323 | 3 | 3 |
下载次数 | 被引频次 | 阅读次数 |
模型参数规模的不断增加使模型训练所需的算力资源变得更加庞大,导致很多情况下单个算力集群难以满足大规模语言模型的训练需求。大规模语言模型的跨云联合训练成为解决这一问题的有效方式。以自然语言处理大模型的跨云预训练和微调为例,介绍了大规模语言模型跨云训练的主要挑战和关键技术,并探讨了这些技术在跨云训练过程中的具体应用、实际效果和未来场景。这些技术将为智能化应用和人机交互等提供有力支持。
Abstract:As the scale of model parameters continues to grow, the computational resources required for model training become significantly larger. This often leads to situations where a single computing cluster is insufficient to meet the training needs of large-scale language models. Cross-cloud joint training of large-scale language models has emerged as an effective solution to addressing this challenge. In this study, taking cross-cloud pre-training and fine-tuning of natural language processing models as examples, we introduce the main challenges and key technologies involved in cross-cloud training of large-scale language models. The specific applications, practical effects, and future scenarios of these technologies in the cross-cloud training process are explored. These technologies will provide strong support for intelligent applications and human-computer interaction.
[1] DEVLIN J, CHANG M W, LEE K, et al. BERT:pre-training of deep bidirectional transformers for language understanding[EB/OL].[2023-06-08]. https://arxiv.org/abs/1810.04805
[2] BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[EB/OL].[2023-06-08]. https://arxiv.org/abs/2005.14165
[3] HUANG Y P, CHENG Y L, CHEN D H, et al. GPipe:efficient training of giant neural networks using pipeline parallelism[EB/OL].[2023-06-08]. https://arxiv.org/abs/1811.06965
[4] HU E J, SHEN Y L, WALLIS P, et al. LoRA:low-rank adaptation of large language models[EB/OL].[2023-06-08]. https://arxiv.org/abs/2106.09685
[5] XIANG Y, WU Z H, GONG W B, et al. Nebula-I:a general framework for collaboratively training deep learning models on low-bandwidth cloud clusters[EB/OL].[2023-06-08]. https://arxiv.org/abs/2205.09470
[6] CLARK K, LUONG M T, LE Q V, et al. ELECTRA:pre-training text encoders as discriminators rather than generators[EB/OL].[2023-06-08]. https://arxiv.org/abs/2003.10555
[7] LAMPLE G, CONNEAU A. Cross-lingual language model pretraining[EB/OL].[2023-06-08]. https://arxiv.org/abs/1901.07291
[8] HUANG H Y, LIANG Y B, DUAN N, et al. Unicoder:a universal language encoder by pre-training with multiple cross-lingual tasks[EB/OL].[2023-06-08]. https://arxiv.org/abs/1909.00964
[9] CONNEAU A, KHANDELWAL K, GOYAL N, et al. Unsupervised crosslingual representation learning at scale[EB/OL].[2023-06-08]. https://arxiv.org/abs/1911.02116
[10] CHI Z W, DONG L, WEI F R, et al. InfoXLM:an information-theoretic framework for cross-lingual language model pre-training[EB/OL].[2023-06-08]. https://arxiv.org/abs/2007.07834
[11] OUYANG X, WANG S H, PANG C, et al. ERNIE-M:enhanced multilingual representation by aligning cross-lingual semantics with monolingual corpora[EB/OL].[2023-06-08]. https://arxiv.org/abs/2012.15674
[12] LUO F L, WANG W, LIU J H, et al. VECO:variable and flexible crosslingual pre-training for language understanding and generation[EB/OL].[2023-06-08]. https://arxiv.org/abs/2010.16046
[13] GUO J L, ZHANG Z R, XU L L, et al. Incorporating BERT into parallel sequence decoding with adapters[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems.ACM, 2020:10843–10854. DOI:10.5555/3495724.3496634
[14] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. ACM, 2017:6000–6010. DOI:10.5555/3295222.3295349
[15] PAPINENI K, ROUKOS S, WARD T, et al. BLEU:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics-ACL'02.Association for Computational Linguistics, 2001:311-318. DOI:10.3115/1073083.1073135
[16] SHI S H, YANG Q, XIANG Y, et al. An efficient split fine-tuning framework for edge and cloud collaborative learning[EB/OL].[2023-06-08]. https://arxiv.org/abs/2211.16703
[17] WANG A, SINGH A, MICHAEL J, et al. GLUE:a multi-task benchmark and analysis platform for natural language understanding[C]//Proceedings of the 2018 EMNLP Workshop BlackboxNLP:Analyzing and Interpreting Neural Networks for NLP. Association for Computational Linguistics, 2018:353–355. DOI:10.18653/v1/w18-5446
[18] RAJPURKAR P, ZHANG J, LOPYREV K, et al. SQuAD:100, 000+questions for machine comprehension of text[C]//Proceedings of the 2016Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics, 2016:2383-2392. DOI:10.18653/v1/d16-1264
基本信息:
DOI:
中图分类号:TP391.1;TP18
引用信息:
[1]潘囿丞,侯永帅,杨卿等.大规模语言模型的跨云联合训练关键技术[J].中兴通讯技术,2023,29(04):49-56.
基金信息:
科技创新2030—“新一代人工智能”重大项目(2022ZD0115301)