中兴通讯技术

2023, 04, v.29 49-56

大规模语言模型的跨云联合训练关键技术

1.鹏城实验室

基金项目(Foundation): 科技创新2030—“新一代人工智能”重大项目（2022ZD0115301）

邮箱(Email):

DOI:

323	3	3
下载次数	被引频次	阅读次数

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

摘要全文参考文献出版信息相关文章

摘要：

模型参数规模的不断增加使模型训练所需的算力资源变得更加庞大，导致很多情况下单个算力集群难以满足大规模语言模型的训练需求。大规模语言模型的跨云联合训练成为解决这一问题的有效方式。以自然语言处理大模型的跨云预训练和微调为例，介绍了大规模语言模型跨云训练的主要挑战和关键技术，并探讨了这些技术在跨云训练过程中的具体应用、实际效果和未来场景。这些技术将为智能化应用和人机交互等提供有力支持。

关键词： 大规模语言模型; 算力资源; 跨云训练; 自然语言处理;

Abstract：

As the scale of model parameters continues to grow, the computational resources required for model training become significantly larger. This often leads to situations where a single computing cluster is insufficient to meet the training needs of large-scale language models. Cross-cloud joint training of large-scale language models has emerged as an effective solution to addressing this challenge. In this study, taking cross-cloud pre-training and fine-tuning of natural language processing models as examples, we introduce the main challenges and key technologies involved in cross-cloud training of large-scale language models. The specific applications, practical effects, and future scenarios of these technologies in the cross-cloud training process are explored. These technologies will provide strong support for intelligent applications and human-computer interaction.

KeyWords： large-scale language model; computational resource; cross-cloud training; natural language processing;

如需获取全文，请访问cnki.net

参考文献

[1] DEVLIN J, CHANG M W, LEE K, et al. BERT:pre-training of deep bidirectional transformers for language understanding[EB/OL].[2023-06-08]. https://arxiv.org/abs/1810.04805

[2] BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[EB/OL].[2023-06-08]. https://arxiv.org/abs/2005.14165

[3] HUANG Y P, CHENG Y L, CHEN D H, et al. GPipe:efficient training of giant neural networks using pipeline parallelism[EB/OL].[2023-06-08]. https://arxiv.org/abs/1811.06965

[4] HU E J, SHEN Y L, WALLIS P, et al. LoRA:low-rank adaptation of large language models[EB/OL].[2023-06-08]. https://arxiv.org/abs/2106.09685

[5] XIANG Y, WU Z H, GONG W B, et al. Nebula-I:a general framework for collaboratively training deep learning models on low-bandwidth cloud clusters[EB/OL].[2023-06-08]. https://arxiv.org/abs/2205.09470

[6] CLARK K, LUONG M T, LE Q V, et al. ELECTRA:pre-training text encoders as discriminators rather than generators[EB/OL].[2023-06-08]. https://arxiv.org/abs/2003.10555

[7] LAMPLE G, CONNEAU A. Cross-lingual language model pretraining[EB/OL].[2023-06-08]. https://arxiv.org/abs/1901.07291

[8] HUANG H Y, LIANG Y B, DUAN N, et al. Unicoder:a universal language encoder by pre-training with multiple cross-lingual tasks[EB/OL].[2023-06-08]. https://arxiv.org/abs/1909.00964

[9] CONNEAU A, KHANDELWAL K, GOYAL N, et al. Unsupervised crosslingual representation learning at scale[EB/OL].[2023-06-08]. https://arxiv.org/abs/1911.02116

[10] CHI Z W, DONG L, WEI F R, et al. InfoXLM:an information-theoretic framework for cross-lingual language model pre-training[EB/OL].[2023-06-08]. https://arxiv.org/abs/2007.07834

[11] OUYANG X, WANG S H, PANG C, et al. ERNIE-M:enhanced multilingual representation by aligning cross-lingual semantics with monolingual corpora[EB/OL].[2023-06-08]. https://arxiv.org/abs/2012.15674

[12] LUO F L, WANG W, LIU J H, et al. VECO:variable and flexible crosslingual pre-training for language understanding and generation[EB/OL].[2023-06-08]. https://arxiv.org/abs/2010.16046

[13] GUO J L, ZHANG Z R, XU L L, et al. Incorporating BERT into parallel sequence decoding with adapters[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems.ACM, 2020:10843–10854. DOI:10.5555/3495724.3496634

[14] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. ACM, 2017:6000–6010. DOI:10.5555/3295222.3295349

[15] PAPINENI K, ROUKOS S, WARD T, et al. BLEU:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics-ACL'02.Association for Computational Linguistics, 2001:311-318. DOI:10.3115/1073083.1073135

[16] SHI S H, YANG Q, XIANG Y, et al. An efficient split fine-tuning framework for edge and cloud collaborative learning[EB/OL].[2023-06-08]. https://arxiv.org/abs/2211.16703

[17] WANG A, SINGH A, MICHAEL J, et al. GLUE:a multi-task benchmark and analysis platform for natural language understanding[C]//Proceedings of the 2018 EMNLP Workshop BlackboxNLP:Analyzing and Interpreting Neural Networks for NLP. Association for Computational Linguistics, 2018:353–355. DOI:10.18653/v1/w18-5446

[18] RAJPURKAR P, ZHANG J, LOPYREV K, et al. SQuAD:100, 000+questions for machine comprehension of text[C]//Proceedings of the 2016Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics, 2016:2383-2392. DOI:10.18653/v1/d16-1264

基本信息:

DOI：

中图分类号:TP391.1;TP18

引用信息:

[1]潘囿丞,侯永帅,杨卿等.大规模语言模型的跨云联合训练关键技术[J].中兴通讯技术,2023,29(04):49-56.

基金信息:

科技创新2030—“新一代人工智能”重大项目（2022ZD0115301）

请选择需要下载的pdf数据

中兴通讯技术

Summary

引用

GB/T 7714-2015 格式引文

MLA格式引文

APA格式引文