5,808 | 51 | 103 |
下载次数 | 被引频次 | 阅读次数 |
基于神经网络和深度学习的预训练语言模型为自然语言处理技术带来了突破性发展。基于自注意力机制的Transformer模型是预训练语言模型的基础。GPT、BERT、XLNet等大规模预训练语言模型均基于Transformer模型进行堆叠和优化。认为目前依赖强大算力和海量数据的大规模预训练语言模型存在实用问题,指出轻量预训练语言模型是未来重要的发展方向。
Abstract:The pre-trained language model based on neural network and deep learning has brought breakthrough development for natural language processing technology. The Transformer model based on self-attention mechanism is the basis of the pre-trained language model.Large-scale pre-trained language models such as GPT, BERT, XLNet, etc. are based on the Transformer model or its optimization. However, the current large-scale pre-training language models that rely on powerful computing resources and massive data have practical problems. It is pointed out that lightweight pre-trained language models are an important development direction in the future.
[1]ISO/IEC.Information technology-artificial intelligence-artificial intelligence concepts and terminology:ISO/IEC TR 24372:2021(E)[S].2021
[2]段德智.莱布尼茨语言哲学的理性主义实质及其历史地位研究[J].武汉大学学报(人文科学版),2013,66(5):54-63
[3]TURING A M.Computing machinery and intelligence[J].Mind,1950,49:433-460.DOI:10.1093/mind/lix.236.433
[4]RADFORD A,NARASIMHAN K.Improving language understanding by generative pre-training[EB/OL].[2022-02-25].https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
[5]DEVLIN J,CHANG M W,LEE K,et al.BERT:pre-training of deep bidirectional transformers for language understanding[EB/OL].[2022-02-25].https://aclanthology.org/N19-1423.pdf
[6]BENGIO Y,DUCHARME R,VINCENT P,et al.A neural probabilistic language model[J].Journal of machine learning research,2003,3:1137-1155
[7]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.ACM,2017:6000-6010
[8]DAI Z H,YANG Z L,YANG Y M,et al.Transformer-XL:attentive language models beyond a fixed-length context[EB/OL].[2022-02-25].https://arxiv.org/abs/1901.02860v3.DOI:10.18653/v1/p19-1285
[9]RADFORD A,JEFFREY W,CHILD R,et al.Language models are unsupervised multitask learners[EB/OL].[2022-02-25].https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
[10]BROWN T B,MANN B,RYDER N,et al.Language models are few-shot learners[EB/OL].[2022-02-25].https://arxiv.org/abs/2005.14165
[11]WANG A,SINGH A,MICHAEL J,et al.GLUE:a multi-task benchmark and analysis platform for natural language understanding[C]//Proceedings of the 2018 EMNLP Workshop BlackboxNLP:Analyzing and Interpreting Neural Networks for NLP.Association for Computational Linguistics,2018:353-355.DOI:10.18653/v1/w18-5446
[12]RAJPURKAR P,ZHANG J,LOPYREV K,et al.SQuAD:100 000+questions for machine comprehension of text[C]//Proceedings of the 2016Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,2016:2383-2392.DOI:10.18653/v1/d16-1264
[13]ZELLERS R,BISK Y,SCHWARTZ R,et al.SWAG:a large-scale adversarial dataset for grounded commonsense inference[C]//Proceedings of the2018 Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,2018:93-104.DOI:10.18653/v1/d18-1009
[14]YANG Z L,DAI Z H,YANG Y M,et al.XLNet:generalized autoregressive pretraining for language understanding[EB/OL].[2022-02-25].https://arxiv.org/abs/1906.08237
[15]LAN Z Z,CHEN M D,GOODMAN S,et al.ALBERT:a lite BERT for selfsupervised learning of language representations[EB/OL].[2022-02-25].https://arxiv.org/abs/1909.11942
[16]LEWIS M,LIU Y H,GOYAL N,et al.BART:denoising sequence-tosequence pre-training for natural language generation,translation,and comprehension[EB/OL].[2022-02-25].https://arxiv.org/abs/1910.13461
[17]SUN C,MYERS A,VONDRICK C,et al.VideoBERT:a joint model for video and language representation learning[C]//Proceedings of 2019 IEEE/CVFInternational Conference on Computer Vision (ICCV).IEEE,2019:7463-7472.DOI:10.1109/ICCV.2019.00756
[18]DEUTSCHER M.OpenAI makes GPT-3 more broadly available to developers[EB/OL].[2022-02-25].https://siliconangle.com/2021/11/18/openai-makes-gpt-3-broadly-available-developers/
[19]DICKSON B.The untold story of GPT-3 is the transformation of OpenAI[EB/OL].[2022-02-25].https://bdtechtalks.com/2020/08/17/openai-gpt-3-commercial-ai/#:~:text=According%20to%20one%20estimate%2C%20training%20GPT-3%20would%20cost,tuning%20that%20would%20probably%20increase%20the%20cost%20several-fold
[20]HENRY G,TANG P T P,HEINECKE A.Leveraging the bfloat16 artificial intelligence datatype for higher-precision computations[C]//Proceedings of 2019 IEEE 26th Symposium on Computer Arithmetic.IEEE,2019:69-76.DOI:10.1109/ARITH.2019.00019
[21]Intel.Code sample:Intel?deep learning boost new deep learning instruction bfloat16-intrinsic functions[EB/OL].[2022-02-25].https://www.intel.cn/content/www/cn/zh/developer/articles/technical/intel-deeplearning-boost-new-instruction-bfloat16.html?wapkw=BF16
[22]SANH V,DEBUT L,CHAUMOND J,et al.DistilBERT,a distilled version of BERT:smaller,faster,cheaper and lighter[EB/OL].[2022-02-25].https://arxiv.org/abs/1910.01108
[23]JIAO X Q,YIN Y C,SHANG L F,et al.TinyBERT:distilling BERT for natural language understanding[C]//Proceedings of Findings of the Association for Computational Linguistics:EMNLP 2020.Association for Computational Linguistics,2020:4163-4174.DOI:10.18653/v1/2020.findings-emnlp.372
[24]SUN Z Q,YU H K,SONG X D,et al.MobileBERT:a compact taskagnostic BERT for resource-limited devices[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Association for Computational Linguistics,2020:2158-2170.DOI:10.18653/v1/2020.acl-main.195
基本信息:
DOI:
中图分类号:TP391.1
引用信息:
[1]王海宁.自然语言处理技术发展[J].中兴通讯技术,2022,28(02):59-64.
基金信息: