nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo searchdiv qikanlogo popupnotification paper paperNew
2022, 02, v.28 10-15
知识指导的预训练语言模型
基金项目(Foundation):
邮箱(Email):
DOI:
摘要:

作为典型的数据驱动工具,预训练语言模型(PLM)仍然面临可解释性不强、鲁棒性差等难题。如何引入人类积累的丰富知识,是改进预训练模型性能的重要方向。系统介绍知识指导的预训练语言模型的最新进展与趋势,总结知识指导的预训练语言模型的典型范式,包括知识增强、知识支撑、知识约束和知识迁移,从输入、计算、训练、参数空间等多个角度阐释知识对于预训练语言模型的重要作用。

Abstract:

As a typical data-driven method, pre-trained language models(PLMs) still face challenges such as poor interpretablility and robustness. Hence, it is important to introduce human knowledge into these models for better performance. The latest progress and trend of knowledge-guided PLMs are introduced and the paradigm of knowledge-guided PLMs is summarized,including knowledge augmentation,knowledge support, knowledge regularization, and knowledge transfer.

参考文献

[1]CHOMSKY N.Syntactic structures[M].Germany:Walter de Gruyter,1957

[2]NOAM C.Aspects of the theory of syntax[M].USA:The MIT Press,1969

[3]WILSON S,BARR A,COHEN P R,et al.The handbook of artificial intelligence[J].Leonardo,1984,17(4):299.DOI:10.2307/1575114

[4]ROTH H F,WATERMAN A D.Building expert system[M].USA:AddisonWesley,1983

[5]LANDAUER T K,DUMAIS S T.A solution to Plato's problem:the latent semantic analysis theory of acquisition,induction,and representation of knowledge[J].Psychological review,1997,104(2):211-240.DOI:10.1037/0033-295x.104.2.211

[6]PENNINGTON J,SOCHER R,MANNING C.Glove:global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).Association for Computational Linguistics,2014:1532-1543.DOI:10.3115/v1/d14-1162

[7]BOSER B E,GUYON I M,VAPNIK V N.A training algorithm for optimal margin classifiers[C]//COLT'92:Proceedings of the Fifth Annual Workshop on Computational Learning Theory.ACM,1992:144-152.DOI:10.1145/130385.130401

[8]BREIMAN L,FRIEDMAN J,STONE C J,et al.Classification and regression trees[M].USA:CRC press,1984

[9]LAFFERTY J D,McCALLUMM A,FERNANDO C N.Conditional random fields:probabilistic models for segmentingand labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning(ICML 2001).ICML,2001:282-289

[10]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems(NeurIPS 2013).NIPS,2013:3111-3119

[11]HAN X,ZHANG Z,LIU Z.Knowledgeable machine learning for natural language processing[J].Communications of the ACM,2021,64(11):50-51

[12]LIU W J,ZHOU P,ZHAO Z,et al.K-BERT:enabling language representation with knowledge graph[J].Proceedings of the AAAIconference on artificial intelligence,2020,34(3):2901-2908.DOI:10.1609/aaai.v34i03.5681

[13]GUU K,LEE K,TUNG Z.REALM:integrating retrieval into language representation models[EB/OL].[2022-01-10].https://arxiv.org/abs/2005.11401

[14]LEWIS P,PEREZ E,PIKTUS A,et al.Retrieval-augmented generation for knowledge-intensive NLP tasks[EB/OL].(2021-04-12)[2022-01-10].https://arxiv.org/abs/2005.11401

[15]LIU Z H,XIONG C Y,SUN M S,et al.Entity-duet neural ranking:understanding the role of knowledge graph semantics in neural information retrieval[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers).Association for Computational Linguistics,2018:2395-2405.DOI:10.18653/v1/p18-1223

[16]PETERS M,NEUMANN M,IYYER M,et al.Deep contextualized word representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,Volume 1 (Long Papers).Association for Computational Linguistics,2018:2227-2237.DOI:10.18653/v1/n18-1202

[17]ZHANG Z Y,HAN X,LIU Z Y,et al.ERNIE:enhanced language representation with informative entities[EB/OL].(2019-06-04)[2022-01-10].https://arxiv.org/abs/1905.07129

[18]WANG X Z,GAO T Y,ZHU Z C,et al.KEPLER:a unified model for knowledge embedding and pre-trained language representation[J].Transactions of the association for computational linguistics,2021,9:176-194.DOI:10.1162/tacl_a_00360

[19]SU Y S,HAN X,ZHANG Z Y,et al.CokeBERT:Contextual knowledge selection and embedding towards enhanced pre-trained language models[J].AI open,2021,2:127-134.DOI:10.1016/J.AIOPEN.2021.06.004

[20]PETERS M E,NEUMANN M,LOGAN R,et al.Knowledge enhanced contextual word representations[EB/OL].(2019-10-31)[2022-01-10].https://arxiv.org/abs/1909.04164

[21]WESTON J,CHOPRA S,BORDES A.Memory networks[EB/OL].(2014-10-15)[2022-01-10].https://arxiv.org/abs/1410.3916

[22]DING M,ZHOU C,CHEN Q B,et al.Cognitive graph for multi-hop reading comprehension at scale[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Association for Computational Linguistics,2019.DOI:10.18653/v1/p19-1259

[23]RAE W J,POTAPENKO A,JAYAKUMAR S M,et al.Compressive transformers for long-range sequence modelling[EB/OL].(2019-11-13)[2022-01-12].https://arxiv.org/abs/1911.05507

[24]LOGAN R,LIU N F,PETERS M E,et al.Barack's wife Hillary:using knowledge graphs for fact-aware language modeling[EB/OL].(2019-06-17)[2022-01-10].https://arxiv.org/abs/1906.07241

[25]AHN S,CHOI H,PARNAMAA T,et al.A neural knowledge language model[EB/OL].(2017-03-02)[2021-12-12].https://arxiv.org/pdf/1608.00318.pdf

[26]HAYASHI H,HU Z C,XIONG C Y,et al.Latent relation language models EB/OL].(2017-03-02)[2022-01-12].https://arxiv.org/abs/1908.07690

[27]HINTON G E,VINYALS O,DEAN J.Distilling the knowledge in a neural network[EB/OL].(2015-03-09)[2022-01-10].https://arxiv.org/abs/1503.02531

[28]SUN S Q,CHENG Y,GAN Z,et al.Patient knowledge distillation for BERTmodel compression[EB/OL].(2015-03-09)[2022-01-10].https://arxiv.org/abs/1908.09355

[29]RASHID A,LIOUTAS V,REZAGHOLIZADEH M.MATE-KD:masked adversarial TExt,a companion to knowledge distillation[EB/OL].(2021-05-12)[2022-01-10].https://arxiv.org/abs/2105.05912v1

[30]QIN Y,LIN Y,YI J,et al.Knowledge inheritance for pre-trained language models[EB/OL].(2021-05-28)[2022-01-12].https://arxiv.org/abs/2105.13880v1

[31]JIAO X Q,YIN Y C,SHANG L F,et al.TinyBERT:distilling BERT for natural language understanding[EB/OL].(2019-09-23)[2022-01-10].https://arxiv.org/abs/1909.10351v4

[32]MINTZ M,BILLS S,SNOW R,et al.Distant supervision for relation extraction without labeled data[EB/OL].[2022-01-10].https://paperswithcode.com/paper/distantly-supervised-ner-with-partial

[33]BALDINI SOARES L,FITZGERALD N,LING J,et al.Matching the blanks:distributional similarity for relation learning[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Association for Computational Linguistics,2019.DOI:10.18653/v1/p19-1279

[34]SUN Y,WANG S H,LI Y K,et al.ERNIE:enhanced representation through knowledge integration[EB/OL].(2019-04-19)[2022-01-10].https://arxiv.org/abs/1904.09223v1

[35]SUN Y,WANG S H,FENG S K.Ernie 3.0:large-scale knowledge enhanced pre-training for language understanding and generation[EB/OL].(2019-04-19)[2022-01-10].https://arxiv.org/abs/2107.02137

[36]DEVLIN J,CHANG M W,LEE K,et al.BERT:pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North.Association for Computational Linguistics,2019.DOI:10.18653/v1/n19-1423

[37]GU X T,LIU L Y,YU H K,et al.On the transformer growth for progressive BERT training[C]//Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Association for Computational Linguistics,2021.DOI:10.18653/v1/2021.naacl-main.406

[38]GURURANGAN S,MARASOVI?A,SWAYAMDIPTA S,et al.Don’t stop pretraining:adapt language models to domains and tasks[EB/OL].(2020-04-23)[2022-01-10].https://arxiv.org/abs/2004.10964v2

[39]PFEIFFER J,RüCKLéA,POTH C,et al.AdapterHub:a framework for adapting transformers[EB/OL].(2020-10-06)[2022-01-10].https://arxiv.org/abs/2007.07779

[40]LIU X,ZHENG Y N,DU Z X,et al.GPT understands,too[EB/OL].(2021-03-18)[2022-01-11].https://arxiv.org/abs/2103.10385v1

[41]LIU X,JI K X,FU Y C,et al.P-tuning v2:prompt tuning can be comparable to fine-tuning universally across scales and tasks[EB/OL].(2021-10-18)[2022-01-10].https://arxiv.org/abs/2110.07602v2

[42]GAO T Y,FISCH A,CHEN D Q.Making pre-trained language models better few-shot learners[EB/OL].(2021-06-02)[2022-01-12].https://arxiv.org/abs/2012.15723v2

[43]HU S D,DING N,WANG H,et al.Knowledgeable prompt-tuning:incorporating knowledge into prompt verbalizer for text classification(2021-08-04)[2022-01-11].https://paperswithcode.com/paper/knowledgeable-prompt-tuning-incorporating

[44]DING N,CHEN Y,HAN X,et al.Prompt-learning for fine-grained entity typing[EB/OL].(2021-08-24)[2022-01-10].http://121.199.17.194/paper/1430587541732179968?adv

[45]MA R,ZHOU X,GUI T,et al.Template-free Prompt Tuning for few-shot NER[EB/OL].(2021-09-28)[2022-01-10].https://paperswithcode.com/paper/template-free-prompt-tuning-for-few-shot-ner

[46]DATHATHRI S,MADOTTO A,LAN J,et al.Plug and play language models:a simple approach to controlled text generation[EB/OL].[2022-01-10].https://paperswithcode.com/paper/plug-and-play-languagemodels-a-simple

[47]ZOU X,YIN D,ZHONG Q Y,et al.Controllable generation from pretrained language models via inverse prompting[C]//Proceedings of the27th ACM SIGKDD Conference on Knowledge Discovery&Data Mining.ACM,2021.DOI:10.1145/3447548.3467418

[48]SHIN T,RAZEGHI Y,LOGAN R L,et al.AutoPrompt:eliciting knowledge from language models with automatically generated prompts[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).Association for Computational Linguistics,2020.DOI:10.18653/v1/2020.emnlp-main.346

[49]HAN X,ZHAO W L,DING N,et al.PTR:prompt tuning with rules for text classification[EB/OL].(2021-05-24)[2022-01-10].https://paperswithcode.com/paper/ptr-prompt-tuning-with-rules-for-text

[50]LESTER B,AL-RFOU R,CONSTANT N.The power of scale for parameter-efficient prompt tuning[C]//Proceedings of the 2021Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,2021.DOI:10.18653/v1/2021.emnlp-main.243

[51]GU Y X,HAN X,LIU Z Y,et al.PPT:pre-trained prompt tuning for fewshot learning[EB/OL].(2021-09-09)[2022-01-12].https://paperswithcode.com/paper/ppt-pre-trained-prompt-tuning-for-fewshot

[52]VU T,LESTER B,CONSTANT N,et al.SPoT:better frozen model adaptation through soft prompt transfer[EB/OL].(2021-10-15)[2022-01-12].https://paperswithcode.com/paper/spot-better-frozen-modeladaptation-through

[53]PETRONI F,ROCKTASCHEL T,RIEDEL S,et al.Language models as knowledge bases?[EB/OL].[2022-01-12].https://paperswithcode.com/paper/language-models-as-knowledge-bases

[54]PETRONI F,LEWIS P,PIKTUS A,et al.How context affects language models'factual predictions[EB/OL].(2021-10-15)[2022-01-10].https://paperswithcode.com/paper/spot-better-frozen-model-adaptation-through

[55]JIANG Z B,XU F F,ARAKI J,et al.How can we know what language models know?[J].Transactions of the association for computational linguistics,2020,8:423-438.DOI:10.1162/tacl_a_00324

基本信息:

DOI:

中图分类号:TP391.1

引用信息:

[1]韩旭,张正彦,刘知远.知识指导的预训练语言模型[J].中兴通讯技术,2022,28(02):10-15.

基金信息:

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文