中兴通讯技术

2024, 02, v.30 37-42

基于存算一体集成芯片的大语言模型专用硬件架构

1.复旦大学

基金项目(Foundation): 国家自然科学基金项目（62322404）; 复旦大学-中兴通讯强计算架构研究联合实验室“存算一体架构研究项目”

邮箱(Email):

DOI:

881	2	5
下载次数	被引频次	阅读次数

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

摘要全文参考文献出版信息相关文章

摘要：

目前以ChatGPT为代表的人工智能（AI）大模型在参数规模和系统算力需求上呈现指数级的增长趋势。深入研究了大型模型专用硬件架构，详细分析了大模型在部署过程中面临的带宽问题，以及这些问题对当前数据中心的重大影响。提出采用存算一体集成芯片架构的解决方案，旨在缓解数据传输压力，同时提高大模型推理的能量效率。此外，还深入研究了在存算一体架构下轻量化-存内压缩协同设计的可能性，以实现稀疏网络在存算一体硬件上的稠密映射，从而显著提高存储密度和计算能效。

关键词： 大语言模型; 存算一体; 集成芯粒; 存内压缩;

Abstract：

Artificial intelligent(AI) models represented by ChatGPT are showing an exponential growth trend in parameter size and system computing power requirements. The dedicated hardware architecture for large models is studied, and a detailed analysis of the bandwidth bottleneck issues faced by large models during deployment is provided, as well as the significant impact of this challenge on current data centers. To address this issue, a solution of using integrated compute-in-memory chiplets has been proposed, aiming to alleviate data transmission pressure and improve the energy efficiency of large-scale model inference. In addition, the possibility of lightweight in-memory compression collaborative design under the in-memory computing architecture is studied, in order to achieve dense mapping of sparse networks on the integrated in-memory computing architecture hardware, thereby significantly improving storage density and computational energy efficiency.

KeyWords： large language model; compute-in-memory; chiplet; in-memory compression;

如需获取全文，请访问cnki.net

参考文献

[1] JIAO Y, HAN L, JIN R, et al. 7.2 A 12nm programmable convolution-efficient neural-processing-unit chip achieving825TOPS[C]//2020 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 2020:136-140. DOI:10.1109/ISSCC19947.2020.9062984

[2] DEAN J. 1.1 The deep learning revolution and its implications for computer architecture and chip design[C]//2020 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE,2020:8-14. DOI:10.1109/ISSCC19947.2020.9063049

[3] LIU S W, LI P Z, ZHANG J S, et al. 16.2 A 28nm 53.8TOPS/W 8b sparse transformer accelerator with In-memory butterfly zero skipper for unstructured-pruned NN and CIM-based localattention-reusable engine[C]//Proceedings of IEEE International Solid-State Circuits Conference(ISSCC). IEEE, 2023:250-252.DOI:10.1109/isscc42615.2023.10067360

[4] STOW D, XIE Y, SIDDIQUA T, et al. Cost-effective design of scalable high-performance systems using active and passive interposers[C]//Proceedings of IEEE/ACM International Conference on Computer-Aided Design(ICCAD). IEEE, 2017:728-735. DOI:10.1109/iccad.2017.8203849

[5] GOMES W, KHUSHU S, INGERLY B D, et al. 8.1 Lakefield and mobility compute:a 3D stacked 10nm and 22FFL hybrid processor system in 12×12mm2, 1mm package-on-package[C]//2020 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 2020:144-146. DOI:10.1109/ISSCC19947.2020.9062957

[6] NAFFZIGER S, LEPAK K, PARASCHOU M, et al. 2.2 AMD chiplet architecture for high-performance server and desktop products[C]//Proceedings of IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 2020:44-45. DOI:10.1109/isscc19947.2020.9063103

[7] SHAO Y S, CLEMONS J, VENKATESAN R, et al. Simba:scaling deep-learning inference with multi-chip-module-based architecture[C]//Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 2019:44-45. DOI:10.1145/3352460.3358302

[8] ZHU H Z, JIAO B, ZHANG J S, et al. COMB-MCM:computingon-memory-boundary NN processor with bipolar bitwise sparsity optimization for scalable multi-chiplet-module edge machine learning[C]//2022 IEEE International Solid-State Circuits Conference(ISSCC). IEEE, 2022:1-3. DOI:10.1109/ISSCC42614.2022.9731657

基本信息:

DOI：

中图分类号:TP18;TN40

引用信息:

[1]何斯琪,穆琛,陈迟晓.基于存算一体集成芯片的大语言模型专用硬件架构[J].中兴通讯技术,2024,30(02):37-42.

基金信息:

国家自然科学基金项目（62322404）; 复旦大学-中兴通讯强计算架构研究联合实验室“存算一体架构研究项目”

请选择需要下载的pdf数据

中兴通讯技术

Summary

引用

GB/T 7714-2015 格式引文

MLA格式引文

APA格式引文