大语言模型-大模型基础文献

来源：品趣旅游知识分享网

1、

2、Sequence to Sequence Learning with Neural Networks

3、Neural Machine Translation by Jointly Learning to Align and Translate

4、BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

5、Scaling Laws for Neural Language Models

6、Emergent Abilities of Large Language Models

7、Training Compute-Optimal Large Language Models (ChinChilla scaling law)

8、Scaling Instruction-Finetuned Language Models

Direct Preference Optimization:

9、Your Language Model is Secretly a Reward Model

10、Progress measures for grokking via mechanistic interpretability

11、Language Models Represent Space and Time

12、GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

13、Adam: A Method for Stochastic Optimization

14、Efficient Estimation of Word Representations in Vector Space (Word2Vec)

15、Distributed Representations of Words and Phrases and their Compositionality

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文