1、
2、Sequence to Sequence Learning with Neural Networks
3、Neural Machine Translation by Jointly Learning to Align and Translate
4、BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
5、Scaling Laws for Neural Language Models
6、Emergent Abilities of Large Language Models
7、Training Compute-Optimal Large Language Models (ChinChilla scaling law)
8、Scaling Instruction-Finetuned Language Models
Direct Preference Optimization:
9、Your Language Model is Secretly a Reward Model
10、Progress measures for grokking via mechanistic interpretability
11、Language Models Represent Space and Time
12、GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
13、Adam: A Method for Stochastic Optimization
14、Efficient Estimation of Word Representations in Vector Space (Word2Vec)
15、Distributed Representations of Words and Phrases and their Compositionality
因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- pqdy.cn 版权所有 赣ICP备2024042791号-6
违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务