【投机解码专题】核心论文-2: Accelerating-Large-Language-Model-Decoding-with-Speculative-Sampling
“Accelerating Large Language Model Decoding with Speculative Sampling” 是投机解码核心论文解读的第二篇论文,它同样是由 Google DeepMind 团队与 2023 年发表的。 它和上一篇 “Fast Inference from Transformers via Speculative Decoding” 的区别是,上一