Guided Speculative Inference for Efficient Test-Time Alignment of LLMs

About this Event

Deriving compute-efficient methods for steering LLMs toward high-reward outputs at inference time is an important line of research in test-time scaling. In this talk, Harvard’s Jonathan Geuter will introduce Guided Speculative Inference (GSI), a new algorithm that uses speculative drafts from a small auxiliary model and a reward-likelihood tilt to provably approximate the optimal reward-regularized policy of a larger model. He will begin by motivating test-time scaling and reviewing prior approaches like soft best-of-n and reward-guided speculative decoding. Then he’ll describe the GSI algorithm, its theoretical guarantees, and its strong empirical gains on reasoning benchmarks.

[PAPER]

Speakers

Jonathan Geuter is a PhD student in Applied Mathematics at Harvard University, and previously obtained a Bachelor’s and Master’s degree in Mathematics from TU Berlin. His research interests lie in statistical machine learning, optimal transport, generative modelling, LLMs, and test-time scaling methods.