Project Alpha investigates a specific hypothesis in LLM optimization: that a large reasoning model's chain-of-thought, captured at an early stopping point, can be handed off to a significantly smaller model — and that the smaller model, guided by the partial thinking trace, can arrive at the correct answer at a fraction of the inference cost.
We benchmark frontier and open-weight models across a variety of challenging evaluations to identify where this handoff works, where it breaks, and what properties of the thinking trace matter most for successful transfer. The goal is a practical framework for high-accuracy, low-cost inference that doesn't require running a massive model to completion on every query.