Speculative Decoding

Speeding up generation with a small draft model the big one verifies.

Speculative decoding uses a fast, small model to propose several tokens at once, which the large target model then checks in parallel, accepting the agreed prefix. It cuts latency substantially without changing the output distribution.