Papers /

The Verbosity Premium: What RLHF-Induced Token Inflation Costs the AI Industry

John Kearney15 Research LabMarch 31, 2026DOI: 10.5281/zenodo.19346709
Zenodo preprintRelated findings

Abstract

Alignment training systematically lengthens language model outputs. We aggregate measurements from 30+ published studies to quantify this effect across 14 models, finding verbosity compensation rates — the fraction of tokens compressible without information loss — ranging from 13.6% (Llama-3-70B) to 74.2% (Mistral-7B). Because output tokens cost 4–8× more than input tokens at major providers, this inflation has a direct economic cost. Our central estimate: ~$1.2 billion annually, approximately 14% of global inference spend. 98% of PPO reward improvement on WebGPT is attributable to length alone. All 12 existing mitigations in the literature target response length rather than information density. We argue that optimizing for information density (facts per token) is a more principled alternative.

Key Findings

  • $1.2 billion annual cost. RLHF-induced verbosity accounts for ~14% of global inference spend.
  • 13.6%–74.2% verbosity compensation rates across 14 models. Mistral-7B is worst; Llama-3-70B is most efficient.
  • 98% of PPO reward improvement on WebGPT is attributable to length alone (Singhal et al., 2024).
  • DPO doubles response length within the first 10% of training (Park et al., 2024).
  • Average completions nearly tripled: ~150 to ~400 tokens from early 2024 to late 2025.
  • All 12 existing mitigations target response length, not information density. Wrong target.

Keywords

RLHF, language model alignment, verbosity, token efficiency, inference cost, information density, preference optimization, DPO

Citation

@article{kearney2026verbosity,
  title   = {The Verbosity Premium: What RLHF-Induced Token
             Inflation Costs the AI Industry},
  author  = {Kearney, John},
  year    = {2026},
  doi     = {10.5281/zenodo.19346709},
  url     = {https://doi.org/10.5281/zenodo.19346709},
  publisher = {Zenodo}
}

Operationalized in Authensor: The RCI framework from this paper informs how Authensor's policy engine evaluates output quality constraints and budget enforcement for token-level spending controls. Learn more →

← All publications