Efficient Reasoning with Balanced Thinking

1. どんなもの？

大規模推論モデル（LRM）の「overthinking（過剰思考）」と「underthinking（思考不足）」の問題を解決し、効率的かつ正確な推論を実現するトレーニング不要のフレームワーク「ReBalance」を提案。
LRMは、簡単な問題に冗長な計算ステップを費やすoverthinkingや、十分な推論パスを探索しないunderthinkingにより、非効率性や不正確さに悩まされている。
信頼度（confidence）を推論ダイナミクスの連続的な指標として利用し、リアルタイムで推論プロセスを動的に制御する。

2. 先行研究と比べてどこがすごい？

既存のoverthinking対策（例：反射的キーワードの抑制、推論長の調整）が、意図せずunderthinkingを誘発し、精度を損なう可能性があった。
ReBalanceは、overthinkingとunderthinkingの両方を同時に、かつバランス良く解決することを目指す。
高い信頼度分散でoverthinkingを、一貫した過信でunderthinkingを特定し、それぞれに合わせた調整を行う。
トレーニング不要（training-free）で、既存のLRMにプラグアンドプレイで適用できる汎用性の高さが特徴。

3. 技術や手法の肝はどこ？

**信頼度ベースの推論状態特定:** LRMの推論過程における信頼度を継続的に監視し、その変化パターンから現在の思考状態（overthinking、underthinking、balanced thinking）を識別する。
高い信頼度分散はoverthinking、一貫した過信はunderthinkingの兆候と捉える。
**推論モードプロトタイプとステアリングベクトル:** 小規模なデータセットからLRMの隠れ状態を集約し、異なる推論モード（過剰思考、思考不足、バランス思考）の「プロトタイプ」を生成。これらを用いて、現在の推論軌道を望ましい方向へ導く「ステアリングベクトル」を計算する。
**動的制御関数:** リアルタイムで計算される信頼度に基づいて、ステアリングベクトルの強度と方向を調整する。
overthinking時には冗長な思考パスを剪定（pruning）し、underthinking時にはより多くの探索を促進（promoting exploration）する。

4. どうやって有効だと検証した？

0.5Bから32Bまでの異なるサイズの4つのモデル（例：LLaMA系列）を用いて広範な実験を実施。
数学推論、一般的な質問応答、コーディングタスクを含む9つのベンチマークで評価。
実験の結果、ReBalanceがモデルの出力における冗長性を効果的に削減しつつ、同時に推論精度を向上させることを実証した。

5. 議論はある？

アブストラクトからは直接的な議論の記述はないが、信頼度の定義や計算方法がモデルやタスクによって最適なものが異なる可能性や、推論モードプロトタイプを生成するための小規模データセットの選定が性能に影響を与える可能性が考えられる。
「トレーニング不要」ではあるが、プロトタイプ作成や信頼度計算のための追加モジュールや設定が必要となるため、完全にゼロコストではない。

6. 次に読むべき論文は？

LLMの不確実性推定（Uncertainty Quantification）や信頼度推定（Confidence Estimation）に関する論文。
LLMの推論プロセスを制御・ガイドする手法（例: Tree of Thoughts, Self-Refine）に関する論文。
LLMの効率化（推論速度、計算リソース削減）に関する論文。

Abstract (原文)

Large Reasoning Models (LRMs) have shown remarkable reasoning capabilities, yet they often suffer from overthinking, expending redundant computational steps on simple problems, or underthinking, failing to explore sufficient reasoning paths despite inherent capabilities. These issues lead to inefficiencies and potential inaccuracies, limiting practical deployment in resource-constrained settings. Existing methods to mitigate overthinking, such as suppressing reflective keywords or adjusting reasoning length, may inadvertently induce underthinking, compromising accuracy. Therefore, we propose ReBalance, a training-free framework that achieves efficient reasoning with balanced thinking. ReBalance leverages confidence as a continuous indicator of reasoning dynamics, identifying overthinking through high confidence variance and underthinking via consistent overconfidence. By aggregating hidden states from a small-scale dataset into reasoning mode prototypes, we compute a steering vector to guide LRMs' reasoning trajectories. A dynamic control function modulates this vector's strength and direction based on real-time confidence, pruning redundancy during overthinking, and promoting exploration during underthinking. Extensive experiments conducted on four models ranging from 0.5B to 32B, and across nine benchmarks in math reasoning, general question answering, and coding tasks demonstrate that ReBalance effectively reduces output redundancy while improving accuracy, offering a general, training-free, and plug-and-play strategy for efficient and robust LRM deployment. Code is available at https://github.com/yu-lin-li/ReBalance .

Efficient Reasoning with Balanced Thinking💻 コードあり

Abstract (原文)