Heterogeneous Scientific Foundation Model Collaboration

1. どんなもの？

科学分野の専門基盤モデル（FMs）と大規模言語モデル（LLM）を連携させる「Eywa」という異種エージェントフレームワークです。
従来LLMが苦手としていた非言語データや専門タスクに対し、ドメイン固有FMsの能力を活用します。
LLMが非言語データに対する推論をガイドできるよう、ドメイン固有FMsにLLMベースの推論インターフェースを付与します。
これにより、専門FMsが単なるツールとしてではなく、エージェントシステム内の高レベルな推論や意思決定プロセスに直接参加できるようになります。

2. 先行研究と比べてどこがすごい？

従来のエージェント型LLMシステムは言語中心であり、科学分野の専門タスクや非言語データに対する適用には限界がありました。
Eywaは、言語に依存しない専門FMsの能力をLLMベースのエージェントシステムに統合することで、この根本的な限界を克服します。
専門FMsが、それぞれの専門データとタスクに最適化された予測能力を活かしつつ、高レベルな推論プロセスに直接関与できる点が画期的です。

3. 技術や手法の肝はどこ？

**言語モデルベースの推論インターフェースの付与**: ドメイン固有FMsが非言語データを処理する能力を、LLMが理解し、指示を出せる形式に変換するインターフェースを構築します。
**異種エージェントのオーケストレーション**:
**EywaAgent**: 単一エージェントとして、特定の専門タスクを処理します。
**EywaMAS**: 既存のマルチエージェントシステムに、従来のLLMエージェントの代わりに専門エージェントとして組み込みます。
**EywaOrchestra**: プランニングベースのフレームワークで、プランナーが従来のLLMエージェントとEywaエージェントを動的に調整し、異種データモダリティにまたがる複雑なタスクを解決します。

4. どうやって有効だと検証した？

物理、生命、社会科学といった多様な科学ドメインにわたる様々なタスクでEywaを評価しました。
実験結果として、構造化データやドメイン固有データを含むタスクにおいて、Eywaがパフォーマンスを向上させることを示しました。
専門FMsとの効果的なコラボレーションを通じて、言語ベースの推論への依存が低減されることを実証しました。

5. 議論はある？

アブストラクトからは直接的な議論は読み取れませんが、一般的な課題として以下が考えられます。
異なるドメイン固有FMs間のインターフェースの標準化や、LLMがFMsの出力を正確に解釈し、適切な指示を生成する際のロバスト性。
EywaOrchestraのようなプランニングベースのオーケストレーションにおける、プランニングの複雑性、効率性、およびエラー処理の課題。
多数の異種FMsを統合する際のスケーラビリティや計算コスト。

6. 次に読むべき論文は？

「ToolFormer: Language Models as Tool Users」や「Gorilla: Large Language Model Connected with Massive Tools」など、LLMと外部ツール連携に関する論文。
「Scientific Foundation Models: A Survey of Challenges and Opportunities」など、科学分野における基盤モデルの現状と課題に関する論文。
マルチエージェントシステムにおけるプランニングや協調に関する論文。

Abstract (原文)

Agentic large language model systems have demonstrated strong capabilities. However, their reliance on language as the universal interface fundamentally limits their applicability to many real-world problems, especially in scientific domains where domain-specific foundation models have been developed to address specialized tasks beyond natural language. In this work, we introduce Eywa, a heterogeneous agentic framework designed to extend language-centric systems to a broader class of scientific foundation models. The key idea of Eywa is to augment domain-specific foundation models with a language-model-based reasoning interface, enabling language models to guide inference over non-linguistic data modalities. This design allows predictive foundation models, which are typically optimized for specialized data and tasks, to participate in higher-level reasoning and decision-making processes within agentic systems. Eywa can serve as a drop-in replacement for a single-agent pipeline (EywaAgent) or be integrated into existing multi-agent systems by replacing traditional agents with specialized agents (EywaMAS). We further investigate a planning-based orchestration framework in which a planner dynamically coordinates traditional agents and Eywa agents to solve complex tasks across heterogeneous data modalities (EywaOrchestra). We evaluate Eywa across a diverse set of scientific domains spanning physical, life, and social sciences. Experimental results demonstrate that Eywa improves performance on tasks involving structured and domain-specific data, while reducing reliance on language-based reasoning through effective collaboration with specialized foundation models.

Heterogeneous Scientific Foundation Model Collaboration💻 コードあり

Abstract (原文)