記事要約

海外記事・論文の日本語要約 — 111本

論文要約
Gen-Searcher: Reinforcing Agentic Search for Image Generation
論文arxiv 2603.28767
π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows
論文arxiv 2605.14678
PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects
論文arxiv 2605.21572
Toto 2.0: Time Series Forecasting Enters the Scaling Era
論文arxiv 2605.20119
Active Learners as Efficient PRP Rerankers
論文arxiv 2605.14236
Auditing Agent Harness Safety
論文arxiv 2605.14271
MMSkills: Towards Multimodal Skills for General Visual Agents
論文arxiv 2605.13527
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models
論文arxiv 2605.14906
Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems
論文arxiv 2605.14892
MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory
論文arxiv 2605.15128
FrameSkip: Learning from Fewer but More Informative Frames in VLA Training
論文arxiv 2605.13757
World Model for Robot Learning: A Comprehensive Survey
論文arxiv 2605.00080
SEIF: Self-Evolving Reinforcement Learning for Instruction Following
論文arxiv 2605.07465
Beyond Retrieval: A Multitask Benchmark and Model for Code Search
論文arxiv 2605.04615
Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction
論文arxiv 2605.05242
Audio-Visual Intelligence in Large Foundation Models
論文arxiv 2605.04045
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration
論文arxiv 2605.03042
Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling
論文arxiv 2604.28075
Representation Fréchet Loss for Visual Generation
論文arxiv 2604.28190
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
論文arxiv 2604.26752
Heterogeneous Scientific Foundation Model Collaboration
論文arxiv 2604.27351
ClawGym: A Scalable Framework for Building Effective Claw Agents
論文arxiv 2604.26904
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora
論文arxiv 2604.24819
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company
論文arxiv 2604.22446
ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents
論文arxiv 2604.23781
LLM Safety From Within: Detecting Harmful Content with Internal Representations
論文arxiv 2604.18519
Exploring Spatial Intelligence from a Generative Perspective
論文arxiv 2604.20570
DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data
論文arxiv 2604.19859
UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling
論文arxiv 2604.19734
OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis
論文arxiv 2604.15093
ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning
論文arxiv 2604.19254
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
論文arxiv 2604.18486
PersonaVLM: Long-Term Personalized Multimodal LLMs
論文arxiv 2604.13074
RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework
論文arxiv 2604.15308
Exploration and Exploitation Errors Are Measurable for Language Model Agents
論文arxiv 2604.13151
SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding
論文arxiv 2604.09557
WildDet3D: Scaling Promptable 3D Detection in the Wild
論文arxiv 2604.08626
When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models
論文arxiv 2604.08546
ClawBench: Can AI Agents Complete Everyday Online Tasks?
論文arxiv 2604.08523
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents
論文arxiv 2604.07430
General Multimodal Protein Design Enables DNA-Encoding of Chemistry
論文arxiv 2604.05181
LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models
論文arxiv 2603.28301
SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing
論文arxiv 2604.04911
ASI-Evolve: AI Accelerates AI
論文arxiv 2603.29640
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook
論文arxiv 2604.02029
CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery
論文arxiv 2604.01658
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization
論文arxiv 2604.02268
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence
論文arxiv 2603.28032
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development
論文arxiv 2603.27460
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
論文arxiv 2603.25716
UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience
論文arxiv 2603.24533
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents
論文arxiv 2603.24440
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
論文arxiv 2603.23516
EVA: Efficient Reinforcement Learning for End-to-End Video Agent
論文arxiv 2603.22918
Repurposing Geometric Foundation Models for Multi-view Diffusion
論文arxiv 2603.22275
Manifold-Aware Exploration for Reinforcement Learning in Video Generation
論文arxiv 2603.21872
MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction
論文arxiv 2603.19231
FASTER: Rethinking Real-Time Flow VLAs
論文arxiv 2603.19199
3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model
論文arxiv 2603.18524
Complementary Reinforcement Learning
論文arxiv 2603.17621
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings
論文arxiv 2603.13594
Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding
論文arxiv 2603.13366
Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation
論文arxiv 2603.12793
Efficient Reasoning with Balanced Thinking
論文arxiv 2603.12372
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training
論文arxiv 2603.12255
Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation
論文arxiv 2603.12247
GPT4o-Receipt: A Dataset and Human Study for AI-Generated Document Forensics
論文arxiv 2603.11442
ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation
論文arxiv 2603.11421
Counterweights and Complementarities: The Convergence of AI and Blockchain Powering a Decentralized Future
論文arxiv 2603.11299
Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios
論文arxiv 2603.11214
FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System
論文arxiv 2603.10420
記事・ブログ要約
Hengefinder — 街路と太陽が重なる日を探す
HN2026/05/24
Project Glasswing — AI脆弱性探索で変わる保守の律速段階
HN2026/05/23
The Virtual OS Museum — 触れるOS史の博物館
HN2026/05/20
Tailwindを離れてCSS構造を学ぶ — Julia Evans記事メモ
HN2026/05/17
Codex mobile と RAV4 テレマティクス除去
HN2026/05/15
Flow Maps — Diffusionモデルの「経路全体を一気に渡す」写像(Sander Dieleman)
技術2026/05/06
Copy Fail — Linuxカーネルの「直線的論理欠陥」で誰でもroot
セキュリティCVE-2026-31431
FlashPortrait:6倍高速な無限ポートレートアニメーション
技術
glTFに3D Gaussian Splatting標準が追加
技術
2025年HN人気ブログTop5
HN
MCPは要らない? CLIの方がAIに向いている説
技術
Metaスマートグラスの背後で働く人々は「すべてを見ている」
社会
Neural Networks: Zero to Hero
教育
StackOverflow質問数95%減少
業界
500ms以下のレイテンシー音声エージェントをゼロから構築
技術
The suck is why we're here
エッセイ
トニー・ホーア卿 逝去 — コンピュータ科学の巨人が去る
追悼
WebMCP:AIエージェントのWebアクセス標準化
技術
Astral(uv/ruff開発元)がOpenAIに参加
記事
Autoresearch: AIエージェントで古い研究アイデアを自動実験
記事
Flash-MoE: 397Bパラメータモデルをノートパソコンで実行
記事
iPhone 17 Proで400Bパラメータ LLM を動かす「flash-moe」
記事
KittenTTS — 25MB以下の超小型テキスト読み上げモデル
記事
ローカル音声アシスタント構築の全記録 ― Google Homeから完全ローカルへの道
記事
OpenRocket — 無料オープンソースのモデルロケットシミュレータ
記事
Python 3.15のJITコンパイラが順調に進行中
記事
Rob Pikeのプログラミング5つのルール (1989)
記事
太陽なしでも生命は宿れる🌐 Space.com記事
記事
RollerCoaster Tycoon — 最適化のゴールドスタンダード
記事
「スモールWeb」は思っているより大きい
記事
Stravaleaks: フランス空母がフィットネスアプリで位置特定される
記事
Manyana: CRDTによるバージョン管理の未来
記事
Wander — スモールWebを探索する小さな分散型ツール
記事
Xbox Oneがついにハックされた — 13年の壁を破った電圧グリッチ攻撃
記事
アシモフ「最後の質問」— エントロピーを逆転できるか
記事
Claude Design: AnthropicのAIデザインツール
記事
ヒトはなぜ賢いのか — ヘンリック「文化がヒトを進化させた」解説
記事
smol machines: サブ秒コールドスタートの軽量VM
記事
「生物から見た世界」を読んで — あなたの世界は本当に「世界」か?
読書
Nullsoft 1997-2004 — 最後のマーベリック企業の死(Slate 2004)日本語要約
記事