ArXiV ML/AI/CV papers summary

Theme 1: Foundation Models for Specialized Domains

The era of “one-size-fits-all” transformers is yielding to native foundation models that respect the intrinsic geometry and physical constraints of specific domains. By embedding governing equations and physical coordinates directly into the architecture, these models achieve stability and efficiency that generic text-based models cannot match.

Wireless & Physical Systems: Towards CSI-Native Foundation Models: A Channel-Adaptive Roadmap for 6G, Physics-Guided Fully Convolutional Spatiotemporal Learning Toward Digital-Twin-Enabled Microstructure Evolution Prediction, and Physics-Guided Dual-Stream Heterogeneous Graph Neural Network for Predicting Full-Field Structural Response of Stiffened Panels demonstrate that treating data as physical coordinates rather than tensors is essential for long-term stability in engineering.
Brain & Health: B[FM]$^2$: Brain Foundation Model via Flow Matching with SplitUNet, NeuroShield: A Device-Agnostic Foundation Model for EEG Authentication, and Cohort-Anchored Foundation Models for Electronic Health Records: From Risk Scores to Auditable Peer Cohorts emphasize models that handle heterogeneous, continuous data and align with clinical reasoning through cohort-based anchoring.
Scientific Discovery: BioMatrix: Towards a Comprehensive Biological Foundation Model Spanning the Modality Matrix of Sequences, Structures, and Language and Empowering Polymeric Materials Discovery by Artificial Intelligence illustrate how agents act as “AI Scientists,” navigating complex design spaces by combining simulation and reasoning.

Theme 2: Agentic Reasoning, Planning, and Workflow Orchestration

We are witnessing a transition from passive chatbots to autonomous agents that act as orchestrators. This shift requires moving beyond “prompt-and-pray” methods toward structured, verifiable, and long-horizon planning.

Reasoning & Search: Provable Benefits of RLVR over SFT for Reasoning Models: Learning to Backtrack Efficiently, Scheduling Thoughts: Learning the Order of Thought in Diffusion Language Models, and SPIRAL: Learning to Search and Aggregate provide the theoretical and practical foundations for test-time scaling, allowing models to search and backtrack.
Agentic Frameworks: From Question Answering to Task Completion: A Survey on Agent System and Harness Design, Sakana Fugu Technical Report, and Specialize Roles, Mix Deployments: Pushing the Cost-Accuracy Frontier of LLM Agent Teams explore the orchestration of multi-agent teams and the infrastructure required to manage them.
Cognitive Memory: Plans Don’t Persist: Why Context Management Is Load Bearing for LLM Agents, RaMem: Contextual Reinstatement for Long-term Agentic Memory, and Learning What to Remember: A Cognitively Grounded Multi-Factor Value Model for Agentic Memory highlight that agents must learn to prioritize information based on goal relevance rather than simply retaining everything.

Theme 3: Verification, Governance, and Structural Safety

As agents gain the ability to execute code and interact with external systems, the “black box” approach to safety is failing. The field is shifting toward white-box, mechanistic, and formal verification to ensure systems are auditable and aligned with human intent.

Formal Verification: Hypothesis-Disciplined Multi-Agent Automated Formalization of Asymptotic Statistical Theory, VeriBound: PAC-Bayesian Generalization Bounds for Process Reward Models Trained with Formal Verification Tools, and Composing Verifiable Conceptual Models via Building Blocks: Towards Design-Time Verification of Agentic AI Workflows bridge the gap between symbolic logic and neural architectures.
Mechanistic Interpretability: The Geometry of Refusal: Linear Instability in Safety-Aligned LLMs, Beyond Importance: Interchange-Sobol Sensitivity Reveals Task-Specific Content Channels in Transformer Components, and AgentLens: Interpretable Safety Steering via Mechanistic Subspaces for Multi-Turn Coding Agent move safety from external guardrails to internal representation steering.
Governance & Auditing: The AI Evaluability Gap: The Missing Layer for Managing Risk and Sustaining Value, AgentRiskBOM: A Risk-Scoping Security Bill of Materials for Agentic AI Systems, and Governance Decay: How Context Compaction Silently Erases Safety Constraints in Long-Horizon LLM Agents address the critical need for evidence-based governance.

Theme 4: Efficiency, Optimization, and Hardware-Aware Design

As models grow, the computational cost of “thinking” becomes a bottleneck. The field is responding with innovative optimization techniques that align mathematical tools with the underlying geometric and physical structure of the models.

Quantization & Compression: HyperQuant: A Rate-Distortion-Optimal Quantization Pipeline for Large Language and Diffusion Models, GRINQH: Graded Input-based Quantization Hierarchy for Efficient LLM Generation, and FORGE: Fused On-Register Gradient Elimination for Memory-Efficient LLM Training provide breakthroughs in training and inference efficiency.
Architectural Efficiency: SpotAttention: Plug-In Block-Sparse Routing for Pretrained Long-Context Transformers, HERALD: High-Throughput Block Diffusion LLM Serving via CPU-GPU Cooperative KV Cache Retrieval, and Ultrafast On-Chip Online Learning via Spline Locality in Kolmogorov-Arnold Networks rethink how we run these systems on constrained hardware.
Symmetry-Compatible Design: Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers and REFINE: Super-efficient 3D Gaussian Splatting Pruning via Rendering-Free Primitive Importance argue that optimizers and pruning methods should respect the geometric symmetries of the neural network architecture.