ArXiV ML/AI/CV papers summary
Theme 1: The Physics of Intelligence and Scientific Modeling
We are witnessing a profound shift from “black-box” curve fitting toward a paradigm where physical laws are the bedrock of machine learning. By embedding constraints directly into neural architectures, we move closer to models that don’t just mimic data, but understand the underlying mechanics of the universe.
- Physics-Informed Discovery: Researchers are moving toward interpretable, physics-compliant symbolic structures. Physics-Informed Discovery of Yield Functions in Plasticity via Convex Neural Representations and Agentic Symbolic Search: Characterizing PDEs Beyond Hand-crafted Expressions, Meshes, and Neural Networks exemplify this, as does An adaptive framework for the axisymmetric pulsar magnetosphere using physics-informed Kolmogorov-Arnold networks, which achieves massive efficiency gains by replacing manual tuning with physics-based convergence.
- Surrogate Modeling & Efficiency: Hybridizing GNNs with traditional numerical methods (FEM) allows for accelerated simulations while maintaining equilibrium, as seen in A Hybrid GNN-FEM Framework for Phase-Field Fracture Simulation. Physics-Preserving Hybridization for Generalizable Surrogate Modeling and Adaptive Distance-Aware Trunk Deep Operator Learning for Long-Span Roadway Bridges. Lightweight alternatives like Physics-Informed Neural Network with Squeeze-Excitation-like Attention and Learning universal approximations for partial differential equations with Physics-Informed Broad Learning System further reduce the computational burden of solving PDEs.
- Thermodynamic Agency: At a more philosophical level, Thermodynamic Measure of Intelligence and The Tao of Agency: Autotelic AI, Embedded Agency and Dissolution of the Self explore intelligence as a physical, recursive process of self-simulation and goal generation, framing the “self” as a necessary boundary for an agent to act within an environment.
Theme 2: Agentic Reasoning, Verification, and Safety
As LLMs evolve into autonomous agents, the industry is pivoting from probabilistic “chain-of-thought” to sound, verifiable reasoning. The goal is to move beyond mere fluency toward systems that can be trusted in high-stakes environments.
- Reasoning & Verification: To improve reliability, researchers are integrating search algorithms and formal solvers. VERITAS: Verifier-Guided Proof Search for Zero-Shot Formal Theorem Proving, StarOR: Synergizing Tree Search and Test-Time Reinforcement Learning for Optimization Modeling, and Process-Verified Reinforcement Learning for Theorem Proving via Lean demonstrate that symbolic oracles can guide LLMs toward sound conclusions. Analyzing the Narration Gap in LLM-Solver Loops serves as a critical reminder that the translation between LLM output and formal logic remains a significant safety bottleneck.
- Efficiency in Reasoning: ADaPT: Token-Level Decoupling for Efficient Large Reasoning Models and Granularity-Regulated Adaptive Computational Efficiency for Optimal Verification in Test-Time Scaling show that we can dynamically allocate compute based on problem difficulty, rather than wasting resources on uniform reasoning chains.
- Governance & Accountability: AgentArmor: A Framework, Evaluation, & Mitigation of Coding Agent Failures, Large Language Models Hack Rewards, and Society, and Deontic Policies for Runtime Governance of Agentic AI Systems highlight the urgent need for guardrails, while MoCA-Agent: A Market-of-Claims Code Agent for Financial and Numerical Reasoning and JustDiag!: A Diagnostic Justification Engine for Accountable Root Cause Analysis emphasize that agents must provide a “ledger” of evidence to be truly accountable.
Theme 3: Embodied AI and Physical Intelligence
The frontier of AI is moving from the screen to the physical world. This requires models that understand kinematics, spatial constraints, and the messy reality of robotics.
- Spatial & Embodied Awareness: Occ-VLM: Occupancy Grounded Vision Language Model for Indoor Scene Understanding, SpatialSV: Internalizing Interpretable 3D Spatial Awareness in MLLMs via Task-Oriented Visual Supervision, and 3D-PLOT-LLM: Part-Level Object Tokens for 3D Large Language Models provide the 3D geometric priors necessary for grounding language in physical space.
- Robot Navigation & Learning: PhysDrift: Bridging the Embodiment Gap in Humanoid Co-Speech Motion Generation, ENPIRE: Agentic Robot Policy Self-Improvement in the Real World, and Qwen-RobotNav Technical Report: A Scalable Navigation Model Designed for an Agentic Navigation System address the “embodiment gap,” while HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining offers a breakthrough in data sourcing for robotics.
- Infrastructure: Data Standards for Humanoid Robotics: The Missing Infrastructure for Physical AI argues that standardized data is the key to scaling physical AI, while VOiLA: Vectorized Online Planning with Learned Diffusion Model for POMDP Agents provides the efficient planning tools needed for real-time autonomy.
Theme 4: Generative Modeling and Efficient Deployment
As models grow, the focus shifts to hardware-aware optimization, structural compression, and the fine-tuning of generative processes.
- Generative Optimization: Learning When to Denoise: Optimizing Asynchronous Schedules for Latent Diffusion and CrossFlow: One-Step Generation Across Latent and Pixel Spaces optimize the diffusion process for speed. DiffMath: Symbol- and Graph-Aware Latent Diffusion Transformer for Handwritten Mathematical Expression Generation and Shape of Thought: Progressive Object Assembly via Visual Chain-of-Thought introduce structural priors to guide high-fidelity generation.
- Compression & Edge Deployment: Techniques for Peak Memory Reduction for LoRA Fine-tuning of LLMs on Edge Devices, UltraQuant: 4-bit KV Caching for Context-Heavy Agents, and Activation- and Influence-Aware Ranks (AIR): Function-Preserving SVD Compression for LLMs enable high-performance inference on resource-constrained hardware. How Linear Is a Transformer Feed-Forward Block? Per-Block Linear Recoverability Is Learned, Not Architectural provides deep insights into simplifying transformer architectures.
- Sensor-to-Processor Pipelines: FrequencyFormer: A Co-Designed Sensor-to-Processor Pipeline for Frequency-Domain Vision Transformer Inference, Neural Events: Discrete Asynchronous Autoencoders for Event-Based Vision, and ViCoStream: Streaming VideoLLMs Can Run Beyond 100 FPS with Stage-Wise Coordinated Inference rethink the data flow to achieve real-time efficiency.
Theme 5: Interpretability, Robustness, and Evaluation
We are moving toward a paradigm where we can inspect the “gears” of a model and rigorously test its performance in the wild.
- Mechanistic Understanding: How Transparent is DiffusionGemma?, The Representational Limit of Scalar Interactions: An Interventional Decomposition, Leverage Is Not Reach: A Control-Window Law for Single-Neuron Steering in Language Models, and GEMS: Geometric Constraints Enable Multi-Semantic Superposition in LLMs allow us to map information flow and steer model behavior without full retraining.
- Data-Centric AI & Clinical Translation: BLISS: A Lightweight Bilevel Influence Scoring Method for Data Selection in Language Model Pretraining and A Critical Look at Targeted Instruction Selection: Disentangling What Matters (and What Doesn’t) highlight the importance of data quality. In medicine, HypOProto: Hyperbolic Ordinal Prototypes for Left Ventricular Filling Pressure Classification and GEN-Guard: Correcting Generalization Failures for Deployable Federated Surgical AI provide frameworks for interpretable and robust clinical diagnostics.
- Predictive Benchmarking: Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents, StaminaBench: Stress-Testing Coding Agents over 100 Interaction Turns, and NRT-Bench: Multi-turn red-teaming of LLM agents move the field toward evaluating agents under sustained, adaptive, and real-world pressure.