ArXiV ML/AI/CV papers summary

Theme 1: Physics-Informed and Structure-Preserving AI

We are witnessing a profound transition from models that merely pattern-match data to those that respect the fundamental laws of the universe. By embedding physical constraints—such as conservation of energy, mass, and momentum—directly into neural architectures, we ensure that AI predictions remain physically valid rather than relying on fragile post-hoc filters.

Generative Physics: Physics-informed generative AI for semiconductor manufacturing: Enforcing hard physical constraints in generative models by construction and Least-Action-Guided Diffusion for Physical Extrapolation demonstrate how physical principles like the principle of least action can guide generative processes to ensure consistency.
Neural Operators: GENERIC-FNO: Embedding Energy Conservation and Entropy Production into Fourier Neural Operators and SirenFNO: Efficient and Full Frequency Learning of Fourier Neural Operators refine PDE solvers by encoding thermodynamic structures.
Constraint-Based Stability: Energy-Conserved Neural Pipelines: Attenuating Error Propagation in Modular Neural Networks via Physical Conservation Constraints uses hard constraints to prevent error accumulation in complex robotic and synthetic pipelines.
Engineering Applications: Human-Enhanced Loop Modeling (HELM): Agent-Based Finite Element Modeling of Concrete Bridge Barriers and A Lightweight Multi-Agent Framework for Automated Concrete Barrier Design prove that grounding agents in physical constraints allows smaller models to outperform massive, unconstrained counterparts in engineering tasks.

Theme 2: Mechanistic Interpretability and Structural Diagnosis

As our models grow in complexity, we must move beyond the “black box” paradigm. This theme focuses on the “latent geometry” of neural networks, treating them as objects of scientific study to understand how they compute, reason, and store information.

Geometric and Concept Discovery: Trajectory Geometry of Transformer Representations Across Layers and Visualizing LLM Latent Space Geometry Through Dimensionality Reduction reveal that LLMs organize knowledge in structured manifolds. Cross-Layer Discrete Concept Discovery for Interpreting Language Models and Vector Quantized Latent Concepts: A Scalable Alternative to Clustering-Based Concept Discovery provide methods to extract discrete, human-understandable concepts from hidden states.
Formal Theory: The Standard Interpretable Model: A general theory of interpretable machine learning to deductively design interpretable methods using Lagrangian mechanics offers a formal, physics-inspired framework for interpretability, while Anatomy of Post-Training: Using Interpretability to Characterize Data and Shape the Learning Signal uses these tools to audit training data for spurious correlations.
Mechanistic Tracing: Language Model Circuits Are Sparse in the Neuron Basis and Transformer Field Theory: A Response-Theoretic Approach to Mechanistic Interpretability allow us to trace specific “circuits” responsible for reasoning, moving us toward a true science of model internals.

Theme 3: Agentic Architectures and Cognitive Workflows

The frontier of AI has shifted from passive text generation to autonomous, agentic systems that maintain state, plan, and execute multi-step workflows. This transition requires “cognition layers” that allow agents to reason, self-correct, and interact with tools reliably.

Reasoning and Planning: Arbor: Tree Search as a Cognition Layer for Autonomous Agents and WISE: A Long-Horizon Agent in Minecraft with Why-Which Reasoning introduce structured search and causal reasoning to decouple episodic memory from decision-making.
Tool-Use and Reliability: HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agents optimizes tool invocation, while Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents and SENTINEL: Failure-Driven Reinforcement Learning for Training Tool-Using Language Model Agents address the unique safety risks inherent in multi-turn agentic interactions.
Verification: Operadic consistency: a label-free signal for compositional reasoning failures in LLMs and Operads for compositional reasoning in LLMs provide a mathematical framework to verify reasoning consistency at inference time.
Self-Optimization: Evolving Agents in the Dark: Retrospective Harness Optimization via Self-Preference and FitText: Evolving Agent Tool Ecologies via Memetic Retrieval demonstrate that agents can iteratively refine their own tool usage and belief systems.

Theme 4: Multimodal Grounding and World Models

To truly understand our world, AI must bridge the gap between abstract language and physical reality. This theme explores how models can “see,” “act,” and predict consequences in 3D space.

Spatial and 3D Reasoning: GeoWorld-VLM: Geometry from World Models for Vision-Language Models enhances spatial judgment, while World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible allows models to infer occluded 3D surfaces.
Robotic World Models: NavWAM: A Navigation World Action Model for Goal-Conditioned Visual Navigation and MaskWAM: Unifying Mask Prompting and Prediction for World-Action Models enable robots to “imagine” the visual consequences of their actions before execution.

Theme 5: Scientific Discovery and Domain-Specific Autonomy

AI is evolving into an autonomous researcher capable of handling the rigor of scientific workflows, from molecular dynamics to formal theorem proving.

Scientific Autonomy: EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery and Benchmarking AI Agents for Addressing Scientific Challenges Across Scales focus on creating the environments necessary for agents to conduct research.
Formal and Molecular Reasoning: Pythagoras-Prover: Advancing Efficient Formal Proving via Augmented Lean Formalisation and MDForge: Agentic Molecular Dynamics Pipeline Design under Sparse Simulator Feedback automate complex formal and chemical discovery.
Knowledge Orchestration: Agents-K1: Towards Agent-native Knowledge Orchestration and InnoEval: On Research Idea Evaluation as a Knowledge-Grounded, Multi-Perspective Reasoning Problem provide frameworks for agents to synthesize literature and evaluate research ideas.
Clinical Integrity: Deterministic Integrity Gates for LLM-Assisted Clinical Manuscript Preparation: An Auditable Biomedical Informatics Architecture and MARD: Mirror-Augmented Reasoning Distillation for Mechanism-Level Drug-Drug Interaction Prediction ensure that AI-assisted scientific work remains auditable and grounded in mechanism-level evidence.

Theme 6: Efficiency, Hardware Co-Design, and Adaptation

As agentic workflows become more sophisticated, the computational cost of inference and training becomes a primary bottleneck. Research is now focused on making these systems faster, more memory-efficient, and capable of running on edge hardware.

Inference Optimization: MPK: A Compiler and Runtime for Mega-Kernelizing Tensor Programs, MiniMax Sparse Attention, and Can I Buy Your KV Cache? introduce radical new ways to manage memory and compute, including reusable KV caches and mega-kernels.
Quantization and Edge AI: TWLA: Achieving Ternary Weights and Low-Bit Activations for LLMs, Characterizing the Impact of NVFP4 Quantization for Low-Power Edge AI Deployment, MobileFineTuner: A Mobile-Native Framework for On-Device LLM Fine-Tuning in Real-World Embedded AI Applications, and Energy-Efficient On-Device RAG on a Mobile NPU: System Design and Benchmark on Snapdragon X Elite provide a roadmap for deploying high-performance AI on resource-constrained devices.
Tuning-Free Learning: GILT: An LLM-Free, Tuning-Free Graph Foundational Model for In-Context Learning and Harness In-Context Operator Learning with Chain of Operators demonstrate that models can perform complex reasoning in-context, bypassing the need for expensive fine-tuning.

Theme 7: Governance, Safety, and Evaluation Sovereignty

The final theme addresses the critical challenge of ensuring that models remain safe, reliable, and aligned with human values, moving beyond simple safety filters toward constitutional governance.

Robustness and Alignment: Reinforcement Learning Disrupts Gradient-Based Adversarial Optimization, Bypassing Prompt Guards in Production with Controlled-Release Prompting, and Learning to Inject: Automated Prompt Injection via Reinforcement Learning highlight the ongoing arms race in adversarial security. Existential Indifference: Self-Nonpreservation as a Necessary Architectural Condition for Aligned Superintelligence and Certifiable Safe RLHF: Semantic Grounding and Fixed Penalty Constraint Optimization for Safer LLM Alignment offer structural paths toward provable safety.
Governance and Ethics: Algorithmic Constitutionalism and Epistemic Constitutionalism propose explicit meta-norms for AI, while The Containment Gap: How Deployed Agentic AI Frameworks Fail Public-Facing Safety Requirements audits current agentic frameworks for security. Two Wrongs, No Right: Auditing Social-Desirability Bias in LLM Annotators for Computational Social Science warns against alignment-induced biases in research.
Evaluation Sovereignty: Agents’ Last Exam, Definitional alignment before capability alignment: a Design-Science framework for adjudicating claims about AGI, Evaluation Sovereignty in Metadata-Driven Classification, and AgentBeats: Agentifying Agent Assessment for Openness, Standardization, and Reproducibility argue for a shift toward deployment-centered, governance-oriented evaluation that measures real-world utility.
Human-AI Collaboration: Human-Guided Agentic AI for Multimodal Clinical Prediction and (Human) Attention Is (Still) All You Need emphasize that human oversight remains a vital component for reliability in high-stakes settings.