ArXiV ML/AI/CV papers summary

Theme 1: Mechanistic Interpretability & Steering

The field is rapidly maturing from treating models as opaque “black boxes” to viewing them as systems with inspectable, manipulatable internal dynamics. We are moving toward a paradigm where we can “steer” model behavior by directly accessing latent representations, rather than relying solely on expensive retraining.

Internal Dynamics: Mechanistic Analysis of Alignment Algorithms in Language Models reveals that behavioral alignment (via DPO or PPO) does not imply uniform internal restructuring, highlighting the complexity of latent space shifts.
Steering & Control: Researchers are now using internal activations as a control surface. Rotate2Think: Geometric Priming via Orthogonal Rotation to Improve Language Model Reasoning uses orthogonal rotations to prime reasoning, while Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders and A Geometric Account of Activation Steering through Angle-Norm Decomposition demonstrate how sparse autoencoders and angle-norm decomposition can isolate causal features for precise behavioral interventions.
Reliability: As we gain steering capabilities, we must ensure our diagnostics are sound. When Attribution Patching Lies: Diagnosis and a Second-Order Correction provides a critical second-order correction to ensure that importance estimates are causally valid.
Ethics & Bias: Moral Sensitivity in LLMs: A Tiered Evaluation of Contextual Bias via Behavioral Profiling and Mechanistic Interpretability maps specific model behaviors to internal circuit-level activations, allowing for more precise safety interventions regarding contextual bias.

Theme 2: Agentic Reasoning & Tool-Augmented Systems

The transition from passive models to autonomous agents is the defining shift of this period. We are moving away from simple prompt-response loops toward systems that can plan, use tools, and self-correct through evidence-grounded reasoning.

Planning and Tool Use: AutoPDE: Reliable Agentic PDE Solving via Explicitly Represented Solver Strategies and AgentPLM: Agentic Protein Language Models with Reasoning-Augmented Decoding for Protein Sequence Design show that agents are most effective when they maintain an explicit, inspectable strategy. ASA: Backbone-Training-Free Representation Engineering for Tool-Calling Agents uses an Activation Steering Adapter to amplify tool-use intent without weight updates.
Verifiable Reasoning: To combat “spurious correctness,” frameworks like SAFE: An LLM-as-Verifier Framework for Evidence-Grounded Multi-Hop Reasoning and RAG over Thinking Traces Can Improve Reasoning Tasks shift evaluation toward verifying intermediate reasoning steps. Learning Evidence Highlighting for Frozen LLMs allows models to focus on pivotal evidence spans without retraining.
Reliability and Self-Evolution: Agents are learning from their own failures. Trace2Policy: From Expert Behavior Traces to Self-Evolving Decision Agents and Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution highlight iterative refinement. Meanwhile, STAG-Claw: Automated State-based Agent Benchmarking for Realistic Scenarios and T1-Bench: Benchmarking Multi-Scenario Agents in Real-World Domains provide the rigorous evaluation needed for professional-grade deployment.
Safety: HarDBench: A Benchmark for Draft-Based Co-Authoring Jailbreak Attacks for Safe Human-LLM Collaborative Writing warns that agentic productivity tools introduce new attack surfaces.

Theme 3: Physics-Informed & Embodied Intelligence

As AI enters the physical world, the focus has shifted to grounding models in physical laws, spatial constraints, and 3D world modeling.

Physics Foundation Models: Structure-Preserving Learning Improves Geometry Generalization in Neural PDEs, Geometry-Aware Anisotropic Boundary Correction for Aerodynamic Simulation, and UniPixie: Unified and Probabilistic 3D Physics Learning via Flow Matching embed physical conservation laws directly into architectures. Conformal Prediction for Neural Operators and PL-KKT-hPINN further ensure physical consistency.
Embodied Control: HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers and IMPACT: Learning Internal-Model Predictive Control for Forceful Robotic Manipulation focus on robust, human-centric control.
Spatial Reasoning: What Spatial Memory Must Store: Occlusion as the Test for Language-Agent Memory and Constructing coherent spatial memory in LLM agents through graph rectification argue that agents must maintain explicit, graph-structured spatial representations to navigate the world.
Generative Environments: ABot-Earth 0.5 and Envision4D enable closed-loop simulation by synthesizing vast 3D worlds.

Theme 4: Efficient Scaling & Hardware-Aware Optimization

The research community is prioritizing “doing more with less,” focusing on memory efficiency, inference latency, and the co-design of models with hardware.

Inference & KV Cache: IntentKV, RKSC, and ReasonAlloc optimize memory by pruning or sharing cache entries based on intent. FlashMemory-DeepSeek-V4 and Prefilling-dLLM address long-context bottlenecks.
Structured Pruning & Acceleration: TENP and SHAPE demonstrate hardware-aware expert pruning. K-Forcing, CLP, and PathRelax enable multi-token inference without quality degradation.
Training Efficiency: FedSLoP, PRISM, and Continual LLM Upcycling introduce architectures designed to break serial bottlenecks. A Theory of Training Profit-Optimal LLMs provides the economic framework for this efficiency.

Theme 5: Alignment, Truthfulness, and Evaluation

The “alignment problem” has evolved into a practical engineering discipline, with a growing focus on truthfulness, cultural awareness, and the limitations of current benchmarks.

Truthfulness & Sycophancy: TruthRL and PhantomBench incentivize models to recognize knowledge boundaries. The Price of Agreement and Recalling Too Well highlight the danger of models prioritizing user agreement over factual correctness.
Cultural Bias: The Shibboleth Effect and Who Brought Easter Eggs to Eid? reveal systematic biases that collapse cultural diversity into Western-centric ideologies.
Evaluation Paradigms: RankLLM and KCSAT-ML argue that aggregate accuracy hides weaknesses, necessitating benchmarks grounded in human difficulty. Does Capability Transfer to Subjective Behavior warns that objective scaling does not guarantee subjective reliability, requiring “trust-by-construction” evaluation.
RLHF & Plasticity: When RL Fails after SFT offers techniques to rejuvenate model plasticity, ensuring models remain capable of learning from RL rewards after supervised fine-tuning.