ArXiV ML/AI/CV papers summary
Theme 1: Physics-Guided and Embodied Intelligence
The frontier of AI is shifting from the digital screen to the physical world. We are moving away from “black-box” models that merely correlate pixels toward architectures that internalize the fundamental laws of physics, geometry, and 3D space. By grounding models in physical constraints, we ensure that their reasoning remains consistent with the reality they inhabit.
- Physics-Guided Surrogates: Physics-guided Convolutional Neural Network for Domain Growth Prediction in Systems with Conserved Kinetics and fTNN: a tensor neural network for fractional PDEs embed physical laws directly into network architectures to ensure stable, energy-consistent simulations.
- Geometric Grounding: Neural Voxel Dynamics: Learning Implicit 3D Physics via Volumetric Feature Advection, PhysiFormer: Learning to Simulate Mechanics in World Space, and UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models demonstrate that injecting geometric priors is essential for maintaining structural fidelity in 3D space.
- Embodied Robotics: In-Context World Modeling for Robotic Control, CoStream: Composing Simple Behaviors for Generalizable Complex Manipulation, and Play2Perfect: What Matters in Dexterous Play Pretraining for Precise Assembly? show that robots must learn to “think” about world structure and compose simple, reusable behaviors to navigate complex physical tasks.
- Structural Priors: Appearance-Preserving Refinement of Generated 3D Assets for Monochromatic Fabrication and Controllable Egocentric Video Generation via Occlusion-Aware Sparse 3D Hand Joints bridge the gap between digital generation and physical reality.
Theme 2: Agentic Industrialization and Self-Evolution
We are witnessing the transition from “artisanal” AI development to industrialized, self-improving loops. Modern agents are increasingly capable of planning, tool use, and recursive self-improvement, effectively managing their own research and development cycles.
- Recursive Improvement: The Red Queen G"odel Machine: Co-Evolving Agents and Their Evaluators, EVOM: Agentic Meta-Evolution of Actor-Critic Architectures for Reinforcement Learning, and AgentX: Towards Agent-Driven Self-Iteration of Industrial Recommender Systems represent systems that brainstorm, code, and refine their own architectures.
- Scientific Discovery: Closing the Loop to Discover Psychological Theories with an Automated Cognitive Scientist applies this closed-loop logic to scientific inquiry, allowing AI to propose and test its own theories.
- Tool Use and Planning: Localizing RL-Induced Tool Use to a Single Crosscoder Feature and Finding the Time to Think: Learning Planning Budgets in Real-Time RL provide mechanistic insights into how agents balance deliberation with real-time constraints.
- Integration: A Process Harness for Uplifting Legacy Workflows to Agentic BPM: Design and Realization in CUGA FLO provides the structural framework to integrate these autonomous agents into deterministic business processes.
Theme 3: Grounding, Trust, and Reasoning
As AI takes on high-stakes roles, the “hallucination” problem is being addressed by forcing models to anchor their reasoning in verifiable evidence and domain-specific knowledge.
- Domain-Specific Grounding: KG-TRACE: A Neuro-Symbolic Framework for Mechanistic Grounding in Antimicrobial Resistance Prediction, TAVR-VLM: Risk-Conditioned Causal Grounding for Hallucination-Resistant Report Generation, and CheXanatomy: Anatomy-Aware Vision-Language Modeling for Chest Radiographs ensure that predictions are grounded in established biological or anatomical pathways.
- Verification and Provenance: ProvenAI: Provenance-Native Traces of Evidence in Generated Answers, OpenRCA 2.0: From Outcome Labels to Causal Process Supervision, and Improving Reasoning in Vision-Language Models via Perception Verified Self-Training demand that we verify the propagation path of reasoning rather than just the final output.
- Self-Correction: Visual-OPSD: Cross-Modal On-Policy Self-Distillation for Efficient Unified Multimodal Reasoning and VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization highlight a shift toward models that can verify their own outputs and learn from feedback.
- Boundary Awareness: Know2Guess: A Contamination-Aware Multi-Zone Benchmark for Knowledge-Boundary Evaluation in Large Language Models provides protocols for auditing when a model should admit ignorance.
Theme 4: Efficiency, Scalability, and System-Level Optimization
To move AI from the server room to the edge, we must treat efficiency as a fundamental design principle rather than an afterthought.
- Hardware-Aware Optimization: SOLAR: AI-Powered Speed-of-Light Performance Analysis and Optimizing CUDA like a Human: Micro-Profiling Tools as Expert Surrogates for LLM-Based GPU Kernel Optimization use LLMs to automate low-level kernel tuning.
- Compression and Inference: SharQ: Bridging Activation Sparsity and FP4 Quantization for LLM Inference, PersistentKV: Page-Aware Decode Scheduling for Long-Context LLM Serving on Commodity GPUs, CAT-Q: Cost-efficient and Accurate Ternary Quantization for LLMs, and JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting optimize memory and compute for long-context serving.
- Resource-Efficient Deployment: TinySR: Pruning Diffusion for Real-World Image Super-Resolution, GreenRFM: Learning a resource-efficient radiology vision-language foundation model via supervision-centric pre-training, and CoVStream: Edge-Cloud Collaboration for Understanding of Long Video Streams demonstrate that high performance is achievable on constrained hardware.
- Evaluation Paradigms: The Generalization Spectrum: A Chromatographic Approach to Evaluating Learning Algorithms and The Capability Frontier: Benchmarks Miss 82% of Model Performance argue that we must move beyond single-number metrics to evaluate models across a spectrum of generalization and cost-performance frontiers.
Theme 5: Governance and Safety
As AI systems grow in complexity, our governance models must evolve to address the risks of autonomous, multi-source, and high-stakes agents.
- Governance Models: Beyond Feedforward Networks: Reentry Neural Systems as the Fundamental Basis of Subjecthood and Intrinsic Safety of Next-Generation AGI and Governing Actions, Not Agents: Institutional Attestation as a Governance Model for Autonomous AI Systems propose architectural and cryptographic solutions for safety.
- Systemic Risk: The Governance Inversion Hypothesis: Why More AI Regulation May Produce Less Organisational Control and Fortress and Gatekeeper: Theorizing Transitive Trust in Third-Party Cybersecurity Risk Governance warn of the dangers of procedurally dense regulation and the “transitive trust” problem in AI supply chains.
- Privacy: Agents That Know Too Much: A Data-Centric Survey of Privacy in LLM Agents and Hybrid privacy-aware semantic search address the unique privacy risks inherent in agents that interact with multiple data sources.