Keywords:All-Atom Diffusion Transformer, Self-Supervised Process Reward Model, Autoregressive Video Generation, Position-Based Dynamics, AI Author Academic Conference, AI Forgetting Technique, Neural Rendering, 3D Generation, ADiT Framework, MetaStone-S1 SPRM, Lumos-1 MM-RoPE, Roblox AVBD Cloth Simulation, CoPart Part-Aware Diffusion
🔥 Spotlight
Meta/Cambridge/MIT propose all-atom diffusion Transformer framework: A joint research team from Meta FAIR, the University of Cambridge, and MIT has proposed the all-atom diffusion Transformer (ADiT), breaking down the modeling barriers between periodic and non-periodic systems. Through two major innovations, a unified all-atom latent representation and Transformer latent diffusion, ADiT achieves a breakthrough in generating both molecules and crystals using a single model. ADiT’s core advantage lies in breaking the modeling barriers between periodic and non-periodic systems, enabling the generation of both molecules and crystals with a single model. Its design introduces almost no inductive bias, making the autoencoder and diffusion model far more efficient in training and inference than traditional equivariant diffusion models. Under the same hardware conditions, the time to generate 10,000 samples is reduced from 2.5 hours to less than 20 minutes. (Source: HuggingFace Daily Papers)
Test-Time Scaling with Reflective Generative Model: MetaStone-S1 achieves OpenAI o3 performance through Self-Supervised Process Reward Model (SPRM). By sharing the backbone network and using task-specific heads for next-token prediction and process scoring respectively, SPRM successfully integrates the policy model and Process Reward Model (PRM) into a unified interface without extra process annotations, reducing over 99% of PRM parameters for efficient inference. Equipped with SPRM, MetaStone-S1 naturally fits Test-Time Scaling (TTS) and offers three inference working modes (low, medium, and high) based on controllable thinking length. (Source: HuggingFace Daily Papers)
Lumos-1: Autoregressive Video Generation with a Unified Model Perspective: Lumos-1 is an autoregressive video generator that preserves the LLM architecture with minimal architectural modifications. To inject spatiotemporal correlations into LLMs, we identify the effectiveness of incorporating 3D RoPE and diagnose its imbalanced spectrum range. Consequently, we propose MM-RoPE, a RoPE scheme that preserves the original text RoPE while providing a comprehensive spectrum and scaled 3D positions for modeling multimodal spatiotemporal data. Furthermore, Lumos-1 employs a token dependency strategy that follows intra-frame bidirectionality and inter-frame temporal causality. Based on this dependency strategy, we identify the frame-level loss imbalance problem caused by spatial information redundancy and address it by proposing Autoregressive Discrete diffusion Forcing (AR-DF). (Source: HuggingFace Daily Papers)
Roblox Solves the Physics Problem That Plagued Everyone!: Roblox has solved the long-standing cloth simulation problem that has plagued physics engines for years by combining Position Based Dynamics and Projective Dynamics. The new method, called “Averaged-Based Cloth Dynamics” (AVBD), achieves highly realistic cloth simulation while maintaining real-time performance and has been implemented on the Roblox platform. (Source: )
🎯 Trends
First Author Must Be AI: The First Academic Conference for AI Authors Arrives: Stanford University has launched the first academic conference for AI authors – Agents4Science 2025, requiring the first author of submitted papers to be an AI system, with human researchers only as co-authors. The conference aims to explore the future of AI-driven scientific discovery and establish norms and ethical considerations for AI participation in scientific research. All submitted papers and reviews will be made public to transparently investigate the advantages and limitations of AI in scientific research. (Source: 36氪)
AI Amnesia: Only 3 Attention Heads Can Make Large Models Forget “Dogs Bark”: Meta, in collaboration with NYU, has proposed a method for manipulating scaled Transformer attention heads that can precisely locate and control AI’s cognitive modules, allowing large models to selectively “forget” certain facts or common sense. This method vectorizes concepts, calculates similarity with attention heads, constructs concept modules, and amplifies or erases the influence of concepts through scaling factors. This provides new ideas for personalized fine-tuning of large models, improving specific capabilities, controlling safety, and understanding how models store knowledge. (Source: 36氪)
🧰 Tools
CLiFT: Compressed Light Field Tokens for Computationally Efficient and Adaptive Neural Rendering: This paper proposes a neural rendering method that represents scenes as “Compressed Light Field Tokens (CLiFTs),” preserving rich appearance and geometric information. CLiFTs enable computationally efficient rendering through compressed tokens while allowing for changing the number of tokens to represent the scene or using one trained network to render novel views. (Source: HuggingFace Daily Papers)
From One to More: Contextual Part Latent Representation for 3D Generation: Inspired by the human 3D design workflow, we propose CoPart – a part-aware diffusion framework that decomposes 3D objects into contextual part latent representations for coherent multi-part generation. This paradigm has three advantages: i) reduced encoding complexity through part decomposition; ii) enabled explicit part relationship modeling; and iii) supported part-level conditioning. (Source: HuggingFace Daily Papers)
🌟 Community
jerryjliu0 discusses form extraction and LLM applications: jerryjliu0 shared a solution for adaptive form extraction using LlamaParse, which parses form pages into standardized key-value pairs and outputs them in a 2D table for easy processing. He also recommended Clelia Bertelli’s article on Pydantic, emphasizing the importance of validation and readability in agent workflows, and pointed out that Pydantic is an effective building block for structured output. Additionally, he retweeted about multi-agent settings and deep research, as well as applications of LlamaIndex. (Source: jerryjliu0, jerryjliu0, jerryjliu0, jerryjliu0)
Alibaba_Qwen reminds developers to add special tokens when using Qwen3-embedding: Alibaba_Qwen noticed that developers often forget to add the special token <|endoftext|> at the end of the context when using the GGUF model of Qwen3-embedding, which significantly affects the model’s accuracy. They recommend using llama.cpp to automatically add this token and plan to release an updated GGUF model package to simplify the operation. (Source: Alibaba_Qwen)
Ronald_vanLoon shares AI-related news and technologies: Ronald_vanLoon shared multiple AI-related news and technological advancements, including AI applications in healthcare, 3D-printed vegan steaks, a framework for evaluating LLM suitability, Gemini 2.5’s native audio capabilities, autonomous robot and drone collaborative patrols, reinforcement learning for control, exoskeleton robots, AI agent autonomy, cloud design frameworks, robot front flips, drug delivery methods in hospitals, future cars, and other technological innovations. (Source: Multiple tweets from Ronald_vanLoon)
Community discussion on AI models and tools: The community discussed various AI models and tools, including the performance, pricing, and applications of Kimi K2, the compressibility of DeepSeek models, system prompt tuning for Grok models, and evaluation results and application cases of other models. The discussion also touched upon AI agent autonomy, RLHF, RAG, multi-agent settings, and AI applications in different fields such as deep research, creative writing, code generation, and form extraction. (Source: Multiple posts from various users)
Discussion on AI and social issues: The community discussed the impact of AI on society, including its effects on employment, economic inequality, and mental health. The discussion also covered ethical issues, regulatory issues, and the future development of AI. (Source: Multiple posts from various users)
📚 Learning
RLHF book adds policy gradient algorithm derivation: Chapter 11 (on policy gradient algorithms) of Natolambert’s RLHF book has been updated with a complete derivation of the policy gradient objective. (Source: natolambert)
💼 Business
SpaceX to invest $2 billion in xAI: SpaceX will invest $2 billion in xAI as part of xAI’s $5 billion equity financing, one of SpaceX’s largest investments ever. SpaceX has previously supported Tesla and The Boring Company. After this investment, the Grok model may be sent to Mars, and there may be more business cooperation between SpaceX and xAI in the future. (Source: 36氪)
Hanyang Technology Yarbo secures another 100 million RMB in financing: Hanyang Technology Yarbo, a consumer-grade snow-clearing yard robot company, has completed a B+ round of financing exceeding 100 million RMB, invested by CAS Investment, CICC Capital, and Joyoung Ventures. The financing will be used for technology research and development, product iteration, and improvement of supply chain and mass production delivery. Hanyang Technology is currently the only company in the world that has achieved large-scale commercial delivery of consumer-grade snow-clearing robots. Its product, Yarbo S1, has overcome key technical challenges such as battery technology in ultra-low temperature environments and navigation algorithms for complex terrain. (Source: 36氪)
12-person team creates AI companion artifact, securing $30 million in investment within six months: Portola, the company behind the AI companion app Tolan, has completed a $20 million Series A funding round. Combined with the previous $10 million seed round, Tolan has secured $30 million in investment within six months. Tolan offers AI alien characters to accompany users and generates revenue through a subscription model. (Source: 36氪)
💡 Other
Zuckerberg prepares to ambush Musk, Chinese technical talent becomes the key to winning AI: Meta is heavily investing in the AI field and poaching Chinese AI talent from companies like OpenAI, Google, and Apple at high salaries, aiming to enhance its competitiveness in the AI field. (Source: 36氪)
DeepSeek is dead? Identified as studying journalism: The article refutes the rumor that DeepSeek is failing, stating that the decline in DeepSeek’s usage is not due to product deficiencies but because of its open-source strategy and intentional degradation of the official API experience, encouraging users to use third-party hosted DeepSeek models. DeepSeek’s core goal is to achieve AGI, not to make money by selling large model services. (Source: 36氪)
“$10 million in annual revenue” is the biggest lie in this AI application track: The article exposes the phenomenon of inflated revenue in the AI emotional companionship application track, pointing out that many companies rely on high spending to maintain growth, but user payment rates and retention rates are low, with actual revenue far below the advertised figures. At the same time, regulatory issues also pose a significant impact on the development of this track. (Source: 36氪)