Keywords:Large Language Models, Reinforcement Learning, AI Infrastructure, Multimodal AI, AI Ethics, Quantum Computing, AI Agents, Richard Sutton’s skepticism about LLMs, OpenAI’s Project Stargate, Meta’s Code World Model (CWM), Flash Attention 4 performance optimization, Unitree G1 robot security vulnerabilities

Here’s the translated AI news summary:

🔥 Spotlight

Richard Sutton Questions LLMs: Reinforcement Learning pioneer Richard Sutton questions the “bitter lesson” of Large Language Models (LLMs), arguing that current LLM architectures are not the ultimate path to Artificial General Intelligence (AGI). He advocates for new architectures that enable continuous, in-situ learning, allowing AI agents to learn like humans and animals, which could render existing LLM approaches obsolete. This perspective has sparked widespread discussion in the AI community, prompting a re-evaluation of AI learning paradigms. (Source: dwarkesh_sp, finbarrtimbers, scaling01, dejavucoder, teortaxesTex, jpt401)

OpenAI’s Trillion-Dollar AI Infrastructure Bet: OpenAI announced a partnership with NVIDIA, Oracle, and SoftBank, planning to invest trillions of dollars to build a supercomputing data center project named “Stargate.” The project is expected to require 17 gigawatts of power capacity, equivalent to the output of 17 nuclear power plants. This unprecedented capital investment aims to meet the infrastructure demands of AI’s exponential growth, with an anticipated annual revenue of $125 billion by 2029, marking a new phase in the AI arms race that emphasizes compute scale over singular algorithmic breakthroughs. (Source: Reddit r/ArtificialInteligence, cnbc.com, atroyn, jonst0kes, scaling01)

OpenAI Enhances Function Calling with File and Image Support: OpenAI has updated its function calling capabilities, now supporting files and images as outputs for tool calls. This means models can directly interact with visual and file data, for instance, by calling functions like “generate chart” or “load image,” and returning these files to the model for subsequent processing, significantly expanding the model’s application capabilities in complex tasks. (Source: OpenAIDevs)

Anthropic Claude Model Quality Issues Post-Mortem: Anthropic released a detailed post-mortem report, revealing three complex and overlapping infrastructure errors that led to intermittent degradation in Claude’s response quality. The report highlights the challenges in maintaining the reliability of large-scale AI systems, emphasizing that even leading AI companies must continuously address issues of system stability and performance degradation. (Source: dl_weekly)

Gemini Flash Model Updates Boost Efficiency and Reliability: Google AI developers announced updates to the Gemini 2.5 Flash and Flash-Lite models, focusing on improved tool usage, system reliability, and overall efficiency. The new versions quickly deliver the latest features to users via preview models and support skipping code updates using the -latest alias. Users have reported a slight performance improvement with the updated models, alongside a nearly 30% reduction in cost, significantly enhancing token efficiency. (Source: nin_artificial, scaling01)

Meta Releases Code World Model (CWM): Meta AI introduced the Code World Model (CWM), a 32B-parameter open-source model focused on code generation and reasoning. Trained by combining static code, execution traces, and agent interactions, CWM can understand code syntax and semantics, simulate Python execution, and support multi-turn software engineering tasks. It also features long context handling (131k tokens) and performs exceptionally well on code benchmarks like SWE-bench Verified and LiveCodeBench. (Source: TheTuringPost, awnihannun, ImazAngel)

Tencent Hunyuan Launches Hunyuan3D-Part for Part-Level 3D Generation: Tencent Hunyuan released Hunyuan3D-Part, an open-source part-level 3D shape generation model. The model achieves highly controllable and high-quality generation of 3D object shapes through two major innovations: P3-SAM (a native 3D part segmentation model) and X-Part (a part generation model). Its training process avoids the use of 2D SAM and leverages a large-scale dataset containing 3.7 million shapes, achieving leading results in 3D generation. (Source: ImazAngel)

NVIDIA Jet-Nemotron Model Significantly Boosts Inference Speed: NVIDIA’s research team introduced Jet-Nemotron, a new “hybrid architecture” model that boasts a 53x faster inference speed than existing top open-source models (e.g., Qwen3, Gemma3, Llama3.2) while maintaining comparable accuracy. This breakthrough is attributed to the PortNAS framework, which reduces training costs by freezing MLP weights and optimizing attention mechanisms. The core innovation, JetBlock, employs dynamic convolutions, further enhancing accuracy in mathematical reasoning and retrieval tasks. (Source: 量子位)

Tsinghua University’s OpenLens AI Automates Full Medical Research Workflow: Tsinghua University’s Department of Automation, led by Professor Suo Jinli’s research group, released OpenLens AI, the first fully autonomous AI research framework designed specifically for medical informatics. This system achieves a full-chain automated closed-loop from literature mining, experimental design, data analysis, and code generation to publishable papers, compressing research cycles from months to hours. OpenLens AI ensures research rigor, traceability, and high-quality output through modular agent collaboration and medical-specific quality control mechanisms, heralding a “zero-manual” era for medical research. (Source: 量子位)

Alibaba Cloud’s Tongyi Qianwen Releases Native All-Modal Large Model Qwen3-Omni: Alibaba Cloud’s Tongyi Qianwen officially released Qwen3-Omni, a new generation native all-modal large model. This model can seamlessly process various input forms including text, images, audio, and video, and can simultaneously generate text and natural speech output via real-time streaming responses, further expanding the application boundaries and interactive experience of multimodal AI. (Source: 36氪)

🧰 Tools

Unsloth GPT-OSS Reinforcement Learning Improves Inference Efficiency: Unsloth AI released a reinforcement learning update for GPT-OSS, significantly boosting inference speed and VRAM efficiency. The new version achieves a 3x increase in GPT-OSS RL inference speed (approx. 21 tokens/sec), BF16 inference speed of about 30 tokens/sec, a 50% reduction in VRAM usage, and supports 8x longer context lengths, allowing the GPT-OSS 20B model to run within 15GB of VRAM. Additionally, the update includes strategies to combat reward hacking and supports Vision RL. (Source: danielhanchen, Reddit r/LocalLLaMA)

vLLM Supports Hybrid Models for Enhanced Performance: The vLLM project announced that its v1 release officially supports hybrid models, including Mamba, Mamba2, and linear attention mechanisms, treating them as first-class citizens. This update aims to further enhance inference performance and efficiency by integrating different types of model architectures. (Source: vllm_project)

CompLLM Compression Technology Optimizes Long-Context QA: CompLLM is a soft compression technique designed for LLMs, aimed at addressing computational challenges in long context processing. This technique segments contexts into independent chunks for compression, achieving linear scaling, generalization from short sequences to 100k tokens, and chunk reuse across queries. At a 2x compression rate, CompLLM can accelerate Time To First Token (TTFT) by 4x and reduce KV cache size by 50%, while maintaining or exceeding the performance of uncompressed contexts. (Source: HuggingFace Daily Papers, gabriberton)

LMCache Open-Source Extension Boosts LLM Inference Efficiency: LMCache is an open-source LLM serving engine extension that acts as a caching layer for large-scale inference. It intelligently manages KV caches and reuses key-value states of previous texts across GPU, CPU, and local disk, thereby reducing RAG costs (4-10x), shortening Time To First Token (TTFT), and increasing throughput under load. NVIDIA has integrated it into the Dynamo inference project. (Source: TheTuringPost)

Qwen3 Coder Model Enhances Local Coding Capabilities: The Qwen3 Coder model has garnered attention for its “amazing stability” in local coding tasks, especially when combined with tools like Cline and LM Studio, providing a high-quality coding experience on consumer-grade hardware. This offers strong support for developers performing LLM-assisted coding in local environments. (Source: ImazAngel)

mlx-lm and oLLM Library Updates Enhance Local LLM Inference: The mlx-lm library received an update, adding models like Meta’s Code World Model and improving batch inference capabilities for hybrid SSM and sliding window attention. Concurrently, oLLM, a lightweight Python library, also supports running LLMs like Qwen3-next-80B, GPT-OSS, and Llama3 on consumer-grade hardware, offering a wider range of choices and higher efficiency for local model inference. (Source: awnihannun, ImazAngel, huggingface)

Replit Improves AI Agent and Automation Features: Replit is enhancing its AI agent and automation building capabilities on its platform, now allowing developers to directly test and track scheduled automations in real-time from the dashboard, significantly improving development efficiency and convenience. (Source: amasad)

OpenWebUI Users Report GPT-OSS Model Streaming Issues: OpenWebUI users reported encountering a “502: Upstream Error” when streaming the GPT-OSS 20B cloud model on the platform, despite the same model running normally on CLI and Ollama Web UI. This suggests potential issues with OpenWebUI’s integration or streaming mechanism with specific LLM models, affecting user experience. (Source: Reddit r/OpenWebUI)

DeepAgent Desktop Launches Model-Agnostic Coding Agent: DeepAgent Desktop has been released, claiming its coding agent surpasses Claude Code and GPT-5 (Codex) in performance. The tool offers powerful coding agent capabilities in both CLI and editor, cleverly leveraging multiple state-of-the-art models to handle complex tasks. This suggests that a model-agnostic integration approach might be more efficient in the coding agent domain. (Source: matanSF)

Rumors of AI-Native Browsers Could Reshape Market: Rumors suggest that OpenAI and Google are soon to launch “AI-native” browsers. This move is seen as a strategic play by tech giants in distribution, data collection, and seamless AI automation, potentially posing a significant challenge to startups offering AI browser plugins and extensions, and signaling a deeper integration of AI into users’ daily computing experience. (Source: dotey)

📚 Learning

Free Python Data Structures Book Recommended: “A First Course on Data Structures in Python” by Donald R. Sheehy is recommended as an excellent free resource for learning data structures, algorithmic thinking, complexity analysis, recursion/dynamic programming, and search methods. These skills are fundamental to AI and machine learning, and highly valuable for learners wishing to delve deeper into these fields. (Source: TheTuringPost, huggingface)

Seeking Deep Learning and LLM Learning Resources: A user on Reddit sought recommendations for the best learning resources on LLM internal architectures and deep learning, specifically mentioning François Chollet and Matthew Watson’s “Deep Learning with Python, Third Edition.” This reflects the AI community’s demand for high-quality, in-depth educational content on LLMs and deep learning. (Source: Reddit r/deeplearning)

AI Mastery Roadmap and Brief History of AI Shared: An AI mastery roadmap was shared on social media, providing learning paths and guidance on key skills for aspiring AI professionals. Concurrently, resources on the brief history of AI were also shared, helping people understand the evolution and key milestones of AI technology. (Source: Ronald_vanLoon, Ronald_vanLoon)

DSPy Getting Started Guide and Tutorials Shared: DSPy’s getting started guide was shared on social media, covering how to run examples from its homepage, along with detailed tutorials on RAG, mathematical reasoning, and building AI agents. Additionally, video resources were provided to help users conceptually understand the problems DSPy solves and its practical application methods. (Source: lateinteraction)

💼 Business

Applied Compute Secures $500M in New Funding Round: Applied Compute, a startup founded by three former OpenAI researchers focusing on Reinforcement Learning as a Service (RL as a service), is reportedly raising a new funding round at a $500 million valuation, led by Lux Capital. This comes just three months after its previous funding round, demonstrating high market recognition for the RLaaS model and its team. (Source: steph_palazzolo)

Mistral AI Completes €1.7 Billion Series C Funding, Led by ASML: European AI unicorn Mistral AI completed a €1.7 billion (approximately 14.2 billion RMB) Series C funding round, reaching a post-money valuation of €11.7 billion. ASML led the round with €1.3 billion, acquiring an 11% stake. This move is seen as a strategic alliance between a European tech giant and an AI newcomer, aiming to unlock AI value in industrial manufacturing, promote Europe’s independent development in AI, and focus on vertical AI applications. (Source: 36氪)

Hangzhou HENGWEI Technology Acquires Shuxing Information, Pioneering AIRaaS: Hangzhou HENGWEI Technology announced the acquisition of a 75% stake in Shanghai Shuxing Information, marking the first A-share listed company acquisition of an AIRaaS (AI Result as a Service) target. This signifies a shift in the AI industry’s business model from merely “selling compute power” to “selling results.” Shuxing Information, leveraging its large model technology combined with industry scenarios, has achieved profitability in sectors such as fast-moving consumer goods, automotive, and finance, providing Hangzhou HENGWEI Technology with an opportunity to transition from hardware sales to high-value-added services. (Source: 36氪)

🌟 Community

ChatGPT 4o Performance Degradation Sparks Strong User Discontent: ChatGPT Plus users widely report significant degradation in GPT-4o model performance and “personality.” Many users claim that even when selecting 4o, conversations are secretly routed to GPT-5, especially when handling “sensitive” or “emotional” prompts, leading to responses that are “cold, lazy, and lacking emotional intelligence.” Users feel “deceived” and betrayed, questioning OpenAI’s transparency and integrity, and expressing dissatisfaction with the paid product. (Source: Reddit r/ChatGPT, Reddit r/ChatGPT, Reddit r/ChatGPT, Reddit r/ChatGPT, Reddit r/ChatGPT, Reddit r/ChatGPT, Reddit r/ChatGPT, menhguin)

AI Agents: The Gap Between Hype and Reality: Discussions on social media regarding AI agents reveal a gap between their ambitious vision and current practical capabilities. Former Google CEO Eric Schmidt stated that “there is no evidence that AI can self-improve.” Developers report that giving AI agents more freedom often leads to worse results, and truly successful agents are those that are strictly controlled and focused on specific assistive tasks. This indicates that the maturity of AI agents is far from expected, still requiring significant human intervention and fine-grained management. (Source: Reddit r/ArtificialInteligence, dotey)

In-depth Analysis of Flash Attention 4 Performance Sparks Discussion: A 4000-word in-depth technical analysis of Flash Attention 4 sparked widespread discussion, detailing how the technology achieves a 20% performance boost. The article reveals its core optimizations, including more complex warp-specialized asynchronous pipelines, an innovative cubic approximation exponential function for “software softmax,” and efficient re-scaling for numerical stability. These technical details provide the AI community with a deeper understanding of efficient attention mechanisms. (Source: charles_irl, akshat_b, TheZachMueller, jonst0kes, atroyn, swyx, dejavucoder)

In-depth Discussion on AI’s Impact on Employment and Society: Sam Altman predicts that 30-40% of economic tasks will be performed by AI in the future, accelerating career transitions. He emphasizes “learning to learn,” adaptability, resilience, understanding human needs, and interpersonal interaction as key future skills. Discussions also touched upon the socio-ethical impacts of AI, such as concerns about “mind-altering drugs” and AI-generated content polluting the internet, as well as the balance between AI replacing job tasks and creating new opportunities. (Source: dotey, Ronald_vanLoon, TheEthanDing, swyx, cloneofsimo, MillionInt, glennko, Reddit r/ArtificialInteligence)

AI Ethics: Challenges of Trust, Privacy, and Control: Social media discussions focused on AI ethical challenges, including data privacy, ad-funded AI agents and trust issues, and the broader societal impact of AI’s growing power. The community called for increased transparency in AI systems and debated whether AI should serve “intelligence for intelligence’s sake” or prioritize human well-being. These discussions reflect deep public concern about the direction of AI development. (Source: Ronald_vanLoon, pmddomingos, Reddit r/ChatGPT, Reddit r/ArtificialInteligence)

💡 Other

Unitree G1 Robot Bluetooth Security Vulnerability Exposed: The Unitree G1 humanoid robot (potentially including Go2, H1, B2) has been exposed to a critical Bluetooth security vulnerability. Any device within Bluetooth range can exploit a hardcoded AES key to execute root commands, thereby controlling the robot or implanting backdoors. Although vulnerabilities in some older firmware versions may have been patched, the fundamental security flaw of a hardcoded key persists, raising concerns about AI robot security. (Source: Sentdex, teortaxesTex)

Synergistic Development of AI and Quantum Computing: Social discussions highlighted the transformative potential of quantum computing in cybersecurity and noted NVIDIA’s active investment in quantum startups, developing platforms like CUDA-Q and DGX Quantum to support hybrid quantum-classical programming. This indicates a growing industry recognition of the synergistic effects of quantum technology and AI, and their prospects in commercial applications. (Source: Ronald_vanLoon, TheTuringPost)

Modular Manifolds: A New Theory for Neural Network Optimization: Thinking Machines proposed the “Modular Manifolds” theory, a method for co-designing optimizers by imposing manifold constraints on weight matrices, thereby achieving more stable and higher-performing neural network training. This theory delves into the geometric properties of neural network optimization, aiming to surpass traditional optimization methods like Adam and offering new directions for AI research. (Source: thinkymachines, dejavucoder, johnschulman2, giffmana, menhguin, jeremyphoward, rown, suchenzang, teortaxesTex, zacharynado)