AI Daily - 2025-09-29(Morning)

Keywords：GPT-5, Quantum Computing, AI Material Design, Reinforcement Learning, Large Language Models, AI Infrastructure, Multimodal Models, AI Agent, Quantum NP Problems, CGformer Crystal Graph Neural Network, RLMT Reinforcement Learning Framework, DeepSeek Sparse Attention DSA, UniVid Unified Vision Task Framework

Here’s the English translation of the AI news:

🔥 Spotlight

GPT-5 Conquers “Quantum NP Problem”: Quantum computing expert Scott Aaronson published a paper for the first time, revealing GPT-5’s groundbreaking assistive role in quantum complexity theory research. GPT-5 assisted in solving a critical derivation step in the “quantum version of the NP problem” within 30 minutes, a task that typically takes humans 1-2 weeks. This achievement signifies that AI has begun to touch upon core scientific discovery work, a domain of human intellect, heralding a massive leap in AI’s potential in scientific research. (Source: arXiv, scottaaronson.blog)

New Material AI Design Model CGformer: A team led by Professors Li Jinjin and Huang Fuqiang from Shanghai Jiao Tong University has developed a new AI material design model, CGformer, by innovatively integrating Graphormer’s global attention mechanism with CGCNN, and incorporating centrality encoding and spatial encoding. This successfully breaks through the limitations of traditional crystal graph neural networks. The model can fully capture global information of complex crystal structures, significantly improving the prediction accuracy and screening efficiency for new materials such like high-entropy sodium-ion solid electrolytes. (Source: Matter)

New material R&D accelerated: SJTU team develops new AI material design model CGformer, integrating global attention mechanism

UniVid Unified Visual Task Framework: UniVid is an innovative framework that fine-tunes a pre-trained video diffusion Transformer to adapt to diverse image and video tasks without task-specific modifications. This method represents tasks as visual sentences, defining tasks and expected output modalities through contextual sequences, demonstrating the immense potential of pre-trained video generation models as a unified foundation for visual modeling. (Source: HuggingFace Daily Papers)

RLMT Revolutionizes Large Model Post-Training: A team led by Associate Professor Danqi Chen from Princeton University proposed the “Reinforcement Learning with Model-based Reward Thinking” (RLMT) framework, which enables LLMs to generate long chains of thought before responding, and combines this with a preference-based reward model for online RL optimization. This method significantly enhances LLMs’ reasoning capabilities and generalization on open-ended tasks, even allowing 8B models to surpass GPT-4o in chat and creative writing. (Source: arXiv)

Revolutionizing large model post-training, Danqi Chen's team proposes "Reinforcement Learning with Model-based Reward Thinking" (RLMT)

CHURRO Historical Text Recognition Model: CHURRO is a 3B-parameter open-source Vision-Language Model (VLM) specifically designed for high-accuracy, low-cost historical text recognition. Trained on the CHURRO-DS dataset, comprising 99,491 pages of historical documents spanning 22 centuries and 46 languages, its performance surpasses existing VLMs like Gemini 2.5 Pro, significantly enhancing the efficiency of cultural heritage research and preservation. (Source: HuggingFace Daily Papers)

🎯 Trends

Altman Predicts AI Superintelligence and Pulse Feature: Sam Altman predicts that AI will fully surpass human intelligence by 2030, emphasizing the astonishing speed of AI development. OpenAI’s launch of ChatGPT’s “active mode” Pulse feature marks a shift from passive AI responses to proactive thinking for users. It can actively provide relevant information based on user conversations, offering highly personalized services, and foreshadowing AI becoming an outsourced human subconscious. (Source: 36氪, )

Sam Altman: Why AI will surpass human intelligence by 2030

Jensen Huang Refutes AI Bubble Theory and NVIDIA Strategy: In an exclusive interview, Jensen Huang refuted the “AI bubble empire” theory, emphasizing AI’s critical role in the economy and predicting NVIDIA could become the first $10 trillion company. He highlighted the immense computing power demand behind AI inference, stating that NVIDIA, through extreme co-design, releases new architectures annually and maintains an open system ecosystem. Unfazed by the trend of in-house chip development, NVIDIA aims to shape the AI economic system and promote “sovereign AI” as a new consensus. (Source: 36氪, )

Jensen Huang refutes "AI bubble empire" theory for 2 hours, NVIDIA to become the world's first ten-trillion-dollar company

DeepSeek Open-Sources V3.2-Exp and DSA Mechanism: DeepSeek has open-sourced its 685B-parameter V3.2-Exp experimental version and simultaneously released a paper detailing its new sparse attention mechanism (DeepSeek Sparse Attention, DSA). DSA aims to explore and validate optimizations for training and inference efficiency in long-context scenarios, significantly improving long-context processing efficiency while maintaining model output quality. (Source: 36氪, HuggingFace)

DeepSeek just open-sourced V3.2-Exp, revealing new sparse attention mechanism DSA

GLM-4.6 Imminent Release: Zhipu AI’s GLM-4.6 model is expected to be released soon. Its Z.ai official website has already labeled GLM-4.5 as the “previous generation flagship model,” hinting at potential improvements in context length and other aspects in the new version, drawing community attention and anticipation. (Source: Reddit r/LocalLLaMA, karminski3)

Apple’s AI Strategy and Internal Chatbot Veritas: Apple’s internal AI chatbot, codenamed “Veritas,” has been exposed. It serves as a sparring partner for Siri and can perform in-app actions. Despite this, Apple insists on not launching a consumer-facing chatbot, focusing instead on system-level AI integration. It plans to deepen third-party model integration through an AI answer engine and a universal MCP interface, rather than developing its own chatbot. (Source: 36氪)

Apple, which doesn't make chatbots, made a chatbot

AI PC Market Growth and Technical Bottlenecks: The AI PC market is projected to see strong growth in 2025-2026, but primarily driven by the end of Windows 10 support and PC replacement cycles, rather than disruptive AI technology. Current AI features mostly supplement traditional PCs, facing challenges such as insufficient local computing power, passive interaction, and closed ecosystems. True AI devices require a “local computing first, cloud supplement second” approach and proactive sensing capabilities. (Source: 36氪)

AI Floods Power Trading Market: AI is being widely applied in the power trading market. Companies like Qingpeng Smart use time-series large models to predict wind and solar power generation and electricity demand, assisting trading decisions. AI’s advantage in processing massive data is expected to amplify profits, but it may also lead to losses due to immature models and market complexity, with the industry still in an exploratory phase. (Source: 36氪)

Alibaba Cloud’s Tongyi Large Model Update and Full-Stack AI Services: At the Apsara Conference, Alibaba Cloud significantly upgraded its full-stack AI system, releasing six new models including Qwen3-MAX and Qwen3-Omni, positioning itself as a “full-stack AI service provider.” Alibaba Cloud is committed to building an “Android for the AI era” and “the next-generation computer,” offering full-stack AI cloud services from foundational models to infrastructure, to address the evolution of AI Agents from “intelligent emergence” to “autonomous action.” (Source: 36氪)

NVIDIA Blackwell Architecture Deep Dive: A deep dive event into NVIDIA’s Blackwell architecture will explore its architecture, optimizations, and implementation in GPU clouds. Hosted by SemiAnalysis and NVIDIA experts, the event aims to reveal how Blackwell GPUs, as “the GPU for the next decade,” will drive AI computing power development and the future of GPU clouds. (Source: TheTuringPost)

🧰 Tools

Factory AI’s Agentic Harnesses: Factory AI has developed world-class Agentic Harnesses that significantly boost the performance of existing models, especially in coding tasks, described by users as a “cheat code.” Their Droids agent ranks first on Terminal-Bench and achieves reliable code refactoring through multi-agent verification workflows. (Source: Vtrivedy10, matanSF, matanSF)

RAGLight Open-Source RAG Library: LangChainAI released RAGLight, a lightweight Python library for building production-grade RAG systems. Featuring LangGraph-powered agent pipelines, multi-provider LLM support, built-in GitHub integration, and CLI tools, the library aims to simplify the development and deployment of RAG systems. (Source: LangChainAI, hwchase17)

ArgosOS Semantic Operating System: ArgosOS is a desktop application that enables intelligent document search and content integration through a tag-based architecture rather than vector databases. It utilizes LLMs to create relevant tags stored in an SQLite database, intelligently processing queries, such as analyzing shopping bills, providing an accurate and efficient document management solution for small-scale applications. (Source: Reddit r/MachineLearning)

Ollama’s Web Search Tool: Ollama now supports a Web search tool, allowing users to integrate Web search functionality into Minions workloads, thereby enriching the contextual information of AI applications and enhancing their ability to handle complex tasks. (Source: ollama)

Hyperlink Local Multimodal RAG: Hyperlink offers local multimodal RAG capabilities, allowing users to search and summarize screenshot/photo libraries offline. Through OCR and embedding technologies, the tool converts unstructured image data into queryable content, enabling completely private, on-device document management and information extraction. (Source: Reddit r/LocalLLaMA)

Azure PostgreSQL LangChain Connector: Microsoft launched a native Azure PostgreSQL connector to unify agent persistence for the LangChain ecosystem. This connector provides enterprise-grade vector storage and state management, simplifying the complexity of building and deploying AI agents in the Azure environment. (Source: LangChainAI)

LLM API Standardization and MCP Protocol: The community discusses the fragmentation of LLM APIs, pointing out incompatibilities in message structures, tool calling patterns, and inference field names across different providers, calling for industry standardization of JSON API protocols. Concurrently, the introduction of the MCP (Model-Client Protocol) has also sparked discussions about its impact on agent development. (Source: AAAzzam, charles_irl)

Grok Code’s Application on OpenRouter: Grok Code accounts for 57.6% of coding traffic on the OpenRouter platform, surpassing the combined total of all other AI code generators, with Grok Code Fast 1 ranking first. This demonstrates its strong market performance and user preference in the code generation domain. (Source: imjaredz)

📚 Learning

AI Fundamentals Course Cursor Learn: Lee Robinson launched Cursor Learn, a free six-part video series designed to help beginners grasp fundamental AI concepts such as tokens, context, and agents. The course, approximately one hour long, offers quizzes and AI model trials, serving as a convenient resource for learning AI basics. (Source: crystalsssup)

Free Book on Python Data Structures: Donald R. Sheehy released a free book titled “A First Course on Data Structures in Python,” covering data structures, algorithmic thinking, complexity analysis, recursion/dynamic programming, and search methods. It provides a solid foundation for learners in AI and machine learning. (Source: TheTuringPost)

dots.ocr Multilingual OCR Model: Xiaohongshu Hi Lab released dots.ocr, a powerful multilingual OCR model supporting 100 languages. It can end-to-end parse text, tables, formulas, and layouts (outputting to Markdown) and is free for commercial use. This compact model (1.7B VLM) achieves SOTA performance on OmniDocBench and dots.ocr-bench. (Source: mervenoyann)

8 Types of Large Language Models Explained: Analytics Vidhya summarized 8 mainstream large language model types, including GPT (Generative Pre-trained Transformer), MoE (Mixture of Experts), LRM (Large Reasoning Model), VLM (Vision-Language Model), SLM (Small Language Model), LAM (Large Action Model), HLM (Hierarchical Language Model), and LCM (Large Concept Model), providing detailed interpretations of their architectures and applications. (Source: karminski3)

AI Weekly Report: Latest Paper Summary: DAIR.AI released this week’s AI paper selection (September 22-28), covering cutting-edge research such as ATOKEN, LLM-JEPA, Code World Model, Teaching LLMs to Plan, Agents Research Environments, Language Models that Think, Chat Better, and Embodied AI: From LLMs to World Models, providing the latest updates for AI researchers. (Source: dair_ai)

Advice for Young Researchers in the AI Era: Jascha Sohl-Dickstein shared practical advice for young researchers on choosing research projects and making career decisions in the “Anthropocene’s” final stage. He discussed the profound impact of AGI on academic careers and emphasized the need to rethink research directions and professional development as AI systems are poised to surpass human intelligence. (Source: mlpowered)

RAG Concepts and AI Agent Construction: Ronald van Loon shared the basic concepts of RAG (Retrieval-Augmented Generation) and its importance in LLMs, and provided 8 key steps for building AI Agents. The content covers AI Agent concepts, stack, advantages, and how to evaluate them through frameworks, offering guidance from theory to practice for AI developers. (Source: Ronald_vanLoon, Ronald_vanLoon)

Meta Addresses LLM Inference Inefficiency: Meta’s research reveals LLM inference inefficiency due to repetitive work in long chains of thought. They propose compressing repetitive steps into small, named behaviors that the model calls instead of re-deriving, thereby reducing token consumption and improving inference efficiency and accuracy, offering a new approach to optimizing LLM inference processes. (Source: ylecun)

Veo-3 Visual Reasoning Capabilities Emerge: Lisan al Gaib points out that the Veo-3 video model exhibits emergent (visual) reasoning capabilities similar to GPT-3, suggesting that native multimodal models, once their full potential is realized, will bring more comprehensive visual understanding and reasoning benefits. (Source: scaling01)

💼 Business

OpenAI’s Trillion-Dollar Bet and AI Infrastructure Bubble: OpenAI is rapidly spending to weave a giant network spanning chips, cloud computing, and data centers, including NVIDIA’s $100 billion investment and a $300 billion “Stargate” partnership with Oracle. Despite projected revenues of only $13 billion in 2025, OpenAI management believes AI infrastructure investment is a “once-in-a-century opportunity,” sparking debate over whether AI infrastructure faces an internet bubble. (Source: 36氪)

OpenAI's Trillion-Dollar Bet: AI Infrastructure Woven by Interests, or Another Internet Bubble?

Musk Sues OpenAI for the Sixth Time: Elon Musk’s xAI company has sued OpenAI for the sixth time, accusing it of systematically poaching employees, illegally stealing Grok large model source code, and data center strategic plans, among other trade secrets. This lawsuit marks an escalating competition between the two AI giants. Musk argues that OpenAI has deviated from its non-profit origins, while OpenAI denies the allegations, calling them “persistent harassment.” (Source: 36氪)

Can't bear it anymore, no need to bear it anymore: Musk sues for the sixth time

Top AI Scientist Steven Hoi Joins Alibaba Cloud’s Tongyi Lab: Steven Hoi (Xu Zhuhong), a globally renowned AI scientist and IEEE Fellow, has joined Alibaba Cloud’s Tongyi Lab to focus on foundational frontier research in multimodal large models. With over 20 years of experience in AI industry, academia, and research, Hoi previously served as Vice President at Salesforce and founded HyperAGI. His joining signifies Alibaba’s renewed heavy investment in multimodal large models to accelerate model iteration efficiency and multimodal innovation breakthroughs. (Source: 36氪)

Top AI expert reportedly joins Alibaba Cloud's Tongyi, concerning next-gen large models

🌟 Community

ChatGPT 4o Performance Decline and User Sentiment: Numerous ChatGPT users report a decline in 4o model performance, experiencing “dumbing down” and “safety routing” issues, leading to frustration and a feeling of being deceived. Many neurodivergent users are particularly saddened, considering 4o a “lifeline” for communication and self-understanding. Users widely question OpenAI’s lack of transparency and call for it to honor its promise to “treat adult users,” opposing unclear censorship mechanisms. (Source: Reddit r/ChatGPT, Reddit r/ChatGPT, Reddit r/ChatGPT, Reddit r/ChatGPT, Reddit r/ChatGPT)

AI Era Employment and Layoff Controversy: The community is actively discussing AI’s impact on the job market, including a significant decrease in entry-level positions, simultaneous corporate layoffs and AI investments, and the authenticity of AI-related layoff reasons. Discussions highlight the trend of “people who understand AI replacing those who don’t” and call for companies to redesign entry-level jobs rather than simply eliminating them, to cultivate scarce talent adapted to the AI era. (Source: 36氪, 36氪, Reddit r/artificial)

Using AI to cut entry-level positions is jeopardizing a company's future

Challenges and Barriers in LLM Research: The community is hotly debating the increasing barriers to machine learning research, where individual researchers struggle to compete with large tech giants. Facing massive amounts of papers, expensive computing power, and complex mathematical theories, many find it difficult to get started and achieve breakthroughs, raising concerns about the field’s sustainability. (Source: Reddit r/MachineLearning)

Impact of MoE Models on Local Hosting: The community is deeply discussing the pros and cons of MoE models for local LLM hosting. Opinions suggest that while MoE models consume more VRAM, they are computationally efficient and can run larger models through CPU offloading, making them particularly suitable for consumer-grade hardware with ample memory but limited GPU, serving as an effective way to improve LLM performance. (Source: Reddit r/LocalLLaMA)

Rapid Development and Application of AI Agents: The community discusses the rapid development of AI Agents, whose capabilities have quickly improved from “almost unusable” to “performing well” in narrow scenarios, and even “general agents starting to be useful” in less than a year, progressing faster than expected. However, some also believe that current coding agents are highly homogenized and lack significant differentiation. (Source: nptacek, HamelHusain)

RL Research Trends and GRPO Controversy: The community is deeply discussing the latest trends in Reinforcement Learning (RL) research, particularly the status and controversy surrounding the GRPO algorithm. Some argue that RL research is shifting towards pre-training/modeling, and GRPO is an important open-source advancement, while OpenAI employees believe it significantly lags behind cutting-edge technology, sparking intense debate about algorithmic innovation versus practical performance. (Source: natolambert, MillionInt, cloneofsimo, jsuarez5341, TheTuringPost)

OpenAI’s Energy Consumption and AI Infrastructure: The community discusses OpenAI’s massive future energy demands, projected to exceed that of the UK or Germany within five years, and India within eight years, raising concerns about the scale of AI infrastructure construction, energy supply, and environmental impact. Concurrently, Google’s data center site selection has also faced opposition from local residents due to water consumption issues. (Source: teortaxesTex, brickroad7)

Sutton’s Bitter Lesson and AI Development: The community discusses Richard Sutton’s “Bitter Lesson” and its implications for AI research, emphasizing that general computational methods are superior to human prior knowledge. The discussion revolves around the relationship between “imitation and world models,” suggesting that pure imitation can lead to “cargo cults” and that imitation without real experience has fundamental limitations. (Source: rao2z, jonst0kes)

💡 Other

BionicWheelBot Biomimetic Robot: The BionicWheelBot robot achieves versatile navigation on complex terrains by mimicking the rolling motion of the wheel spider. This innovation demonstrates the potential of biomimetics in robot design, offering new solutions for future robots to cope with varied environments. (Source: Ronald_vanLoon)

PC Storage Optimization and RAID Configuration: A user shared how to achieve data throughput of up to 47GB/s by configuring RAID0 and RAID10, utilizing multiple PCIe channels and M.2 drives, to accelerate the loading of large models. This optimization scheme meets high-speed read/write demands while balancing storage capacity and data redundancy, providing an efficient hardware foundation for local AI model deployment. (Source: TheZachMueller)

Liangzhu “Digital Habitat Bay AI+ Industry Community” Opens: Hangzhou Liangzhu “Digital Habitat Bay AI+ Industry Community” officially opened, focusing on cutting-edge fields such as artificial intelligence, digital nomad economy, and cultural creativity. Through the “Digital Habitat Eight Policies” special policies and “Four Spaces” layout, the community provides full-cycle support for AI explorers, from creative incubation to ecological leadership, aiming to build an innovative ecosystem where technology and humanities are deeply integrated. (Source: 36氪)

Why Liangzhu, a hub for "atypical talent," is becoming a new experimental field for digital economy development in the "Oriental Silicon Valley"

🔥 Spotlight

🎯 Trends

🧰 Tools

📚 Learning

💼 Business

🌟 Community

💡 Other

Related Tags

Related Posts

AI Daily – 2025-10-28(Evening)

AI Daily – 2025-10-27(Evening)

AI Daily – 2025-10-27(Morning)