AI Daily - 2025-05-10(Morning)

Keywords：Transformer, Noam Shazeer, ChatGPT, Gemini, DeepSeek R1, AI Technology, Large Language Model (LLM), Mixture of Experts (MoE), Multi-Query Attention (MQA), Gated Linear Unit (GLU), Absolute Zero Reinforcement Learning Paradigm, Seed-Coder-8B Code Model

🔥 Focus

Noam Shazeer: The Mastermind Behind Transformer and the Evolution of AI Technology: Noam Shazeer, one of the eight authors of the Transformer architecture, is widely recognized as the largest contributor. His research not only laid the foundation for modern large language models (such as “Attention Is All You Need”) but also prophetically drove the development of key technologies like Mixture of Experts (MoE), the Adafactor optimizer, Multi-Query Attention (MQA), and Gated Linear Units (GLU). Recently, his early research achievements have once again garnered attention, highlighting his advanced technological foresight. Shazeer co-founded Character.AI and later returned to Google to lead the Gemini project, continuing to influence the AI field. (Source: 36Kr)

A legendary figure "always" at the forefront of large model technology

ChatGPT Traffic Soars, Challenging Google Search’s Dominance: Similarweb data shows that in April 2025, ChatGPT’s monthly visits grew by 13.04%, exceeding 5 billion and surpassing X (formerly Twitter) to become the fifth largest website globally. It was also the only platform among the top ten websites to achieve positive monthly growth. This trend indicates that AI applications, represented by ChatGPT, are significantly changing how users access information, posing a substantial threat to traditional search engines, especially in work and study scenarios where user reliance on AI tools is increasingly growing. (Source: 36Kr, Similarweb on X)

Google Search, the sky is falling! ChatGPT unscrupulously steals traffic

DeepSeek R1’s 100-Day Sensation: Reshaping the AI Venture Capital Landscape and Startup Ecosystem: Since its release in January 2025, DeepSeek R1 has garnered widespread attention in the AI field with its low-cost open-source strategy, profoundly impacting the venture capital market and startup ecosystem. The model has not only brought new development opportunities for startups in AI hardware and Agent development but has also prompted leading players like Moonshot AI’s Kimi and Zhipu AI to adjust their market strategies, intensifying competition in AI application and commercialization. Investor interest in AI applications and embodied intelligence has increased, while investment in foundational large models has become more cautious, indicating a market shift towards downstream applications. (Source: 36Kr)

Entrepreneurs want to pay homage to Liang Wenfeng

Gemini 2.5 Pro Shows Significant Progress in Video Understanding: Google’s Gemini 2.5 Pro has demonstrated exceptional video understanding capabilities, not only leading in traditional video analysis tasks but also unlocking new application scenarios. Its video understanding ability surpasses existing SOTA models and even human performance on multiple test sets. Jeff Dean noted that the new 66 tokens per frame mode (replacing 258 tokens) allows processing of over 6 hours of video (at 1fps) within a 2M token context, greatly expanding the potential for long video analysis. (Source: matvelloso, op7418, JeffDean)

Paper “Absolute Zero”: Enhancing LLM Reasoning via Reinforced Self-Play Without External Data: A paper titled “Absolute Zero: Reinforced Self-play Reasoning with Zero Data” introduces a new reinforcement learning paradigm, “Absolute Zero,” aimed at enhancing the reasoning abilities of Large Language Models (LLMs) by having a single model self-propose tasks and solve them, without relying on any external data. The system, AZR, verifies tasks and answers through a code executor, achieving open-loop learning and SOTA performance on coding and mathematical reasoning tasks, demonstrating the potential for AI autonomous evolution. (Source: Reddit r/LocalLLaMA, teortaxesTex)

🎯 Trends

Llama.cpp Server Adds Vision Model Support, Expanding Local Multimodal Applications: The built-in llama-server in Llama.cpp now supports vision models, allowing users to start using gguf-quantized multimodal models. This significant update, contributed by Xuan-Son Nguyen (ngxson) and others, makes it more convenient to run and interact with multimodal AI applications on local devices, which is important for edge computing and privacy-preserving scenarios. (Source: karminski3, reach_vb, ggerganov, Reddit r/LocalLLaMA)

Google May Release New Image/Video Models Veo 3.0 and Imagen 4.0 at I/O Conference: Reports suggest Google plans to release new image and video generation models at its May I/O conference, including veo-3.0-generate-preview, imagen-4.0-generate-preview-05-20, and imagen-4.0-ultra-generate-exp-05-20. This indicates significant updates from Google in multimodal generation, with Veo 3.0’s performance being particularly anticipated. (Source: op7418)

Flow-GRPO: Improving Image Generation in Flow Matching Models with Online Reinforcement Learning: Flow-GRPO is a newly proposed method that, for the first time, integrates online reinforcement learning (RL) into flow matching models. Experiments show that SD3.5, fine-tuned with RL, achieves near-perfect accuracy in object count, spatial relationships, and fine-grained attributes when generating images, significantly enhancing prompt adherence and generation quality in text-to-image tasks. (Source: teortaxesTex)

ByteDance Open-Sources Seed-Coder-8B: Code Model Achieves SOTA with Self-Data Curation: ByteDance’s Seed team has released the Seed-Coder-8B series of large code models, including Base, Instruct, and Reasoner versions. Trained on 6T tokens of data, its core innovation lies in “letting the code model curate data for itself,” achieving a SOTA data processing method and outperforming Qwen3-8B. This demonstrates the significant potential of automated data management in enhancing code LLM capabilities. (Source: Dorialexander, scaling01)

Google AI Launches Mobility AI to Advance Urban Transportation Intelligence: Google AI has announced the Mobility AI project, dedicated to leveraging artificial intelligence to improve urban transportation systems. The project may cover various aspects such as traffic flow optimization, public transit scheduling, and autonomous driving coordination, aiming to enhance traffic efficiency, safety, and sustainability. (Source: Ronald_vanLoon)

Progress in Research on Single Transistor Simulating a Neuron: A paper in Nature indicates that a single transistor can simulate the function of a neuron. While this doesn’t mean PCs will run superhuman intelligence in the short term (as synapses also require transistors), this research opens new avenues for future processor design and neuromorphic computing, potentially having a profound impact on AI hardware in the coming years. (Source: Reddit r/LocalLLaMA)

MIT Research Uses AI to Enhance Air Traffic Planning: Researchers at MIT are utilizing artificial intelligence to improve the planning and management of air traffic. This may include optimizing flight routes, increasing airspace utilization efficiency, and predicting and responding to potential conflicts, aiming to make air travel more efficient and safer. (Source: Ronald_vanLoon)

AI Trends in Software Development (2025 Outlook): A report predicts the top 15 trends in software development for 2025, where artificial intelligence, deep learning, and machine learning will continue to play central roles, driving advancements in automation, intelligent coding, testing, and operations. (Source: Ronald_vanLoon)

Outlook for AI-Powered 6G Networks: Discussion on the critical role of artificial intelligence in future 6G networks, including intelligent resource allocation, network self-optimization, personalized services, and support for massive IoT device connectivity. AI will be a core technology for realizing the 6G vision. (Source: Ronald_vanLoon)

DeepMind Researcher Believes LLMs Already Possess Partial World Model Capabilities: DeepMind researcher Sam Wolfstone argues that Large Language Models (LLMs) construct many limited and local world models during their pre-training and post-training phases. A model’s ability to solve tasks is related to how well its partial world models represent the task, but current LLMs cannot dynamically develop new partial world models. (Source: SamWolfstone)

OpenAI Aims to Expand Applications of Reinforcement Learning (RL): Dan Roberts of OpenAI, speaking at Sequoia AI Ascent, shared how the company is working to change the traditional view of reinforcement learning (RL) as merely “icing on the cake” and is committed to expanding its application to a broader range of scenarios. (Source: jeffreygwang)

ByteDance Deep Research Agent Uses Typescript Interfaces to Define JSON Output Schemas: Analysis of ByteDance’s open-source Deep Research Agent reveals that the project uses Typescript interfaces to enforce and standardize JSON output schemas. This approach helps improve the stability and reliability of data exchange in multi-Agent collaboration. (Source: _philschmid)

🧰 Tools

WebOllama: A Sleek Web UI for Ollama: WebOllama is a web interface designed for Ollama, aiming to simplify the management and use of local Large Language Models (LLMs). It provides an intuitive UI to manage Ollama models, chat with AI, and generate text, making it convenient for users to interact with LLMs in a local environment. (Source: Reddit r/LocalLLaMA, GitHub)

ArchAI: AI-Powered Codebase Analysis and Documentation Tool Based on CrewAI and Qdrant: ArchAI is a tool that utilizes AI Agents to interpret codebases. It can automatically clone, analyze code, and generate documentation and PlantUML diagrams. ArchAI builds AI Agents based on CrewAI, uses Qdrant for context storage, and integrates SonarQube for code quality checks, supporting local or cloud-based LLMs (like OpenAI, Gemini, Ollama). (Source: qdrant_engine, GitHub)

SkyRL: Reinforcement Learning Training Pipeline Optimized for Long-Horizon Tasks Released: The UC Berkeley RISE team has released SkyRL, a reinforcement learning (RL) training pipeline built on VeRL and OpenHands, specifically optimized for long-horizon tasks such as SWE-Bench. SkyRL introduces an Agent layer that supports efficient multi-turn inference, tool use, and scalable environment execution, and integrates W&B for visualization. (Source: weights_biases)

RunwayML Gen-1 Updated with More Intuitive Video Generation Controls: RunwayML’s Gen-1 video generation tool has released an update aimed at providing more precise, intuitive, and versatile controls. Users can try these new features for free, with more updates planned for the future. (Source: c_valenzuelab)

Chatlog: WeChat Chat History Export Tool: Chatlog is a project that supports exporting WeChat chat records, including images, videos, and audio, and supports multi-account operations. This provides convenience for users to back up personal data or use chat data for AI applications like building digital humans. (Source: karminski3)

Local AI Radio Station Project ACE-Step-RADIO Released: PasiKoodaa has released the ACE-Step-RADIO project on GitHub, a local AI radio station application using the ACE (Agentic Communication Environment) framework. It can theoretically run seamlessly on 24GB VRAM and easily integrate AI anchor functions like DIA, offering new ideas for personalized content generation. (Source: Reddit r/LocalLLaMA, GitHub)

qxresearch-event-1: Python Mini-App Collection: The GitHub project qxresearch-event-1 features over 50 applications written in just 10 lines of Python code each, covering functions like notifications, voice recording, drawing boards, password generators, and more, providing simple and practical code examples for Python beginners and enthusiasts. (Source: karminski3)

Polish 4B Language Model Polanka Released: Piotr-AI has released Polanka (polanka_4b_v0.1_qwen3_gguf), a 4B parameter Polish language model based on the Qwen3 architecture. The model was created by continuously pre-training the Qwen3 4B base model on a single RTX 4090 for about 10 days, using high-quality Polish content mixed with multilingual, math, and code datasets, totaling approximately 1.4B tokens. Its GGUF format allows it to run quickly on laptops. (Source: Reddit r/LocalLLaMA)

Arlo Security Cameras Add AI Video Summary Feature: Arlo has added a new AI feature to its security camera system that automatically summarizes video content recorded by the cameras, helping users quickly understand key events and improving the convenience and efficiency of home security. (Source: Reddit r/artificial)

Gemini 2.0 Flash Preview Adds Image Generation and Editing Features: Google’s latest Gemini 2.0 Flash Preview model supports image generation and editing. Users can edit images in multi-turn conversations, and the documentation has been updated to showcase these new model capabilities. (Source: _philschmid)

📚 Learning

Andrew Ng Deep Learning Notes Compilation Project: A GitHub project (Andrew-NG-Notes) compiling notes for Andrew Ng’s deep learning courses has emerged, suitable for students wishing to start and systematically learn deep learning in conjunction with Coursera courses. It has already gained significant attention. (Source: karminski3)

Microsoft Releases Generative AI for Beginners Tutorial: Microsoft has launched the “generative-ai-for-beginners” tutorial, aimed at helping beginners understand the fundamental principles of large language models and guiding them to build Agent/RAG platforms programmatically. The GitHub repository has garnered over 82k stars, indicating its popularity. (Source: karminski3)

Free Math Textbook: “Algebra, Topology, Differential Calculus, and Optimization Theory For Computer Science and Machine Learning”: A free e-book by Jean Gallier and Jocelyn Quaintance comprehensively covers key mathematical foundations for computer science and machine learning, including linear algebra, affine and projective geometry, geometry of bilinear forms, topology and calculus, linear and nonlinear optimization, with examples of machine learning applications. (Source: TheTuringPost)

Teaching Suggestions for AI General Education Courses in Vocational Colleges: For AI general education courses in vocational colleges conducted entirely in computer labs, it is suggested that the curriculum focus on the application of generative AI, particularly text and image/video generation. By setting up a series of tasks from basic (Q&A, summarization, translation), intermediate (writing, data extraction, AI search/RAG) to advanced (AI-assisted programming, data analysis), students can learn through practice, cultivate interest, and independently supplement theoretical knowledge. (Source: dotey)

💼 Business

VCpedia: AI-Driven Startup Intelligence Platform: Yohei Nakajima has launched VCpedia, a daily briefing service that uses AI to analyze discussions about startup funding on X, enhances information with OpenAI and ExaAI, and is built with Replit Agents. The platform aims to provide venture capital with AI-driven deal sourcing and insights. (Source: yoheinakajima)

Rumors Suggest OpenAI May Adjust ChatGPT API Pricing Strategy: There are reports that ChatGPT might adjust its API pricing, introducing a credit-based billing model (e.g., 50 credits/USD, minimum $20, maximum $1000). This potential change has raised concerns among users, with some stating that if Plus and Pro users also have to pay API fees at this rate, they might consider switching to competitors like Grok or Gemini. (Source: scaling01)

China’s Baidu Applies for Patent to Interpret Animal Sounds with AI: Chinese tech giant Baidu is seeking a patent for an AI system to interpret animal sounds. If successful, this technology could open new possibilities in animal behavior research, species conservation, and human-animal communication. (Source: Reddit r/artificial)

🌟 Community

User Discusses AI’s Impact on Interpersonal Relationships and Mental Health: A Reddit post titled “I lost my mom to ChatGPT” sparked heated discussion. The poster claimed their mother became addicted to communicating with ChatGPT, leading to estranged family relationships and even emotional dependence on AI. The comments section explored issues like AI fulfilling emotional needs, real-world loneliness, technological alienation, and how to balance technology use with human interaction. Many comments pointed out that the mother might have been lonely to begin with, and AI merely filled an emotional void, advising the poster to communicate more with and accompany their mother. (Source: Reddit r/ChatGPT)

New Pope’s Choice of “Leo XIV” Name Possibly Inspired by AI Development: Reports and discussions indicate that the newly elected Pope chose “Leo XIV” as his pontifical name partly due to deep concern about cultural changes brought by artificial intelligence, robotics, and other technologies. Inspired by Leo XIII’s encyclical “Rerum Novarum” during the Industrial Revolution, he believes the Church should exert moral authority and academic strength in the current technological revolution to guide society in seriously addressing these changes. This topic has sparked reflection on AI ethics, societal impact, and how religious institutions adapt to technological advancements. (Source: jpt401, AndrewLampinen, jachiam0, itsclivetime)

AI-Generated “Ideal Woman” Images Spark Discussion: A Reddit user shared images of their “ideal woman” generated by ChatGPT based on its understanding of them, with results often depicting women in armor. This prompted community members to follow suit and share their own AI-generated results, discussing AI’s understanding of the “ideal” concept, how user data influences generated content, and common biases or patterns in AI-generated images. (Source: Reddit r/ChatGPT)

Creative AI Image Generation: “Figurine and Real Person in the Same Frame”: A social media user shared AI-generated images placing anime figurines and their corresponding real-life human counterparts in similar poses, providing the prompts used. This creative use showcases AI’s fun and customizable aspects in image generation, capable of creating visually engaging works with a sense of daily life and contrast based on specific user descriptions. (Source: dotey)

Increasing Demand for DSPy Framework Experience in AI/ML Recruitment: The job market shows growing demand for talent with experience in DSPy (a framework for programmatically optimizing language model prompts and weights). This reflects the industry’s emphasis on building more controllable, efficient language model applications capable of algorithmic optimization. (Source: lateinteraction)

Discussion on AI Application and Acceptance in the Workplace: Reddit users discussed their use of AI at work and the views of employers and colleagues. Most users reported that AI effectively improves work efficiency, such as assisting with programming, writing emails and reports, taking meeting minutes, and market research. Some companies encourage AI use, while others are cautious or opposed, leading employees to potentially use it discreetly. The discussion highlighted AI’s potential in boosting productivity while also touching upon correct understanding of AI capabilities and data security issues. (Source: Reddit r/ArtificialInteligence)

Is AI Eroding Reddit’s Core Competency—Human Interaction?: A Business Insider article notes that Reddit’s CEO believes its human-led community is its biggest competitive advantage, but AI bot-generated posts and comments are threatening this edge. Reddit has acknowledged the problem and plans to introduce new mechanisms to verify user identity, sparking discussions about AI content proliferation, community authenticity, and how online platforms will cope with AI-generated content in the future. (Source: Reddit r/artificial, Business Insider)

ManaBench: A New Benchmark for Testing LLM Reasoning via Magic: The Gathering Deck Building: Jake Boggs released ManaBench, a new benchmark that tests the reasoning abilities of Large Language Models (LLMs) through Magic: The Gathering deck-building tasks. The benchmark does not focus on game knowledge but evaluates the model’s strategic reasoning and system understanding capabilities, aiming to provide model differentiation relevant to user experience. (Source: Teknium1)

User Shares Experience of Using AI for Deep Research and Listening to it as Speech: A user shared their experience of using ChatGPT for in-depth research on a topic and then using tools like Speechify to convert the research results into audio in Obama’s voice for listening. This practice reflects AI’s potential in information acquisition and personalized content consumption but also raises concerns about potential decline in reading ability due to over-reliance on AI. (Source: Reddit r/artificial)

💡 Others

Former UK Government AI Risk Team Member Exposes Ethical Issues and Subsequent Repercussions: A former employee of the UK government’s central AI risk function publicly stated that after raising concerns within the team about ethical issues such as AI bias and discrimination, they faced stonewalling, surveillance, and institutional retaliation. The incident has sparked discussion about whistleblower protection in government technology environments and the effectiveness of public accountability mechanisms for AI ethics. (Source: Reddit r/ArtificialInteligence)

Indirect Impact of AI on “AI-Proof” Jobs: Discussion points out that even if certain skilled trades (like mechanics) are not easily replaced directly by AI, if AI leads to mass unemployment and a shrinking consumer base, these “AI-proof” jobs will also be impacted by insufficient demand. This reminds us to view AI’s impact on employment from a broader economic system perspective. (Source: Reddit r/artificial)

Viewpoint: LLMs Exploit Human Cognitive Biases by Simulating Intelligence: Pedro Domingos argues that Large Language Models (LLMs) excel at generating text that appears intelligent, which exploits the cognitive weakness of some people who have difficulty distinguishing genuine intelligence from “bullshit” (BS). (Source: pmddomingos)

🔥 Focus

🎯 Trends

🧰 Tools

📚 Learning

💼 Business

🌟 Community

💡 Others

Related Tags

Related Posts

AI Daily – 2025-08-13(Evening)

AI Daily – 2025-08-12(Evening)

AI Daily – 2025-08-12(Morning)