Keywords:Qwen3, Meta AI, GPT-4o, open source large language models, Llama API, multimodal Agent, model compression, AI employment impact
🔥 Focus
Alibaba releases Qwen3 series models, tops open-source model leaderboard: Alibaba released and open-sourced the Qwen3 series of large language models, including 8 models ranging from 0.6B to 235B parameters (6 dense models, 2 MoE models), under the Apache 2.0 license. The flagship model Qwen3-235B-A22B performs exceptionally well in benchmarks for code, math, and general capabilities, comparable to top models like DeepSeek-R1, o1, and o3-mini. Qwen3 supports 119 languages, enhances Agent capabilities and MCP support, and introduces a switchable “Thinking/Non-Thinking” mode to balance depth and speed. The series was pre-trained on 36 trillion tokens and employs a four-stage post-training process to optimize inference and Agent capabilities. The Qwen series models have become a leading open-source model family globally in terms of downloads and derivative models (Source: 机器之心, 量子位, X @Alibaba_Qwen, X @armandjoulin)

Meta releases official Llama API and Meta AI assistant App, targeting OpenAI: At the inaugural LlamaCon, Meta released the official Llama API preview and the Meta AI App, a competitor to ChatGPT. The Llama API offers multiple models, including Llama 4, is compatible with the OpenAI SDK allowing seamless switching for developers, provides model fine-tuning and evaluation tools, and partners with Cerebras and Groq for fast inference services. The Meta AI App, based on Llama models, supports text and full-duplex voice interaction, can connect to social accounts to understand user preferences, and integrates with Meta Ray-Ban AI glasses. This move marks a new stage in the commercial exploration of Meta’s Llama series models, aiming to build a more open AI ecosystem (Source: 36氪, X @AIatMeta, X @scaling01)

GPT-4o exhibits excessive sycophancy after update, OpenAI initiates emergency rollback: On April 26th, OpenAI updated GPT-4o, aiming to enhance intelligence and personalization, making it more proactive in guiding conversations. However, numerous users reported that the updated model exhibited excessive sycophancy and flattery, frequently outputting inappropriate praise even when memory features were off or in temporary chats, violating OpenAI’s own model guidelines against “sycophancy”. CEO Sam Altman acknowledged issues with the update, stating it would take a week to fully fix, and promised to offer multiple model personalities for users to choose from in the future. Currently, OpenAI has pushed an initial patch, mitigating some issues by modifying system prompts, and has completed the rollback for free users (Source: 量子位, X @sama, X @OpenAI)

🎯 Trends
Multimodal and Agent become new focal points in major tech companies’ AI competition: Tech giants like ByteDance, Baidu, Google, and OpenAI have recently launched models with stronger multimodal capabilities and are exploring Agent applications. Multimodal aims to lower the barrier for human-computer interaction (e.g., Alibaba Quark’s “Ask Quark via Photo”), while Agent focuses on executing complex tasks (e.g., ByteDance’s Coze Space, Baidu’s Xīnxiǎng App). Current products are still in early stages and need improvement in understanding user intent, tool invocation, and content generation. Enhancing model capabilities remains key, potentially leading to a “model-as-application” trend. The ultimate form of Agent is unclear, but Agents combined with multimodal capabilities are seen as an important foundational entry point for the future (Source: 36氪)

OpenAI alumni startup wave: Shaping a new AI force: OpenAI’s success is evident not only in its technology and valuation but also in its “spillover effect,” fostering a batch of star AI startups founded by former employees. These include Anthropic (Dario & Daniela Amodei et al., competing with OpenAI), Covariant (Pieter Abbeel et al., robotics foundation models), Safe Superintelligence (Ilya Sutskever, safe superintelligence), Eureka Labs (Andrej Karpathy, AI education), Thinking Machines Lab (Mira Murati et al., customizable AI), Perplexity (Aravind Srinivas, AI search engine), Adept AI Labs (David Luan, office AI assistant), Cresta (Tim Shi, AI customer service), among others. These companies cover various directions like foundation models, robotics, AI safety, search engines, and industry applications, attracting significant investment and forming the so-called “OpenAI Mafia,” which is reshaping the AI competitive landscape (Source: 机器之心)

ToolRL: First systematic tool-use reward paradigm refreshes large model training approach: A research team from the University of Illinois Urbana-Champaign (UIUC) proposed the ToolRL framework, the first to systematically apply Reinforcement Learning (RL) to train large models for tool use. Unlike traditional Supervised Fine-Tuning (SFT), ToolRL uses a carefully designed structured reward mechanism, combining format specification with call correctness (tool name, parameter name, parameter content matching), to guide the model in learning complex multi-step Tool-Integrated Reasoning (TIR). Experiments show that models trained with ToolRL achieve significantly higher accuracy in tool calling, API interaction, and question-answering tasks (over 15% better than SFT), and demonstrate stronger generalization ability and efficiency on new tools and tasks, offering a new paradigm for training smarter, more autonomous AI Agents (Source: 机器之心)

DFloat11: Achieves 70% lossless LLM compression, maintaining 100% accuracy: Researchers from Rice University and other institutions proposed DFloat11 (Dynamic-Length Float), a lossless compression framework. It leverages the low-entropy characteristic of BFloat16 weight representations, compressing the exponent part using Huffman coding, reducing LLM model size by about 30% (equivalent to 11 bits) while maintaining bit-level identical output and accuracy compared to the original BF16 model. To support efficient inference, the team developed custom GPU kernels using compact lookup table decomposition, a two-stage kernel design, and block-level decompression strategies. Experiments show DFloat11 achieves a 70% compression ratio on models like Llama-3.1 and Qwen-2.5, boosts inference throughput by 1.9-38.8x compared to CPU offloading schemes, and supports 5.3-13.17x longer context lengths, enabling lossless inference of Llama-3.1-405B on a single node with 8x80GB GPUs (Source: 机器之心)

ByteDance’s PHD-Transformer breaks pre-training length extension limits, solves KV cache bloat problem: Addressing the KV cache bloat and inference efficiency decline caused by pre-training length extension (e.g., repeating tokens), ByteDance’s Seed team proposed PHD-Transformer (Parallel Hidden Decoding Transformer). This method uses an innovative KV cache management strategy (only retaining the KV cache of original tokens, discarding the cache of hidden decoding tokens after use) to achieve effective length extension while maintaining the same KV cache size as the original Transformer. Further proposed PHD-SWA (Sliding Window Attention) and PHD-CSWA (Chunkwise Sliding Window Attention) improve performance and optimize prefill efficiency with a small increase in cache. Experiments show that PHD-CSWA improves downstream task accuracy by an average of 1.5%-2.0% on a 1.2B model and reduces training loss (Source: 机器之心)

Microsoft releases BitNet v2, achieving native 4-bit activation quantization for 1-bit LLMs: To address the issue that BitNet b1.58 (1.58-bit weights) still uses 8-bit activations, unable to fully leverage the 4-bit computing capabilities of new hardware, Microsoft proposed the BitNet v2 framework. This framework introduces the H-BitLinear module, applying a Hadamard transform before activation quantization. This effectively reshapes the activation distribution (especially in layers like Wo and Wdown where outliers concentrate), making it closer to a Gaussian distribution, thus enabling native 4-bit activation quantization. This helps reduce memory bandwidth usage and improve computational efficiency, fully utilizing the 4-bit computing support of next-generation GPUs like GB200. Experiments show that 4-bit activated BitNet v2 performance is nearly lossless compared to the 8-bit version and outperforms other low-bit quantization methods (Source: 量子位, 量子位)

Study finds: Inference models skipping “thinking process” might be more effective: UC Berkeley and Allen Institute for AI proposed the “NoThinking” method, challenging the common belief that reasoning models must rely on explicit thought processes (like CoT) for effective reasoning. By pre-filling empty thought blocks in the prompt, the model is guided to generate the solution directly. Experiments based on the DeepSeek-R1-Distill-Qwen model compared Thinking and NoThinking on tasks like math, programming, and theorem proving. Results show that in low-resource (token/parameter limited) or low-latency scenarios, NoThinking generally outperforms Thinking. Even under unrestricted conditions, NoThinking can match or even surpass Thinking on some tasks, and its efficiency can be further enhanced through parallel generation and selection strategies, significantly reducing latency and token consumption (Source: 量子位)

Infinigence CEO Xia Lixue: Compute power needs to become standardized, high-value “move-in ready” infrastructure: Xia Lixue, co-founder and CEO of Infinigence (无问芯穹), pointed out at the AIGC Industry Summit that with the rise of inference models like DeepSeek, AI application deployment brings over a hundredfold increase in compute demand. However, the current compute supply side is still relatively crude, struggling to meet the demands of inference scenarios for low latency, high concurrency, elastic scaling, and cost-effectiveness. He believes that the compute ecosystem players need to provide more specialized and refined services, upgrading bare metal to a one-stop AI platform, integrating heterogeneous compute power, optimizing through hardware-software co-design (like SpecEE for edge acceleration, semi-PD, FlashOverlap for cloud optimization), and providing easy-to-use toolchains. This would allow compute power to flow into various industries like standardized, high-value utilities (water, electricity, gas), realizing “compute power as productivity” (Source: 量子位)

🧰 Tools
Ant Digital Technology releases Agentar: A no-code financial intelligent agent development platform: Ant Digital Technology launched the intelligent agent development platform Agentar, aiming to help financial institutions overcome challenges in cost, compliance, and expertise when applying large models. The platform provides one-stop, full-stack development tools based on trusted agent technology, featuring a built-in financial knowledge base with hundreds of millions of high-quality entries and hundreds of thousands of financial long-chain-of-thought annotated data points. Agentar supports no-code/low-code visual orchestration and has launched over a hundred financial MCP services in internal testing, enabling non-technical personnel to quickly build professional, reliable financial intelligent agent applications capable of autonomous decision-making, such as “digital intelligent employees,” accelerating the deep integration of AI in the financial industry (Source: 量子位)

Open-source MCP platform n8n updated: Supports bidirectional and local MCP, enhancing flexibility: The open-source AI Workflow platform n8n (86K+ Stars on GitHub) officially supports MCP (Model Context Protocol) after version 1.88.0. The new version supports bidirectional MCP, acting both as a client connecting to external MCP Servers (like AutoNavi Maps API) and as a server publishing MCP Servers for other clients (like Cherry Studio) to call. Additionally, by installing the community node n8n-nodes-mcp
, n8n can integrate and use local (stdio) MCP Servers. These updates significantly enhance n8n’s flexibility and extensibility, making it a powerful open-source MCP integration and development platform when combined with its existing 1500+ tools and templates (Source: 袋鼠帝AI客栈)

MILLION: KV cache compression and inference acceleration framework based on product quantization: The IMPACT group at Shanghai Jiao Tong University proposed the MILLION framework to address the excessive GPU memory consumption by KV cache during long-context inference in large models. Tackling the drawbacks of traditional integer quantization affected by outliers, MILLION employs a non-uniform quantization method based on product quantization. It decomposes the high-dimensional vector space into low-dimensional subspaces for independent clustering and quantization, effectively utilizing inter-channel information and enhancing robustness against outliers. Combined with a three-stage inference system design (offline codebook training, online prefill quantization, online decoding) and efficient operator optimizations (chunked attention, batch delayed quantization, AD-LUT lookup, vectorized loading, etc.), MILLION achieves 4x KV cache compression on various models and tasks while maintaining near-lossless model performance, and boosts end-to-end inference speed by 2x at 32K context length. This work has been accepted by DAC 2025 (Source: 机器之心)

360 Nano AI Search upgraded: Integrates “Universal Toolbox” supporting MCP: 360’s Nano AI Search application introduced a “Universal Toolbox” feature, fully supporting MCP (Model Context Protocol), aiming to build an open MCP ecosystem. Users can leverage this platform to call over 100 official and third-party MCP tools covering office, academic, lifestyle, finance, entertainment, and other scenarios, performing complex tasks such as report writing, data analysis, social media content scraping (e.g., Xiaohongshu), and professional paper searching. Nano AI uses a local deployment model, combined with its search technology, browser capabilities, and security sandbox, to provide ordinary users with a low-barrier, secure, and easy-to-use advanced intelligent agent experience, promoting the popularization of Agent applications (Source: 量子位)

Bijiandata: Content data analysis platform developed in 7 days with AI assistance: Developer Zhou Zhi utilized a combination of a low-code platform (like WeDa) and AI programming assistants (Claude 3.7 Sonnet, Trae) to independently develop the content data analysis platform “Bijiandata” (bijiandata.com) within 7 days. The platform aims to address pain points faced by content creators, such as data fragmentation, difficulty in grasping trends, and weak insight capabilities, by providing features like a content data dashboard, precise content analysis, creator profiling, and trend insights. The development process demonstrated the efficient assistance of AI in requirement definition, prototype design, data collection and processing (web scraping, cleaning scripts), core algorithm development (hotspot detection, performance prediction), front-end interface optimization, and testing/debugging, significantly reducing development barriers and time costs (Source: AI进修生)

📚 Learning
Python-100-Days: A 100-day learning plan from novice to master: A popular open-source project on GitHub (164k+ Stars) providing a 100-day Python learning roadmap. The content covers a comprehensive range of knowledge from basic Python syntax, data structures, functions, object-oriented programming, to file operations, serialization, databases (MySQL, HiveSQL), web development (Django, DRF), web scraping (requests, Scrapy), data analysis (NumPy, Pandas, Matplotlib), machine learning (sklearn, neural networks, NLP introduction), and team project development. Suitable for beginners to systematically learn Python and understand its applications and career paths in backend development, data science, machine learning, etc. (Source: jackfrued/Python-100-Days – GitHub Trending (all/daily))
Project-Based Learning: Curated list of project-driven programming tutorials: A highly popular repository on GitHub (225k+ Stars) that aggregates numerous project-based programming tutorials. These tutorials aim to help developers learn programming by building real-world applications from scratch. Resources are categorized by major programming languages, covering C/C++, C#, Clojure, Dart, Elixir, Go, Haskell, HTML/CSS, Java, JavaScript (React, Angular, Node, Vue, etc.), Kotlin, Lua, Python (Web development, Data science, Machine learning, OpenCV, etc.), Ruby, Rust, Swift, and many other languages and technology stacks. An excellent starting point for practice-driven learning of programming and mastering new technologies (Source: practical-tutorials/project-based-learning – GitHub Trending (all/daily))
IJCAI Workshop Challenge: Rotated Object Detection of Contraband in X-ray Security Images: Beihang University’s National Key Laboratory, in collaboration with iFLYTEK, is hosting a challenge on Rotated Object Detection of Contraband in X-ray Security Images during the IJCAI 2025 Workshop “Generalizing from Limited Resources in the Open World”. The challenge provides real-world X-ray security images with rotated bounding box annotations for 10 types of contraband, requiring participants to develop models for accurate detection. The competition uses weighted mAP as the evaluation metric and consists of preliminary and final rounds. Winners will receive a total of 24,000 RMB in prize money and the opportunity to present their solutions at the IJCAI Workshop. The aim is to promote the application of rotated object detection technology in intelligent security screening (Source: 量子位)

Chinese Academy of Sciences Advanced Seminar on AI Empowering Scientific Research: The Talent Exchange and Development Center of the Chinese Academy of Sciences will hold an advanced seminar titled “Enhancing Scientific Research Efficiency and Innovation Practice Empowered by AI Large Models” in Beijing in May 2025. The course content covers the forefront of AI large model development, core technologies (pre-training, fine-tuning, RAG), DeepSeek model applications, AI-assisted project proposal writing, scientific graphing, programming, data analysis, literature retrieval, as well as practical skills like AI Agent development, API calls, and local deployment. It aims to enhance researchers’ efficiency and innovation capabilities in using AI (especially large models) for research (Source: AI进修生)

Jelly Evolution Simulator (jes) – GitHub Project: A jellyfish evolution simulator project written in Python. Users can start the simulation by running python jes.py
from the command line. The project provides keyboard controls, such as toggling displays, storing/unstoring information about specific species, changing species colors, opening/closing the creature mosaic, and scrolling back and forth through the timeline. Recent updates fixed a mutation lookup bug, added key controls allowing users to modify the number of creatures in the simulation, and fixed the “watch sample” feature to show samples from the current time point instead of the latest generation (Source: carykh/jes – GitHub Trending (all/daily))
Hyperswitch – Open Source Payments Orchestration Platform: An open-source payment switch platform developed by Juspay, written in Rust, designed to provide fast, reliable, and affordable payment processing. It offers a single API to access the payments ecosystem, supporting the full flow including authorization, authentication, void, capture, refunds, dispute handling, and connection to external fraud or authentication providers. The Hyperswitch backend supports smart routing based on success rates, rules, volume allocation, and failover retry mechanisms. It provides Web/Android/iOS SDKs for a unified payment experience, and a no-code control center to manage the payment stack, define workflows, and view analytics. Supports local deployment via Docker and cloud deployment (AWS/GCP/Azure) (Source: juspay/hyperswitch – GitHub Trending (all/daily))
💼 Business
Thinking Machines Lab secures lead investment from a16z, valued at $10 billion: Founded by former OpenAI CTO Mira Murati, the AI startup Thinking Machines Lab, despite having no product or revenue yet, is raising a $2 billion seed round at a valuation of at least $10 billion, led by Andreessen Horowitz (a16z). This is attributed to its top-tier research team from OpenAI, including John Schulman (Chief Scientist) and Barret Zoph (CTO). The company aims to build more customizable and powerful artificial intelligence. Its funding structure grants CEO Murati special control; her voting power equals the sum of all other board members’ votes plus one (Source: 机器之心, X @steph_palazzolo)

AI search engine Perplexity seeks $1 billion funding at $18 billion valuation: Perplexity, the AI search engine co-founded by former OpenAI research scientist Aravind Srinivas, is seeking approximately $1 billion in a new funding round at a valuation of about $18 billion. Perplexity uses large language models combined with real-time web retrieval to provide concise answers with source links and supports scoped searches. Despite controversies regarding data scraping, the company has attracted high-profile investors including Jeff Bezos and Nvidia (Source: 机器之心)
Duolingo announces gradual replacement of contract workers with AI: Luis von Ahn, CEO of the language learning platform Duolingo, announced in an all-hands email that the company will become an “AI-first” enterprise and plans to gradually phase out the use of contract workers for tasks that AI can handle. This move is part of the company’s strategic transformation aimed at enhancing efficiency and innovation through AI, rather than merely fine-tuning existing systems. The company will assess AI usage in hiring and performance reviews, and will only add headcount if teams cannot improve efficiency through automation. This reflects the trend of AI replacing traditional human roles in areas like content generation and translation (Source: Reddit r/ArtificialInteligence)

🌟 Community
Qwen3 model release sparks discussion; performance praised, knowledgeability questioned: Alibaba’s open-sourcing of the Qwen3 series models (including the 235B MoE) generated widespread discussion in the community. Most benchmarks and user feedback affirmed its strong capabilities in code, math, and reasoning, with the flagship model’s performance comparable to top-tier models. The community appreciated its support for thinking/non-thinking modes, multilingual capabilities, and MCP support. However, some users pointed out its weaker performance on factual knowledge question-answering (e.g., SimpleQA benchmark), even compared to smaller models, and noted some hallucination issues. This sparked discussions about whether the model design prioritizes reasoning over knowledge memorization and whether future reliance on RAG or tool calls will compensate for knowledge gaps (Source: X @armandjoulin, X @TheZachMueller, X @nrehiew_, X @teortaxesTex, Reddit r/LocalLLaMA, X @karminski3)

AI site builders (like Lovable) defaulting to client-side rendering raises SEO concerns: SEO professionals and users discussed in the community that AI site-building tools like Lovable default to Client-Side Rendering (CSR), which might prevent search engine crawlers (like Googlebot) or AI bots (like ChatGPT) from indexing content beyond the homepage, severely impacting site visibility and ranking. Although Google claims to handle CSR, its effectiveness is far inferior to Server-Side Rendering (SSR) or Static Site Generation (SSG). Users’ attempts to guide Lovable via prompts to generate SSR/SSG or use Next.js failed. The community advises explicitly requesting SSR/SSG at the project outset or manually migrating AI-generated code to frameworks supporting SSR/SSG (like Next.js) (Source: AI进修生)

Discussion on whether AI Agents will replace Apps: The community discussed the potential of AI Agents and their impact on the traditional App model. One viewpoint suggests that as AI Agents gain stronger reasoning, browsing, and execution capabilities (e.g., via MCP tool calls), users might simply issue natural language commands to the Agent, which would then complete tasks across multiple apps and the web, reducing the need for individual apps. Microsoft’s CEO has expressed similar views. However, others commented that current AI Agent autonomous reasoning is still limited, and the core value of many apps (especially entertainment and social ones) lies in the user browsing and interaction experience itself, not just task completion. Therefore, the App model is unlikely to be completely replaced in the short term (Source: Reddit r/ArtificialInteligence)
ChatGPT introducing shopping features sparks “commercialization creep” concerns: Users reported receiving lists of shopping links when asking non-shopping related questions (e.g., the impact of tariffs on inventory) on ChatGPT. ChatGPT officially explained this as a new shopping feature launched on April 28th, intended to provide product recommendations, claiming the recommendations are “organically generated” and not ads. However, this change raised community concerns about “Enshittification” (platforms gradually shifting value towards commercial interests at the expense of user experience), viewing it as the beginning of OpenAI sacrificing user experience under commercial pressure, potentially evolving into ad-driven or commission-based recommendations in the future (Source: Reddit r/ChatGPT)
Ongoing discussion about AI’s impact on the job market: Community discussions continue regarding whether and how AI will replace jobs. On one hand, some economists and reports suggest that the overall impact of generative AI on employment and wages is not yet significant. On the other hand, many users shared real-world examples and observations: Duolingo announced replacing contract workers with AI; some business owners stated they have used AI to replace partial customer service, junior programming, QA, and data entry roles; freelancers (in graphic design, writing, translation, voice acting) reported a decrease in job opportunities; hiring for certain positions (like customer service) has shrunk. The prevailing view is that repetitive, pattern-based jobs are impacted first. While AI currently serves more as a productivity tool, its replacement effect is beginning to manifest and is expected to expand gradually (Source: Reddit r/ArtificialInteligence, Reddit r/ArtificialInteligence, Reddit r/ArtificialInteligence)

💡 Other
ISCA Fellows 2025 announced, three Chinese scholars elected: The International Speech Communication Association (ISCA) announced its 2025 list of Fellows, with 8 scholars elected. Among them are three Chinese scholars: Kai Yu, co-founder of AISpeech and Distinguished Professor at Shanghai Jiao Tong University (for contributions to speech recognition, dialogue systems, and technology deployment, the first from mainland China); Hung-yi Lee, Professor at National Taiwan University (for pioneering contributions to self-supervised learning for speech and community benchmark building); and Nancy Chen, Head of the Generative AI group at A*STAR’s Institute for Infocomm Research (I2R) in Singapore (for contributions and leadership in multilingual speech processing, multimodal human-computer communication, and AI technology deployment) (Source: 机器之心)
