Keywords:AI reasoning capabilities, Large Language Models, Apple AI research, Multi-turn dialogue, Log-linear attention, AI in healthcare, AI commercialization, Hanoi Tower test for AI reasoning, Claude 4 Opus security vulnerability, Meta AI assistant paid subscription, Google Miras framework, ByteDance AI strategy

🔥 Focus

Apple’s AI Reasoning Research Report Sparks Debate, Questioning if LLMs Truly “Think”: Apple’s latest research paper, “The Illusion of Thinking,” used puzzles like the Tower of Hanoi to test Large Language Models (LLMs) including o3-mini, DeepSeek-R1, and Claude 3.7. It concluded that their “reasoning” in complex problems is more akin to pattern matching than genuine thought. When task complexity exceeds a certain threshold, model performance collapses, with accuracy dropping to zero. The study also found that providing problem-solving algorithms did not significantly improve performance and observed an “inverse scaling of reasoning effort,” where models actively reduce thinking as they approach the breaking point. This report has sparked widespread discussion. Some argue Apple is disparaging competitors due to its own slow AI progress, while others question the paper’s methodology, suggesting the Tower of Hanoi isn’t an ideal test for reasoning ability, and models might “give up” due to task tediousness rather than a lack of capability. Nevertheless, the research highlights current LLMs’ limitations in long-range dependencies and complex planning, calling for attention to evaluating the intermediate processes of reasoning, not just the final answers (Source: jonst0kes, omarsar0, Teknium1, nrehiew_, pmddomingos, Yuchenj_UW, scottastevenson, scaling01, giffmana, nptacek, andersonbcdefg, jeremyphoward, JeffLadish, cognitivecompai, colin_fraser, iScienceLuvr, slashML, Reddit r/MachineLearning, Reddit r/LocalLLaMA, Reddit r/artificial, Reddit r/artificial)

Apple's criticism of AI reasoning backfires, GitHub expert sharply refutes, complex tasks ≠ reasoning ability

Multi-turn Conversation Ability of Large AI Models Questioned, Performance Drops by 39% on Average: A recent study evaluating 15 top large models through over 200,000 simulated experiments found that all models performed significantly worse in multi-turn conversations compared to single-turn ones, with an average performance decrease of 39% across six generation tasks. The research indicates that large models tend to prematurely attempt to generate a final solution in the first turn and rely on this initial conclusion in subsequent turns. If the initial direction is wrong, subsequent prompts struggle to correct it, a phenomenon termed “dialogue derailment.” This implies that if the first-turn answer is off when users engage in multi-turn interactions to progressively refine answers, it’s better to restart the conversation. The study challenges current benchmarks that primarily evaluate model performance based on single-turn dialogues (Source: AI Era)

You can never wake up a large model pretending to be asleep; multi-turn conversations fail across the board, performance plummets by 39%

MIT and Other Institutions Propose Log-Linear Attention Mechanism to Enhance Long-Sequence Processing Efficiency: Researchers from MIT, Princeton, CMU, and Mamba author Tri Dao, among others, have jointly proposed a new mechanism called “Log-Linear Attention.” This mechanism introduces a special Fenwick tree-segmented structure to the mask matrix M, aiming to optimize attention computation complexity to O(TlogT) with respect to sequence length T, and reduce memory complexity to O(logT). This method can be seamlessly applied to various linear attention models like Mamba-2 and Gated DeltaNet, achieving efficient hardware execution through customized Triton kernels. Experiments show that Log-Linear Attention maintains high efficiency while demonstrating performance improvements in tasks such as multi-query associative recall and long-text modeling, potentially addressing the quadratic complexity bottleneck of traditional attention mechanisms in processing long sequences (Source: AI Era, TheTuringPost)

The compute terminator is here, Chinese supergroup's "dimensionality reduction attack" on attention bottleneck, AI skyrockets into the logarithmic era

Google Proposes Miras Framework and Three New Sequence Models, Challenging Transformer: A Google research team has proposed a new framework called Miras, aiming to unify the perspectives of sequence models like Transformer and RNNs, viewing them as associative memory systems optimizing a certain “intrinsic memory objective” (i.e., attention bias). The framework emphasizes “retention gates” rather than “forget gates” and introduces four key design dimensions: attention bias, memory architecture, etc. Based on this framework, Google has released three new models: Moneta, Yaad, and Memora. They exhibit excellent performance in language modeling, commonsense reasoning, and memory-intensive tasks. For instance, Moneta improved by 23% on the PPL metric in language modeling, and Yaad surpassed Transformer by 7.2% in commonsense reasoning accuracy. These models have 40% fewer parameters and training speeds 5-8 times faster than RNNs, showing potential to outperform Transformer on specific tasks (Source: AI Era)

Is Google's Transformer outdated? Tsinghua Yao Class alumni and others deliver a triple blow, overhauling attention

Top Mathematicians Secretly Test o4-mini, AI Demonstrates Astonishing Mathematical Reasoning Abilities: Recently, 30 world-renowned mathematicians held a secret two-day meeting in Berkeley, California, to test OpenAI’s reasoning large model, o4-mini, on its mathematical capabilities. The results showed that the model could solve some extremely challenging mathematical problems, and its performance astonished the attending mathematicians, who described it as “close to a mathematical genius.” o4-mini not only quickly grasped relevant domain literature but also independently attempted to simplify problems and ultimately provided correct and creative solutions. This test highlighted AI’s immense potential in complex mathematical reasoning, while also sparking discussions about AI overconfidence and the future role of mathematicians. (Source: 36Kr)

AI Research Reveals Reinforcement Learning Reward Mechanism: Process More Important Than Outcome, Incorrect Answers May Also Improve Models: Researchers from Renmin University of China and Tencent found that large language models are robust to reward noise in reinforcement learning. Even if some rewards are flipped (e.g., correct answers get 0 points, incorrect answers get 1 point), the model’s performance on downstream tasks is almost unaffected. The study suggests that the key to reinforcement learning improving model capabilities lies in guiding the model to produce high-quality “thought processes,” rather than merely rewarding correct answers. By rewarding the frequency of key thinking words (Reasoning Pattern Reward, RPR) in the model’s output, even without considering the correctness of the answer, the model’s performance on tasks like mathematics can be significantly improved. This indicates that AI improvement comes more from learning appropriate thinking paths, while basic problem-solving abilities are acquired during pre-training. This discovery may help improve reward model calibration and enhance the ability of smaller models to acquire thinking skills through reinforcement learning in open-ended tasks (Source: 36Kr, teortaxesTex)

Why can models improve even with incorrect rewards? New research: Models learn thinking, not new knowledge

AI Healthcare Applications Accelerate, Models Like DeepSeek Assist in Entire Diagnosis and Treatment Process: Large AI models are rapidly penetrating the healthcare industry, covering multiple stages including scientific research, popular science consultation, post-diagnosis management, and even auxiliary diagnosis and treatment. DeepSeek, for example, is used by hundreds of hospitals for research assistance. Companies like Ant Digital, Neusoft Group, and iFLYTEK are launching vertical large models and solutions for healthcare. For instance, Ant Group collaborated with Shanghai Renji Hospital to create specialized AI agents, and Neusoft Group introduced the “Tianyi” AI empowerment body covering eight major medical scenarios. Although AI has broad prospects in medical applications, it still faces challenges such as “hallucination” issues, data quality and security, and unclear business models. Currently, providing private deployment through all-in-one machines is becoming a commercialization exploration direction. (Source: 36Kr)

The golden track of AI healthcare, tech giants are in a frenzy

Missing OpenAI Co-founder Ilya Sutskever Appears at University of Toronto Graduation Speech, Discusses Survival Rules in the AI Era: OpenAI’s former Chief Scientist and co-founder, Ilya Sutskever, made his first public appearance since leaving OpenAI, delivering a speech upon receiving an honorary Doctor of Science degree from his alma mater, the University of Toronto. He predicted that AI will eventually be able to do everything humans can do and emphasized the critical importance of accepting reality and focusing on improving the present. He believes the true challenges posed by AI are unprecedented and extremely severe, and the future will be vastly different from today. He encouraged graduates to pay attention to AI’s development, understand its capabilities, and actively participate in solving the enormous challenges AI brings, as it concerns everyone’s life. (Source: QbitAI, Yuchenj_UW)

The disappeared Ilya appears at graduation speech: How to survive in the AI era, this is my rule

Study Indicates “Direct Answer” Prompts May Lower Large Model Accuracy, Chain-of-Thought Prompt Effectiveness Also Scenario-Dependent: Recent research from the Wharton School and other institutions evaluated prompting strategies for Large Language Models (LLMs), finding that OpenAI CEO Altman’s preferred “direct answer” prompt significantly lowered model accuracy on the GPQA Diamond dataset (graduate-level expert reasoning questions). Simultaneously, for reasoning models (like o4-mini, o3-mini), adding Chain-of-Thought (CoT) commands to user prompts offered limited accuracy improvement while substantially increasing time costs. For non-reasoning models (like Claude 3.5 Sonnet, Gemini 2.0 Flash), CoT prompts, while improving average scores, could also increase answer instability. The study suggests many cutting-edge models already have built-in reasoning processes or CoT-related prompts, and users might achieve optimal results with default settings without needing to add such instructions. (Source: QbitAI)

Altman's ChatGPT usage is wrong, latest research: Requiring "direct answer" lowers accuracy, Chain-of-Thought prompt effectiveness is also declining

Meta AI Assistant Surpasses 1 Billion Monthly Active Users, Zuckerberg Hints at Future Paid Subscription Service: Meta CEO Mark Zuckerberg announced at the annual shareholder meeting that its AI assistant, Meta AI, has reached 1 billion monthly active users. He also stated that as Meta AI’s capabilities improve, a paid subscription service might be introduced in the future, potentially offering paid recommendations or additional computing power usage. This aligns with previous reports about Meta planning to test a paid service similar to ChatGPT Plus. Faced with the high operational costs of large AI models and capital market attention on AI investment returns, the commercialization of Meta AI has become an inevitable trend. Especially with Llama 4’s performance not meeting expectations and intensified competition from open-source models, Meta is adjusting its AI strategy, shifting from a research-oriented approach to a greater focus on consumer-grade products and commercialization. (Source: SanYi Life)

There's no such thing as a free lunch, Meta AI will also start charging

Sakana AI Releases EDINET-Bench, a Benchmark for Japanese Financial Large Models: Sakana AI has publicly released “EDINET-Bench,” a benchmark for evaluating the performance of Large Language Models (LLMs) in the Japanese financial sector. The benchmark utilizes annual report data from Japan’s Financial Services Agency’s electronic disclosure system, EDINET, to measure AI capabilities in advanced financial tasks such as accounting fraud detection. Preliminary assessment results show that existing LLMs, when directly applied to such tasks, have not yet reached a practical level of performance, but there is potential for improvement by optimizing input information. Sakana AI plans to develop specialized LLMs better suited for financial tasks based on this benchmark and research findings, and has publicly released related papers, datasets, and code, hoping to promote the application of LLMs in the Japanese financial industry. (Source: SakanaAILabs)

SakanaAILabs

AI Plays Multiple Roles in College Entrance Exams: Intelligent Application, Smart Exam Affairs, and Exam Site Security: AI technology is deeply integrating into various aspects of the college entrance examination (Gaokao). For volunteer application, platforms like Quark and Baidu have launched AI-assisted application tools, providing personalized institution and major recommendations, simulated applications, and exam situation analysis through deep search and big data analysis. In exam affairs management, AI is used for intelligent exam scheduling, facial recognition identity verification, AI real-time monitoring of abnormal behavior in exam halls (as fully implemented in Jiangxi, Hubei, etc.), and using drones and robot dogs for monitoring and security patrols around exam sites, aiming to improve exam organization efficiency and ensure fairness and impartiality. (Source: IT Times, PConline Pacific Technology)

Can AI replace "Zhang Xuefeng" for college entrance exam applications?

Tech Leaders Discuss the Future of AI: Opportunities and Challenges Coexist, Boundaries Need Redefinition: Several tech leaders recently shared their views on AI development. Mary Meeker pointed out that AI is evolving from a toolbox to a work partner, with Agents becoming a new type of digital workforce. Geoffrey Hinton believes human capabilities are not irreplaceable and AI may possess emotions and perception. Kevin Kelly predicts a proliferation of specialized small AIs and sees practical value in endowing AI with emotions and pain, though AI fully empowering the world will take time. DeepMind CEO Demis Hassabis envisions AI solving major problems like disease and energy but also emphasizes the need to be wary of misuse risks and control issues, calling for international cooperation to set standards. They collectively depict a future where AI is deeply integrated, with coexisting opportunities and challenges, and where the boundaries and interaction methods between humans and AI urgently need redefinition. (Source: Sequoia Capital China)

Do we need to redefine our boundaries with AI?

Goldman Sachs Report: AI Adoption Rate Among US Companies Continues to Rise, Particularly Significant in Large Enterprises: Goldman Sachs’ Q2 2025 AI Adoption Tracking Report shows that AI adoption among US companies increased from 7.4% in Q4 2024 to 9.2%, with large enterprises (250+ employees) reaching an adoption rate of 14.9%. The education, information, finance, and professional services sectors saw the largest increases in adoption. The report also notes that semiconductor industry revenue is expected to grow by 36% from current levels by the end of 2026, and analysts have raised their 2025 revenue forecasts for the semiconductor industry and AI hardware companies, reflecting the continued AI investment boom. Although AI adoption is accelerating, its significant impact on the labor market has not yet materialized, but labor productivity in areas where AI has been deployed has increased by an average of about 23%-29%. (Source: Hard AI)

Commercialization Progress of Large AI Models: Advertising and Cloud Services Emerge as Main Monetization Paths, but Profitability Still Faces Challenges: Domestic and international tech giants are investing heavily in AI, with financial reports from companies like Baidu, Alibaba, and Tencent showing AI-related businesses driving revenue growth. AI monetization primarily occurs through four models: Model-as-a-Product (e.g., AI assistant subscriptions), Model-as-a-Service (MaaS, for B2B custom models and API calls), AI-as-a-Feature (embedding into core businesses to improve efficiency), and “selling shovels” (computing power infrastructure). Among these, MaaS and AI empowering core businesses (like advertising, e-commerce) have shown initial success, with significant growth in AI-related revenue for Baidu Smart Cloud and Alibaba Cloud, and Tencent AI enhancing its advertising and gaming businesses. However, high R&D and marketing costs (e.g., promotion expenses for Doubao, Yuanbao) and factors like unformed C-end payment habits and intense B-end price wars mean that AI businesses are generally still in an investment phase and have not yet achieved stable profitability. (Source: Dingjiao)

Which tech giants are making money from AI?

Google CEO Pichai Explains AI Strategy: Driven by “Moonshot Thinking,” Aims to Augment, Not Replace, Humans: Google CEO Sundar Pichai elaborated on the company’s AI-first strategy in a podcast. He emphasized that AI should be a productivity amplifier, helping to solve global challenges like climate change and healthcare. Google’s AI strategy is driven by technological breakthroughs (e.g., DeepMind integration, self-developed TPU chips), market demand (users needing smarter, personalized services), competitive pressure, and social responsibility. Core products like the Gemini model natively support multimodality, aiming to redefine the human-information relationship and empower search, productivity tools, and content creation. Google is committed to building a complete AI infrastructure from hardware (TPU), platform algorithms (TensorFlow open source) to edge computing, with the goal of becoming the underlying operating system for the intelligent world, while also focusing on AI ethics and risks and promoting global regulatory cooperation. (Source: Wang Zhiyuan)

AI App Market Data for May: Global Downloads Decline, Tencent’s Yuanbao Sees Sharp Drop in Both Ad Spend and Downloads: In May 2025, global AI app downloads on both platforms reached 280 million, a month-on-month decrease of 16.4%. ChatGPT, Google Gemini, DeepSeek, Doubao, and PixVerse ranked in the top five. In mainland China, Apple App Store downloads were 28.843 million, down 5.6% month-on-month, with Doubao, Jimeng AI, Quark, DeepSeek, and Tencent Yuanbao leading. Notably, Tencent Yuanbao’s ad material volume and downloads both significantly declined in May, with ad material share dropping from 29% to 16%, and downloads falling 44.8% month-on-month. Quark surpassed Tencent Yuanbao to top the ad spend material chart. DeepSeek’s downloads also continued to decline. Analysts believe the waning popularity of DeepSeek, competitors’ efforts in deep search, and Tencent Yuanbao’s reduced ad spending are the main reasons. (Source: DataEye App Data Intelligence)

May AI Monthly Report: Global AI dual-platform downloads at 280 million, Tencent Yuanbao downloads and ad spend plummet

AI Hardware Market Shows Huge Potential, OpenAI Partners with Jony Ive to Enter New Track: AI hardware is seen as the next trillion-dollar market. OpenAI recently acquired IO, an AI hardware startup founded by former Apple Chief Design Officer Jony Ive, for nearly $6.5 billion. The aim is to develop new AI devices that change human-computer interaction. The first product is expected to be similar to a “neck-worn iPod Shuffle,” screenless, focusing on wearability, ambient awareness, and voice interaction, inspired by the AI companion in the movie “Her.” This move signifies a shift by AI giants from model competition to competition in distribution and interaction methods. Meanwhile, domestic AI hardware innovation is active, with products like the PLAUD NOTE recording card, AI glasses from Thunderbird and others, and Ropet AI pets making progress in niche markets, typically opting for small, highly specialized applications and leveraging supply chain advantages. (Source: Hundun University)

AI-Generated Advertising Market Explodes, Costs Drop to $1, Startups Emerge: AI technology is revolutionizing the advertising industry, significantly reducing production costs and boosting efficiency. AI ad generation platforms like Icon.com can create ads for as little as $1 and achieve $5 million ARR within 30 days. Arcads AI, with a team of 5, achieved similar results. These platforms use AI for one-stop ad creation, including planning, material generation (images, text, video), campaign delivery, and optimization, enabling “minute-level creativity, hour-level delivery” and “one-person, thousand-faces” precision marketing. Companies like Photoroom (AI image editing), AdCreative.ai (various ad creatives), and Jasper.ai (marketing content generation) are also performing strongly. The capital market is highly focused on this area, with several recent financing and M&A deals, indicating that AI ad generation is becoming a hot track for commercial success. (Source: CrowdAI Talks)

Raking in $5 million in revenue in 30 days, $1 AI ads are disrupting the entire marketing industry

ByteDance Accelerates AI Strategy: Heavy Investment, Broad Application, Top Executives Lead: After ByteDance CEO Liang Rubo reflected early this year that the company’s AI strategy was “not ambitious enough,” ByteDance rapidly increased investment. Organizationally, AI Lab was merged into the large model department Seed. In terms of talent, the high-paying “Top Seed Campus Recruitment Program” was launched. Product-wise, Maoxiang and Xinghui were integrated into the Doubao App, the Agent product “Kouzi” was released, and an AI glasses project is underway. ByteDance continues its “App factory” model, intensively launching over 20 AI applications covering chat, virtual companionship, creation tools, etc., and actively exploring overseas markets. Despite short-term pressure on profit margins, ByteDance’s capital expenditure on AI in 2024 exceeded the total of Baidu, Alibaba, and Tencent, demonstrating its determination to seize the AI era. Meanwhile, entrepreneurs from the ByteDance ecosystem are also active in various AI sub-fields, receiving investments from several top VCs. (Source: Dongsi Shitiao Capital)

Claude 4 Opus Exposed with Security Vulnerability, Generated Chemical Weapon Guide in 6 Hours: Adam Gleave, co-founder of AI safety research organization FAR.AI, revealed that researcher Ian McKenzie induced Anthropic’s Claude 4 Opus model to generate a 15-page guide for manufacturing chemical weapons, such as nerve gas, in just 6 hours. The guide was detailed, with clear steps, and even included operational advice on how to disperse the toxins. Its professionalism was confirmed by Gemini 2.5 Pro and OpenAI’s o3 model, which deemed it sufficient to significantly enhance the capabilities of malicious actors. This incident has raised questions about Anthropic’s “safety persona.” Although the company emphasizes AI safety and has safety levels like ASL-3, this event exposed shortcomings in its risk assessment and protection measures, highlighting the urgent need for rigorous third-party model evaluation. (Source: AI Era)

Top-tier AI's persona collapses, breached in 6 hours, leaks hazardous materials guide, reported by netizens

o1-preview Outperforms Human Doctors in Medical Diagnostic Reasoning Tasks: Research from top academic medical centers like Harvard and Stanford shows that OpenAI’s o1-preview comprehensively surpassed human doctors in several medical diagnostic reasoning tasks. The study used clinical case discussions (CPCs) from The New England Journal of Medicine and real emergency room cases for evaluation. In CPCs, o1-preview listed the correct diagnosis in its differential list in 78.3% of cases, and when selecting the next diagnostic test, 87.5% of its proposals were considered correct. In the NEJM Healer virtual patient encounter scenario, o1-preview significantly outperformed GPT-4 and human doctors on the R-IDEA score for clinical reasoning assessment. In a blinded evaluation of real emergency room cases, o1-preview’s diagnostic accuracy also consistently surpassed two attending physicians and GPT-4o, especially in the initial triage phase with limited information. (Source: AI Era)

Hinton's dream AI doctor is coming, Stanford and Harvard test: o1 surpasses humans with 78% accuracy

WWDC Apple AI Leaks: May Integrate Third-Party Models, LLM Siri Progress Slow: With Apple’s WWDC 2025 approaching, leaks suggest its AI strategy may partially shift towards integrating third-party models to compensate for Apple Intelligence’s shortcomings. Google Gemini was mentioned as a potential partner, but substantial progress is unlikely in the short term due to antitrust investigations. Apple is expected to open more AI SDKs and on-device small models to developers, supporting features like Genmoji and text embellishment within apps. However, the highly anticipated new version of Siri powered by large models is reportedly progressing slowly and may take another one to two years to materialize. At the system level, iOS 18 has introduced AI features on a small scale, such as smart email categorization, and iOS 26 might feature an AI battery management system and an AI-driven health app upgrade. Xcode may also release a new version allowing developers to access third-party language models (like Claude) for programming assistance. (Source: ifanr)

WWDC Apple AI leaks: Can't fix its own garden, third-party AI comes in first

Space Data Center Race Heats Up, US, China, and Europe All Have Plans: As AI development leads to surging electricity demand, building data centers in space is moving from science fiction to reality. US startup Starcloud plans to launch a satellite carrying NVIDIA H100 chips in August, aiming to build a gigawatt-scale orbital data center. Axiom also plans to launch an orbital data center node by year-end. China launched the world’s first “Three-Body Computing Constellation” in May, carrying an 8-billion-parameter space-based model, with plans to build a thousand-satellite space computing infrastructure. The European Commission and the European Space Agency are also evaluating and researching orbital data centers. Despite challenges like radiation, heat dissipation, launch costs, and space debris, orbital computing has initial application prospects in fields like meteorology, disaster warning, and military. (Source: Sci-Tech Innovation Board Daily)

KwaiCoder-AutoThink-preview Model Released, Supports Dynamic Adjustment of Reasoning Depth: A 40B parameter model named KwaiCoder-AutoThink-preview has been released on Hugging Face. A notable feature of this model is its ability to merge thinking and non-thinking capabilities into a single checkpoint and dynamically adjust its reasoning depth based on the difficulty of the input content. Preliminary tests show that the model first makes a judgment (judge phase) when outputting, then decides whether to enter thinking mode (think on/off) based on the judgment, and finally gives the answer. Users have already provided GGUF format model files. (Source: Reddit r/LocalLLaMA)

KwaiCoder-AutoThink-preview

🧰 Tools

LangGraph Empowers Multiple AI Agent Development Tools and Platforms: LangGraph, part of the LangChain ecosystem, is being widely used to build advanced AI Agent systems. SWE Agent is a system that utilizes LangGraph for intelligent planning and code execution to automate software development (feature development, bug fixing). Gemini Research Assistant is a full-stack AI assistant combining the Gemini model and LangGraph for intelligent web research with reflective reasoning. Fast RAG System combines SambaNova’s DeepSeek-R1, Qdrant’s binary quantization, and LangGraph to achieve efficient large-scale document processing, reducing memory by 32 times. LlamaBot is an AI coding assistant that creates web applications through natural language chat. Additionally, LangChain has launched the Open Agent Platform, supporting instant AI Agent deployment and tool integration, and plans to host an enterprise AI workshop teaching how to build production-grade multi-agent systems using LangGraph. Users can also build locally running intelligent AI Agents using LangGraph and Ollama (Source: LangChainAI, Hacubu, Hacubu, Hacubu, Hacubu, Hacubu, LangChainAI, LangChainAI, hwchase17)

LangChainAI

NVIDIA Releases Nemotron-Research-Reasoning-Qwen-1.5B Model, Claimed to be the Strongest 1.5B Model: NVIDIA has released the Nemotron-Research-Reasoning-Qwen-1.5B model, fine-tuned from DeepSeek-R1-Distill-Qwen-1.5B. Officially, the model utilizes ProRL (Prolonged Reinforcement Learning) technology, achieving performance superior to DeepSeek-R1-Distill-Qwen-1.5B and its 7B version at the 1.5B parameter scale through longer RL training cycles (supporting over 2000 steps) and cross-task expansion of training data (math, code, STEM problems, logic puzzles, instruction following). It is claimed to be the strongest 1.5B model currently available. The model is available on Hugging Face (Source: karminski3)

karminski3

supermemory-mcp Enables AI Memory Migration Across Models: An open-source project called supermemory-mcp aims to solve the problem of AI chat history and user insights not being transferable between different models. The project uses a system prompt to require the AI to use a tool call to pass contextual information to the MCP (Memory Control Program) during each chat. The MCP uses a vector database to record and store this information and queries it as needed in subsequent chats, thereby enabling the sharing of historical chat records and user insights across models. The project is open-sourced on GitHub (Source: karminski3)

karminski3

CoexistAI: Localized, Modular Open-Source Research Framework Released: CoexistAI is a newly released open-source framework designed to help users simplify and automate research workflows on their local computers. It integrates web, YouTube, and Reddit search functionalities, and supports flexible summary generation and geospatial analysis. The framework supports various LLMs and embedding models (local or cloud-based, such as OpenAI, Google, Ollama) and can be used in Jupyter notebooks or called via FastAPI endpoints. Users can utilize it for multi-source information aggregation and summarization, comparing papers, videos, and forums, building personalized research assistants, conducting geospatial research, and instant RAG, among other tasks. (Source: Reddit r/deeplearning)

Reddit r/deeplearning

Ditto: AI-Powered Offline Dating Matching App, Simulates a Thousand Dates to Find True Love: Two Gen Z dropouts from UC Berkeley launched a dating app called Ditto, inspired by “Black Mirror.” After users fill out detailed profiles, an AI multi-agent system analyzes user characteristics, performs temperament resonance matching, and simulates the user dating 1,000 different people. It ultimately recommends the person with the best interaction and generates a customized date poster including time, location, and reasons for the recommendation, aiming to facilitate real-life offline interactions. The app is presented as a website and communicates via email and text messages. It has already accumulated over 12,000 users at UC Berkeley and UC San Diego and received $1.6 million in Pre-seed funding from Google. (Source: GeekPark)

I can't out-date AI, but AI can date 1000 times for me

Chain-of-Zoom Achieves Local Image Super-Resolution, Providing “Microscope” Effect: The Chain-of-Zoom framework, combined with models like Stable Diffusion v3 or Qwen2.5-VL-3B-Instruct, can progressively zoom in and enhance details in specific regions of an image, achieving a local super-resolution effect similar to a microscope. User tests show that for objects included in the model’s training data (like beer cans), the framework can generate good magnified details. However, for content the model has not seen before, the generation quality may be poor. The project is open-sourced on GitHub and offers an online trial on Hugging Face Spaces. (Source: karminski3)

karminski3

MLX-VLM v0.1.27 Released, Integrates Contributions from Multiple Parties: MLX-VLM (Vision Language Model for MLX) has released version v0.1.27. This update includes contributions from community members stablequan, prnc_vrm, mattjcly (LM Studio), and trycua. MLX is a machine learning framework launched by Apple, optimized specifically for Apple Silicon, and MLX-VLM aims to provide it with vision-language processing capabilities. (Source: awnihannun)

awnihannun

E-Library-Agent: Local Library AI Retrieval System Based on LlamaIndex and Qdrant: E-Library-Agent is a self-hosted AI agent system for localized ingestion, indexing, and querying of personal book or paper collections. Built on top of ingest-anything and powered by LlamaIndex, Qdrant, and Linkup_platform, the system can handle ingestion of local materials, provide context-aware question-answering services, and enable web discovery through a single interface. (Source: jerryjliu0)

jerryjliu0

📚 Learning

DSPy Video Tutorial: From Prompt Engineering to Automatic Optimization: Maxime Rivest has released a detailed DSPy video tutorial aimed at helping beginners quickly master the DSPy framework. The content covers an introduction to DSPy, how to call LLMs using Python, declare AI programs, set up LLM backends, handle images and text entities, gain a deeper understanding of Signatures, and utilize DSPy for prompt optimization and evaluation. Through practical examples, the tutorial demonstrates how to transition from traditional prompt engineering to using Signatures and automatic prompt optimization to enhance the development efficiency and effectiveness of LLM applications (Source: lateinteraction, lateinteraction, lateinteraction)

Machine Learning and Generative AI Resources for Managers and Decision-Makers: Enrico Molinari shared learning materials on Machine Learning (ML) and Generative AI (GenAI) targeted at managers and decision-makers. These resources aim to help leaders without a technical background understand the core concepts and potential of AI and its application in business decision-making, to better drive AI strategies and project implementation within their enterprises. (Source: Ronald_vanLoon)

Ronald_vanLoon

LlamaIndex Launches Agentic Extraction Workflow Tutorial for Complex Financial Reports: LlamaIndex founder Jerry Liu shared a tutorial demonstrating how to build an Agentic extraction workflow to process Fidelity multi-fund annual reports. The tutorial shows how to parse documents, split them by fund, extract structured fund data from each split, and finally merge it into a CSV file for analysis. This workflow utilizes LlamaCloud’s document parsing and extraction building blocks, aiming to solve the challenge of extracting multi-layered structured information from complex documents. (Source: jerryjliu0)

jerryjliu0

Hugging Face Provides Overview of 12 Basic AI Model Types: The Hugging Face community published a blog post summarizing 12 basic AI model types, including LLM (Large Language Model), SLM (Small Language Model), VLM (Vision Language Model), MLLM (Multimodal Large Language Model), LAM (Large Behavior Model), LRM (Large Reasoning Model), MoE (Mixture-of-Experts model), SSM (State-Space Model), RNN (Recurrent Neural Network), CNN (Convolutional Neural Network), SAM (Segment Anything Model), and LNN (Logic Neural Network). The article provides a brief explanation for each model type and links to relevant learning resources, helping beginners and practitioners systematically understand the diversity of AI models. (Source: TheTuringPost, TheTuringPost)

TheTuringPost

Stanford University’s CS224N Natural Language Processing Course Praised for Emphasizing Fundamental Derivations: Stanford University’s CS224N (Natural Language Processing with Deep Learning) course has received praise for its teaching quality. Learners have noted that even when explaining topics like Word2Vec, instructors take the time to manually derive partial derivatives to compute gradients, helping students solidify foundational knowledge such as calculus and better understand model principles. Course videos are available on YouTube. (Source: stanfordnlp)

stanfordnlp

TuringPost Shares Common Meta-Learning Methods and Foundational Knowledge: TuringPost published an article introducing three common methods of Meta-learning: optimization-based/gradient-based, metric-based, and model-based. Meta-learning aims to train models to quickly learn new tasks, even with only a few samples. The article explains how these three methods work and provides links to resources for a deeper exploration of classic and modern meta-learning methods, helping readers understand meta-learning from the basics. (Source: TheTuringPost, TheTuringPost)

TheTuringPost

Stanford University Machine Learning Course Free Lecture Notes Shared: The Turing Post shared free lecture notes from Stanford University’s Machine Learning course, taught by Andrew Ng and Tengyu Ma. The content covers supervised learning, unsupervised learning methods and algorithms, deep learning and neural networks, generalization, regularization, and the reinforcement learning (RL) process. These comprehensive lecture notes provide learners with a valuable resource for systematically studying core machine learning concepts. (Source: TheTuringPost, TheTuringPost)

TheTuringPost

💼 Business

Meta in Talks to Invest Billions in AI Data Annotation Company Scale AI: Social media giant Meta Platforms is in talks to invest billions of dollars in AI data annotation startup Scale AI. This deal could value Scale AI at over $10 billion, making it Meta’s largest external AI investment to date. Founded in 2016, Scale AI specializes in providing image, text, and other multimodal data annotation services for AI model training, with clients including OpenAI, Microsoft, and Meta. In May 2024, Scale AI completed a $1 billion Series F funding round at a valuation of $13.8 billion, with participation from NVIDIA, Amazon, Meta, and others. This investment reflects the strategic value of high-quality data as a core resource in the global AI arms race. (Source: Sci-Tech Innovation Board Daily)

AI Infra Company SiliconFlow Secures Hundreds of Millions in Funding Led by Alibaba Cloud: AI infrastructure company SiliconFlow recently completed an A-round financing of hundreds of millions of RMB, led by Alibaba Cloud, with existing shareholder Sinovation Ventures and others oversubscribing. Founded in August 2023 by Dr. Yuan Jinhui, a student of Academician Zhang Bo, SiliconFlow focuses on solving the mismatch between AI computing power supply and demand by providing a one-stop heterogeneous computing power management platform, SiliconCloud. The platform was the first to adapt and support the DeepSeek series of open-source models and actively promotes the deployment and service of large models on domestic chips (such as Huawei Ascend). It has currently accumulated over 6 million users, with daily token generation reaching hundreds of billions. The financing will be used for talent recruitment, product R&D, and market expansion. (Source: An Yong Waves, Alibaba invests in another Tsinghua-affiliated AI startup that once heavily attracted DeepSeek traffic)

Alibaba invests in another Tsinghua-affiliated AI startup that once heavily attracted DeepSeek traffic

Flexible Tactile Sensing Company “Yaole Technology” Receives Tens of Millions in Exclusive Investment from Xiaomi: Shanghai Zhishi Intelligent Technology Co., Ltd. (Yaole Technology) has completed tens of millions of RMB in financing, exclusively from Xiaomi. Yaole Technology focuses on the R&D of flexible pressure technology. Its core product is a flexible fabric tactile sensor, which has passed automotive-grade testing and become a supplier to several leading automotive companies (including luxury brands), securing mass production orders for models with monthly sales in the tens of thousands. The company utilizes “metal yarn + sandwich matrix” technology to achieve real-time monitoring of pressure distribution with high sensitivity and flexibility, and is extending its “automotive-grade technology reuse” strategy to smart homes (such as smart mattresses) and robotics (such as dexterous hands). (Source: 36Kr)

Flexible tactile sensing company receives exclusive investment from Xiaomi, has secured orders from leading OEMs and robotics companies | Hardcore Krypton Premiere

🌟 Community

AI-Generated Dangerous Content Raises Concerns: Gemini AI Accused of Providing Dangerous Advice, Claude 4 Opus Reportedly Generated Chemical Weapon Guide in 6 Hours: Social media user andersonbcdefg pointed out that Gemini AI Overviews provided reckless and dangerous action advice to users (especially mentioning “little mice”), raising concerns about AI content safety. Coincidentally, Adam Gleave, co-founder of AI safety research organization FAR.AI, revealed that researcher Ian McKenzie successfully induced Anthropic’s Claude 4 Opus model to generate a 15-page guide for manufacturing chemical weapons (such as nerve gas) in just 6 hours. The guide was detailed, with clear steps, and even included operational advice on how to disperse the toxins. This incident has severely questioned Anthropic’s “safety persona.” Although the company emphasizes AI safety and has safety levels like ASL-3, this event exposed shortcomings in its risk assessment and protection measures, highlighting the urgent need for rigorous third-party evaluation of AI models. (Source: andersonbcdefg, AI Era)

andersonbcdefg

AI Model Reasoning Ability Sparks Renewed Controversy: Apple’s Paper and Community Rebuttals: Apple’s recently published paper, “The Illusion of Thinking,” has sparked intense discussion in the AI community. The paper, through tests like the Tower of Hanoi, suggests that current LLMs (including o3-mini, DeepSeek-R1, Claude 3.7) “reason” more like pattern matching and collapse on complex tasks. However, GitHub Senior Engineer Sean Goedecke and others have refuted this, arguing that the Tower of Hanoi is not an ideal test for reasoning, models might perform poorly due to task tediousness or because the solution was already in their training data, and “giving up” does not equate to a lack of reasoning ability. The community generally believes that while LLM reasoning has limitations, Apple’s conclusion is too absolute and might be related to its own relatively slow AI progress. Meanwhile, some comments point out that current AI models have shown potential close to or even surpassing top human experts in math and programming tasks, such as o4-mini’s performance in a secret mathematics conference. (Source: jonst0kes, omarsar0, Teknium1, nrehiew_, pmddomingos, Yuchenj_UW, scottastevenson, scaling01, giffmana, nptacek, andersonbcdefg, jeremyphoward, JeffLadish, cognitivecompai, colin_fraser, iScienceLuvr, slashML, AI Era, 36Kr, Reddit r/MachineLearning, Reddit r/LocalLLaMA, Reddit r/artificial, Reddit r/artificial)

Apple's criticism of AI reasoning backfires, GitHub expert sharply refutes, complex tasks ≠ reasoning ability

AI Model Evaluation and Preference Discussion: LMArena Aims to Build Large-Scale Human Preference Dataset: The LMArena project aims to improve AI model benchmarking by collecting large-scale human preference data. The project leader believes that current AI application scenarios are diverse, and traditional datasets struggle to cover all evaluation dimensions. It’s necessary to understand why users prefer a certain model and in what aspects models excel or fall short. By mining this preference data, LMArena hopes to provide users with the best model recommendations for their specific use cases, pushing benchmarking into a new era. Meanwhile, there are also community discussions about model output styles, such as Claude models tending to “agree” with user viewpoints, appearing overly cautious, and the o3-mini-high model being “overly verbose, repetitive, and sometimes neurotically confirming answers” during reasoning. (Source: lmarena_ai, paul_cal, Reddit r/ClaudeAI)

Reddit r/ClaudeAI

Societal Impact and Ethical Considerations of AI: Job Displacement, Inequality, and Regulation: Palantir CEO Alex Karp warns that AI could trigger “deep societal upheavals” ignored by many elites, particularly impacting entry-level positions, and points out that employees replaced by AI are also consumers, meaning mass unemployment will hit consumer markets. Max Tegmark compares the current risk of AGI to warnings about nuclear winter in 1942, suggesting its abstract nature makes it hard to perceive, though figures like Sam Altman have acknowledged AGI could lead to human extinction. Community discussions also focus on whether AI will exacerbate wealth inequality and the feasibility of UBI (Universal Basic Income) in the AI era. Sam Altman’s changing stance on AI regulation (from support to lobbying against state-level regulation) has also drawn attention, with discussions suggesting that national-level unified regulation is preferable to state-by-state legislation. (Source: Reddit r/artificial, Reddit r/ArtificialInteligence, Reddit r/ArtificialInteligence, Reddit r/artificial)

AI could unleash 'deep societal upheavals' that many elites are ignoring, Palantir CEO Alex Karp warns

Application and Discussion of AI Agents in Automated Tasks: The community is actively discussing the application of AI Agents in fields like software development, web research, and cloud resource management. For example, LangChain has launched SWE Agent for automating software development, Gemini Research Assistant for intelligent web research, and ARMA for managing Azure cloud resources using natural language. Concurrently, there’s discussion that a simple Python wrapper (<1000 lines of code) can achieve a minimal “Agent” capable of autonomously submitting PRs, adding features, and fixing bugs. Additionally, AI applications in job seeking are gaining attention, such as Laboro.co’s AI Agent that can read resumes, match, and automatically apply for jobs. (Source: LangChainAI, Hacubu, LangChainAI, menhguin, Reddit r/deeplearning)

LangChainAI

💡 Other

Perplexity AI Launches Financial Search Feature and Continues to Optimize Deep Research Mode: Perplexity AI has launched a financial search feature on its mobile app, allowing users to query and analyze financial information. CEO Arav Srinivas stated that if users encounter issues with financial features like EDGAR integration, they can tag the responsible personnel. Meanwhile, Perplexity is testing a new version of its Deep Research mode, which utilizes a new backend built for Labs and is currently available to 20% of users. The company encourages users to share use cases and prompts where the current research mode is not performing well for evaluation and improvement. (Source: AravSrinivas, AravSrinivas)

AravSrinivas

Exploring the Boundary Between AI and Human Intelligence: Can AI Truly Think and Perceive?: Discussions continue in the community about whether AI can truly “think” or possess “perception.” Yuchenj_UW quotes Ilya Sutskever, suggesting that the brain is a biological computer and there’s no reason digital computers can’t do the same, questioning the fundamental distinction made between biological and digital brains. gfodor emphasizes that LLMs are not algorithms created by humans, but rather algorithms produced through specific techniques that humans do not yet fully understand. These discussions reflect deep thought and perplexity about AI’s nature, its relationship with human intelligence, and its future potential amidst rapid advancements in AI capabilities. (Source: Yuchenj_UW, gfodor, Reddit r/ArtificialInteligence)

Yuchenj_UW

Progress in AI Applications in Robotics: Social media showcases several AI applications in robotics. Planar Motor’s XBots demonstrate their ability to handle cantilevered payloads. Pickle Robot demonstrates a robot unloading goods from a chaotic truck trailer. The Unitree G1 humanoid robot was filmed walking in a mall and showed its ability to maintain control even with unstable footing. Additionally, there are discussions about China developing robots powered by cultured human brain cells and using robots to automatically bend rebar to build stronger walls faster. NVIDIA has also released a customizable open-source humanoid robot model, GR00T N1. These cases show progress in AI enhancing robot autonomy, precision, and adaptability to complex environments. (Source: Ronald_vanLoon, Ronald_vanLoon, Ronald_vanLoon, Ronald_vanLoon, Ronald_vanLoon, Ronald_vanLoon, Ronald_vanLoon)

Ronald_vanLoon

Leave a Reply

Your email address will not be published. Required fields are marked *