Berita AI - 2025-05-14(Edisi pagi)

Kata Kunci：AI Kesehatan, Model Bahasa, Pembelajaran Penguatan, Penalaran AI, Pengujian Tolok Ukur AI, Alat AI, Bisnis AI, Etika AI, OpenAI HealthBench, Meta Fisika Model Bahasa, Mesin Inferensi FlashInfer, Pembuatan Dunia Virtual Matrix-Game, Pelatihan Terdistribusi INTELLECT-2

🔥 Spotlight

OpenAI HealthBench benchmark released, AI medical capabilities significantly improved: OpenAI released HealthBench, a medical AI evaluation benchmark built in collaboration with 262 doctors globally. Tests show that the latest AI models (such as o3, GPT-4.1) perform comparably to the best level with AI assistance in medical dialogue scenarios, far exceeding independent doctors (about 4 times). Performance of smaller models also improved. This signifies AI’s huge potential in the healthcare field, and the evaluation system aims to promote the safe and effective application of AI in clinical practice. (Source: Reddit r/ArtificialInteligence, BorisMPower, clefourrier)

Meta Physics of Language Models Part 4 released: Meta AI Research released the fourth part of the “Physics of Language Models” research series. Through controlled synthetic pre-training environments, they discovered a lightweight component called “Canon layers” which, by adding “horizontal residual connections” between tokens, can significantly improve the reasoning and generalization capabilities of various architectural models like Transformer, Mamba, and GLA. (Source: AIatMeta, arohan)

FlashInfer wins MLSys 2025 Best Paper and receives NVIDIA support: FlashInfer, an efficient and customizable attention engine technology focused on LLM inference services, won the MLSys 2025 Best Paper award. NVIDIA announced support for the project and will integrate top LLM inference kernels like TensorRT-LLM into FlashInfer for use by vLLM, SGLang, etc., aiming to improve LLM inference efficiency and scalability. (Source: vllm_project, _philschmid)

Kunlun Wanwei releases Matrix-Game interactive world generation engine: Kunlun Wanwei launched Matrix-Game, an interactive engine capable of generating and controlling virtual worlds via text commands. It supports generating various scenes like deserts and forests, and enables smooth action control (forward, jump, attack) and 360° perspective switching. This technology is expected to accelerate game development, embodied AI training, and metaverse content production. (Source: WeChat)

Prime Intellect releases INTELLECT-2 distributed RL training model: Prime Intellect released INTELLECT-2, claiming it is the first model trained using distributed reinforcement learning by integrating global idle computing resources, with performance comparable to DeepSeek-R1. The project aims to reduce RL training costs, break dependence on centralized computing power, and has received investment from notable figures like Karpathy and Tri Dao. Its core components (PRIME-RL, SHARDCAST, TOPLOC, Protocol Testnet) are open-sourced. (Source: 36氪)

Reinforcement learning pioneers Andrew Barto and Richard Sutton awarded Turing Award: Andrew Barto and Richard Sutton were awarded the Turing Award for their foundational contributions to reinforcement learning, including temporal difference learning. Their work has had a profound impact on AI and is reflected in projects like AlphaGo. The duo plans to use part of the prize money to support young scientists’ research freedom and establish graduate scholarships. (Source: WeChat)

New Pope named partly after AI revolution, AI Czar predicts million-fold AI growth in four years: Newly elected Pope Leo XIV stated that his naming was partly in response to the challenges posed by the “new industrial revolution” brought by AI to human dignity, justice, and labor, indicating the Church’s concern for AI ethics. David Sacks, the first US “AI and Cryptocurrency Czar,” predicted that due to exponential advancements in models, chips, and computing power, AI capabilities will increase a million-fold in four years, emphasizing the importance of understanding exponential growth and its disruptive impact. (Source: WeChat)

🎯 Trends

Alibaba Qwen3 Technical Report reveals training details: Alibaba Cloud released the Qwen3 technical report, detailing its training process on 36 trillion tokens, including large-scale data investment for smaller models and multi-stage post-training (e.g., CoT, RL). The model performed well on benchmarks like MathArena, but community discussions also pointed out bugs in its chat template and performance on non-reasoning tasks being inferior to Mistral Medium 3. (Source: cognitivecompai, rishdotblog, Dorialexander, teortaxesTex, qtnx_, nrehiew_, Reddit r/LocalLLaMA)

US Congress considers pausing state-level AI regulation for a decade: A draft text from the US House Commerce Committee includes a proposal suggesting a ten-year pause on state-level AI regulation to prevent complex state laws from hindering AI innovation. This move has received support from some state officials who believe AI regulation should occur at the federal level. (Source: ylecun, pmddomingos, jd_pressman, Reddit r/artificial)

Coding assistants evolving towards “always-on” agents: Coding assistants are shifting from being pair programmers requiring significant prompting and human assistance to “always-on” agents that continuously search for bugs and vulnerabilities in the background. (Source: steph_palazzolo)

New concepts emerging in the AI field: Several new concepts have appeared in AI research, including SakanaAI’s “Continuous Thought Machines” (emphasizing the time element), Salesforce’s “Elastic Reasoning” (separating thinking and solving stages), Alibaba’s “ZeroSearch” (using LLMs as simulated search engines), and Tsinghua University’s “Absolute Zero” (learning entirely through self-play). (Source: TheTuringPost)

Kuaishou Kling 2.0 video model tops charts: Kuaishou’s Kling 2.0 surpassed Veo 2 and Runway Gen 4 on Artificial Analysis’s video generation leaderboard, becoming the leading image-to-video model. Community users have acknowledged its performance. (Source: scaling01)

OpenAI GPT-4.1 leads Claude 3.5 Sonnet in user preference tests: User preference tests show that OpenAI’s GPT-4.1 (and even 4.1-mini) leads Claude 3.5 Sonnet in user experience. (Source: imjaredz)

AMD and NVIDIA intensify competition in AI software development: Activity on GitHub shows that the number of Pull Requests submitted by AMD’s ROCm PyTorch team is catching up to NVIDIA’s PyTorch technical lead, indicating increasing competition in the underlying AI hardware and software development space. (Source: zacharynado)

Anthropic’s new model “claude-neptune” undergoing safety testing: Reports indicate that Anthropic is conducting safety tests on its new model “claude-neptune,” suggesting a potential upcoming release. (Source: scaling01)

Gemini 2.5 Pro free API access paused due to high demand: Due to immense demand, Google has temporarily paused free tier API access to Gemini 2.5 Pro to ensure developers can continue scaling their applications. The model remains free to use in Google AI Studio. (Source: matvelloso)

Firefox exploring llama.cpp integration in WASM: Firefox is experimenting on GitHub with integrating the llama.cpp library into WebAssembly (WASM), which could mean users might be able to run local LLMs directly in the browser in the future. (Source: ClementDelangue, ggerganov)

AMD Ryzen AI Max+ PRO 395 LLM benchmark tests: LLM benchmark tests on the AMD Ryzen AI Max+ PRO 395 on Linux show its performance appears to be below the RTX 4060 Ti. Community discussion points out that the test might only reflect CPU performance and discusses its iGPU performance, VRAM advantages, and current compatibility issues with Intel GPUs regarding FP8, Flash Attention, and memory allocation. (Source: Reddit r/LocalLLaMA)

🧰 Tools

Minions Secure Chat open-source protocol released for encrypted cloud LLM chat: An open-source protocol called “Minions Secure Chat” has been released, aiming to enable end-to-end encrypted cloud LLM chat with minimal latency overhead (<1%), even for models with 30B+ parameters. The protocol ensures cloud service providers cannot view message content, with inference occurring in secure GPU enclaves to guarantee confidentiality. (Source: realDanFu, ollama, rebeccatqian, code_star)

DSPy enables recursive summarization of arbitrarily long texts: A program built using DSPy was demonstrated, capable of recursively summarizing texts of arbitrary length. The program achieves this by building a directory, chunking content, and processing parts in parallel, providing a general solution for handling long documents. (Source: lateinteraction)

Runway AI video generation adds cinematic controls and reference features: Runway introduced new features in its Gen-4 video generation model, including over 20 cinematic camera controls, multi-element referencing and blending, and smoother handling of complex motion. Enhanced reference capabilities also improve the precision of object placement. (Source: c_valenzuelab, TomLikesRobots)

OpenMemory MCP launched, providing local private memory for AI agents: OpenMemory MCP has been released, a private, local, persistent memory layer designed for MCP-compatible AI clients (such as Cursor, Claude Desktop). It allows different AI tools to securely and privately read and write shared memory, running entirely on the user’s machine without relying on cloud services. (Source: omarsar0)

HeyGen launches Voice Mirroring feature: HeyGen released its Voice Mirroring feature, allowing users to replicate specific voice styles or characteristics in AI-generated audio. (Source: Ronald_vanLoon)

Step1X-3D open-source framework released for controllable 3D asset generation: StepFun AI released Step1X-3D on Hugging Face, an open-source framework for generating high-fidelity, controllable 3D assets with textures. (Source: huggingface, _akhaliq, reach_vb)

Hugging Face Whisper transcription speed improved: Hugging Face launched a Whisper transcription endpoint based on vLLM and optimized for NVIDIA GPUs, achieving up to 8x speed improvement and offering better performance at lower cost. (Source: ClementDelangue, huggingface, vllm_project)

LlamaIndex Memory API updated, supports long-term and short-term memory fusion: LlamaIndex updated its Memory API to be more flexible, fusing short-term chat history and long-term memory through pluggable modules (Static, Fact Extraction, Vector Memory). (Source: jerryjliu0)

NVIDIA releases CUTLASS 4.0, supporting native Python GPU programming: NVIDIA released CUTLASS 4.0, a library supporting native Python GPU programming. This update aims to accelerate kernel development and explore new ideas in ML and GPU programming. (Source: marksaroufim, tri_dao)

WeClone open-source project creates digital clones from chat history: WeClone, a popular open-source project on GitHub, provides a solution for creating digital clones from WeChat chat records. It fine-tunes large language models to capture personal conversation styles and binds them to chat bots on platforms like WeChat, QQ, and Telegram, also including privacy filtering features. (Source: GitHub Trending)

Google Maps Scraper open-source tool for scraping map data: A popular open-source tool on GitHub for scraping Google Maps listing data. It offers command-line, Web UI, and REST API interfaces, capable of extracting business names, addresses, contact info, ratings, reviews, etc., and supports email extraction and a “fast mode.” (Source: GitHub Trending)

OpenWebUI users report multiple technical issues: OpenWebUI users have reported several technical issues, including Modelfile parameters (like num_ctx) being ignored causing crashes, inability to access the UI on the local network after updates, inability to use the built-in OpenAI web search with specific models, and timeout issues with old chat sessions. (Source: Reddit r/OpenWebUI)

Railway surface inspection robot: A multi-functional robot named RailScan was mentioned, used for railway surface inspection work, an example of AI and robotics in industrial applications. (Source: Ronald_vanLoon)

3D printing construction robots: 3D printing technology is being combined with robotics for construction, such as 3D printed construction, representing advancements in automated building with robotics and AI. (Source: Ronald_vanLoon)

Embodied AI robots: Autonomous, AI-driven robots capable of seamlessly navigating complex environments and executing tasks with precision were mentioned, showcasing the potential of embodied AI and robotics in real-world applications. (Source: Ronald_vanLoon)

Bio-inspired robots: Research was mentioned about mushrooms being given robot bodies and learning to climb, demonstrating how biological inspiration can drive advancements in robotics. (Source: Ronald_vanLoon)

📚 Learning

Collection of AI learning resources: The community shared various AI learning resources, including positive feedback on @dair_ai resources, online masterclasses and book workshops on AI evaluation, video guides for inferring LLMs, explanations of the difference between Agentic AI and regular AI, a free RLHF book, a data analysis course module on data processing and using GenAI for debugging, an event on AI code intelligence, and an infographic explaining how LLMs work. (Source: dair_ai, HamelHusain, omarsar0, bobvanluijt, natolambert, DeepLearningAI, l2k, Ronald_vanLoon, Reddit r/deeplearning, Reddit r/artificial)

LangChain Interrupt event and workshops: LangChain hosted the Interrupt event, including workshops on building reliable AI agents. Content covered designing agent workflows using LangGraph, human-in-the-loop collaboration, and leveraging LangSmith for observability and evaluation. Cisco showcased their text-to-SQL agent built using LangGraph and LangSmith. (Source: LangChainAI, hwchase17)

RL and Video Games Workshop announcement: The RLC 2025 conference will host a Reinforcement Learning and Video Games Workshop, calling for papers on game-related RL topics such as complex environments, multi-agent scenarios, and content generation, and announced confirmed speakers. (Source: Reddit r/MachineLearning)

mlabonne/llm-course GitHub repository provides comprehensive LLM learning roadmap: A popular GitHub repository, mlabonne/llm-course, offers a comprehensive LLM learning course and roadmap, covering fundamentals, LLM science (fine-tuning, quantization, evaluation), and LLM engineering (running, RAG, deployment, security), and includes related code notes and references. (Source: GitHub Trending)

Qwen3 Base GRPO advanced notebook released: A new advanced GRPO (Generalized Policy Optimization) notebook has been released, specifically for the Qwen3 Base model. Content covers how to fine-tune the model to enhance reasoning capabilities, proximity scoring, GRPO templates, the OpenR1 dataset, and optimizing the RL process through pre-fine-tuning. (Source: danielhanchen)

TRL library integrates GRPO stabilization trick: A new GRPO stabilization trick developed by Prime Intellect has been integrated into the popular Transformer Reinforcement Learning (TRL) library. It can be used by installing the latest version and aims to improve the stability of GRPO training. (Source: ClementDelangue)

💼 Business

Perplexity AI nearing completion of $500M funding round, valued at $14B: AI search startup Perplexity AI is reportedly close to completing a $500 million funding round led by Accel, which would value the company at $14 billion. This shows strong capital support for Perplexity despite competition from Google and OpenAI. (Source: TheRundownAI, Reddit r/ClaudeAI, 36氪)

NVIDIA partners with Saudi Arabia to build AI factory: NVIDIA announced a partnership with HUMAIN, the AI subsidiary of Saudi Arabia’s Public Investment Fund, planning to build an “AI factory” in Saudi Arabia. NVIDIA will provide infrastructure and expertise to help Saudi Arabia become a global AI leader in AI. (Source: nvidia)

WizardLM team leaves Microsoft to join Tencent Hunyuan: The WizardLM team, including its head Can Xu, has left Microsoft and joined Tencent Hunyuan. Tencent Hunyuan-Turbos model previously ranked high (8th) on leaderboards, and this talent migration has sparked discussion about talent competition among major AI labs. (Source: andrew_n_carr, cognitivecompai, teortaxesTex, Sentdex, WizardLM_AI, madiator)

Johnson & Johnson widely applies Generative AI in pharmaceutical business: After conducting around 900 internal experiments, Johnson & Johnson has expanded the application of its Generative AI across multiple stages of its pharmaceutical business, including accelerating drug discovery, predicting supply chain risks, streamlining clinical trials, and supporting sales and employee services. (Source: DeepLearningAI)

Somite AI raises funding to build foundation model for human cells: Somite AI is building a foundation model for human cells called “DeltaStem” and developing technology to generate cell signaling data faster. The company has raised $5.9 million in funding. (Source: saranormous, finbarrtimbers)

🌟 Community

Users express dissatisfaction with declining AI model quality and Sycophancy: Many users expressed frustration over the declining quality of current AI models, particularly ChatGPT, which is accused of becoming “sycophantic” (overly positive/flattering), lazy, and prone to increased hallucinations. Some users are considering canceling subscriptions as a result, while others discuss whether custom instructions are effective or if the dissatisfaction on social media is exaggerated. (Source: Reddit r/ArtificialInteligence, Reddit r/ChatGPT)

AI Ethics and Responsibility Discussion: Who is responsible for AI decision errors?: The community widely discussed who should bear responsibility when AI makes errors due to autonomous decisions. Perspectives include the company owning the AI being responsible (similar to parents for children or drivers for autonomous vehicles), the possibility of AI itself being responsible in the future, the need for human oversight, and companies profiting from AI being responsible. (Source: Reddit r/ArtificialInteligence)

Impact of AI on Education and Employment: Teacher use of AI for grading sparks controversy: Discussion about teachers using AI to grade student assignments sparked controversy, with some worrying it devalues students or signals potential obsolescence. Counterarguments suggest AI is merely a tool that can provide timely feedback and that the purpose of exams is diverse. The community also discussed the broader impact of AI on employment and specific work tasks users wish AI would fully take over. (Source: Reddit r/ArtificialInteligence, Reddit r/ArtificialInteligence)

LLM reliability concerns: Poor performance handling specific data sources: Users expressed disappointment with LLMs’ performance when processing specific, fragmented data sources (like legal documents), where the output sounds authoritative but is factually inaccurate or vague. While LLMs perform well in general summarization or coding, their reliability for tasks requiring precise single-pass data processing is questioned. (Source: Reddit r/artificial)

AI Hardware Geopolitics: US Senator proposes requiring geotracking in high-end GPUs: A US Senator proposal requires built-in geotracking in high-end GPUs (like the RTX 4090) to prevent their use by foreign governments. This sparked community concerns about government overreach, potential remote disablement features, and hardware DRM. (Source: Reddit r/LocalLLaMA)

Young people using ChatGPT to assist with life decisions: Sam Altman noted that younger generations are increasingly using ChatGPT to assist in making life decisions. Some view this positively (seeking advice when human resources are insufficient), but others worry about reliance on potentially unreliable LLMs for critical choices. (Source: Reddit r/ChatGPT)

AI industry perception and strategy discussion: Community discussions covered perspectives on why Meta is perceived as lagging behind other major AI labs, the trade-offs between fine-tuning small models and prompt engineering, AI company secrecy, and the view of “search” as a core moat for AI agents. (Source: Reddit r/MachineLearning, cto_junior, madiator, Dorialexander)

💡 Other

China releases fourth-generation quantum control system: China released its fourth-generation quantum control system supporting over 500 qubits, representing the latest advancement in quantum computing technology. (Source: Ronald_vanLoon)

AI application in defense: China uses DeepSeek to develop stealth fighters: Reports indicate that China is using DeepSeek AI technology to assist in the development of its sixth-generation stealth fighters (J-35, J-50). (Source: Ronald_vanLoon)

METACOG-25 project introduction video released: The METACOG-25 project released an introduction video, hinting at new developments in AI research or development. (Source: Reddit r/deeplearning)

Hugging Face platform updates: Collections within collections and official PyTorch account: The Hugging Face Hub introduced “Collections within Collections,” allowing for more granular organization of resources. Additionally, PyTorch now has an official account on the platform. (Source: ClementDelangue, Reddit r/LocalLLaMA)

🔥 Spotlight

🎯 Trends

🧰 Tools

📚 Learning

💼 Business

🌟 Community

💡 Other

Tag Terkait

Related Posts

Berita AI – 2025-08-14(Edisi pagi)

Berita AI – 2025-08-12(Edisi malam)

Berita AI – 2025-08-12(Edisi pagi)