Berita AI - 2025-08-14(Edisi pagi)

Kata Kunci：Sistem Hukum AI, GPT-5, Matrix-3D Kunlun Wanwei, Pengobatan Kanker AI, Model Multimodal Besar, AI Pembuatan Video, Kecerdasan Berwujud, Masalah Halusinasi AI, Pembuatan Dunia 3D dari Satu Gambar, Model AI Sel Hidup, Penalaran Visual GLM-4.5V, Pembuatan Video Panorama 360°

🔥 Spotlight

AI Application in Legal Systems and the Controversy Over GPT-5’s Health Advice : The U.S. legal system is exploring AI applications, such as accelerating legal research, summarizing cases, and drafting routine orders, to alleviate case backlogs. However, AI hallucination issues have led to lawyers submitting fabricated cases and errors in expert testimonies. Meanwhile, OpenAI’s GPT-5 model, despite not meeting performance expectations, has begun explicitly advising users to use it for health consultations. This has sparked safety and ethical controversies regarding AI’s application in sensitive areas, suggesting that AI companies are venturing into more risky service domains. (Source: MIT Technology Review)

Kunlun Wanwei Matrix-3D: Single Image Generates Roamable 3D Worlds, Setting a New Industry Benchmark : Kunlun Wanwei has released Matrix-3D, a unified framework integrating panoramic video generation and 3D reconstruction. The model can generate 360° panoramic videos from a single image and directly reconstruct freely roamable 3D spaces, achieving SOTA results in panoramic video generation tasks. Its core advantages include global scene consistency, large-scale generation, high controllability, strong generalization ability, and fast generation speed. Technological breakthroughs include using panoramic data as an intermediate representation, mesh rendering to enhance geometric and color consistency, and optimizing 3DGS based on feed-forward networks to accelerate 3D generation, along with building a high-quality Matrix-Pano synthetic dataset. This marks a significant advancement for domestic AI in the field of “spatial intelligence”. (Source: 量子位)

AI Empowers Cancer Treatment: Tahoe Therapeutics Raises $30 Million to Build Live Cell AI Models : Startup Tahoe Therapeutics has secured $30 million in funding to build AI models of live cells, aiming to discover new methods for curing cancer. The company has developed scalable data generation methods and open-sourced the Tahoe-100M dataset, containing 100 million data points on cancer cell-molecule interactions. Its AI model has successfully developed a drug candidate for a major cancer subtype, which has entered pre-human trial research. Tahoe’s Mosaic platform efficiently integrates cell data from multiple sources, accelerating data production, with the goal of building a dataset containing over 1 billion single-cell data points, thereby boosting the efficiency of oncology research. (Source: 量子位)

🎯 Trends

OpenAI GPT-5 and Grok Model Updates and Performance Controversy : OpenAI’s GPT-5 model has recently received several updates, including new “Auto,” “Fast,” and “Thinking” modes for users to balance speed and reasoning depth, alongside improvements in API latency and caching efficiency. However, user opinions on GPT-5’s actual performance are divided, with some praising its capabilities in complex tasks and coding, while others complain about performance degradation and even question OpenAI’s pricing strategy and model differences across user tiers. Additionally, Grok has launched an automatic translation feature for the X platform, with some users claiming it is setting industry standards. (Source: Yuhu_ai_, sama, gdb, aidan_mclau, scaling01, scaling01)

Multimodal Large Models GLM-4.5V and LFM2-VL Released : Zhipu AI has released GLM-4.5V, hailed as the “best open-source visual reasoning model in the global 100B class” (106B total parameters, 12B active parameters), demonstrating outstanding performance across 41 benchmarks, especially achieving significant breakthroughs in visual reasoning. LiquidAI has also launched LFM2-VL, an efficient vision-language model available in 440M and 1.6B versions. It achieves native resolution processing via the SigLIP2 NaFlex encoder, boosting speed by up to 2x on GPUs while maintaining competitiveness. (Source: code_star, mervenoyann, clefourrier, Reddit r/ArtificialInteligence)

Video Generation AI Model Progress: Hailuo 2 Pro and Wan2.2 : MiniMax’s Hailuo 2 Pro has been rated by the community as the best audio-free video model, particularly excelling in image-to-video generation. Concurrently, Alibaba’s Wan2.2 model demonstrates the ability to generate realistic 360° rotating videos from a single image. Its strong instruction following and physical understanding enable complex visual generation with simple commands, earning it praise from users as a “terrifying child” and “perfect” video generation tool, further pushing the technological boundaries in video generation. (Source: Alibaba_Wan, lmarena_ai, Alibaba_Wan, lmarena_ai)

Embodied AI and Humanoid Robot Technology Breakthroughs : Progress continues in the robotics field, including a rope-climbing robot developed by the University of Illinois, China’s Robot Era company releasing the 5-foot-7-inch humanoid robot L7, 1x_tech launching the home humanoid robot NEO Beta, and Booster Robotics’ kung fu robot Booster T1. Furthermore, humanoid robots have for the first time achieved clothes folding solely through neural networks and new data, without architectural modifications, signaling an improvement in robot learning and generalization capabilities. These advancements collectively drive the potential for embodied AI applications in real-world tasks. (Source: Ronald_vanLoon, Ronald_vanLoon, Ronald_vanLoon, Ronald_vanLoon, adcock_brett)

AI Application Expansion in Finance : Perplexity Finance has expanded into the Indian market, offering comprehensive analysis of the Indian market and latest news, real-time stock prices for BSE and NSE, bull/bear market analysis for key issues, explanations of price fluctuations, and historical data downloads, with plans to introduce natural language stock screening and price alerts. Additionally, the qqWen project has open-sourced a full-stack fine-tuning model series (1.5B to 32B) for the niche financial programming language Q, outperforming GPT-4.1 and Claude Opus-4 in Q benchmarks, demonstrating AI’s strong potential in vertical financial sectors. (Source: AravSrinivas, AravSrinivas, Dorialexander, HuggingFace Daily Papers)

AI Model Progress in Gaming and Simulation Environments : DeepMind’s Genie 3 showcased a real-time interactive world model, and while not open-source, Skywork’s Matrix-Game 2.0, as the first open-source, real-time, long-sequence interactive world model, supports minutes of interaction at 25FPS, changing the game. Furthermore, TextQuests benchmarks indicate that AI currently cannot complete long video games without clues, but its capabilities are rapidly improving. These advancements suggest that AI’s understanding and interaction capabilities in complex simulation and gaming environments are progressively strengthening. (Source: QuixiAI, tokenbender, lmthang)

Significant ChatGPT User Growth, Perplexity Aims to Acquire Chrome : As of July 2025, ChatGPT’s monthly active users have grown by 134.90% year-over-year, making it one of the fastest-growing websites globally and ranking fifth in total visits. Concurrently, AI startup Perplexity has made a staggering $34.5 billion offer to acquire Google’s Chrome browser, a move that highlights AI companies’ growing ambition and competitive stance in internet entry points and data traffic. (Source: BorisMPower, Reddit r/ArtificialInteligence)

🧰 Tools

DocStrange: Structured Data Extraction Tool for Images/PDFs/Documents : DocStrange is an open-source library that has launched a free web application, supporting structured data extraction from PDFs, images, and documents, outputting to Markdown, CSV, JSON, or specific field formats. This tool excels in processing document data, particularly suitable for scenarios requiring clear, actionable information from unstructured documents, such as court case analysis. Users can upload large volumes of files for processing, and data download is supported. (Source: Reddit r/LocalLLaMA)

Runway Aleph: Precise Video Content Replacement and Reconstruction : Runway Aleph is an advanced video editing tool that supports precise replacement, re-texturing, or complete reimagining of specific parts of a video. Users can quickly conceptualize and iterate new ideas through text prompts, applying them to existing footage. This feature significantly simplifies the video post-production process, enhances creative efficiency, and makes video content creation more flexible and controllable. (Source: c_valenzuelab)

WebWatcher: Multimodal Deep Research AI Agent : WebWatcher is a groundbreaking multimodal deep research agent designed to address the issue of existing research primarily focusing on textual information while neglecting visual information. It leverages high-quality synthetic multimodal trajectories for efficient cold-start training and employs various tools for deep reasoning, further enhancing generalization capabilities through reinforcement learning. WebWatcher significantly outperforms proprietary baselines and open-source agents across four challenging VQA benchmarks, paving the way for solving complex cross-modal information retrieval tasks. (Source: HuggingFace Daily Papers, _akhaliq)

AI Avatar: Full-Body Motion and Emotion Matching : SynthesiaIO has launched a new AI Avatar feature, enabling AI characters to match full-body movements with script content and tone. These AI Avatars can understand text and simultaneously generate natural body language and gestures, creating more expressive and engaging video content. This advancement makes AI-generated videos more realistic and captivating, promising new applications in content creation, education, and marketing. (Source: synthesiaIO)

Qwen Chat Deep Research: Supports Image and File Input : Alibaba Cloud’s Qwen Chat Deep Research now supports image and file input, significantly expanding its deep research capabilities. Users can upload images and documents for the model to analyze and extract information; for instance, one user successfully used this feature to fix an air conditioner malfunction. This update enhances the model’s utility in processing multimodal information, enabling it to better assist users in solving real-world problems. (Source: Alibaba_Qwen)

📚 Learning

IJCAI-25 International Joint Conference on Artificial Intelligence Preview : The 2025 International Joint Conference on Artificial Intelligence (IJCAI-25) will be held in August in two locations: Montreal, Canada, and Guangzhou, China. The conference will feature keynote speeches, tutorials, workshops, and competitions, with four special tracks: AI for Social Good, AI and Art, Human-Centered AI, and AI-Powered Key Technologies. This conference has invited several renowned scholars for keynote speeches and offers a wealth of tutorials and workshops covering cutting-edge areas such as LLM training, Agent evaluation, RAG, neuroevolution, fairness, computational pathology, and multimodal LLMs, providing a valuable learning and exchange platform for AI researchers and developers. (Source: aihub.org)

New Progress in LLM Evaluation and Optimization : GEPA (Reflective Prompt Evolution can Outperform Reinforcement Learning) proposes a method to optimize LLM performance through reflective prompt evolution, marking a significant step in automated prompt optimization. Concurrently, research on Curriculum Learning for Efficient Reasoning shows that by progressively tightening token budgets, LLMs can discover more effective solutions and distill them into more concise reasoning traces, significantly improving accuracy and token efficiency. These studies offer new insights into LLM evaluation, optimization, and efficient reasoning. (Source: davisblalock, EthanJPerez, Reddit r/deeplearning, HuggingFace Daily Papers)

AI Learning Resources and Practical Experience Sharing : The community has shared various AI learning resources and practical experiences, including: 6 must-read articles on GPT-5 and GPT-OSS, covering model advancements, user experience, and architectural analysis; a weekly list of the latest AI/ML research papers, touching upon cutting-edge areas like social intelligence, agent training, and reinforcement learning; and a tutorial on building multi-head attention mechanisms using Excel, aiding in a deeper understanding of the Transformer architecture. These resources provide a comprehensive learning path from theory to practice for AI enthusiasts and practitioners. (Source: TheTuringPost, TheTuringPost, ProfTomYeh)

LLM Fine-tuning and Model Merging Techniques : A technical report details full-stack fine-tuning methods for the niche financial programming language Q, including pre-training, SFT, and RL, providing a blueprint for LLM adaptability in vertical domains. Furthermore, model merging techniques have seen significant progress in the past year, demonstrating how combining different models can enhance performance and efficiency. These techniques offer developers new avenues for optimizing LLMs on specific tasks, especially crucial in scenarios with scarce data or strong domain specificity. (Source: maximelabonne, HuggingFace Daily Papers)

LLM Generation Layer Architecture and Retrieval-Augmented Generation (RAG) Course : Together Compute, in collaboration with Andrew Ng, has launched a RAG course that delves into architectural patterns for LLM generation layers in production systems, emphasizing how to effectively build generation layers to optimize RAG performance. The course aims to help developers understand and implement LLM generation mechanisms in practical applications, ensuring the quality and efficiency of model outputs, and is highly relevant for engineers looking to achieve high-quality content generation in RAG applications. (Source: togethercompute)

Discussions on AI Ethics and Its Societal Impact : The community has engaged in extensive discussions on the potential impact of AI on jobs, personal privacy, and mental health. Some worry that tools like AI lawyers will replace human jobs, but the general consensus is that AI is more likely to enhance efficiency rather than fully replace roles, and will create new positions. Regarding AI companions and human-AI emotional connections, discussions point out that the brain’s recognition of emotional patterns does not depend on the “author’s” identity, but emphasize that AI currently lacks a physical body and genuine subjective experience. Furthermore, cases of “AI psychosis” have raised concerns about AI-induced delusions, and heated debates about whether AI should manage economic administrative structures, highlighting the deep socio-ethical challenges in AI development. (Source: Reddit r/ArtificialInteligence, Reddit r/ArtificialInteligence, Reddit r/ClaudeAI, Reddit r/ArtificialInteligence, Reddit r/artificial)

💼 Business

Chinese Companies Halt NVIDIA H20 Chip Purchases Amid US-China Chip Rivalry : The Chinese government has urged tech companies to halt purchases of NVIDIA H20 chips, citing security concerns, which strikes a blow to NVIDIA’s agreement with the U.S. government. Chinese officials are concerned that the U.S. might embed “backdoors” in the chips. This move reflects the ongoing technological and geopolitical rivalry between the U.S. and China in the AI chip sector, as well as China’s determination to promote domestic alternatives, further intensifying uncertainties in the global semiconductor supply chain. (Source: jeremyphoward, MIT Technology Review)

Zhipu AI Faces Large Model Elimination Race Challenge, Accelerates IPO Process : Zhipu AI, as a top-tier domestic large model company, has seen its update pace slow down and market share diluted following the rise of competitors like DeepSeek. Although its GLM-4.5 model performs excellently in reasoning, coding, and agent capabilities, and has achieved cost breakthroughs (API call price as low as 0.8 RMB per million tokens), high R&D investment has led to continuous losses. To alleviate cash flow pressure and seize market dividends, Zhipu AI has initiated A-share and Hong Kong IPO processes, with a valuation exceeding 40 billion RMB, seeking to maintain its leading position and achieve commercialization amidst fierce competition. (Source: 36氪)

OpenAI Partners with Commonwealth Bank of Australia, Anthropic Acquires Humanloop : OpenAI has partnered with Commonwealth Bank, Australia’s largest bank, to jointly explore advanced generative AI solutions. Additionally, Anthropic has announced the acquisition of the Humanloop team, aiming to accelerate the safe application of AI. These partnerships and acquisitions indicate that AI giants are actively integrating with traditional industries and innovative teams, driving the deep application and commercialization of AI technology in sectors such as finance and security. (Source: gdb, swyx, RazRazcle)

🌟 Community

Musk and Altman’s AI Spat Escalates: Grok and ChatGPT Take Sides Controversy : Musk accused Apple’s App Store of favoring OpenAI, while Altman retaliated by claiming Musk manipulated X platform algorithms. Subsequently, Musk’s AI assistant Grok unexpectedly “sided” with Altman, stating that Musk’s accusations were unfounded and that he had a history of manipulating algorithms. Musk, in turn, posted a screenshot of ChatGPT 5 Pro “siding” with him, turning the debate into a satirical drama of AI tools “taking sides”. This not only exposed potential biases in AI systems on subjective issues but also sparked deeper discussions on AI ethics and platform control. (Source: 36氪, 36氪)

AI Hallucinations and Information Pollution: Intensifying Internet Trust Crisis : The issue of AI hallucinations is becoming increasingly prominent, leading to the rapid spread of false information through a closed loop of AI generation, media amplification, and AI regurgitation. For instance, DeepSeek’s “apology statement” and “court judgment” were cited as true by the media. This phenomenon of “feeding AI garbage” has led to the “industrialized” pollution of internet information, with users’ excessive trust in AI and tech worship exacerbating the problem. Commentary suggests that AI hallucination is an inherent characteristic, and the key lies in management rather than elimination; simultaneously, the role of humans as “gatekeepers” is also challenged, requiring vigilance against the erosion of social trust by the mass production of false information. (Source: 36氪)

Societal Discussions on AI’s Impact on Human Work and Life : The community has engaged in extensive discussions on the potential impact of AI on jobs, personal privacy, and mental health. Some worry that tools like AI lawyers will replace human jobs, but the general consensus is that AI is more likely to enhance efficiency rather than fully replace roles, and will create new positions. Regarding AI companions and human-AI emotional connections, discussions point out that the brain’s recognition of emotional patterns does not depend on the “author’s” identity, but emphasize that AI currently lacks a physical body and genuine subjective experience. Furthermore, cases of “AI psychosis” have raised concerns about AI-induced delusions, and heated debates about whether AI should manage economic administrative structures, highlighting the deep socio-ethical challenges in AI development. (Source: Reddit r/ArtificialInteligence, Reddit r/ArtificialInteligence, Reddit r/ClaudeAI, Reddit r/ArtificialInteligence, Reddit r/artificial)

ChatGPT Pricing, Performance, and User Loyalty Controversies : ChatGPT Plus’s $20 monthly fee has become a benchmark for AI product pricing, though its pricing process was actually rushed, quickly determined via a Discord community survey. However, after the release of GPT-5, some users complained about performance degradation, even deeming it inferior to GPT-4o, sparking discussions of “broken user trust” and calls for GPT-4o’s return. Concurrently, some users express concern about over-reliance on specific AI models (such as Claude Sonnet 3.5), fearing that their disappearance could impact livelihoods, reflecting user anxieties about product stability under the cloud service model. (Source: Reddit r/ChatGPT, Reddit r/ClaudeAI, dotey, TheTuringPost)

GPT-OSS Model Performance and Vendor Discrepancy Controversy : OpenAI’s GPT-OSS-120B was advertised as the most intelligent model capable of running at native precision on H100, but its performance obtained through API providers like Microsoft and Amazon in benchmarks such as GPQA Diamond and AIME25 was significantly lower than OpenAI’s official data, leading to strong accusations of “performance fraud” from users. Concurrently, the base model of GPT-OSS-20B was successfully extracted, and its “alignment” to safety instructions was found to be easily reversible, allowing it to answer sensitive questions, raising concerns about model safety and the effectiveness of “alignment”. (Source: Reddit r/LocalLLaMA, nrehiew_, Reddit r/LocalLLaMA, imjaredz, jpt401)

💡 Other

Portable Local AI Server ‘SERVE-AI-VAL Box’ : A developer has built a portable local AI server named “SERVE-AI-VAL Box,” capable of operating offline and off-grid, powered by solar and hand-cranked generation, at a cost under $300. The device is equipped with the Gemma3:4b model, supporting camera, microphone, speaker, and touchscreen input, designed to provide medical or survival knowledge in emergency situations, showcasing the potential of local AI in extreme environments. (Source: Reddit r/LocalLLaMA)

Surya: Multilingual OCR and Document Analysis Toolkit : Surya is a document OCR toolkit offering OCR for over 90 languages, line-level text detection, layout analysis (tables, images, headings, etc.), reading order detection, table recognition, and LaTeX OCR. It outperforms cloud services in OCR performance and supports various document types. The toolkit is written in Python, provides an interactive application and Python interface, and supports GPU acceleration, offering an efficient and comprehensive solution for processing document data. (Source: GitHub Trending)

Alibaba AI Try-on App ‘Lookie’ Launched: Generates Personal Digital Avatars and Virtual Try-ons : Alibaba has launched its standalone AI try-on app “Lookie,” allowing users to upload photos to generate personal digital avatars and virtually try on various clothing styles in a short time. The app utilizes Alibaba’s Wanxiang image generation and text generation algorithms, aiming to build an interactive platform integrating apparel brand display and virtual try-on shopping. Users can share try-on photos to get styling suggestions, while merchants can accurately capture fashion trends. Although challenges remain in simulating fabric dynamic effects, it is expected to redefine the online try-on experience and integrate with e-commerce. (Source: 36氪)

🔥 Spotlight

🎯 Trends

🧰 Tools

📚 Learning

💼 Business

🌟 Community

💡 Other

Tag Terkait

Related Posts

Berita AI – 2025-10-30(Edisi pagi)

Berita AI – 2025-10-29(Edisi pagi)

Berita AI – 2025-10-28(Edisi pagi)