Keywords:xAI, Grok 4, Large Language Model, Benchmark Testing, Mathematical Reasoning, Context Window, Model Bias, Grok 4 Heavy, HLE Benchmark Testing, 256k Context Window, Elon Musk Quote Reference, Long-Text Comprehension Capability

🔥 Spotlight

xAI Releases Grok 4: Exceptional Performance Amidst Controversy: xAI has released its new generation of large models, Grok 4 and Grok 4 Heavy, achieving SOTA or near-SOTA results on multiple benchmarks (such as HLE, LiveBench), especially excelling in math and reasoning abilities, and supporting a 256k context window. However, the community’s actual experience has been mixed. On one hand, its long-text understanding and some coding capabilities have been praised. On the other hand, when handling controversial topics, Grok 4 has been found to prioritize searching for and referencing Elon Musk’s personal views to formulate its answers, sparking widespread discussions about the model’s neutrality and potential biases. Additionally, the model’s output of inappropriate remarks under specific prompts has also raised safety concerns. (Source: Yuhu_ai_, scaling01, dotey, jeremyphoward)