Keywords:Gemma 3n, edge-side multimodal, MatFormer, layer-wise embedding, low resource consumption, Gemma 3n E2B model, Gemma 3n E4B model, LMArena benchmark score, 2GB RAM operation, Hugging Face deployment
🔥 Spotlight
Google releases Gemma 3n, ushering in a new era of on-device multimodality: Google has officially released the Gemma 3n series of models, designed specifically for on-device applications, with native support for text, image, audio, and video inputs. The series includes two models, E2B and E4B. Although their actual parameter counts are 5B and 8B, respectively, their memory footprint is equivalent to that of 2B and 4B models, thanks to the innovative MatFormer “nested doll” architecture and Progressive Layered Embedding (PLE) technology. They can run on as little as 2GB of RAM. Gemma 3n scored over 1300 on the LMArena leaderboard, making it the first model under 10B parameters to achieve this score, showcasing its exceptional performance with low resource consumption. The model is now fully available on major open-source platforms such as Hugging Face, Ollama, and MLX, further advancing the development of on-device AI applications (Source: HuggingFace Blog, karminski3, demishassabis, Reddit r/LocalLLaMA)