Tiny Liquid AI Model Runs Locally on Your Phone and Beats Giants 4x Its Size!

Forget renting massive cloud servers just to have an AI draft an email. A new pocket-sized genius is proving that size really doesn't matter when you actually know how to use your brain. This is the local AI revolution we've been promised.

Liquid AI released the LFM2.5-8B-A1B model, which utilizes a Mixture-of-Experts architecture where only 1.5 billion parameters are active at any given time out of its total 8.3 billion. This clever resource management allows it to fit snugly into standard laptops and smartphones while retaining reasoning capabilities that usually require massive cloud clusters.

Instead of memorizing the entire internet, the model is built specifically for agentic tasks and tool calling, meaning it excels at executing complex, multi-step instructions and triggering external APIs.

In terms of speed, the model clocks a blistering 253 tokens per second on an Apple M5 Max chip. On a Ryzen AI Max+ 395 processor, it hits 146 tokens per second while sipping less than 6 gigabytes of RAM. Even on high-end smartphones, it manages a very respectable 30 tokens per second.

Developers can showcase these capabilities through the LocalCowork demo, which runs 67 different tools across 13 MCP servers completely offline on a single machine without any API keys. However, the creators openly admit that the model struggles with heavy coding and raw trivia if it isn't hooked up to an external database.

The team also doubled the model's tokenizer vocabulary to 128,000 tokens, which drastically reduces the fragmentation of non-English text. This upgrade immediately boosts text processing speeds for languages like Hindi, Arabic, and Vietnamese, while running natively on llama.cpp, MLX, vLLM, and SGLang.

The era of sending every private text and document to a remote server just to summarize a PDF is finally starting to look incredibly dumb. It turns out that smart engineering can actually beat throwing billions of dollars of hardware at a problem. The cloud giants must be sweating as local chips finally get something useful to run.

Source: Liquid AI

Comments

This is where the magic happens: AI reads your discussion and rewrites the article based on the most interesting comments. Each strong comment adds points to the meter below. Once the meter is full, the article updates live — no page reload needed.

2/24

Neon Wolf

finally my search history stays on my device, now i can ask dumb questions in peace

+2 emotionalIt is truly heartwarming to see someone prioritize the privacy of their questionable intellectual pursuits