Google Launches LiteRT-LM to Run LLMs on Any Edge Device
Krasa AI
2026-04-08
4 minute read
Google Launches LiteRT-LM to Run LLMs on Any Edge Device
Running a large language model used to require a data center. Now Google wants you to run one on your watch.
Google today publicly released LiteRT-LM, an open-source C++ inference framework that brings production-grade LLM performance to edge devices — everything from Android phones and iPhones to Chromebooks, IoT sensors, and even the Pixel Watch.
Why Edge AI Matters Right Now
Every time you ask an AI assistant a question, your data typically travels to a distant server, gets processed, and the answer travels back. That round trip adds latency, costs money, and raises privacy concerns.
LiteRT-LM eliminates that round trip entirely. The framework runs LLMs directly on your device, which means three things for you: faster responses (sub-second time-to-first-token is the target), complete privacy (your data never leaves the device), and offline capability (no internet required).
Why this matters: As AI becomes embedded in healthcare, finance, and enterprise tools, the ability to process sensitive data locally — without sending it to the cloud — isn't just convenient. It's becoming a regulatory requirement.
What Makes LiteRT-LM Different
Google has been running LiteRT-LM internally for months. It already powers features across Chrome, Chromebook Plus, and the Pixel Watch. Today's release gives every developer access to the same production-tested engine.
The architecture uses a clever Engine/Session system. A single foundation model (like Gemini Nano or Gemma) serves as the base, while lightweight LoRAs (low-rank adaptations — small model adjustments that customize behavior) handle feature-specific customization. This means one model can power text summarization, proofreading, smart replies, and image understanding simultaneously.
Performance is impressive. Session cloning — spinning up a new AI conversation — takes under 10 milliseconds. A copy-on-write KV-cache (the AI's working memory) minimizes memory usage, and context switching between tasks happens almost instantly.
Cross-Platform, Cross-Model Support
LiteRT-LM isn't locked to Google's own models. The framework supports Gemma, Llama, Phi-4, Qwen, and other popular open-source LLMs. It runs on Android, iOS, Web, Desktop, and IoT platforms including Raspberry Pi.
The April 2026 update adds full support for Gemma 4, including the Edge 2B variant designed specifically for mobile deployment, plus 4B and 12B models for desktop-class hardware. GPU and NPU (neural processing unit) hardware acceleration ensures you're getting peak performance from whatever device you're running on.
For developers, the framework is modular by design. Building for a Pixel Watch with strict memory limits? You can strip LiteRT-LM down to just the executor, tokenizer, and sampler. Building for a powerful desktop? Load the full stack with all optimizations enabled.
Already Shipping in Google Products
This isn't vaporware. LiteRT-LM already powers real features that millions of people use daily.
In Chrome and Chromebook Plus, it enables on-device text summarization and proofreading through built-in Web AI APIs. On the Pixel Watch, it drives the Smart Replies feature — generating contextual message responses on a device with severely constrained resources.
The fact that Google has battle-tested this framework across such diverse hardware gives developers confidence that it's ready for production use cases.
The Bigger Picture
LiteRT-LM arrives at a pivotal moment for edge AI. Companies are increasingly concerned about sending sensitive data to cloud-based AI services. Regulations like the EU AI Act and various healthcare data privacy laws are pushing computation closer to the user.
Google's move to open-source a production-grade edge inference framework lowers the barrier significantly. Previously, deploying LLMs on edge devices required deep expertise in model optimization and hardware-specific tuning. LiteRT-LM packages all of that into a developer-friendly framework.
Why this matters: If running a capable LLM on a phone becomes as easy as importing a library, we'll see an explosion of AI-powered apps that work offline, protect privacy, and respond instantly — fundamentally changing how AI integrates into daily life.
How to Get Started
LiteRT-LM is available now on GitHub under the google-ai-edge organization. Documentation and quickstart guides are available at Google AI for Developers.
The Bottom Line
Google just made it dramatically easier to run AI on the devices people actually carry. LiteRT-LM isn't just a framework — it's a bet that the future of AI isn't in the cloud, but in your pocket. For developers building the next generation of privacy-first, low-latency AI applications, this is the starting gun.
Don't fall behind
Expert AI Implementation →Related Articles
Anthropic Launches Claude Fable 5: Its Most Capable Model Yet
Anthropic released Claude Fable 5, a Mythos-class model that's state-of-the-art on nearly every benchmark — with new safeguards built in. Here's what it means.
min read
China Plans $295B AI Data Center Buildout to Rival the US
China is readying a $295 billion plan to build nationwide AI data centers using mostly domestic chips — squeezing out Nvidia and AMD. Here's what it means.
min read
Flourish Raises $500M to Copy the Brain and Fix AI's Power Crisis
Flourish raised $500M at a $2.5B valuation — backed by Jeff Bezos — to build brain-inspired AI that runs on a fraction of today's energy. Here's the bet.
min read