AI experts sharing free tutorials to accelerate your business.
← Back to News
Breaking

Google Launches LiteRT-LM to Run LLMs on Any Edge Device

Krasa AI

2026-04-08

4 minute read

Google Launches LiteRT-LM to Run LLMs on Any Edge Device

Running a large language model used to require a data center. Now Google wants you to run one on your watch.

Google today publicly released LiteRT-LM, an open-source C++ inference framework that brings production-grade LLM performance to edge devices — everything from Android phones and iPhones to Chromebooks, IoT sensors, and even the Pixel Watch.

Why Edge AI Matters Right Now

Every time you ask an AI assistant a question, your data typically travels to a distant server, gets processed, and the answer travels back. That round trip adds latency, costs money, and raises privacy concerns.

LiteRT-LM eliminates that round trip entirely. The framework runs LLMs directly on your device, which means three things for you: faster responses (sub-second time-to-first-token is the target), complete privacy (your data never leaves the device), and offline capability (no internet required).

Why this matters: As AI becomes embedded in healthcare, finance, and enterprise tools, the ability to process sensitive data locally — without sending it to the cloud — isn't just convenient. It's becoming a regulatory requirement.

What Makes LiteRT-LM Different

Google has been running LiteRT-LM internally for months. It already powers features across Chrome, Chromebook Plus, and the Pixel Watch. Today's release gives every developer access to the same production-tested engine.

The architecture uses a clever Engine/Session system. A single foundation model (like Gemini Nano or Gemma) serves as the base, while lightweight LoRAs (low-rank adaptations — small model adjustments that customize behavior) handle feature-specific customization. This means one model can power text summarization, proofreading, smart replies, and image understanding simultaneously.

Performance is impressive. Session cloning — spinning up a new AI conversation — takes under 10 milliseconds. A copy-on-write KV-cache (the AI's working memory) minimizes memory usage, and context switching between tasks happens almost instantly.

Cross-Platform, Cross-Model Support

LiteRT-LM isn't locked to Google's own models. The framework supports Gemma, Llama, Phi-4, Qwen, and other popular open-source LLMs. It runs on Android, iOS, Web, Desktop, and IoT platforms including Raspberry Pi.

The April 2026 update adds full support for Gemma 4, including the Edge 2B variant designed specifically for mobile deployment, plus 4B and 12B models for desktop-class hardware. GPU and NPU (neural processing unit) hardware acceleration ensures you're getting peak performance from whatever device you're running on.

For developers, the framework is modular by design. Building for a Pixel Watch with strict memory limits? You can strip LiteRT-LM down to just the executor, tokenizer, and sampler. Building for a powerful desktop? Load the full stack with all optimizations enabled.

Already Shipping in Google Products

This isn't vaporware. LiteRT-LM already powers real features that millions of people use daily.

In Chrome and Chromebook Plus, it enables on-device text summarization and proofreading through built-in Web AI APIs. On the Pixel Watch, it drives the Smart Replies feature — generating contextual message responses on a device with severely constrained resources.

The fact that Google has battle-tested this framework across such diverse hardware gives developers confidence that it's ready for production use cases.

The Bigger Picture

LiteRT-LM arrives at a pivotal moment for edge AI. Companies are increasingly concerned about sending sensitive data to cloud-based AI services. Regulations like the EU AI Act and various healthcare data privacy laws are pushing computation closer to the user.

Google's move to open-source a production-grade edge inference framework lowers the barrier significantly. Previously, deploying LLMs on edge devices required deep expertise in model optimization and hardware-specific tuning. LiteRT-LM packages all of that into a developer-friendly framework.

Why this matters: If running a capable LLM on a phone becomes as easy as importing a library, we'll see an explosion of AI-powered apps that work offline, protect privacy, and respond instantly — fundamentally changing how AI integrates into daily life.

How to Get Started

LiteRT-LM is available now on GitHub under the google-ai-edge organization. Documentation and quickstart guides are available at Google AI for Developers.

The Bottom Line

Google just made it dramatically easier to run AI on the devices people actually carry. LiteRT-LM isn't just a framework — it's a bet that the future of AI isn't in the cloud, but in your pocket. For developers building the next generation of privacy-first, low-latency AI applications, this is the starting gun.

#AI#Google#LiteRT-LM#edge AI#on-device AI

Related Articles