🍪 We use cookies

    We use cookies to improve your experience on our website, analyse traffic, and for marketing purposes. By clicking "Accept All", you consent to our use of cookies. You can also customise your preferences or reject non-essential cookies. Learn more

    Loc.ai
    Sign inStart free

    Turn end-user devices into sovereign off-cloud AI infrastructure

    Orchestrate local models across user devices from a central control plane. On-device first — route to on-prem or cloud when you need to. OpenAI-compatible API.

    Start free — Free Forever
    3 devices, 5GB registry, 2GB egress / month — no card required.

    Backed by

    Google for Startups - supporting edge computing infrastructure startups
    NVIDIA Inception Program - accelerating air-gapped AI deployment solutions
    Fuel Ventures - early-stage venture capital

    Simple

    Turn user devices into AI inference infrastructure with a single command. OpenAI-compatible API — no code changes required.

    Scalable

    Unlike centralised cloud or big-server setups where everyone connects to one place, on-device inference scales instantly with every new user.

    Cost-Efficient

    Move repeated, high-frequency workloads to user devices. Heavy or rare jobs route to your own servers or cloud — you decide where each runs.

    For devs and teams that are building

    real-time AI features where API latency breaks UX

    products where user data must stay on the device

    workloads that can run on local models vs paid APIs

    AI features where inference cost scales with usage

    applications that must run without stable connectivity

    Most AI-powered product features don't need frontier models

    Local models are viable for many product features — once validated and adapted in real end-user environments. Loc.ai lets you do that without manual setup or per-device configuration, from a single control plane.

    What you can run locally

    Repeated inference
    Classification, extraction, routing
    Structured generation
    Reliable JSON, schemas, deterministic outputs
    Real-time features
    Assistants, typing, live interactions

    Supported models

    Language models
    GGUF (llama.cpp)
    Image / audio
    TFLite

    How you run it with Loc.ai

    Run models on real user devices, not dev machines

    Deploy and test models across hardware and environments from the control plane

    Compare models in real product flows, not benchmarks

    Choose the right model and optimisation strategy based on real usage data

    Adapt or fine-tune models and roll them out across all user devices from one place

    Request a personal demo to get Loc.ai evaluated on your fleet

    Request a demo

    Deploy on-device AI infrastructure in minutes

    We provide the full on-device AI infrastructure —completely application agnostic.

    Step 1

    Install the runtime (Loc.ai:Link)

    One curl command — handles dependencies, Python setup, device registration, and agent startup automatically.

    Step 2

    Deploy a local model

    Pick from the model library (GGUF, TFLite) or bring your own. Deploy to your entire fleet from the control plane.

    Step 3

    Run & call via API

    Runs on-device first, routes to your servers or cloud when needed. OpenAI-compatible API

    const openai = new OpenAI({ baseURL: "http://localhost:8100" })
    Step 4

    Manage everything centrally

    Register devices, push model updates, monitor inference results and device telemetry — CPU, RAM, temperature — all from Loc.ai:Control.

    Loc.ai architecture

    Device-first execution. Control plane orchestrates. Heavy or rare jobs route to your own servers or cloud — optional.

    End-User Device
    Your App
    HTTP / JSON
    Localhost API
    (OpenAI-compatible)
    Loc.ai:Link
    (runtime)
    Local Model
    (runs on device)
    Commands
    Results & Telemetry
    Loc.ai:Control
    Control Plane
    (dashboard / backend)
    Heavy / rare
    jobs
    optional
    Your Server
    (on-prem GPU)
    Cloud API
    (OpenAI / Anthropic / …)

    Built on a pub/sub architecture — the standard for distributed, low-latency systems. Powered by Zenoh, an open protocol for edge and robotics workloads.

    Call local models via the OpenAI-compatible API

    Point your existing SDK at a localhost endpoint—no code changes, no new client, no per-device setup.

    locai — zsh
    ❯ uv run main.py run
    ℹ️ Resuming from latest session: configs/session_20260428_113353.json

    Same API. Local execution.

    Your inference cost no longer scales with usage

    With local inference repeated and high-frequency workloads run on user devices, API usage is limited to tasks that actually require large models.

    As a result cost per user becomes more predictable and margins don't shrink as usage grows.

    Simulate cost scenarios

    Compare local vs API-based execution costs

    ~0.4s
    time-to-first-token on device
    Up to 85%
    of workloads move off paid APIs
    60-90%
    cost reduction by offloading to user devices
    70%+
    less dev time on infrastructure and model debugging

    See Loc.ai in action

    See how to install, deploy, and run a model on a device

    Explore and build with Loc.ai

    Join builders working on AI systems where latency, cost, and data are under your control.

    FAQ