Vane (Perplexica 2.0) Quickstart With Ollama and llama.cpp

Self-hosted AI search with local LLMs

Page content

Vane is one of the more pragmatic entries in the “AI search with citations” space: a self-hosted answering engine that mixes live web retrieval with local or cloud LLMs, while keeping the whole stack under your control.

The project was originally known as Perplexica, and the rename to Vane is not cosmetic: it reflects both a branding clean-up and a steady shift away from being framed as “a clone” and toward being a general answering engine.

laptop-llama-server

Because the useful part of the stack is not only the UI but also where inference and data live, this comparison of LLM hosting in 2026 pulls local, self-hosted, and cloud setups together so you can place Vane next to other runtimes and deployment choices.

This post focuses on the parts technical readers actually care about: how the system works, a minimal Docker quickstart, and how to run it with local inference via Ollama and llama.cpp (directly or through LM Studio). Along the way, each FAQ topic is answered in-context, not parked at the bottom.

What Vane is and how AI search engines work

At a high level, Vane is a Next.js application that combines a chat UI with search and citations. The core architectural pieces are also exactly what you would expect from a modern AI search engine: API routes for chat and search, orchestration that decides when to do retrieval, and a citation-aware answer writer.

When you submit a query in the UI, Vane calls POST /api/chat. Internally, the workflow is deliberately structured:

  • It classifies the question first to decide whether research is needed and which helpers should run.
  • It runs research and widgets in parallel.
  • It generates the final answer and includes citations.

That “AI search engine” label matters, because this is not just a chat frontend. The key difference is retrieval augmented generation: instead of relying purely on the LLM’s parameters, Vane fetches external context (web results and optionally user uploads) and uses that material as the grounding substrate for the final answer. Its docs explicitly call out web lookup and “searching user uploaded files” as part of research, with embeddings used for semantic search over uploads.

Citations are not an afterthought. Vane prompts the model to cite references it used, then the UI renders those citations next to the response. In practice, this is what separates “helpful” AI search from a confident hallucination generator that just happens to have a search button.

SearxNG sits underneath the web retrieval layer for most setups. SearxNG is a free metasearch engine that aggregates results from many search services and, by design, does not track or profile users. That is a fundamentally different philosophy from paid search APIs, which usually give you a single vendor index and a commercial data contract.

Perplexica to Vane history and rename

Perplexica started as an open-source, self-hostable answering engine inspired by Perplexity AI. Several public guides still describe the project as “previously known as Perplexica” and treat Vane as the continuation rather than a hostile fork.

The rename was implemented directly in the upstream repo. In the master branch commit history, the commit titled feat(app): rename to 'vane' appears on Mar 9, 2026 (SHA 39c0f19).

The “how” is more interesting than the headline. That rename commit is not just a README tweak: it updates Docker image names from itzcrazykns1337/perplexica to itzcrazykns1337/vane, adjusts container filesystem paths from /home/perplexica to /home/vane, and updates project text and assets accordingly.

If you are wondering why open source AI projects get renamed, Vane is a textbook example of the usual drivers:

  • Name proximity to a commercial brand creates confusion (and sometimes legal risk).
  • The project scope expands beyond the original framing (from “clone” to “answering engine”).
  • Distribution artefacts need a coherent identity (Docker images, docs, UI labels).

Also, the ecosystem does not switch names overnight. Docker Hub still shows both repositories under the maintainer account, including itzcrazykns1337/vane and itzcrazykns1337/perplexica. So you will still see older blog posts, compose files, and registry references using the Perplexica naming even after the repo rebrand.

Docker quickstart and basic configuration

Vane’s official README is refreshingly direct: run a single container and you get Vane plus a bundled SearxNG search backend. The minimal Docker quickstart looks like this.

docker run -d -p 3000:3000 -v vane-data:/home/vane/data --name vane itzcrazykns1337/vane:latest

That image is positioned as the “just works” path because it includes SearxNG already, so you do not need an external search backend just to test the UI. Configuration happens in the setup screen after you open the web UI at http://localhost:3000.

If you already run SearxNG (common in homelabs), the “slim” Vane image expects you to point it at an external SearxNG instance using SEARXNG_API_URL. The README also calls out two practical SearxNG settings expectations: JSON output enabled and the Wolfram Alpha engine enabled.

docker run -d -p 3000:3000 \
  -e SEARXNG_API_URL=http://your-searxng-url:8080 \
  -v vane-data:/home/vane/data \
  --name vane \
  itzcrazykns1337/vane:slim-latest

Keeping Vane updated is also documented in-repo. The official update workflow is basically pull the latest image and restart with the same volume, which preserves settings.

docker pull itzcrazykns1337/vane:latest
docker stop vane
docker rm vane
docker run -d -p 3000:3000 -v vane-data:/home/vane/data --name vane itzcrazykns1337/vane:latest

Once you have it running, Vane can be used as a browser search engine shortcut by pointing a custom engine at http://localhost:3000/?q=%s. That is a small feature with outsized impact if you want “AI search” to feel like search rather than an app you visit.

For automation and integration, Vane exposes an API. The docs describe GET /api/providers to discover configured providers and models, and POST /api/search to execute a search with a chosen chat model, embedding model, sources, and an optimizationMode (speed, balanced, quality).

Local LLM setup with Ollama

Vane supports local LLMs through Ollama and cloud providers in the same UI, which is the right abstraction if you think in terms of “connections” and “models” rather than “vendors”.

Ollama cheatsheet lists common CLI commands and quick API checks that pair well with picking models and verifying Ollama before you chase Docker networking issues.

The most common issue is not model choice, it is networking. When Vane runs in Docker and Ollama runs on the host, “localhost” does not mean what you think it means from inside the container. Vane documents OS-specific base URLs for connecting to Ollama from a container.

Connectivity gotchas with Docker

Vane’s troubleshooting section explicitly recommends:

  • Windows and macOS: http://host.docker.internal:11434
  • Linux: http://<private_ip_of_host>:11434

For Linux, Vane also notes that Ollama may be bound to 127.0.0.1 by default and needs to be exposed. The README suggests setting OLLAMA_HOST=0.0.0.0:11434 in the systemd service and restarting the service.

This lines up with Ollama’s own serve environment variables, where OLLAMA_HOST controls the server bind address and defaults to 127.0.0.1:11434.

Keep models warm and pick models

If you run local inference, you will feel cold starts. Ollama has two related mechanisms for keeping models loaded:

  • OLLAMA_KEEP_ALIVE as a server setting.
  • keep_alive as a per-request parameter for /api/generate and /api/chat, which overrides the server default.

Vane added its own keep_alive support for Ollama models (so the app can influence how long a model stays in memory). That feature appears in Vane’s v1.10.0 release notes.

Model selection is the part that gets overcomplicated on the internet. For Vane-style work, the most practical split is:

  • A chat model that is instruct tuned (for summarisation and synthesis).
  • An embedding model for similarity search over uploads and retrieved text. Vane’s API docs show that the search request explicitly chooses both a chat model and an embedding model.

Ollama itself supports embeddings workflows, and even the CLI docs include an example using nomic-embed-text for embeddings.

This is also the answer to the FAQ about running AI search locally without cloud APIs: with Vane in Docker, SearxNG local, and Ollama on your hardware, you can keep both your search queries and your private document uploads within your own network boundary. (If you decide to connect to a cloud provider instead, the connection obviously changes the data path.)

Local LLM setup with llama.cpp

There are two realistic ways to pair Vane with llama.cpp:

  • Use LM Studio as the server layer (and let Vane talk to it).
  • Run llama.cpp’s own HTTP server (llama-server) and connect via an OpenAI-compatible endpoint.

Vane explicitly supports “Local OpenAI-API-Compliant Servers” and calls out the usual requirements: bind to 0.0.0.0 rather than 127.0.0.1, use the correct port, set a model name that exists on the server, and do not leave the API key field empty even if the server does not enforce auth.

LM Studio is relevant here because it sits on top of local backends (often llama.cpp) while exposing an OpenAI-compatible API. Vane v1.12.1 specifically notes adding an LM Studio provider.

LM Studio’s docs list the supported OpenAI-compatible endpoints and show a base URL example using http://localhost:1234/v1 (assuming port 1234). That matters because, from Vane’s perspective, it is “just another OpenAI-style server”.

If you prefer to run llama.cpp directly, llama.cpp Quickstart with CLI and Server covers install, llama-cli, and llama-server. The official llama.cpp HTTP server supports OpenAI API compatible chat completions, responses, and embeddings routes, along with a long list of server features (batching, monitoring, tool use).

Even if you do not memorise the flags, the important parts are:

  • The server exists and is actively documented.
  • The API surface is compatible enough that OpenAI-style clients can talk to it, which is exactly what Vane needs for its “OpenAI compatible” connection pattern.

What shipped recently and what is changing now

If you want to understand what Vane became over the last year, follow the release notes and the master branch history rather than the hype.

As of Apr 10, 2026 (Australia/Melbourne), the latest tagged GitHub release visible on the releases page is v1.12.1 (Dec 31, 2025). That release notes adding an LM Studio provider and fixes around function calling with OpenAI-compatible providers and JSON parsing.

The preceding releases outline the bigger shifts:

  • v1.11.0 (Oct 21, 2025) introduced a new setup wizard and a redesigned configuration system, along with broader provider support and a single-command Docker installation path. It also mentions dynamic model fetching and various UI and developer experience improvements.
  • v1.12.0 (Dec 27, 2025) is an architectural reset: it removes LangChain in favour of a custom implementation for streaming, generation, and provider-specific behaviour. It also renames “providers” to “connections”, adds UI and code rendering improvements, and moves more capability into the project’s own abstractions (including improved function calling versus previous parsing approaches).
  • Earlier, v1.10.0 (Mar 20, 2025) added file uploads (PDF, TXT, DOCX), added an Ollama keep_alive parameter, added a meta search agent class to improve maintainability and focus mode creation, and added automatic image and video search functionality.

On the branding side, the rename to Vane landed on Mar 9, 2026 in master (feat(app): rename to 'vane'), updating both the codebase naming and Docker artefacts.

And the project did not stop evolving after the Dec 2025 release. Master branch commits on Apr 8-9, 2026 include work described as “updated deep research mode, context management” and new search execution and scraping-related changes. In other words, the “AI search engine” part is still being actively iterated, not frozen behind release tags.

Some References