Ollama Setup Guide - Run AI Offline with Rephlo

Set up Ollama for 100% offline AI with Rephlo. Run Llama 4, Mistral, and other models locally on your Windows PC or Mac. Complete privacy, zero cost.

Difficulty: intermediate. Reading time: 6 minutes. Last updated: 2026-02-28.

What is Ollama and why use it with Rephlo?

Ollama is a tool that runs open-source AI models like Llama 4 and Mistral directly on your computer. When used with Rephlo, it enables 100% offline AI processing — your data never leaves your machine. This makes it ideal for HIPAA, GDPR, and air-gapped environments.

Ollama is free and works on Windows, macOS, and Linux. It downloads and manages AI models locally, exposing them through a local API that Rephlo connects to. No internet connection is needed after initial model download. This is perfect for lawyers, doctors, financial advisors, and anyone handling sensitive data who cannot use cloud AI services.

How do I install and configure Ollama for Rephlo?

Download Ollama from ollama.ai, run the installer, then pull a model with "ollama pull llama3.2" in your terminal. In Rephlo, add a new Ollama provider with the default base URL http://localhost:11434. No API key is needed.

Ollama runs as a background service on your machine. We recommend llama3.2 (8B) for general use or mistral for balanced performance. For coding tasks, try deepseek-coder or qwen3. Models range from 4 GB to 40 GB in size depending on parameter count. For optimal performance, 16 GB RAM and a GPU are recommended.

Install Ollama: Download the Ollama installer from ollama.ai and run it. Ollama installs as a system service.
Download a model: Open a terminal and run "ollama pull llama3.2" to download the recommended general-purpose model (about 4 GB).
Configure in Rephlo: In Rephlo, go to Settings > Providers > Add Provider > Ollama. The default base URL is http://localhost:11434. No API key needed.
Test the connection: Click "Test Connection" in Rephlo. If Ollama is running, you will see available models listed.

What are the system requirements for running local AI models?

Minimum: 8 GB RAM for small models (7B parameters). Recommended: 16 GB RAM and a GPU for larger models (13B-70B parameters). Models require 4-40 GB of disk space depending on size. An NVIDIA or AMD GPU significantly improves inference speed.

Small models like Llama 3.2 (8B) run well on 8 GB RAM without a GPU, producing responses in 5-15 seconds. Medium models (13B) need 16 GB RAM. Large models (70B) require 32+ GB RAM or GPU offloading. Apple Silicon Macs and NVIDIA RTX GPUs provide the best local AI performance.

Frequently Asked Questions

Is Ollama completely free?

Yes, Ollama is free and open-source. The AI models it runs are also free. There are no subscription fees or usage costs.

Can I use Ollama without an internet connection?

Yes, after downloading your models, Ollama works completely offline. No internet connection is needed for AI processing.