Get Glyph
Warning This documentation is still a work in progress. Some details may be out of date depending on the version of Glyph you are using, but it is being actively reviewed and improved.
Documentation AI Assistant Development Licensing

Documentation

Ollama Configuration

Run AI models locally with Ollama in Glyph

Run open-source AI models locally on your machine using Ollama. No API keys, no cloud dependencies, complete privacy.

Prerequisites

  • Ollama installed: ollama.ai
  • Sufficient disk space for models (2-10GB per model)
  • Adequate RAM (8GB minimum, 16GB+ recommended)

Setup

Install Ollama

Download and install Ollama from ollama.ai .

brew install ollama
ollama serve
curl -fsSL https://ollama.ai/install.sh | sh
ollama serve

Download the installer from ollama.ai/download and run it.

Pull a Model

Download a model from Ollama’s library:

# Fast, capable model
ollama pull llama3.2

# Larger, more capable
ollama pull llama3.1:70b

# Code-focused
ollama pull codellama

# Lightweight
ollama pull phi3

View all models at ollama.ai/library .

Verify Ollama is Running

Check that Ollama is running on localhost:11434:

curl http://localhost:11434/v1/models

You should see a JSON response with your installed models.

Open Glyph AI Settings

Go to Settings → AI and select the Ollama profile.

Configure Base URL

The default base URL is http://localhost:11434/v1. If Ollama runs on a different host or port, update the base URL.

Allow Private Hosts is enabled by default for Ollama.

Select Model

Click the Model dropdown. Glyph fetches models from Ollama’s local API.

Select your downloaded model (e.g., llama3.2, llama3.1:70b).

Test Connection

Open the AI panel and send a test message. You should receive a response from your local model.

Configuration

Provider Settings

  • Service: ollama
  • Base URL: http://localhost:11434/v1 (default)
  • Authentication: None (local API)
  • Allow Private Hosts: Enabled (required for localhost)

Custom Port

If Ollama runs on a different port:

Base URL: http://localhost:8080/v1

Remote Ollama Server

To connect to Ollama on another machine:

Base URL: http://192.168.1.100:11434/v1

Ensure Allow Private Hosts is enabled.

Model Selection

Glyph uses Ollama’s OpenAI-compatible /v1/models endpoint to list models.

ModelSizeUse CaseRAM Required
llama3.23BFast, everyday tasks8GB
llama3.18BGeneral purpose8GB
llama3.1:70b70BMost capable32GB+
mistral7BBalanced performance8GB
codellama7BCode generation8GB
phi33.8BLightweight4GB
gemma29BGoogle’s open model8GB

Explore all models at ollama.ai/library .

Model Tags

Ollama models use tags for variants:

  • llama3.1:latest - Latest stable version
  • llama3.1:70b - 70 billion parameter variant
  • llama3.1:8b-q4_0 - 4-bit quantized (smaller, faster)

Features

Chat Mode

Conversational interaction:

  • Back-and-forth dialogue
  • No file system access
  • Fast local inference
  • Best for brainstorming and Q&A

Create Mode

Local AI with workspace tools:

  • read_file - Read files from your space
  • search_notes - Search note content
  • list_dir - List directory contents
  • Tool usage tracked in timeline view
  • Best for research and knowledge retrieval

Context Attachment

Attach notes for grounded responses:

  • Attach files or folders via context menu
  • Mention with @filename syntax
  • Configure character budget (up to 250K chars)
  • Context sent locally, never leaves your machine

Performance

Inference Speed

Local inference speed depends on:

  • Model size: Smaller models (3B-8B) are faster
  • Hardware: GPU acceleration significantly improves speed
  • Context length: Longer contexts increase latency

GPU Acceleration

Ollama automatically uses GPU if available:

  • NVIDIA: CUDA support
  • AMD: ROCm support
  • Apple Silicon: Metal acceleration

Check GPU usage:

ollama ps

Context Window

Ollama models have varying context windows:

  • llama3.1: 128K tokens
  • mistral: 32K tokens
  • codellama: 16K tokens

Larger contexts increase memory usage and latency.

Privacy and Security

Ollama runs entirely on your machine:

  • ✅ No data sent to external servers
  • ✅ No API keys required
  • ✅ Complete privacy for sensitive notes
  • ✅ Works offline
  • ✅ No usage limits or billing

Note

Ollama is ideal for private notes, confidential documents, or offline environments.

Troubleshooting

”model list failed”

Cause: Glyph can’t connect to Ollama.

Solution:

  1. Verify Ollama is running: ollama ps
  2. Check the base URL in settings
  3. Ensure Allow Private Hosts is enabled
  4. Test connection: curl http://localhost:11434/v1/models

Model not in dropdown

Solution: Type the model name manually (e.g., llama3.2, codellama).

“connection refused”

Cause: Ollama is not running.

Solution: Start Ollama:

ollama serve

Responses are very slow

Possible causes:

  • Large model (70B+) without sufficient RAM
  • No GPU acceleration
  • Long context

Solutions:

  • Use a smaller model (llama3.2, phi3)
  • Enable GPU acceleration (automatic if hardware supports it)
  • Reduce context size
  • Close other memory-intensive applications

”out of memory”

Cause: Model is too large for available RAM.

Solution:

  • Use a smaller model
  • Use quantized variants (e.g., llama3.1:8b-q4_0)
  • Close other applications
  • Increase system swap space

Tool calls fail in create mode

Cause: Some Ollama models don’t support function calling well.

Solution: Use chat mode instead, or try a different model. llama3.1 has good tool support.

Advanced Configuration

Custom Ollama Endpoint

If you run Ollama with custom settings:

OLLAMA_HOST=0.0.0.0:8080 ollama serve

Update base URL in Glyph:

Base URL: http://localhost:8080/v1

Model Parameters

To adjust model parameters (temperature, top_p, etc.), you’ll need to modify Glyph’s source code or use a different provider (OpenAI-compatible supports more options).

Multiple Ollama Instances

Run multiple Ollama instances on different ports and create separate profiles in Glyph for each.

Next Steps