Skip to content

LibreChat

🦙 Ollama

danny-avila/LibreChat

🦙 Ollama

Ollama

Use Ollama for

Running large language models on local hardware
Hosting multiple models
Dynamically loading the model upon request

1. Install Ollama

Mac, Linux, Windows Install

Ollama supports GPU acceleration on Nvidia, AMD, and Apple Metal. Follow Instructions at Ollama Download

Docker Install

Reference docker-compose.override.yml.example for configuration of Ollama in a Docker environment.

Run docker exec -it ollama /bin/bash to access the Ollama command within the container.

2. Load Models in Ollama

Browse the available models at Ollama Library
Copy the text from the Tags tab from the library website and paste it into the terminal. It should begin with 'ollama run'
Check model size. Models that can run in GPU memory perform the best.
Use /bye to exit the terminal

3. Configure LibreChat

Use librechat.yaml Configuration file (guide here) to add Ollama as a separate endpoint.