๐ฆ Ollama
Ollama
Use Ollama for
- Running large language models on local hardware
- Hosting multiple models
- Dynamically loading the model upon request
1. Install Ollama
Mac, Linux, Windows Install
Ollama supports GPU acceleration on Nvidia, AMD, and Apple Metal. Follow Instructions at Ollama Download
Docker Install
Reference docker-compose.override.yml.example for configuration of Ollama in a Docker environment.
Run
docker exec -it ollama /bin/bash
to access the Ollama command within the container.
2. Load Models in Ollama
- Browse the available models at Ollama Library
- Copy the text from the Tags tab from the library website and paste it into the terminal. It should begin with 'ollama run'
- Check model size. Models that can run in GPU memory perform the best.
- Use /bye to exit the terminal
3. Configure LibreChat
Use
librechat.yaml
Configuration file (guide here)
to add Ollama as a separate endpoint.