How to Set up local LLMs with Ollama
About Ollama
Ollama is a streamlined tool for running open-source LLMs locally.
Prerequisite
You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models. And GPU is required.
Download and Install Ollama
Please see the Guide to download and install Ollama
Install Ollama Models
Ollama supports a list of models available on ollama.com/library
Here are some example models that can be downloaded:
Model | Parameters | Size | Download |
---|---|---|---|
Llama 3.1 | 8B | 4.7GB |
|
Llama 3.1 | 70B | 40GB |
|
Llama 3.1 | 405B | 231GB |
|
Phi 3 Mini | 3.8B | 2.3GB |
|
Phi 3 Medium | 14B | 7.9GB |
|
Gemma 2 | 2B | 1.6GB |
|
Gemma 2 | 9B | 5.5GB |
|
Gemma 2 | 27B | 16GB |
|
Mistral | 7B | 4.1GB |
|
Moondream 2 | 1.4B | 829MB |
|
Neural Chat | 7B | 4.1GB |
|
Starling | 7B | 4.1GB |
|
Code Llama | 7B | 3.8GB |
|
Llama 2 Uncensored | 7B | 3.8GB |
|
LLaVA | 7B | 4.5GB |
|
Solar | 10.7B | 6.1GB |
|
Ollama CLI
# List installed models
ollama list
# Install a model
ollama pull llama3.1:8b
# Run a model with cli
ollama run llama3.1:8b
# Show model information
ollama show llama3.1
# Remove a models
ollama rm llama3.1
Tips
Some environment parameters of Ollama
The below ENVs maybe need to set
# Liston on 0.0.0.0 instead of 127.0.0.1
Environment="OLLAMA_HOST=0.0.0.0:11434"
# max loaded models
Environment="OLLAMA_MAX_LOADED_MODELS=4"
# keep a model loaded in memory always
Environment="OLLAMA_KEEP_ALIVE=-1"