About Ollama
Info |
---|
Ollama is a streamlined tool for running open-source LLMs locally. |
Prerequisite
You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models. And GPU is required.
Download and Install Ollama
Please see the Guide to download and install Ollama
Install Ollama Models
Ollama supports a list of models available on ollama.com/library
Here are some example models that can be downloaded:
Model | Parameters | Size | Download |
---|---|---|---|
Llama 3.1 | 8B | 4.7GB |
|
Llama 3.1 | 70B | 40GB |
|
Llama 3.1 | 405B | 231GB |
|
Phi 3 Mini | 3.8B | 2.3GB |
|
Phi 3 Medium | 14B | 7.9GB |
|
Gemma 2 | 2B | 1.6GB |
|
Gemma 2 | 9B | 5.5GB |
|
Gemma 2 | 27B | 16GB |
|
Mistral | 7B | 4.1GB |
|
Moondream 2 | 1.4B | 829MB |
|
Neural Chat | 7B | 4.1GB |
|
Starling | 7B | 4.1GB |
|
Code Llama | 7B | 3.8GB |
|
Llama 2 Uncensored | 7B | 3.8GB |
|
LLaVA | 7B | 4.5GB |
|
Solar | 10.7B | 6.1GB |
|
Ollama CLI
Code Block | ||
---|---|---|
| ||
# List installed models ollama list # Install a model ollama pull llama3.1:8b # Run a model with cli ollama run llama3.1:8b # Show model information ollama show llama3.1 # Remove a models ollama rm llama3.1 |
Tips
Some environment parameters of Ollama
The below ENVs maybe need to set
Code Block | ||
---|---|---|
| ||
# Liston on 0.0.0.0 instead of 127.0.0.1
Environment="OLLAMA_HOST=0.0.0.0:11434"
# max loaded models
Environment="OLLAMA_MAX_LOADED_MODELS=4"
# keep a model loaded in memory always
Environment="OLLAMA_KEEP_ALIVE=-1" |