Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

About Ollama

Ollama

Info

Ollama is a streamlined tool for running open-source LLMs locally.

Prerequisite

You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models. And GPU is required.

Download and Install Ollama

Please see the Guide to download and install Ollama

Install Ollama Models

Ollama supports a list of models available on ollama.com/library

Here are some example models that can be downloaded:

Model

Parameters

Size

Download

Llama 3.1

8B

4.7GB

ollama pull llama3.1

Llama 3.1

70B

40GB

ollama pull llama3.1:70b

Llama 3.1

405B

231GB

ollama pull llama3.1:405b

Phi 3 Mini

3.8B

2.3GB

ollama pull phi3

Phi 3 Medium

14B

7.9GB

ollama pull phi3:medium

Gemma 2

2B

1.6GB

ollama pull gemma2:2b

Gemma 2

9B

5.5GB

ollama pull gemma2

Gemma 2

27B

16GB

ollama pull gemma2:27b

Mistral

7B

4.1GB

ollama pull mistral

Moondream 2

1.4B

829MB

ollama pull moondream

Neural Chat

7B

4.1GB

ollama pull neural-chat

Starling

7B

4.1GB

ollama pull starling-lm

Code Llama

7B

3.8GB

ollama pull codellama

Llama 2 Uncensored

7B

3.8GB

ollama pull llama2-uncensored

LLaVA

7B

4.5GB

ollama pull llava

Solar

10.7B

6.1GB

ollama pull solar

Ollama CLI

Code Block
languagebash
# List installed models
ollama list

# Install a model
ollama pull llama3.1:8b

# Run a model with cli
ollama run llama3.1:8b

# Show model information
ollama show llama3.1

# Remove a models
ollama rm llama3.1

Tips

Some environment parameters of Ollama

The below ENVs maybe need to set

Code Block
languagebash
# Liston on 0.0.0.0 instead of 127.0.0.1
Environment="OLLAMA_HOST=0.0.0.0:11434"

# max loaded models
Environment="OLLAMA_MAX_LOADED_MODELS=4"

# keep a model loaded in memory always
Environment="OLLAMA_KEEP_ALIVE=-1"