How to Set up local LLMs with Ollama

 

About Ollama

 

Ollama

 

Ollama is a streamlined tool for running open-source LLMs locally.

Prerequisite

You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models. And GPU is required.

Download and Install Ollama

Please see the Guide to download and install Ollama

Install Ollama Models

Ollama supports a list of models available on ollama.com/library

Here are some example models that can be downloaded:

Model

Parameters

Size

Download

Model

Parameters

Size

Download

Llama 3.1

8B

4.7GB

ollama pull llama3.1

Llama 3.1

70B

40GB

ollama pull llama3.1:70b

Llama 3.1

405B

231GB

ollama pull llama3.1:405b

Phi 3 Mini

3.8B

2.3GB

ollama pull phi3

Phi 3 Medium

14B

7.9GB

ollama pull phi3:medium

Gemma 2

2B

1.6GB

ollama pull gemma2:2b

Gemma 2

9B

5.5GB

ollama pull gemma2

Gemma 2

27B

16GB

ollama pull gemma2:27b

Mistral

7B

4.1GB

ollama pull mistral

Moondream 2

1.4B

829MB

ollama pull moondream

Neural Chat

7B

4.1GB

ollama pull neural-chat

Starling

7B

4.1GB

ollama pull starling-lm

Code Llama

7B

3.8GB

ollama pull codellama

Llama 2 Uncensored

7B

3.8GB

ollama pull llama2-uncensored

LLaVA

7B

4.5GB

ollama pull llava

Solar

10.7B

6.1GB

ollama pull solar

 

Ollama CLI

# List installed models ollama list # Install a model ollama pull llama3.1:8b # Run a model with cli ollama run llama3.1:8b # Show model information ollama show llama3.1 # Remove a models ollama rm llama3.1

 

Tips

Some environment parameters of Ollama

The below ENVs maybe need to set

# Liston on 0.0.0.0 instead of 127.0.0.1 Environment="OLLAMA_HOST=0.0.0.0:11434" # max loaded models Environment="OLLAMA_MAX_LOADED_MODELS=4" # keep a model loaded in memory always Environment="OLLAMA_KEEP_ALIVE=-1"