Skip to main content

Running Large Language Models locally

Ollama is currently a popular options for running LLMs locally. With the newer versions you can download other models than llama too, like Google's Gemma or task-specific smaller models.

You can download Ollama from ollama.com, available for all systems. Again: you need a bit of a beefy computer for this, preferrably with a recent NVIDIA graphics card and quite a bit of storage.

Once installed, ollama disappears to the background. On Windows, you can still see it running among the icons in the lower right of the taskbar.

When installed, open a terminal (win key, type cmd) and run ollama from here.

Ollama starter tips

No idea where to start? Type 

ollama -helph

See all installed models by typing

ollama list

Add a model with the following. Replace [model name] with one of the models you can find in the ollama model list.

ollama run [model name]

All models (and instructions, see below) are saved as blob files in C:\Users\[Username]\.ollama\models\blobs. This means you can't manually remove models. To delete a model, type

ollama rm [model name]

Create your own instructions

One of the ways you can modify the model is to give it instructions before it runs conversation mode. Ollama calls this models and modelfiles. Please note that this is not the same as training your own model, just an additional set of instructions to a pre-trained model. The way to do this is to copy the instructions of an existing model (like llama3.2), modify those instructions, and then creating a new model in ollama using those new instructions.