The ollama
Python library is a client for interacting with local LLMs hosted via the Ollama runtime. It provides a simple API for loading models, running inference, and managing sessions on your own machine.
Key features:
- Connects to the Ollama runtime, which must be installed and running (ollama serve
)
- Works with a variety of models (e.g. llama2, mistral, gemma)
- Supports chat-style interaction, system prompts, and streaming responses
Installation:
pip install ollama
Example usage:
import ollama
response = ollama.chat(
model='llama2',
messages=[
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'Explain photosynthesis.'}
]
)
print(response['message']['content'])
Other methods:
• ollama.list() – list available models
• ollama.pull('mistral') – download a model
• ollama.create() – create a custom model from a Modelfile
• ollama.generate() – run single-turn inference without chat format
Docs: https://github.com/ollama/ollama/tree/main/python
Issue with "Error: could not connect to ollama app, is it running?"
from: https://collabnix.com/running-ollama-with-docker-for-python-applications
Create a .sh
file:
#!/bin/sh
# Start Ollama in the background
ollama serve &
# Wait for Ollama to start
sleep 5
# Pull the required model(s)
ollama pull mistral
# Start your Python application
python app.py