Serve your Ollama API on your Kubernetes cluster
Author
Erdi KöseWouldn’t it be cool sending HTTP requests to a LLM model on your own infrastructure, and turning it an API to use on any desired purpose?
I am going to show how to do it now.
Requirements
- A Kubernetes server
- A Helm installed device
- A domain
- Some Kubernetes knowledge
That’s all we need. We will start by preparing our kubernetes cluster. We need to install ingress-nginx to proxy the requests to our k8s services, cert-manager to have a valid SSL certificate. This is optional but personally I like having free certificates 🤗. And finally we need to install ollama.
Installing Helm Packages
Please make sure that you have Helm installed and have connection to your cluster.
We will first add ingress-nginx package to our helm packages and install it to our cluster.
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install ingress-nginx ingress-nginx/ingress-nginx -n ingress-nginx --create-namespace
After that we need to add cert-manager package to Helm and install it to our cluster.
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --set installCRDs=true
And now let’s install ollama in our cluster. For this we need to create a values.yaml file to configure our ollama installation like which models to install and how much resources it should use.
ollama:
gpu:
enabled: false
number: 1
models:
- mistral
- llama2
persistentVolume:
enabled: true
size: 100Gi
resources:
limits:
cpu: '8000m'
memory: '8192Mi'
requests:
cpu: '4000m'
memory: '4096Mi'
And we can install ollama with our values.yaml file now by running:
helm repo add ollama-helm https://otwld.github.io/ollama-helm/
helm repo update
helm install ollama ollama-helm/ollama --namespace ollama -f ./values.yaml
K8S resources
letsencrypt.yaml
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: letsencrypt
namespace: ollama
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: admin@example.com
solvers:
- http01:
ingress:
class: nginx
ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress
annotations:
nginx.ingress.kubernetes.io/use-regex: 'true'
cert-manager.io/issuer: 'letsencrypt'
spec:
ingressClassName: nginx
tls:
- hosts:
- llm.example.com
secretName: tls-secret
rules:
- host: llm.example.com
http:
paths:
- path: /(.*)
pathType: ImplementationSpecific
backend:
service:
name: ollama
port:
number: 11434
Testing
You can send your request on your terminal using curl:
curl https://llm.example.com/api/generate -d '{
"model": "llama2",
"prompt": "Why is the sky blue?",
"stream": false
}'
Or you can send it through Postman

That’s all! Now you have a running API for your LLM models. You can scale it or optimize it by configuring your Kubernetes resources.
If you would like to install a home Kubernetes cluster, check out my previous post about it.