Shelf • local-experiments

Ollama on M1: thermal baseline

A baseline run to learn where M1 throttles and why it matters.

Key takeaways

Local performance is a temperature story, not just a model story.

Baselines should be short, repeatable, and recorded.

Throttling hides inside long runs and ruins comparisons.

I wanted a fair baseline for local models on an M1 machine. The surprise was not raw speed. It was how quickly a clean run drifted once the system warmed up.

Architecture map

The system is simple: model runtime, the host machine, and the environment around it. The host is the real boundary.

The host boundary decides how stable your baseline really is.

What happened

Short runs were fast. Long runs slowed down. Once I logged temperatures, the pattern was obvious: the system throttled after a few minutes, and the numbers became incomparable.

The two gates

The first gate is time. If you test too long, you stop testing the model and start testing heat dissipation.

The second gate is consistency. Without a fixed prompt and a fixed run length, the baseline drifts even when nothing else changes.

Setup walkthrough

Pick one model and one prompt.
Run the prompt in short bursts to avoid thermal drift.
Log run time and system temperature together.

First-time config

export OLLAMA_HOST="http://127.0.0.1:11434"
export OLLAMA_MODEL="llama3.1:8b"

Quick checks

ollama run llama3.1:8b "ping"

Failure modes

Long runs hide throttling until you compare a later test.
Background apps push the host into a hotter state.
Changing the prompt changes the performance profile.

What made the difference

I treated the baseline as a short, repeatable ritual and logged the host temperature alongside the timing. That made later comparisons honest.

What I would do next time

I would automate a short benchmark loop, record system temperature per run, and stop the test before the host warms up.

#experiments#local-llm#ollama#performance

← Back to Shelf