#caching

caching shows up across 5 section(s) and 5 page(s) in this workspace. Use this page as a topic map, not just an archive.

Start here

If you are new to this topic, begin with the strongest entry points first, then move into related notes and supporting material.

How to reduce latency and cost in LLM applications by caching semantically equivalent queries using vector similarity.

A reflective look at debugging a user-facing issue where a semantic cache returned a stale context block due to a loose similarity threshold.

Best practices for defining caching policies, setting TTLs, and scoping namespaces in similarity-based cache structures.