/shelf/shared-resources
Evaluation and prompting references
Shortlist for building safer, more measurable prompts.
Tags
- Language Model Evaluation Harness — A practical baseline for comparing model behavior.
- OpenAI Evals — Useful patterns for building custom evals.
- Chain-of-Thought Prompting — A clear framing for multi-step reasoning prompts.