Research

Notes from the field. What we're learning building production AI systems.

Why your RAG pipeline retrieves the wrong documents

Evaluation-driven development for LLM applications

The hidden cost of prompt chains: a latency audit

Fine-tuning vs. few-shot: when each approach wins

Building agent frameworks that fail gracefully