#quality

13 posts tagged with "quality"

Nov 19, 2025

Testing Fine-tuned Model Quality

Generic benchmarks don't predict production quality. Domain-specific evals, regression tests, and A/B testing reveal whether your fine-tuning actually worked.

Nov 15, 2025

How Much Data You Actually Need

1,000 high-quality examples often outperforms 100,000 noisy ones. Data quality dominates quantity for fine-tuning. Curation is the work.

Oct 8, 2025

Why 128K Context Doesn't Mean 128K Useful

Models advertise 128K context windows. But attention quality degrades with distance. The last 10% of context often contributes less than the first 10%.

Sep 20, 2025

Mapping Quality Against Cost

Every configuration lives on a quality-cost curve. Some are on the efficient frontier, most aren't. Map the frontier, then choose your spot deliberately.

Sep 17, 2025

How to Catch Quality Regressions

Quality regressions are silent killers. Users notice before your metrics do. Automated regression detection catches drops before they become incidents.

Sep 13, 2025

When to Use LLM-as-Judge

LLM judges excel at subjective quality. They fail at factual correctness. Knowing when each applies determines whether your evals are useful or misleading.

Sep 10, 2025

Treating Evals as Non-Negotiable Constraints

If your optimization breaks an eval, the optimization is wrong. Evals are invariants, not suggestions. Ship nothing that fails them.

Sep 3, 2025

Building Evals That Catch Real Problems

Bad evals give false confidence. Good evals predict production failures. The difference is designing for the problems users actually hit.

Aug 30, 2025

Evaluating Millions of LLM Responses

Human review doesn't scale. At 10M responses per day, you're sampling 0.001%. Automated evals are the only path to quality at scale.

Jul 16, 2025

Testing Quality After Quantization

Eval suites catch problems benchmarks miss. Here's how to build testing that prevents quantization regressions from reaching users.