In general yes, bench mark pollution is a big problem and why only dynamic bench...

brookst · on Feb 28, 2025

This is true, but how would pollution work for a benchmark designed to test hallucinations?

llm_trw · on Feb 28, 2025

A dataset of labelled answers that are hallucinations and not hallucinations are published based on the benchmark as part of a paper.

People _seriously_ underestimate just how much stuff is online and how much impact it can have on training.