They don't mention BM25, which still outperforms much of semantic search. A fun exercise is to watch the benchmarks of the latest semantic embeddings models and see that they still struggle to match good 'ol BM25.
BM25 uses the relative statistical frequency of words to identify relevant material, along with some adjustments. It doesn't use ML at all, but it works very well, especially for technical content.
SPLADE is capable for some areas but is slow, and often times it doesn't present much of a benefit (or is worse) versus BM25 for technical searches, where specific technical words don't have many synonyms that it would be able to pull.
The best search systems today use a mix of semantic search and BM25 or SPLADE, depending on the type of material and the speed required.
I've had pretty good success with BM25 + stemming, or even easier, BM25 with trigram tokenization. If the index isn't too big, the whole search can be done client-side and is lightning fast.
Isn't the big problem that BM25 (and friends) will help you find (and rank) exact search terms (or stemmed varieties of that search term), whereas semantic search can typically find items out-of-dictionary but "close" semantically? SPLADE, on my reading of it, seems to do a "pre-materialization" of the out-of-dictionary part.
It's various measures of recall rate. Recall@500 means what percentage of the time does the target document show up in the top 500 results from the retrieval system.
I found BM25 and everything resembling it (like TF/IDF) to be near useless. It was (back in the day) really necessary to use external semantic info, or at least data gathered by examining the whole document set for stuff going beyond term frequency. I was excited by the first part of the SPLADE article because I thought it was going to use LLM's to somehow find concept embeddings in documents and let you search for those. But as someone said, it turns out to be a version of synonym search except the thesaurus is generated automatically. I remember someone did that with Word2Vec some years back and it was sort of useful, but generally the problem with search systems is too many results rather than missing some that are relevant.
BM25 uses the relative statistical frequency of words to identify relevant material, along with some adjustments. It doesn't use ML at all, but it works very well, especially for technical content.
SPLADE is capable for some areas but is slow, and often times it doesn't present much of a benefit (or is worse) versus BM25 for technical searches, where specific technical words don't have many synonyms that it would be able to pull.
The best search systems today use a mix of semantic search and BM25 or SPLADE, depending on the type of material and the speed required.