I remember ~10 years ago that Google said 40% of searches were unique. Just sear...

RadiozRadioz · on June 20, 2023

Mullvad does have the happenstantial advantage that its userbase likely nowhere near as diverse as Google's, naturally following that the queries themselves are not as diverse. While Google fields requests across the full diversity of the globe, Mullvad's userbase likely skews toward middle-high income westerners with a STEM background searching in English. The types of queries these users are making are probably from a much narrower corpus of topics; I wonder what percentage of the queries revolve around privacy, Linux, software, typical hacker hobbies like woodworking, et cetera. This isn't to say that these are the only types of queries being made, but if you were to group Mullvad users into equivalently broad advertising cohorts, you'd probably end up with far fewer than Google's users.

The interests being more heterogeneous results in more similar queries, which would increase the proportion of cache hits. Whether this is enough to help make the strategy viable is another matter, but I do think it's worth noting.

I also wonder about the complexity of the queries themselves. The more technical users would probably use more complex combinations of operators, but they're also more likely to search by keyword rather than natural language.

KRAKRISMOTT · on June 20, 2023

But people who actively use VPNs are not necessary those with a search history that follows a short tail distribution. Mullvad gets a good chunk of its revenue from Firefox and other white labels too.

bentcorner · on June 20, 2023

As far as I can tell there's no predictive search. UI is a simple search box, optional country selector dropdown and an "Only search in cache" checkbox. Smoke test shows the cache checkbox works - apparently nobody else has searched for "dog" in the US.

The country dropdown is interesting as far as the cache goes - not selecting a country is meaningful as far as the cache is concerned. My prior "dog" query in the US does not return hits if I don't select a country. Not selecting a country and searching the cache appears to return english results (with a few sample searches).

It's interesting that you can explore the cache with this checkbox. Not sure if there are any privacy concerns with this feature - considering cache searches are "free" you can kind of scrape what other users are searching for, maybe with enough users it doesn't really matter. I suppose there could be rate limiting and such to prevent this kind of attack, but that's just a guess.

It may be useful to have an option to opt-out your search from cache.

pnt12 · on June 20, 2023

Good question: I see one way it may work and another it may not.

I think the profile of their users is less diversified: mostly tech savvy people. "Normies" are using those vpns advertised in YouTube, or not using any at all. This may result in similar interests and lower the number of unique queries.

On the other hand, we may produce more unique queries than other people: who will receive-use the cached "how to fix ValueError on main.py:67"?

flas9sd · on June 20, 2023

a statistic I'd be interested in: what percentage of searches can be answered computationally cheap. As in: Wikipedia title index, simple word lookup dictionaries. Indices that could complement a caching search-engine proxy to not hit its origin crawl repository.

A study[1] by wikipedia done with DDG notes it showed up in the top5 results and information module for ~13% of searches with a click-through rate for each at ~8% - so a total of ~16% click-through rate. Granted, that is not a number gained from title searches but the whole articles.

[1]: https://diff.wikimedia.org/2021/09/23/searching-for-wikipedi...

zjnevnf · on June 20, 2023

There's a checkbox to only search in cached results, although in my experience so far it had no results except for very generic searches like "google". Even "python" didn't show anything.