Suppose a Google competitor emerged that produced wonderful results in every category. Within a week, tens of thousands of SEO specialists will be on the case, reverse-engineering the magic and figuring out how to get their crappy sites back to the top of the rankings. The wonderful results would quickly degrade, and it's unclear as to whether this new small company would have the resources to keep up.
You can unfortunately see this with DuckDuckGo: people are clearly targeting more than just Google. Python results, for example, are so infested by SEO-specialist spam that searching for standard library functions will return spam above the actual standard library documentation, particularly for more popular functions. Searching for "python datetime.now" or "python json.loads" will both return spam above documentation. This problem heavily impacts anything 'data scientist' spammers see as important as well; it's actually worse on DDG than Google.
What's frustrating is that these often seem to be a handful of domains, like 'geeksforgeeks' and 'towardsdatascience'; for a while, of course, there was also the "gitmemory" spammer who seemed to be able to push out Github results on both DDG and Google. Yet I think Google removed reporting and blacklisting domains from searches long ago, and I think DDG never had it, leaving the only option for removing them client-side scripts and extensions that work poorly. Likewise, no search engine appears to be manually blacklisting them. Yet as you point out, if one did, then the spammers would probably just move to using many domains, which would probably be worse.
Domain age & quantity/breadth of content should be taken into account in ranking.
A fresh domain that suddenly has a ton of content should be viewed with suspicion and downranked as it's likely a spammer (copying GitHub/Stackoverflow/official docs).
Legitimate sites that are starting out shouldn't be affected as they are unlikely to have a ton of content from the start.
Of course, this isn't perfect, but it should take care of the majority of spam copycat sites.
You think all the existing search engineers haven't thought of using the domain age for the signal and tried it yet? And if you did, you think SEO people wouldn't figure it out and take advantage of it? This cycle of adversarial game on this particular signal has already gone through the full cycle - https://www.searchenginejournal.com/ranking-factors/domain-a...
And what makes you think search engines don't take quantity and breadth of content ? And, why do you think SEO won't be able to take advantage of that ?
Most people commenting and lamenting about search quality need to understand how much of search quality is an adversarial game, that neither side (search engine vs SEO people who want to manipulate the result in their favor) can win decisively forever. And generally it's not for the lack of trying the quality hasn't improved as much - it's just some areas it's more difficult to make progress due to more money spent on making it difficult for the search engines.
What I am most disappointed about Google and search engines in general isn't so much about ranking simplistic things like generic recipe or product reviews and stuff like that - most recipes for general items are all similar, and product review is inherently commercial that reviews always have been biases (short of using Consumer Reports, almost all magazine review articles are biases and influenced by manufacturers). What I am most disappointed is the lack of improvements in UI and retrieval/recall. Interactive UI that can narrow down search space iteratively and interactively so that I can describe what I want in more precise way is lacking. Where is the virtual librarian that I can talk to to narrow down what I want in a step by step way ? Instead of a single search query being the only input, why shouldn't search engine ask for clarifications and let users more precisely describe what they want ? Human languages and human brain just don't produce a single phrase to describe what we want - it takes sentences and questioning back and forth to determine what we want, and we often need more input to realize and describe what we actually want. And all search engines have failed to provide such an interface. And my bet is that whoever can crack that would beat all other search engines, since users can give better, higher quality input for the search to find.
I have to wonder just what is the justification for Google to not let users decide what sites they never want to see results from? I'm sure it can't be storage or compute limitations. This one feature alone would make so many people (me included) happier to use the engine.
SEO isn’t magic. Google decided to deprioritize Wikipedia for example which noticeably degraded results. Where major websites show up has nothing to do with SEO and simply relates to what they think is important.
Somehow humans are very good at telling apart SEO crap from legitimate content even without understanding the content or the language itself - SEO crap has some common elements such as ads, affiliate links, a certain page layout, etc.
I remember using an open-source ML model (trained on Buzzfeed article titles) to detect YouTube clickbait based on titles and it worked brilliantly, and that was just downloading some code on GitHub and running it as-is. I'm sure the same could be applied to search results and you could achieve much better quality if you actually put some effort into it.
I very much doubt this is some kind of hard problem as opposed to Google just giving up because their business model doesn't actually incentivize good search results.
This isn't a new problem and I'm pretty sure Google has countermeasures for that, and even if they didn't, it doesn't look like an unsolvable problem - automation can help but having a "report" feature on the search results page or literally paying real people (using real browsers) to review results can work and is virtually bulletproof.
It is all about incentives isn’t it? Google gave all the power to these SEO websites by making it difficult to get your content listed as a search result on the first page.
Google could start incentivizing high quality unique content and websites from domain experts, but they have decided they can’t make as much money off of independent publishers as they can from marketing/content farms.
I read a blog post mentioning how Google artificially boosted Wikipedia in search results years ago. So movement up or down is a result manipulation on their part.
> Within a week, tens of thousands of SEO specialists will be on the case, reverse-engineering the magic and figuring out how to get their crappy sites back to the top of the rankings.
An algorithm could punish things inherent to SEO optimized sites (ads and tracking). Removing the ability to passively generate money without providing useful content is the key.
Of the top of my head, a system like the following would be a pretty good start:
Scale of 0 to 100 (closer to 100 being higher up on the results).
You are penalized for the following:
- subtract 10 points for any ad, using ublock origin's list as a good starting point (this stacks, it's 10 points off for each link)
- subtract 100 points for google tag manager
- subtract 100 points for the facebook like button
- ... etc for each of the major tracking scripts
This would obviously need to be updated as ad-tech evolves, but it would cut out 90% of the current SEO spam.
Can google do this? No, they have a conflict of interest around placing ads. Somebody else, however, can absolutely do this.
> Within a week, tens of thousands of SEO specialists will be on the case, reverse-engineering the magic and figuring out how to get their crappy sites back to the top of the rankings.
The Google competitor will have a human nuke their entire domain or business (based on a manual index of banned products/brands) from the search results forever, or have by default a bias against ad-filled websites which would remove any commercial incentive for those websites to exist in the first place.
That kind of manual intervention doesn't scale though. The only way it can work is to have community-curated lists of bad domains, similar to adblock lists, that users can upload to personalize their search result.
Somehow a distributed community of unpaid volunteers manages to keep the entire advertising industry (where billions are at stake) at bay by curating adblock lists. I'm sure a company can achieve the same. It will never be 100% perfect, but it will surely be better than what we have now.
But yes, supporting community-supplied adblock-style lists would be a start, and Google isn't even doing that.
I wonder if there's a project like sponsorblock, for google search instead of youtube. Basically a centralized community-driven database that blacklists certain urls (timestamps for yt) based on submission & vote. It would be much less of a clear cut than youtube's case, though.
I personally just append `wiki` or `reddit` to queries. Crappy but kinda works.
It doesn't have to be community based. The search engine could be paid and employ actual people to sift through the garbage and vet domains/brands/etc before they are added to the index.
The problem with Google is that their business model is to show you ads (either on their own website or third-party websites embedding Google ads/analytics) and not to provide you quality search results, therefore they have no incentive to combat even the most obvious SEO spam (see Pinterest & image search for an example).
That was the original Yahoo business model. It doesn't scale. You can't hire enough human beings to curate the entire web, and if you try to automate it, then the SEO spammers can game the automation.
You can enforce penalties (bans for the entire brand/domain/etc) to deter gaming the system, which should take some pressure off the humans which can then focus on the top issues reported by end-users.
It won't be perfect, but at least Pinterest wouldn't be polluting image search for years for example.
This will block sites like OutdoorGearLab which has great content and trustworthy reviews. I think doing a check on the copy on the SEO website and comparing it to other websites might do the trick though.
I wasn't talking about domain-specific search engines. I was trying to say that if we had multiple competing search engines, with different ranking algorithms, then gaming the algorithm would be less effective.
The efficacy of SEO is largely dependent on a search monoculture. They only need to optimize for one set of unknown rules, and that's something that is relatively easy to do well with simple machine learning tools.
Real competition in search is probably the best way of reigning in the SEO sector.
What if the search index was much more manually curated? For example, say that you could create a custom search index relevant to your field of expertise, and that other users would rank their results to let the engine know which indices are actually good and for which types of queries. You could still game it, but probably not with traditional SEO techniques.
> What if the search index was much more manually curated?
Google would need to not have a monopoly if they wanted to do that. Otherwise, they would be accused of anti-competitive practices (since their policies would be aligned with their policies in other services, they would rarely ban themselves).
One of the problems here is the effective monoculture.
If there were multiple search engines in use with different ranking algorithms and different search strategies, then "SEO" would be difficult to impossible to achieve because optimising would one search engine would likely be a pessimisation for the others.
We don't just need one competitor, we need many. And they can't just be copycats, they need to do their own thing, and that might well make the cost:benefit of SEO unviable.
i remember a time when teacher would say form a line, and there would be a chaos of who was getting there first.
This was met with a second instruction, the first five people in line, move to the back of the line, and last five in line move to the front.
it became a trial and error, with the usual kids jostling for position now wanting to be, at the back of the line, then in the middle of the line, hunting for the condition that creates pole position.
the idea that it was the personalities and the value sets, not the position on the line, that triggered the condition, seemingly was too abstract to be deduced. keep in mind this was grade 2.
so may the browser filter the first to the nth SEO spammers in the query results be sent somewhere away from the front of the line until a match for query terms occurs.