The issue I was having was with the query "term+wikipedia" it then shows the wik...

saltysalt · 2026-01-23T22:13:20 1769206400

It's a difficult problem to fix, you can set an Accept-Language header on crawl requests but his only works if the target website uses "Content Negotiation." Some sites ignore headers and determine language based on the IP address (Geo-IP) or the URL structure (e.g., /es/ vs /en/), basically a mess...

1718627440 · 2026-01-23T22:23:00 1769206980

I don't get the problem you claim. You crawl something and get a document in whatever language the site delivers you. You know the language of that document with the lang=... attribute of the document. What results you show for a given language is under your control and not influenced by what the crawled site chose to serve to the crawler.

saltysalt · 2026-01-25T22:10:24 1769379024

I'm working on the language improvements presently, but I need to clean out a lot of bad entries in my index. In essence what I am trying to say is many servers ignore "Accept-Language" so you have to rely on other means of detecting the language of the page reliably, e.g. inspecting the body content of the response. It's a non-trivial problem online.

1718627440 · 2026-01-25T22:20:10 1769379610

So html lang=... is wrong, or doesn't exist?

> I am trying to say is many servers ignore "Accept-Language"

I wouldn't have expected that to be a hard rule, more like if there are multiple pages to return to have a factor, which one the user most likely wants.