Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Agents respecting robots.txt is clearly going to end soon. Users will be installing browser extensions or full browsers that run the actions on their local computer with the user's own cookie jar, IP address, etc.


I hope agents.txt becomes standard and websites actually start to build agent-specific interfaces (or just have API docs in their agent.txt). In my mind it's different from "robots" which is meant to apply rules to broad web-scraping tools.


I hope they don't build agent-specific interfaces. I want my agent to have the same interface I do. And even more importantly, I want to have the same interface my agent does. It would be a bad future if the capabilities of human and agent interfaces drift apart and certain things are only possible to do in the agent interface.


I think the word you're looking for is Apartheid, and I think you're right.


I wonder how many people will think they are being clever by using the Playwright MCP or browser extensions to bypass robots.txt on the sites blocking the direct use of ChatGPT Agent and will end up with their primary Google/LinkedIn/whatever accounts blocked for robotic activity.


I don't know how others are using it, but when I ask Claude to use playwright, it's for ad-hoc tasks which look nothing like old school scraping, and I don't see why it should bother anyone.


Claude doesn't look at advertisements.


I'm surprised older OpenAI agents respected robots.txt.

Expecting AI agents to respect robots.txt is like expecting browser extensions like uBlock Origins to respect "please-dont-adblock.txt".

Of course it's going to be ignored, because it's an unreasonable request, it's hard to detect, and the user agent works for the user, not the webmaster.

Assuming the agent is not requesting pages at an overly fast speed, of course. In that case, feel free to 429.

Q: but what about botnets-

I'm replying in the context of "Users will be installing browser extensions or full browsers that run the actions on their local computer with the user's own cookie jar, IP address, etc."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: