Also very cute of you to assume that llms are still being trained on websites.
Or that the crib of software (california) with elite engineers (openai comp averages 900k/yr) needs help with a task that indians can do for 3 bucks an hour (web scraping)
Might have conflated you for op. Or defender of op's tool.
1. Creating a tool based on helping llms train on website implies that: llms have a problem with training on websites (even though html is designed for easy machine parsing of content) and second that llms are still crawling and have not moved on to other harder sources of data.
2. I am challenging those raison d'etre assumptions on the tool. Questioning not only the tool and its usefulness, but its creator's understanding of the state of llm development.
> Creating a tool based on helping llms train on website
What are you talking about? The tool has basically nothing to do with websites, other than it is assumed the author of the document will provide it to the user via their website and that the user will know to find it there. Technically speaking, the user could, instead, request the document from the author over email, fax, or even a letter delivered by hand. But HTTP is more convenient for a number of reasons.
> llms have a problem with training on websites
If you mean LLMs have a problem with keeping up with current events, yes, that is essentially the problem this is intended to solve. It offers a document you can inject into your prompt (think RAG) that provides current information that an LLM is probably not up-to-date with – that it can use to gain knowledge about information that may not have even existed a minute ago.
You could go to the regular HTML website and copy/paste the content out of page after page after page to much the same effect, but consolidating it all into one place, with an added bonus of being without any extraneous information that might eat up tokens, to copy/paste once makes it easier for the user.
> Questioning not only the tool and its usefulness
Its usefulness is worth questioning. It very well may not be useful, and the author who proposed this even admits it may not be useful – putting it out there merely to test the waters to see if anyone finds it to be. But your questions are a long way away from being relevant to the tool and how it might potentially be useful.
Or that the crib of software (california) with elite engineers (openai comp averages 900k/yr) needs help with a task that indians can do for 3 bucks an hour (web scraping)