Always happy to answer questions! The code is mostly Node.js, with a lot of shell scripts to glue things together. The “background worker” is mostly me running things in tmux, though I do (ab)use GitLab CI for some scheduled tasks. The main full-text index is currently ElasticSearch (as I mention elsewhere in this thread, I’m not a fan of it); various other data in the ingestion process is stored a combination of JSON-Lines files, SQLite, and bespoke binary formats as needed. Because I’m squeezing this into the hardware I have, the details are generally dictated by performance constraints for the particular problem at hand.
No plans to open-source it at the moment; that implies a level of stewardship that I don’t have the energy for at the moment, and also some of the code is kind of tied to my specific server right now.
Could i ask you a question? What is your tech stack? (programing language, background worker, database) How often does the index updates?
Are you planning to make it open source?