Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm talking about an "at scale" point of view. A 90-95% success rate would be at worst 9:1 odds and at best 19:1. That means your best case scenario would be a likely failure on every 20th inference. This may be passable on an adhoc individual basis, but why do this when a deterministic solution can achieve beyond 99%, with proper error handling? Data normalization tools are rigid as a feature - LLMs are not, even at a temp of 0


Thanks for the info z3c0. Have you implemented open source solutions? Do you mind sharing some deterministic solutions to detect entities without huge dictionaries (NER)? Or maybe extractive QA if you used that instead?


Deterministic? Not anything that is "one size fits all". If your documents are of an unpredictable shape, then ML is your best bet. For both NER and QA, BERT models are very capable.

To your first question, most of my ML work is proprietary, unfortunately. I am hoping to change that in the near future.

Edit: spaCy is a great library for NER, if you're hoping for an open-source solution that can have you hitting the ground quickly. SQuAD for QA.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: