Determining whether the latest off the shelf LLMs are good enough should be stra...

sandworm101 · on Jan 18, 2025

This is not an LLM problem. It was solved years ago via OCR. Worldwide, postal services long ago deployed OCR to read handwitten addresses. And there was an entire industry of OCR-based data entry services, much of it translating the chicken scratch of doctor's handwiting on medical forms, long before LLMs were a thing.

prng2021 · on Jan 18, 2025

It was never “solved” unless you can point me to OCR software that is 100% accurate. You can take 5 seconds to google “ocr with llm” and find tons of articles explaining how LLMs can enhance OCR. Here’s an example:

https://trustdecision.com/resources/blog/revolutionizing-ocr...

sandworm101 · on Jan 18, 2025

By that standard, no problem has ever been solved by anyone. I prefer to believe that a great many everyday tech issues were in fact tackled and solved in the past by people who had never even heard of LLMs. So too many things were done in finance long before blockchains solved everything for us.

asveikau · on Jan 18, 2025

OCR is very bad.

As an example look at subtitle rips for DVD and Blu-ray. The discs store them as images of rendered computer text. A popular format for rippers is SRT, where it will be stored as utf-8 and rendered by the player. So when you rip subtitles, there's an OCR step.

These are computer rendered text in a small handful of fonts. And decent OCR still chokes on it often.

prng2021 · on Jan 18, 2025

From the article I linked:

“Our internal tests reveal a leap in accuracy from 98.97% to 99.56%, while customer test sets have shown an increase from 95.61% to 98.02%. In some cases where the document photos are unclear or poorly formatted, the accuracy could be improved by over 20% to 30%.”

flir · on Jan 18, 2025

In my experience the chatbots have bumped transcription accuracy quite a bit. (Of course, it's possible I just don't have access to the best-in-class OCR software I should be comparing against).

(I always go over the transcript by hand, but I'd have to do that with OCR anyway).

VWWHFSfQ · on Jan 18, 2025

OCR is not perfect. And therefore it is not "solved".

Dylan16807 · on Jan 18, 2025

That definition, solved=perfect, is not what sandworm meant and it's an irrelevant definition to this conversation because it's an impossible standard.

Insisting we switch to that definition is just being unproductive and unhelpful. And it's pure semantics because you know what they meant.

philipwhiuk · on Jan 18, 2025

Not really, because this entire post is about that last fraction of a %.

Dylan16807 · on Jan 18, 2025

It's not, because then they wouldn't want humans, because humans can't do 100% either.

jadamson · on Jan 18, 2025

That's only true if the x% humans can't do is the same x% that OCR can't do.

Dylan16807 · on Jan 19, 2025

I know it matters what percent humans can do. But specifically "that last fraction of a percent" is in comparison to 100, not to humans. The argument I was replying to was about perfection, and rejecting anything short of it. Comparing to humans is a much better idea, and removes the entire argument of "OCR literally can't be that good so the problem isn't solved".

tjwebbnorfolk · on Jan 18, 2025

point me to handwriting that is 100% legible...

If 100% is your standard, good luck solving anything ever.

elmomle · on Jan 18, 2025

Most handwriting is legible to its owner. This would indicate that there is enough consistency within a person's writing style to differentiate letters, etc., even if certain assumptions about resemblance to any standard may not hold. I wonder if there are modern OCR methods that incorporate old code-breaking techniques like frequency analysis.

Boldened15 · on Jan 19, 2025

> Most handwriting is legible to its owner.

Not necessarily, I'd be surprised if I could fully understand my old handwritten notes from when I was in school (years ago), since I've always had messy handwriting and no longer have the context in each subject matter to guess.

LLMs could help in some of those cases, since it would have knowledge of history/chemistry/etc. and could fill in the blanks better than I could at this point. Though the hallucinations would no doubt outweigh it.

manquer · on Jan 18, 2025

I think OP is saying there is always scope for improvement until it is 100% not that 100% or bust .

lukeschlather · on Jan 18, 2025

LLMs improve significantly on state of the art OCR. LLMs can do contextual analysis. If I were transcribing these by hand, I would probably feed them through OCR + an LLM, then ask an LLM to compare my transcription to its transcription and comment on any discrepancies. I wouldn't be surprised if I offered minimal improvement over just having the LLM do it though.

sandworm101 · on Jan 18, 2025

Why assume that OCR does not involve context? OCR systems regularly use context. It doesnt require an LLM for a machine reading medical forms to generate and use a list of the hundred most common drugs appearing in a paticular place on a specific form. And an OCR reading envelopes can be directed to prefer numbers or letters depending on what it expects.

Even if LLMs can push a 99.9% accuracy to 99.99, at least an OCR-based system can be audited. Ask an OCR vendor why the machine confused "Vancouver WA" and "Vancouver CA" and one can get a solid answer based in repeated testing. Ask an LLM vendor why and, at best, you'll get a shrug and some line citing how much better they were in all the other situations.

iterance · on Jan 18, 2025

Are you guessing, or are there results somewhere that demonstrate how LLMs improve OCR in practical applications?

Modified3019 · on Jan 18, 2025

Someone linked this above

https://trustdecision.com/resources/blog/revolutionizing-ocr...

> Our internal tests reveal a leap in accuracy from 98.97% to 99.56%, while customer test sets have shown an increase from 95.61% to 98.02%. In some cases where the document photos are unclear or poorly formatted, the accuracy could be improved by over 20% to 30%.

While a small percentage increase, when applied to massive amounts of text it’s a big deal.

imtringued · on Jan 19, 2025

It's not a small percentage. The moment you OCR a book, you'll end up with hundreds to thousands of errors.

dambi0 · on Jan 18, 2025

For the addresses it might be a bit easier because they are a lot more structured and in theory and the vocabulary is a lot more limited. I’m less sure about medical notes although I’d suspect that there are fairly common things they are likely to say.

Looking at the (admittedly single) example from the National Archives seems a bit more open than perhaps the other two examples. It’s not impossible thst LLMs could help with this

WillAdams · on Jan 18, 2025

Yes, but there was usually a fall-back mechanism where an unrecognized address would be shown on a screen to an employee who would type it so that it could then be inkjetted with a barcode.

iandanforth · on Jan 18, 2025

Fun fact, convolutional neural networks developed by Yann LeCunn were instrumental in that roll out!

pinoy420 · on Jan 18, 2025

Agree. Sounds like not wanting to let go of a legacy