Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Impressive demo!

However, the hardware requirements and cost make this inaccessible for anyone but large companies. When do you envision that the price could be affordable for hobbyists?

Also, while the CNN Vapi demo was impressive as well, a few weeks ago here[1] someone shared https://smarterchild.chat/. That also has _very_ low audio latency, making natural conversation possible. From that discussion it seems that https://www.sindarin.tech/ is behind it. Do we know if they use Groq LPUs or something else?

I think that once you reach ~50 t/s, real-time interaction is possible. Anything higher than that is useful for generating large volumes of data quickly, but there are diminishing returns as it's far beyond what humans can process. Maybe such speeds would be useful for AI-AI communication, transferring knowledge/context, etc.

So an LPU product that's only focused on AI-human interaction could have much lower capabilities, and thus much lower cost, no?

[1]: https://news.ycombinator.com/item?id=39180237



> However, the hardware requirements and cost make this inaccessible for anyone but large companies. When do you envision that the price could be affordable for hobbyists?

For API access to our tokens as a service we guarantee to beat any other provider on cost per token (see https://wow.groq.com). In terms of selling hardware, we're focused on selling whole systems, and they're only really suitable for corporations or research institutions.


Do you have any data on how many more tokens I would use with the increased speed?

In the demo alone I just used way more tokens than I normally would testing an LLM since it was so amazingly fast.


Interesting question! Hopefully being faster is so much more useful to you that you use a lot more :)


How open is your early access? i.e. likelihood to get API access granted right now


We are absolutely slammed with requests right now, so I don't know, sorry.


>>50 t/s is absolutely necessary for real-time interaction with AI systems. Most of the LLM's output will be internal monologue and planning, performing RAG and summarization, etc, with only the final output being communicated to you. Imagine a blazingly fast GPT-5 that goes through multiple cycles of planning out how to answer you, searching the web, writing book reports, debating itself, distilling what it finds, critiquing and rewriting its answer, all while you blink a few times.


Given the size of the Sindarin team (3 AFAICT), that mostly looks like a clever combination of existing tech. There are some speech APIs that offer word-by-word realtime transcription (Google has one), assuming most of the special sauce is very well thought out pipelining between speech recognition->LLM->TTS

(not to denigrate their awesome achievement, I would not be interested if I were not curious about how to reproduce their result!)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: