I’ve spent some time in enterprise TFO/demo engineering, and this kind of generative tool would’ve been a game changer. When it comes to synthetic data, the challenge lies at the sweet spot of being both "super tough" and in high business need. When you're working with customer data, it’s pretty risky—just anonymizing PII doesn’t cut it. You’ve got to create data that’s far enough removed from the original to really stay in the clear. But even if you can do it once, AI tools often need thousands of data rows to make the demo worthwhile. Without that volume, the visualizations fall flat, and the demo doesn’t have any impact.
I found challenge with LLMs isn’t generating a "real enough" data point—that’s doable. It’s about, "How do I load this in?", then, "How do I generate hundreds of these?" And even beyond that, "How do I make these pseudo-random in a way that tells a coherent story with the graphs?" It always feels like you’re right on the edge, but getting it to work reliably in the way you need is harder than it looks.
Yup agreed. We built an orchestration engine into Neosync for that reason. Can handles all of the reading/writing from DBs for you. Also can generate data from scratch (using LLMs or not).
I found challenge with LLMs isn’t generating a "real enough" data point—that’s doable. It’s about, "How do I load this in?", then, "How do I generate hundreds of these?" And even beyond that, "How do I make these pseudo-random in a way that tells a coherent story with the graphs?" It always feels like you’re right on the edge, but getting it to work reliably in the way you need is harder than it looks.