Very fun read! I’m curious though, when it comes to non-Ruby-specific optimizati...

byroot · on Dec 18, 2024

I somewhat answered that in https://news.ycombinator.com/item?id=42450085

In short, since `ruby/json` ships with Ruby, it has to be compatible with its constraints, which today means plain c99, and no c++. There would also probably be a licensing issue with simdjson (Apache 2), but not sure.

Overall there's a bunch of really nice c++ libraries I'd love to use, like dragonbox, but just can't.

Another thing is that last time I checked, simdjson only provided a parser, the ruby/json gem does both parsing and encoding so it would only help on half the problem space.

meisel · on Dec 18, 2024

yyjson is a very fast C89 compliant C parser that can both parse and generate JSON

mort96 · on Dec 18, 2024

The benefit of a Ruby-specific JSON parser is that the parser can directly output Ruby objects. Generic C JSON parsers generally have their own data model, so instead of just parsing JSON text into Ruby objects you'd be parsing JSON text into an intermediate data structure and then walk that to generate Ruby objects. That'd necessarily use more memory, and it'd probably be slower too unless the parser is way faster.

Same applies to generating JSON: you'd have to first walk the Ruby object graph to build a yyjson JSON tree, then hand that over to yyjson.

meisel · on Dec 18, 2024

All of the would be a big savings in code complexity and a win for reliability, compared to doing new untested optimizations. If memory usage is a concern, I’m sure there’s a fast C SAX parser out there (or maybe one within yyjson)

mort96 · on Dec 18, 2024

I don't understand what you're getting at. If performance is a concern, integrating a different parser written in C isn't desirable, as it would probably be slower than the existing parser for the reasons I mentioned (or at least be severely slowed down by the conversion step), so you need to optimize the Ruby-specific parser. If performance isn't a concern, keeping the old, battle-tested Ruby parser unmodified would surely be better for reliability than trying to integrate yyjson.

meisel · on Dec 18, 2024

Take a look at SAX parsers

akira2501 · on Dec 18, 2024

What I love about this article is it's actual engineering work on an existing code base. It doesn't seek to just replace things or swap libraries in an effort to be marginally faster. It digs into the actual code and seeks to genuinely improve it not only for speed but for efficiency. This simply does not get done enough in modern projects.

I wonder if it was done more regularly would we even end up with libraries like simdjson or oj in the first place? The problem domain simply isn't _that_ hard.

chucke · on Dec 19, 2024

Bear in mind that: the author is part of the ruby core team; json is a standard lib gem; the repo from the json gem was in the original author namespace; the repo had no activity for more than a year, despite several quality MRs.

It took some time to track and get the original author to migrate it to the ruby team namespace.

While I'm glad they to all this trouble, there's only a few who could pull this off. Everyone else would flock to or build a narrative.