Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder what would be wrong with changing the recursive function to take a depth and then bailing out if the depth is too big.


Two things come to mind: - Maximum stack depth greatly varies between targets. If you wanted to support a recursion depth of, say, 50, that may already be too much for some of the hardware this library needs to be able to run on, so the original problem still exists. - This would be add an arbitrary limit to the parsing capabilities of the library. Stack space is very limited to begin with compared to heap space, So any solution that remains recursive would probably require a limit way lower than a heap-based solution could offer.


Or just take the limit as an argument so developers can adjust it based on the platform. Python also has a recursion limit but you can change it if you need to.


Depth is easy to miss when the parser calls a lot of functions that call each other and that have vastly different stack usage.

A more reliable check is to compare an address of a thing on the stack at api boundary with the current address of things on the stack. SpiderMonkey, the JS engine in Firefox, used something like that in past to stop run-away recursion or at least to switch to slower algorithms with bounded stack usage.


That idea works in general but causes false positives: No artificial limit you pick is "right" and the false positives can be avoided by getting rid of the recursion altogether.

PS: It's not one single function, not direct but indirect recursion.


Sure if it's indirect I agree it will get messy fast with a dozen functions suddenly needing to handle an additional parameter, but unrelated to that... I'd really like to know who needs recursion for this that's deeper than 3 or 4 levels. What's the use case? Such xml surely would be unreadable and unwritable to humans, but if it's used as some form of exchange format between systems, what would that be? How would it end up with such deeply nested entities? It sounds like something you deliberately implement that way to show how "smart" you are, but not "hey that seems the reasonable thing to do here".

This makes me wonder: does any of the popular xml libs have a sort of safe mode, where custom entities and similar features are disabled, those schema urls ignored, and namespaces just flattened (and whatever else I forgot or don't even know about)? You know for when I know I only need to parse simple xml files that should contain a couple plain tags and attributes, and want to reduce attack surface.


There are parsers that only implement a tiny subset of XML. And Expat has compile time flags to disable some of that machinery where not needed. It's arguably no longer XML then though.


You can think of this as having two base cases, your normal success case and an error case. You could use metaprogramming to do this semi-automatically, like a decorator that took a maximum depth and checked it before calling your inner function.


It's not ambitious enough. This 'solution' would be someone else's problem to be avoided at all cost.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: