Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is the worst kind of internet rant. It uses all kinds of elaborate similes to make the author (and presumably the sympathetic reader) feel smug and superior, but have very little technical content to justify it.

A reasoned criticism of XML could note that XML is a quite well-designed syntax for its intended purpose (domain-specific structured document formats for interchange on the internet), but that it has grown to be used outside of this niche for purposes like (non-document) data interchange, RPC's, configuration files and so on, where the advantages of the XML syntax for its intended domain turns into disadvantages.

For example, the distinction between elements and attributes is very useful when marking up documents, like in (X)HTML:

    <a href="http://harmful-cat">A <em>wonderful</em> rant</a>
However, if you want to markup a data record, which is not intended as a readable document, like:

    name: Justin
    address: Copenhagen
    phone: (12)34-56
Then the distinction between elements, attributes and content just becomes superfluous, and the XML syntax needlessly verbose. This kind of data is much clearer marked up with YAML, JSON or s-expressions.

On the other hand, the link-markup above would become pretty convoluted and error prone to write using any of these formats.

The "verbose" end-tags like </p>, </body> are very helpful syntax when manually editing large documents (which may have deeply nested structures spanning several screenfulls). However for simple and compact data structures a simple ")" (or even "}") is easier and clearer. Of course, if the content is never edited by hand anyway it doesn't make any difference, and you might as well chose the format that is simplest to parse (or in the very rare circumstance where bandwidth is the bottleneck, you could choose the format with highest content to markup ratio).

So if XML-syntax is better for structured documents, YAML for configuration files, and s-expressions for data-structures and code, which format is "best" in general? Should you always choose the optimal format, or does it make sense to chose the same format everywhere for consistency? For example a Lisp-based system might choose to use s-expressions for documentation even if it is a pain to edit, and conversely an XML-based publishing system might choose XML for configuration also, even if YAML would be easier to edit. This is just trade-off decisions.

But reasoned trade-off decisions are not glamorous and don't make you into an internet hero. If you want to be an internet hero you should write rants that provide the reader a conceptual framework which allows the reader to feel smart and superior. In this case, technical details detract from the purpose, since an informed reader might disagree with technical details, which might undermine the ego-boost the reader is supposed to feel.

But Erik goes beyond the common smugness, and introduces the concept of the stupid, moronic (XML-using?) masses which somehow reigns over and suppresses the few intelligent (presumably s-expression using) persons. Thereby Erik tabs into the deeply rooted insecurities (and consequently delusions of grandeur coupled with persecution complex) of many socially-challenged geeks.



'The "verbose" end-tags like </p>, </body> are very helpful syntax when manually editing large documents (which may have deeply nested structures spanning several screenfulls). However for simple and compact data structures a simple ")" (or even "}") is easier and clearer.'

The verbose end tags also make it easier the write consistent robust parsers. One complaint about SGML was that it hard to find a tool that correctly implemented the entire spec. The XML spec is 11 pages.

XML came from a desire to have SGML on the Web. As you've pointed out, people have used XML were it likely didn't belong.

To be fair, though, once the world had a choice of decent XML parsers and tools it made sense to use XML for many things, even where the syntax itself was less than ideal for the given task. The proliferation of JSON parsers will likely fix a lot of this abuse moving forward.

Still, berating XML for how people misuse it would be like saying Git is crap because some people use it as a general purpose database, and there are better ways to design relational databases.


An 11 page XML spec? That would be surprising. But you're right, XML itself is pretty simple and useful for document processing. Where things really went completely awry is XML Schema.

I have read (and implemented) a lot of weird specs in my life, but XML Schema has to be the worst. What makes it stand out is that it's incredibly convoluted and completely unfit for purpose at the same time.


" Where things really went completely awry is XML Schema."

I think XML worked out OK for its intended purpose because there was a lot of experience with SGML, HTML, and ad-hoc attempts at "re-purposing" HTML. Folks could say, well, we tried this and that, and this works and that is painful. And since it was not assured to be a success, there were fewer major vendors clamoring to get their fingerprints all over it.

But after XML caught on there was interest from tool vendors to beef things up, largely with abstractions that had yet to see real-world testing, and with things that just so happened to require massive IDE support.

The worst may have been the schema stuff, but there's a lot of competition.

BTW, this page http://www.w3.org/TR/REC-xml/ gives me 40 pages of print preview. A good chunk consists of appendices, but the main part runs more than 11 pages. I don't recall where I got that number from.

I'll just blame Tim Bray, for lack of a real excuse. :)


What really surprises me in XML Schema is not so much all the half baked stuff they put in and not even the crazy nesting of complex types for instance. What surprises me is what XML Schema cannot do.

One thing it cannot do is probably the most frequently used structure in all structured documents I have seen. It is to specify that a particular set of quantified elements can occur in any order.

The reason they gave for not supporting this is that validators would have to be more than contextless state machines. That's insane. They have created a schema language that doesn't support the most important schema constraint of them all for performance reasons.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: