This is the worst kind of internet rant. It uses all kinds of elaborate similes ...

jamesbritt · on June 21, 2009

'The "verbose" end-tags like </p>, </body> are very helpful syntax when manually editing large documents (which may have deeply nested structures spanning several screenfulls). However for simple and compact data structures a simple ")" (or even "}") is easier and clearer.'

The verbose end tags also make it easier the write consistent robust parsers. One complaint about SGML was that it hard to find a tool that correctly implemented the entire spec. The XML spec is 11 pages.

XML came from a desire to have SGML on the Web. As you've pointed out, people have used XML were it likely didn't belong.

To be fair, though, once the world had a choice of decent XML parsers and tools it made sense to use XML for many things, even where the syntax itself was less than ideal for the given task. The proliferation of JSON parsers will likely fix a lot of this abuse moving forward.

Still, berating XML for how people misuse it would be like saying Git is crap because some people use it as a general purpose database, and there are better ways to design relational databases.

fauigerzigerk · on June 21, 2009

An 11 page XML spec? That would be surprising. But you're right, XML itself is pretty simple and useful for document processing. Where things really went completely awry is XML Schema.

I have read (and implemented) a lot of weird specs in my life, but XML Schema has to be the worst. What makes it stand out is that it's incredibly convoluted and completely unfit for purpose at the same time.

jamesbritt · on June 22, 2009

" Where things really went completely awry is XML Schema."

I think XML worked out OK for its intended purpose because there was a lot of experience with SGML, HTML, and ad-hoc attempts at "re-purposing" HTML. Folks could say, well, we tried this and that, and this works and that is painful. And since it was not assured to be a success, there were fewer major vendors clamoring to get their fingerprints all over it.

But after XML caught on there was interest from tool vendors to beef things up, largely with abstractions that had yet to see real-world testing, and with things that just so happened to require massive IDE support.

The worst may have been the schema stuff, but there's a lot of competition.

BTW, this page http://www.w3.org/TR/REC-xml/ gives me 40 pages of print preview. A good chunk consists of appendices, but the main part runs more than 11 pages. I don't recall where I got that number from.

I'll just blame Tim Bray, for lack of a real excuse. :)

fauigerzigerk · on June 22, 2009

What really surprises me in XML Schema is not so much all the half baked stuff they put in and not even the crazy nesting of complex types for instance. What surprises me is what XML Schema cannot do.

One thing it cannot do is probably the most frequently used structure in all structured documents I have seen. It is to specify that a particular set of quantified elements can occur in any order.

The reason they gave for not supporting this is that validators would have to be more than contextless state machines. That's insane. They have created a schema language that doesn't support the most important schema constraint of them all for performance reasons.