XML Sucks (a.k.a. Why S-expressions Are Better)

fauigerzigerk · on Sept 28, 2007

I like s-expressions. But this guy makes a slightly odd argument that doesn't help his cause at all. He basically says XML is too complex, but s-expressions are really simple if you are an experienced lisp hacker. Otherwise, so he says, "this is going to get really hard really quickly".

Goladus · on Sept 28, 2007

Actually he basically says XML is a complex, unpredictable mess that is not actually extensible; while s-expressions are simple, stable, and extensible. (Though extending them is still a somewhat advanced topic.)

olavk · on Sept 28, 2007

He is kind of right, since he is talking a about extending the basic XML _syntax_, not about creating or extending XML vocabularies. Obviously his solution with reader macros only works in Lisp, and he seem to be proud about that, but it kind of defeats the purpose of a platform-independent exchange format.

_csoo · on Sept 28, 2007

XML is a pain to parse properly. Have you seen how large XML libraries are?

It's also complex in the fact that it adds extra bytes. Imagine a 10 Mb XML file....now figure out how much of that is space wasted on closing tags.

olavk · on Sept 29, 2007

XML syntax has a certain amount of redundancy. This is a feature which makes it easier to manually write or edit XML using a simple text editor, which is one of the use cases that XML was designed to support.

mojuba · on Sept 28, 2007

I raised this problem earlier on HN and never got the answer to this question:

Let's say I hate XML and I love S-expressions. I'm willing to replace all my XML with Sexp now and all I need is a clearly defined syntax with charsets, escapes and all, so that I can write generators and parsers. That spec would probably fit 2 or 3 pages, but someone has to do it carefully. So where is it, the Sexp syntax for representing tree structures everybody's talking about?

tokipin · on Sept 28, 2007

i'm not sure what you're asking. s-exp's already represent trees:

(html (head (title "hi" ) ) (body (h1 "hello" ) ) )

then (defmacro html ... ) to transform it. so lisp already has the syntax, the parsers, and the generators

mojuba · on Sept 28, 2007

I understand that's simple in Lisp, but I also need interoperability, and that implies different languages, among other things. Perhaps the reason nobody's adopting Sexp's is that you will never get an answer from lispers, other than "you can do it in Lisp". Great, but I need C++, Perl, then C#, Java, PHP, Python and all kinds of crappy languages to support that format.

tokipin · on Sept 28, 2007

lisp syntax is a lot simpler than XML. a basic lisp _interpreter_ would take maybe a night or three in any language, a mere parser much less. but why you don't see many such parsers is a different issue, not closely related to s-exp's themselves

inklesspen · on Sept 29, 2007

So, why _don't_ you see many such parsers?

_csoo · on Sept 28, 2007

Great, but I need C++, Perl, then C#, Java, PHP, Python and all kinds of crappy languages to support that format.

Lucky for you, Greenspun's Tenth Rule (http://en.wikipedia.org/wiki/Greenspun's_Tenth_Rule) comes into play. All of those languages actually have Lisp embedded inside them: "Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."

olavk · on Sept 28, 2007

But how would you express attributes and mixed content and named character entities and namespaces and encoding declarations?

S-expressions are beautiful for lisp code and for many kinds of structured data, but it gets ugly when you try to use it on something as messy as a human-writable hypertext document format. XML syntax on the other hand is designed for this.

brlewis · on Sept 28, 2007

http://okmij.org/ftp/Scheme/SXML.html

mojuba · on Sept 28, 2007

Something like that, yes, but attributes (a (@ (href /path))...) are ugly in particular.

I think it shouldn't necessarily be parsable by Lisp. Attributes, for example, could be as simple as

  (a (href=/path)  ...)

Why '=' - because this character is not used in attribute names historically, so with this syntax you can move from XML to Sexp smoothly, automatically that is. Formally, the syntax I suggested is not Lisp, of course.

_csoo · on Sept 28, 2007

Something like that, yes, but attributes (a (@ (href /path))...) are ugly in particular.

So use :named attributes instead?

  (a :href "/path" ...)

mojuba · on Sept 28, 2007

Nice indeed, but how about attribute names that themselves contain colons, or maybe start with a colon? XML/HTML is less restrictive in this regard: all that matters basically is spaces and '=' inside a tag.

_csoo · on Sept 28, 2007

Replace the colon with a dash in your attribute names. Easy solution.

olavk · on Sept 28, 2007

Cool! So bcts example

    <p>here is a <a href="http://example.com">link</a>.</p>

would be:

    (p "here is a" (a (@ (href "http://example.com")) "link") ".")

tokipin · on Sept 28, 2007

content issues like namespaces wouldn't be unique to s-exp's -- XML had to deal with them. i'm not sure about the readability. i would say XML wins because angle brackets delimit things clearlier, especially by the fact that they're duplicated on closing. but i might have a different opinion if s-exp's were standerer than XML, so i can't really say. some of the SXML examples on that page look really nice

---------

don't judge straight translations. if s-exp's were the standard i suspect links would look more like:

(p "here is a" (a "link" @href "http://example.com" ) )

olavk · on Sept 28, 2007

XML is supposed to be a human-readable platform- and language-independent format for data and document exchange. The article just shows that you can do all kinds of cool stuff with s-expressions if you happen to use CL.

Just as if a smalltalk enthusiast proposed exchange of serialized smalltalk-images as the universal solution for data interchange. Because XML is just to complex to parse.

Tichy · on Sept 28, 2007

Didn't read the article through to the end, but I suppose, if you can use an XML parser to read a document, you might as well use a S-Expression parser.

_csoo · on Sept 28, 2007

Just as if a smalltalk enthusiast proposed exchange of serialized smalltalk-images as the universal solution for data interchange.

That's not a bad idea, and as I understand it, the Lisp operating systems back in the day passed actual objects back and forth between programs instead of strings.

axod · on Sept 28, 2007

I like JSON. Easy to read, easy to work with.

cmars232 · on Sept 30, 2007

I prefer JSON too.. it provides a more natural expression of associations, objects, etc. Sure, you can represent any XML/JSON in sexps, but then you have to come up with a convention, follow it, explain it, and check that its followed correctly.

JSON is a precise DSL for declaring basic data structures -- semantic batteries included! Why re-invent the wheel?

bct · on Sept 28, 2007

XML is for documents, s-expressions (or JSON or YAML or ...) is for data.

jimbokun · on Sept 28, 2007

Or as expressed by Glenn Reid, quoted in the article:

"A markup language is predicated on the idea that the markup is an exception in a river of text."

jsmcgd · on Sept 28, 2007

Actually s-expressions are pretty good for documents too. If you start to use lisp you'll probably wish everything was in a sexpr.

fauigerzigerk · on Sept 28, 2007

So an entire HTML page is a document whereas a fragment of a page retrieved via XHR is data, right?

bct · on Sept 28, 2007

If you're just going to shove it straight in the page, then it's part of a document.

It's the difference between

     <p>here is a <a href="http://example.com">link</a>.</p>

and

    <name>bct</name>
    <dob>19xx-07-03</dob>
    <address>...</address>

One is markup, the other is something different.

fauigerzigerk · on Sept 28, 2007

I see what you mean but I'm not convinced that this difference merits the use of two different representations. And the difference gets blurred if you consider that XHTML makes it possible to include basically any kind of data in the document, not just elements meant to be document structure.

bct · on Sept 28, 2007

Oh, I don't think there's anything wrong with using XML for data - it's just that these "XML sucks" rants always seem to assume that's all it's used for.

I know that a JSON version of my XHTML fragment up there would suck pretty hard; would a s-exp one be any better? (honest question)

fauigerzigerk · on Sept 28, 2007

Why would the JSON version of that fragment suck pretty hard? I don't think there is any rational argument for XML, JSON or s-expressions solely based on whether something is data or a document.

I think to make that judgement, we need to consider things like when do we really need namespaces? How often is a particular expression written by hand, how often is it generated automatically? How important are different character encodings? Does it have to be bandwidth efficient? Is this a type of data that might actually morph into a program or is that something we want to exclude for security or complexity reasons? Are there existing tools that help us to process one or the other representation more easily?

All these things depend on many more aspects than just data or document.

olavk · on Sept 28, 2007

> Why would the JSON version of that fragment suck pretty hard?

The fragment uses attributes and mixed content - something that would be quite ugly to express in JSON. I wont event try.

bct · on Sept 28, 2007

> Why would the JSON version of that fragment suck pretty hard?

    ["p", "this is a", ["a", { "text" : "link", "href": "http://example.com" } ], "." ]

I don't do much JSON so maybe that can be improved, but I wouldn't imagine by much.

Certainly there are lots of other factors to be considered, but that just goes to prove my point; that "XML Sucks" rarely looks at anything beyond very simple uses of XML.

fauigerzigerk · on Sept 28, 2007

Maybe like this:

["p", "this is a", ["a", "link", {href: "http://example.com"}], "."]

If that is uglier is debatable, maybe a little bit. However, the point is really that suckyness is not just a matter of esthetics. All the more I agree with your second point.

bct · on Sept 28, 2007

Now bold something inside the link, and put a class attribute on the paragraph.

twism · on Sept 28, 2007

i actually thought about that, its harder to read than plain ol html

 {"p":{">":"this is a","a":{"@href":"http://example.com","b":{">":"link"}},">":".","@class":"my-p-class"}}

leoc · on Sept 29, 2007

Now imagine the complete works of Shakespeare in that format...

fauigerzigerk · on Sept 29, 2007

you're right, but look at an average html page. Is that pretty? I guess it makes no sense to argue about purely syntactical issues. Comparing static HTML and JSON is not what it's about in the real world.

twism · on Sept 28, 2007

i think both JSONs above are a bit lispy :)...

 {"p":{">":"this is a","a":{"@":"http://example.com",">":"link"},">":"."}

twism · on Sept 28, 2007

missing closing "}"

mdemare · on Sept 28, 2007

I knew there was a code/data divide, but not that there also was one between documents and data.

cstejerean · on Sept 28, 2007

A document needs to contain information (most of the time) about the layout of the document. So an HTML page is a document most of the time. The data is the TEXT and most of the HTML is there simply to tell the browser how to render the data. I think the argument above was that XML is great for creating document formats where as JSON or S-Expressions are great for sending data around.

bct · on Sept 28, 2007

Data isn't really a good word for it (because obviously a document is a kind of data), but I can't think of a better one.