Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
XML Sucks (a.k.a. Why S-expressions Are Better) (hcsw.org)
25 points by dpapathanasiou on Sept 28, 2007 | hide | past | favorite | 44 comments


I like s-expressions. But this guy makes a slightly odd argument that doesn't help his cause at all. He basically says XML is too complex, but s-expressions are really simple if you are an experienced lisp hacker. Otherwise, so he says, "this is going to get really hard really quickly".


Actually he basically says XML is a complex, unpredictable mess that is not actually extensible; while s-expressions are simple, stable, and extensible. (Though extending them is still a somewhat advanced topic.)


He is kind of right, since he is talking a about extending the basic XML _syntax_, not about creating or extending XML vocabularies. Obviously his solution with reader macros only works in Lisp, and he seem to be proud about that, but it kind of defeats the purpose of a platform-independent exchange format.


XML is a pain to parse properly. Have you seen how large XML libraries are?

It's also complex in the fact that it adds extra bytes. Imagine a 10 Mb XML file....now figure out how much of that is space wasted on closing tags.


XML syntax has a certain amount of redundancy. This is a feature which makes it easier to manually write or edit XML using a simple text editor, which is one of the use cases that XML was designed to support.


I raised this problem earlier on HN and never got the answer to this question:

Let's say I hate XML and I love S-expressions. I'm willing to replace all my XML with Sexp now and all I need is a clearly defined syntax with charsets, escapes and all, so that I can write generators and parsers. That spec would probably fit 2 or 3 pages, but someone has to do it carefully. So where is it, the Sexp syntax for representing tree structures everybody's talking about?


i'm not sure what you're asking. s-exp's already represent trees:

(html (head (title "hi" ) ) (body (h1 "hello" ) ) )

then (defmacro html ... ) to transform it. so lisp already has the syntax, the parsers, and the generators


I understand that's simple in Lisp, but I also need interoperability, and that implies different languages, among other things. Perhaps the reason nobody's adopting Sexp's is that you will never get an answer from lispers, other than "you can do it in Lisp". Great, but I need C++, Perl, then C#, Java, PHP, Python and all kinds of crappy languages to support that format.


lisp syntax is a lot simpler than XML. a basic lisp _interpreter_ would take maybe a night or three in any language, a mere parser much less. but why you don't see many such parsers is a different issue, not closely related to s-exp's themselves


So, why _don't_ you see many such parsers?


Great, but I need C++, Perl, then C#, Java, PHP, Python and all kinds of crappy languages to support that format.

Lucky for you, Greenspun's Tenth Rule (http://en.wikipedia.org/wiki/Greenspun's_Tenth_Rule) comes into play. All of those languages actually have Lisp embedded inside them: "Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp."


But how would you express attributes and mixed content and named character entities and namespaces and encoding declarations?

S-expressions are beautiful for lisp code and for many kinds of structured data, but it gets ugly when you try to use it on something as messy as a human-writable hypertext document format. XML syntax on the other hand is designed for this.



Something like that, yes, but attributes (a (@ (href /path))...) are ugly in particular.

I think it shouldn't necessarily be parsable by Lisp. Attributes, for example, could be as simple as

  (a (href=/path)  ...)
Why '=' - because this character is not used in attribute names historically, so with this syntax you can move from XML to Sexp smoothly, automatically that is. Formally, the syntax I suggested is not Lisp, of course.


Something like that, yes, but attributes (a (@ (href /path))...) are ugly in particular.

So use :named attributes instead?

  (a :href "/path" ...)


Nice indeed, but how about attribute names that themselves contain colons, or maybe start with a colon? XML/HTML is less restrictive in this regard: all that matters basically is spaces and '=' inside a tag.


Replace the colon with a dash in your attribute names. Easy solution.


Cool! So bcts example

    <p>here is a <a href="http://example.com">link</a>.</p>
would be:

    (p "here is a" (a (@ (href "http://example.com")) "link") ".")


content issues like namespaces wouldn't be unique to s-exp's -- XML had to deal with them. i'm not sure about the readability. i would say XML wins because angle brackets delimit things clearlier, especially by the fact that they're duplicated on closing. but i might have a different opinion if s-exp's were standerer than XML, so i can't really say. some of the SXML examples on that page look really nice

---------

don't judge straight translations. if s-exp's were the standard i suspect links would look more like:

(p "here is a" (a "link" @href "http://example.com" ) )


XML is supposed to be a human-readable platform- and language-independent format for data and document exchange. The article just shows that you can do all kinds of cool stuff with s-expressions if you happen to use CL.

Just as if a smalltalk enthusiast proposed exchange of serialized smalltalk-images as the universal solution for data interchange. Because XML is just to complex to parse.


Didn't read the article through to the end, but I suppose, if you can use an XML parser to read a document, you might as well use a S-Expression parser.


Just as if a smalltalk enthusiast proposed exchange of serialized smalltalk-images as the universal solution for data interchange.

That's not a bad idea, and as I understand it, the Lisp operating systems back in the day passed actual objects back and forth between programs instead of strings.


I like JSON. Easy to read, easy to work with.


I prefer JSON too.. it provides a more natural expression of associations, objects, etc. Sure, you can represent any XML/JSON in sexps, but then you have to come up with a convention, follow it, explain it, and check that its followed correctly.

JSON is a precise DSL for declaring basic data structures -- semantic batteries included! Why re-invent the wheel?


XML is for documents, s-expressions (or JSON or YAML or ...) is for data.


Or as expressed by Glenn Reid, quoted in the article:

"A markup language is predicated on the idea that the markup is an exception in a river of text."


Actually s-expressions are pretty good for documents too. If you start to use lisp you'll probably wish everything was in a sexpr.


So an entire HTML page is a document whereas a fragment of a page retrieved via XHR is data, right?


If you're just going to shove it straight in the page, then it's part of a document.

It's the difference between

     <p>here is a <a href="http://example.com">link</a>.</p>
and

    <name>bct</name>
    <dob>19xx-07-03</dob>
    <address>...</address>
One is markup, the other is something different.


I see what you mean but I'm not convinced that this difference merits the use of two different representations. And the difference gets blurred if you consider that XHTML makes it possible to include basically any kind of data in the document, not just elements meant to be document structure.


Oh, I don't think there's anything wrong with using XML for data - it's just that these "XML sucks" rants always seem to assume that's all it's used for.

I know that a JSON version of my XHTML fragment up there would suck pretty hard; would a s-exp one be any better? (honest question)


Why would the JSON version of that fragment suck pretty hard? I don't think there is any rational argument for XML, JSON or s-expressions solely based on whether something is data or a document.

I think to make that judgement, we need to consider things like when do we really need namespaces? How often is a particular expression written by hand, how often is it generated automatically? How important are different character encodings? Does it have to be bandwidth efficient? Is this a type of data that might actually morph into a program or is that something we want to exclude for security or complexity reasons? Are there existing tools that help us to process one or the other representation more easily?

All these things depend on many more aspects than just data or document.


> Why would the JSON version of that fragment suck pretty hard?

The fragment uses attributes and mixed content - something that would be quite ugly to express in JSON. I wont event try.


> Why would the JSON version of that fragment suck pretty hard?

    ["p", "this is a", ["a", { "text" : "link", "href": "http://example.com" } ], "." ]
I don't do much JSON so maybe that can be improved, but I wouldn't imagine by much.

Certainly there are lots of other factors to be considered, but that just goes to prove my point; that "XML Sucks" rarely looks at anything beyond very simple uses of XML.


Maybe like this:

["p", "this is a", ["a", "link", {href: "http://example.com"}], "."]

If that is uglier is debatable, maybe a little bit. However, the point is really that suckyness is not just a matter of esthetics. All the more I agree with your second point.


Now bold something inside the link, and put a class attribute on the paragraph.


i actually thought about that, its harder to read than plain ol html

 {"p":{">":"this is a","a":{"@href":"http://example.com","b":{">":"link"}},">":".","@class":"my-p-class"}}


Now imagine the complete works of Shakespeare in that format...


you're right, but look at an average html page. Is that pretty? I guess it makes no sense to argue about purely syntactical issues. Comparing static HTML and JSON is not what it's about in the real world.


i think both JSONs above are a bit lispy :)...

 {"p":{">":"this is a","a":{"@":"http://example.com",">":"link"},">":"."}


missing closing "}"


I knew there was a code/data divide, but not that there also was one between documents and data.


A document needs to contain information (most of the time) about the layout of the document. So an HTML page is a document most of the time. The data is the TEXT and most of the HTML is there simply to tell the browser how to render the data. I think the argument above was that XML is great for creating document formats where as JSON or S-Expressions are great for sending data around.


Data isn't really a good word for it (because obviously a document is a kind of data), but I can't think of a better one.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: