Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you're looking for a happy medium between the readability of JSON and XML and the efficiency of ASN.1 and protobufs, take a look at canonical S-expressions[1].

There's an advanced representation, which looks like this: (message (header (sender "Billy Joe Bob") (sent "2015-03-26T12:02:00Z")) (body "Hey guys! Let's meet up for lunch!")). It's possible to encode any byte string using Base64 or hex. It's also possible to encode types with data: (message (header (sender "Billy Joe Bob") (sent "2015-03-26T12:02:00Z")) (body [text/html]"<p>Hey guys! Let's meet up for lunch!</p>"))

While there are multiple advanced encodings for the same data (e.g. foo or "foo" or |Zm9v| or #666f6f#), there is a _single_ canonical encoding for any datum: the messages above would be (7:message(6:header(6:sender13:Billy Joe Bob)(4:sent20:2015-03-26T12:02:00Z))(4:body35:Hey guys! Let's meet up for lunch!)) and (7:message(6:header(6:sender13:Billy Joe Bob)(4:sent20:2015-03-26T12:02:00Z))(4:body[9:text/html]42:<p>Hey guys! Let's meet up for lunch!</p>)).

A huge advantage of this canonical encoding is that it's amenable to cryptographic hashing and signing; a weakness of JSON is that one has to layer requirements atop JSON itself (e.g. alphabetising object properties) in order for two parties to be able to hash the same datum and get the same value.

Another advantage of canonical S-expressions is that it's straightforward to define a mapping between them and HTML: "<p class='foo'>This is a <em>nifty</em> paragraph.<br /></p>" could be represented as ((p (class foo)) "This is a " (em nifty) paragraph. (br)). There are other possible mappings between S-expressions and HTML, of course, but I like that one. Another might be (p (/ (class foo)) "This is a " (em nifty) paragraph. (br)).

[1] http://people.csail.mit.edu/rivest/Sexp.txt



> there is a _single_ canonical encoding for any datum: the messages above would be (7:message(6:header(6:sender13:Billy Joe Bob)(4:sent20:2015-03-26T12:02:00Z))(4:body35:Hey guys! Let's meet up for lunch!))

This reminds me a lot of bencode, with the advantage for bencode that it doesn't need any fiddling for non-printable characters: no more base64, no more hex.


The base64 & hex stuff is only used for the advanced, human-readable bits; on the wire it's just straight length-encoding and byte strings.

I'd say that bencode's advantage is a built-in standard for integer encoding (with canonical S-expressions one must decide between ASCII decimals or little/big-endian bit strings), and a clearer standard for a dictionary/map/hash (a canonical S-expression would probably use an alist-like structure like (map (foo bar) (baz quux)), but one could also go with (map foo bar baz quux), (map (foo bar baz quux)) or some other encoding.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: