But parsers are easy to write and unfriendly syntax will completely kill a nasce...

cousin_it · on Feb 4, 2014

As a user of programming languages, I don't care about s-expr vs infix syntax, but I do care about the amount of semantic innovation that went into a language (as in I refuse to learn any language whose main contribution is syntactic), and sadly it seems that wasting time on syntax distracts many PL creators from the semantic poverty of their creations.

johnbender · on Feb 4, 2014

This was my primary frustration with the article. It notes that the "bulk" of the work in building a programming language is in defining the semantics but devotes absolutely no space to the process, tools, and tradition.

A programming language is the semantics and without a strong notion of those semantics and the interactions you end up with confusion and difficulty. For an example one need look no further than Ruby's "implementation as specification" crazyness.

I won't go as far as to say that all language creators should adopt operational semantics when building a language, but if you're serious about it then you should at least investigate the process.

WalterBright · on Feb 4, 2014

I deliberately tried to avoid topics that are well-covered elsewhere.

dllthomas · on Feb 5, 2014

Appropriate, but can you point us at some of those elsewheres?

WalterBright · on Feb 5, 2014

If you want to figure out how to write a lexer or parser, what better way than to read the code for one?

https://github.com/D-Programming-Language/dmd/blob/master/sr...

There's a lot of detail there, but the operation of both is straightforward.

dllthomas · on Feb 5, 2014

I'd thought this subthread was discussing semantics.

johnbender · on Feb 6, 2014

For anyone who comes across this thread and is genuinely interested in educating themselves about how programming languages can be constructed in a principled way (what I would call sane) you can consult Pierce's Types and Programming Languages.

With some work anyone can learn the material. It's not as simple as reading a blog post unfortunately, but you'll have a much deeper understanding of what it means to build a programming language.

ericbb · on Feb 8, 2014

I'm in the midst of reading Types and Programming Languages right now.

I'm also reading the more recent Practical Foundations for Programming Languages by Robert Harper. It's written for a similar audience.

I would recommend both.

he_the_great · on Feb 5, 2014

If someone wants to build a language and "A programming language is the semantics" doesn't it really make sense to tell them what the semantics should look like?

johnbender · on Feb 6, 2014

Sorry I thought I was being clear when I mentioned operational semantics but in retrospect that may have been totally unclear. I recommend Pierce's Types and Programming Languages. It has everything you need to learn about operational semantics and types systems.

klipt · on Feb 5, 2014

I dunno, I definitely find

  d = {}
  d[key] = value

easier to read/write than

  (set! d (make-hash-table))
  (hash-table-put! d key value)

wtetzner · on Feb 10, 2014

On the other hand, I find (define d (make-hash-table)) (hash-table-put! d key1 value1 key2 value2 key3 value3)

easier to read than d = {} d[key1] = value1 d[key2] = value2 d[key3] = value3

So syntax matters, but it's not obvious which syntax is best.

WalterBright · on Feb 6, 2014

Yes, syntax matters.

jeffdavis · on Feb 4, 2014

Although I largely agree, I think that line of thinking can be taken too far.

One reason I think that is because syntax helps develop conventions and make them obvious. If everything is an s-expr and you can do absolutely anything with macros anywhere, then two programmers might not really be able to communicate even if they are technically using the same language.

wirrbel · on Feb 4, 2014

When several programmers develop on a mutual codebase, there is not one programming language where coding conventions are unnecessary. This is hardly an argument against Lisps. In fact the clojure code I read so far seems to adhere far more to conventions than code of almost any other programming language I have worked with so far.

Macros are pretty much like Classes, Gotos/Loops, Exceptions, Functions, etc in this respect, the communities have mostly agreed on conventions _when_ and _how_ to use them.

I think the argument that macros will make a codebase unreadable to other programmers is largely an exaggeration to turn a pro-Lisp argument against Lisp. Its a pattern I have heard a little too often:

    "C++ is powerful" -> "C++ is too powerful"

    "Macros let you express ..." -> "Macros can do just too much"

Now in my non-Lisp programming, I miss macros a lot. Especially those that are built into a lot of lisps and are quite "simple" additions. The Clojure threading macro for example, or `if-let` variable binding, or the awesome xml-templating languages that are realizable with macros. On the other hand, I have seen code that - for me as an outsider - came quite close to what you describe, just that it hardly was Lisp but PHP, Javascript, etc. Macros, like functions and classes are part of the vocabulary programmers build.

So next time when you criticize macros, talk about something that is worth debating, like the fact that they can be unsafe (hygiene), they are not intuitively writable like functions, they might make debugging harder, etc.

jeffdavis · on Feb 4, 2014

"So next time when you criticize macros"

I wasn't criticizing macros; I claimed that they don't give you much guidance to any particular convention because you can do anything.

A language without conventions is incomplete, so you need to do something active to bring the conventions about. In other words, the designers of lisp leave conventions as an exercise for the reader and really offer no guidance from the language itself.

Clojure and racket are trying to address that, which is good. I've played around with them a bit, generally found them pleasant, and I hope they succeed (though I did not find anything that would compel me to actually use them for the kinds of things I work on).

It's interesting that you bring up C++. I wonder what your feelings are about people who prefer C over C++?

wirrbel · on Feb 4, 2014

Regarding C and C++, I am slightly leaning towards C. While C++ offers some abstractions over C that are really nice to have, it often feels like an awkward chimera. I see C as a even-higher-level assembly language. c++ was a logical step but is kind of obsolete for non-realtime systems

Guidance in a language is a hard thing to do right. I think the only reasonable way of guidance is to make good coding practice simple, limiting guidance is often quite bad [1], because while in a lot of cases it makes things easier, there are those cases where you then have to work around the limitations and this is when you get those "bug ridden incomplete implementations" of that "liberal" language.

Which good guidances from which programming languages are you fond of?

[1] now that I think about it, I do like immutability in functional programming languages - a not-so-liberal thing..

agentultra · on Feb 4, 2014

I'm not sure there is a point worth debating. Hygiene is left to the programmer and the facilities exist to write "safe" macros (gensym). Debugging isn't terribly difficult with macro-expansion facilities. The only "problem" with them, if you can call it that, is that newbies like to over-use them once they learn them. However that problem goes away with time.

Macros in the hands of an intermediate-to-experienced programmer are powerful tools. They remove patterns from your code. They allow you to modify the compiler to efficiently run your program. And all sorts of good things.

People arguing that X "is too powerful" are saying that nobody needs a combine harvester when a team of farmers with scythes will do.

jeffdavis · on Feb 4, 2014

"People arguing that X "is too powerful""

Who is arguing that?

mattfenwick · on Feb 4, 2014

One example in a recent HN post: https://news.ycombinator.com/item?id=7085682

_vya7 · on Feb 4, 2014

I fully agree. Syntax is the least of my concerns when I'm using a programming language. I use many languages in many IDEs and editors, and when I'm coding, I don't really see syntax, I see the algorithms and patterns behind it.

read · on Feb 4, 2014

It's a misconception parsers are easy to write. Plus, they do an unfathomable amount of damage.

Parsers are far from harmless pieces of logic you can just throw together. And not just because they take time to write - parsers are physically dangerous. Compiler writers develop incipient carpal tunnel syndrome trying to write parsers.

When you finish writing a parser as a language designer the problem of writing a parser for the language doesn't go away. The programming language users might want to apply transformations to source code written in the language -- which now means they need to write brand new parsers from scratch to do these transformations.

What s-exprs give you instead is the option to write a "parser" in one function call: using the read function. It doesn't get shorter than that.

Now, that's a parser that's easy to write.

edit: rewording

joe_the_user · on Feb 4, 2014

It's a misconception that parsers are hard to write and believing this does "unfathomable amount of damage"(whatever that means).

If you understand abstract languages, writing a recursive descent parser is a simple, paper and pencil exercise.

If you don't understand abstract languages, you should not be designing a language till you stop and learn them and you should STFU about what people designing languages should do until then.

gamegoblin · on Feb 4, 2014

Agreed. I wrote a recursive descent parser generator in python and it took about an hour. Takes a BNF grammar and spits out a parser. End of story. Recursive descent parsers are dead simple.

read · on Feb 4, 2014

Yes, the complexity of the parser is related to the size of the language. A smaller language needs a smaller, easier to write parser.

Have you ever written a recursive descent parser for C?

I realize now what was inaccurate about what I wrote. It's that the things you have to do after you parse might be the more harmful parts. Processing the parse tree you get back.

edit: rewording

joe_the_user · on Feb 4, 2014

I would point out that a recursive descent parsing does not have to return an AST or any particular tree, and often it doesn't.

Instead, when you create a recursive descent parser, you create a series of functions called whenever a syntax element is discover. In these functions, you construct whatever your final data structures are going to be.

Of course, you still can create and return a full abstract syntax tree but one nice thing about recursive descent is that if you are only going to do a few things, you can just have those few operations in your parser and be done with it.

seanmcdirmid · on Feb 4, 2014

I've gotten really far with a precedence parser before, and as a bonus they are intrinsically incremental. However, beware of braces to match separately.

WalterBright · on Feb 4, 2014

> It's a misconception parsers are easy to write.

Really, they are easy. They are literally insignificant when you factor in all the hours you'll work on a language.

I wrote another one recently for a small side project. It took more time to write the unit tests for it. The parser practically wrote itself.

apgwoz · on Feb 4, 2014

but, to be fair, you've obviously had lots of experience, and understand a great deal more about formal languages and how they translate to something that is trivially parsed, than the majority of people who will read this article. should they attempt a language, they'll likely fail to produce a working parser long before they spendthe rest of the hours on it.

jerf · on Feb 4, 2014

If you can not trivially implement a parser, you probably shouldn't be implementing your own language. The problems only get worse from there.

I'm egalitarian inasmuch as I believe every serious programmer ought to implement some sort of toy language at some point, but I'm not so stupid as to think that this is a good idea at all phases of a programmer's development. Beginners should concentrate on other basic tasks, even low-intermediate really should too. I wouldn't reserve this task for "experts" though, because this is one of the big steps in moving from intermediate to expert. (Anyone who has assembled the skill set to implement a toy-but-nontrivial language has assembled the skill set to accomplish a very wide variety of programming tasks. If I were interviewing someone and they could demonstrate this, I would almost entirely cease to care what actual languages or frameworks they may have worked in.)

apgwoz · on Feb 5, 2014

I agree in principle, but that won't stop someone from fighting with shift reduce conflicts or dangling elses for hours on end before giving up. constructing an unambiguous CFG isn't trivial without experience, and it's tough to get that without bashing your head a few times.

munificent · on Feb 5, 2014

> someone from fighting with shift reduce conflicts or dangling elses for hours on end before giving up.

Right, but if you just hand-write a recursive descent parser, you won't have to deal with shift-reduce conflicts. Dangling elses are trivial to solve. The nice thing about hand-writing a parser is that it lets you learn one new thing (how to implement a parser for a grammar) while taking advantage of what you already know (how to write, run, test, and debug code in some existing language).

Throwing a parser generator at someone means they end up learning the weird vagaries of that generator instead of focusing on their own language. Meanwhile, the resulting generated code is nigh-unreadable, so all of those debugging skills and nice IDE they have go to waste.

giergirey · on Feb 4, 2014

Are you including the time taken to make the parser produce helpful error messages? That always seems like the hard bit to me ...

WalterBright · on Feb 4, 2014

The harder error messages to do well are the ones that come out of the semantic processing.

The ones coming out of the parser aren't that hard to do.

mattfenwick · on Feb 5, 2014

What is your approach for generating error messages during parsing?

WalterBright · on Feb 6, 2014

Ad-hoc, with the proviso that error recovery should always consume tokens, so the parser can never get stuck in a loop.

mattfenwick · on Feb 4, 2014

Good point -- also, dealing with corner cases, writing tests, and maintaining the parser as the language evolves are a couple more things that might take a lot more work.

Do you know of any strategies for error reporting, or of tools that implement? I'm always on the lookout for cleaner approaches to this.

WalterBright · on Feb 4, 2014

> dealing with corner cases, writing tests, and maintaining the parser as the language evolves are a couple more things that might take a lot more work.

Not really.

Seriously, if you're going to get bogged down trying to get the lexer/parser to work, you're not ready to work on a full blown language/compiler. Lexing/parsing is the EASY part, as in a minute, insignificant part of the time you'll invest in the project. I really do mean that.

mattfenwick · on Feb 5, 2014

Wait a second. Please don't misrepresent what I'm saying. I never said anything about "get[ting] bogged down trying to get the lexer/parser to work". I never mentioned a full-blown language/compiler.

I'm genuinely interested by your comment. If I'm interpreting it correctly, you seem to be claiming that ensuring corner cases are handled correctly, testing, and maintaining a parser require a trivially small amount of work. What do you use to build your parsers?

WalterBright · on Feb 5, 2014

> What do you use to build your parsers?

I use a text editor.

Using a coverage analyzer is adequate for evaluating the thoroughness of the tests.

Yes, it all is a trivially small amount of work compared to the rest of a language project (and even compared with the rest of the compiler). You'll spend much more time just trying to figure out how to convert floating point values to strings.

haberman · on Feb 5, 2014

> You'll spend much more time just trying to figure out how to convert floating point values to strings.

I should hope not, since the source code to printf() should have pretty much the complete answer to that!

WalterBright · on Feb 6, 2014

It may not have a license that is compatible with your needs.

And yes, I did have to do my own float => string implementation.

mattfenwick · on Feb 7, 2014

Just to clarify: I'm asking what strategy/approach you use -- hand-written vs. generator, bottom-up vs. top-down, backtracking, ambiguous, context-sensitive, semantic actions, etc. Sorry for the confusion.

mikesname · on Feb 4, 2014

Walter - I'd really be interested in an article/blog about writing a lexer/parser, from your experience and perspective. Just a suggestion ;)

WalterBright · on Feb 5, 2014

I might actually do that. I see that people are persistent in believing it's hard, maybe after making these claims I have an obligation to back them up.

haberman · on Feb 5, 2014

I think there are a few things worth mentioning here.

One is that there is a big difference between writing a parser for a language you are inventing and writing a parser that is attempting to implement an existing language. Languages have lots of corner cases; if you are inventing the language then every quirk of your parser is correct by definition. You might not even be aware of some of the subtle choices that your hand-written parser is making.

As an example, of this, it was not discovered that ALGOL 60 had a "dangling else" ambiguity in its grammar until it after had been implemented, used extensively, and even published in a technical report. It was essentially an accident of the implementation that it resolved the ambiguity in the way that it did. So while it might not be too much work to "get the lexer/parser to work", it doesn't follow that all of the issues around parsing are trivial. There is still a lot of complexity and subtlety around parsing if you're trying to design something that could reasonably have multiple interoperating implementations.

Secondly, there is a very very seriously wide variation of lexical/syntactic complexity between languages. You can pretty easily write a 100% correct JSON parser in an hour or less (possibly much less, depending on what language you choose to write it in). On the other hand, it takes man-years to write a 100% correct C++ parser (not least because C++ tightly couples parsing and semantic analysis). Now I know this article is more talking about designing your own language, and no language will start out as syntactically complicated as C++, but empirically most of the languages we actually use have a fair bit of complexity to them, so delivering the lesson that lexers/parsers are easy in general is, I think, the wrong message to be sending.

Thirdly, there are a lot of practical considerations that can make parsing more complex. For example, take Steve Yegge's attempt to do some incremental parsing (from http://steve-yegge.blogspot.com/2008/03/js2-mode-new-javascr...):

  I had two options: incremental parsing, or asynchrous
  parsing. Clearly, since I'm a badass programmer who can't
  recognize my own incompetence, I chose to do incremental
  parsing. I mentioned this plan a few months ago to Brendan
  Eich, who said: "Let me know how the incremental parsing
  goes." Brendan is an amazingly polite guy, so at the time I
  didn't realize this was a code-phrase for: "Let me know when
  you give up on it, loser."

  The basic idea behind incremental parsing (at least, my
  version of it) was that I already have these little
  functions that know how to parse functions, statements,
  try-statements, for-statements, expressions,
  plus-expressions, and so on down the line. That's how a
  recursive-descent parser works. So I figured I'd use
  heuristics to back up to some coarse level of granularity —
  say, the current enclosing function – and parse exactly one
  function. Then I'd splice the generated syntax tree fragment
  into my main AST, and go through all the function's siblings
  and update their start-positions.

  Seems easy enough, right? Especially since I wasn't doing
  full-blown incremental parsing: I was just doing it at the
  function level. Well, it's not easy. It's "nontrivial", a
  word they use in academia whenever they're talking about the
  Halting Problem or problems of equivalent decidability.
  Actually it's quite doable, but it's a huge amount of work
  that I finally gave up on after a couple of weeks of effort.
  There are just too many edge-cases to worry about. And I had
  this nagging fear that even if I got it working, it would
  totally break down if you had a 5,000 line function, so I
  was kinda wasting my time anyway.

All of this is to say: I can't argue with your basic point that "getting lexing/parsing to work" for a language you are inventing isn't terribly difficult. But I disagree with your larger (somewhat implied) point that parsers as a whole are easy.

WalterBright · on Feb 6, 2014

A couple points: 1. the dangling else problem is trivial to deal with. 2. I've written a C++ parser, I know the issues involved, and I know that a parser generator isn't going to fix that.

> delivering the lesson that lexers/parsers are easy in general is, I think, the wrong message to be sending.

I stand by the message :-) in the sense that if a person finds lexing/parsing to be hard, they're likely to find the semantic/optimization/codegen parts of the compiler to be insurmountable.

I've written compilers for numerous languages, including C++, including 2 languages I invented, and lexing & parsing is just not that hard relative to the rest of a compiler.

joe_the_user · on Feb 6, 2014

Sure but you argue that problems of the sort you mention come from ambiguous grammars.

So, hand-created parsers may not flag ambiguous grammars and automatically generated parsers might (I've only done hand created parsers so I don't know).

And Steve Yegge quote just shows how much abstract languages are something you need to learn rather than something you can power your way through. And plenty of good programmers can power their way through almost any other kind of programming challenge so someone who seems very smart doing a very dumb thing in parsing doesn't surprise me (I've tried that myself).

breuleux · on Feb 4, 2014

Top down operator parsing (http://javascript.crockford.com/tdop/tdop.html) is pretty easy to implement, is inherently efficient, and trivial to extend.

Even that might be overkill, though: a generic operator parser for unary, binary, ternary, n-ary, and so on will take about 50 lines of code. You can encode a surprisingly large number of control structures with a cleverly crafted precedence parser.

kps · on Feb 4, 2014

  The programming language users might want to apply transformations
  to source code written in the language -- which now means they need to
  write brand new parsers from scratch to do these transformations.

That would be Doing It Wrong. Tools like clang-format (source formatting) and clang-modernize (source transformation to use new language features) use exactly the same parser library — among other things — as the compiler proper.

agumonkey · on Feb 4, 2014

may I add this article by Laurence Tratt about parsing :

http://tratt.net/laurie/blog/entries/parsing_the_solved_prob... ltu discussion http://lambda-the-ultimate.org/node/4489

csmithuk · on Feb 4, 2014

No-one sane writes parsers any more. People use parser generators.

munificent · on Feb 5, 2014

GCC uses a hand-written parser. So does the C# compiler for .NET. V8 and, as far as I know, the other browsers' JS implementations have hand-written parsers. Dart (both the VM and the self-hosted dart2js compiler) have hand-written parsers.

Offhand, I'm not aware of any real-world language with lots of users that has a generated parser.

wisty · on Feb 4, 2014

If you start with s-exprs, and then gradually (and thoughtfully) and syntactic sugar, there shouldn't be too much harm.

agumonkey · on Feb 4, 2014

Complexity aside, concrete syntax places a hard barrier on composability of constructs (I say that based on usage, not a PL researcher). This is something quite far from the applicative/functional world (lisp, stack based, ml, haskell and its predecessors... ) where the combinations are greater (since most things are expressions).

coolsunglasses · on Feb 4, 2014

I don't care about S-expr vs. infix syntax.

I use Haskell and Clojure happily.

Sounds like petty nonsense to me.