Please stop recommending the Dragon Book already. It is not just heavy, it is mo...

barrkel · on Dec 14, 2015

It's fairly heavy in lexing and parsing theory, especially around finite automata, pushdown automata, etc. It's the kind of book you might want to read if you're reimplementing yacc.

Modern code generation has moved on a bit, so I wouldn't dig too deeply into the latter third or so of the book.

All in all, for a hobby compiler, it would be a poor choice; heavy on unnecessary theory in the front end and outdated on on back end stuff.

Let's Build a Compiler by Jack Crenshaw is one of my favourite guides for the hobbyist, although it's a bit dated now since it used Turbo Pascal and targeted 16-bit x86.

sklogic · on Dec 14, 2015

It does not make any sense to reimplement yacc in the 21st century. There are far more powerful and yet simple parsing techniques, rendering all that automata stuff useless and outdated. Take a look at PEG, Pratt parsing and GLR.

wtetzner · on Dec 14, 2015

Or even better, GLL.

http://dotat.at/tmp/gll.pdf

nickpsecurity · on Dec 15, 2015

Best features of both types of parsers? Hell yeah! Never even heard of this one despite all the research updates I did a few years ago. Thanks for the link.

sklogic · on Dec 15, 2015

It's an interesting one, I'll add it to my collection.

Although, to be fair, PEG (Packrat) do support left recursion [1], and it can also be combined with Pratt for the binary expressions very efficiently, at O(n).

[1] http://www.vpri.org/pdf/tr2007002_packrat.pdf

wtetzner · on Dec 15, 2015

> Although, to be fair, PEG (Packrat) do support left recursion [1]

Unfortunately it sometimes generates the wrong parse: http://tratt.net/laurie/research/pubs/papers/tratt__direct_l...

sklogic · on Dec 15, 2015

And all the edge cases are related to the binary expressions - and you should be using Pratt for them instead.

Left recursive PEG must be reserved for things like method, structure field and array access, which are ok with that algorithm.

wtetzner · on Dec 15, 2015

Ah, gotcha. I struggled with PEG and left recursion for a while before finding GLL. Do you have any resources on how to combine PEG and Pratt?

sklogic · on Dec 15, 2015

There must have been some papers on it, but I lost a track. Will try to dig out something.

Meanwhile, you can take a look a my implementation of such a mix: https://github.com/combinatorylogic/mbase/tree/master/src/l/...

An example of a grammar built on top of it: https://github.com/combinatorylogic/clike/blob/master/clike/... - see the `binary` nodes there, they're translated into Pratt while all the others are Packrat.

sparkie · on Dec 14, 2015

Pfft. Who wants to parse in 2015? We all know that constructing your language as a JSON schema is the way forward. Your JSON parser is the only parser you'll ever need.

ORioN63 · on Dec 14, 2015

Or a lisp.

nickpsecurity · on Dec 14, 2015

There's quite a few decent alternatives. I often suggest Wirth's Compiler Construction and Oberon sources because they're straightforward lessons plus give experience with Wirth style of simple, safe, efficient languages. Then, they can improve the Oberon System or compilers for personal projects and improvement.

That said, I typically recommend compilers get written in an ML or LISP given it's so much easier to do in those languages. Ocaml and Racket are my main recommendations for modern work. One can also use techniques for deriving imperative code from functional programs if one wants to re-implement that compiler in specific imperative language on a machine. The constructs are even straight-forward enough for macro assembly for those that want to understand it down to bare metal.

sklogic · on Dec 14, 2015

Yes, functional side of things is interesting and rarely covered in the standard compilers courses.

There is an old but good book, often overlooked: "Functional Programming" by Anthony J. Field and Peter G. Harrison (1988). Despite the title, it's more about various compilation techniques.

Also an old but still relevant (it is missing the STG, but otherwise full of interesting stuff): http://research.microsoft.com/en-us/um/people/simonpj/papers...

It also worth following this blog: https://wingolog.org/

nickpsecurity · on Dec 14, 2015

I got the 2nd link here before, maybe from you. Might be old but very thorough treatment of about every aspect of functional compilation. I have it saved in djvu and pdf in case I neet someone starting in that area. :)

The other two are new to me. Thanks for those.

urs · on Dec 14, 2015

I definitely agree with ML and OCaml, but why do you recommend Lisp for compiler work? I love working withs Lisps, but how do you deal with dynamic typing bugs in compiler work? I prefer OCaml/Haskell/ML for its really strong static type checking for compiler work. Just curious though...

nickpsecurity · on Dec 14, 2015

LISP js easy to parse, transform, build clean-slate, verify, and do fast. My first 4GL tool was a BASIC w/ macros compatible with C and C++. When converted to LISP, it was vastly easier to do transforms and macros plus productivity boost with incremental, per-function compilation. Turns out, Julia did something similar by converting the source into AST's that are LISP expressions with compiler being femtolisp.

Far as safet, you can build checks into your code for protection or use a typed subset of LISP (eg Typed Racket). Shen goes further by embedding whole sequent calculus for custom, per app/module, type systems.

So, not saying LISP is ideal for compilers but does have some ideal and good attributes. Ocaml is my default given it combines conciseness, correctness, decent performance, ease of learning, and some helpful libs.

sklogic · on Dec 14, 2015

ML got ADTs and static type guarantees, but Lisp got macros, which allows to implement things like http://andykeep.com/pubs/np-preprint.pdf - which is significantly less boilerplate than in ML. An ideal language for compilers would have combined both properties, but to make it happen, an expression problem must be solved first.

See a relevant discussion here from not long ago:

https://news.ycombinator.com/item?id=10712566

nickpsecurity · on Dec 17, 2015

A commenter elsewhere said Racket had ADT's with langplai but also with langracket via subtypes of struct type with match macro. Said it was same style of programming.

So that's static typing (Typed Racket) and macros for sure. Possibly ADT's depending on whether you object to claim above. Should cover your list of requirements.

sklogic · on Dec 17, 2015

The problem here is that a single variant tag does not uniquely define a type, as all such type systems expect.

nickpsecurity · on Dec 17, 2015

You think it could be modified to handle that?

troydj · on Dec 14, 2015

I wouldn't go so far to say that the Dragon Book is outdated and irrelevant. (I'm assuming you're referring to the 2nd edition from 2006.) Unless you're focusing on back-end optimization and code generation techniques (something a new compiler writer typically does NOT do), the bulk of the theory and material you'd cover in a first semester compiler course is fairly static.

But if a person is merely looking to bang out a compiler without getting overwhelmed with how to convert NFAs to DFAs for lexing, etc., some good alternative books are:

A Retargetable C Compiler: Design and Implementation, by Hanson and Fraser (http://www.amazon.com/Retargetable-Compiler-Design-Implement...). This book constructs and documents the explains the code for a full C compiler with a recursive descent approach (no flex/lex or bison/yacc). I have some experience augmenting this compiler, so I can vouch for the book's ability to clearly convey their design.

Compiler Design in C, by Allen Holub (http://www.holub.com/software/compiler.design.in.c.html). Downloadable PDF at that link as well. A book from 1990 in which Holub constructs his own version of lex and yacc, and then builds a subset-C compiler which generates intermediate code.

Practical Compiler Construction, by Nils Holm (http://www.lulu.com/shop/nils-m-holm/practical-compiler-cons...). A recent book which documents the construction of a SubC (subset of C) compiler and generates x86 code on the back end.

sklogic · on Dec 14, 2015

It is actually an outdated view, to split a compiler into dedicated monolithic front-end and back-end parts. The more modern approach is very much the opposite. It is a nearly continuous, very long sequence of very simple transforms, rewriting a code seamlessly, all the way down from a front-end (i.e., a parser) to a back-end or multiple back-ends. And this approach is very alien to anything you'd find in the Dragon Book.

As for parsing, as I already said elsewhere in this thread, all the techniques from Dragon Book are not practical any more and are not used in the modern compilers. There are far better ways, which are not covered in the book, and they're far simpler, not even deserving a dedicated book at all.

fahadkhan · on Dec 14, 2015

Please could you recommend an alternative?

xigency · on Dec 14, 2015

Engineering a Compiler 2nd edition, by Keith D. Cooper and Linda Torczon

The dragon books are better used as references to those who are already familiar with all of the high-level concepts in compiler writing.

landmark2 · on Dec 14, 2015

Modern Compiler Implementation in C

fao_ · on Dec 14, 2015

By A. Appel? I heard that the ML version is better since much of the code apparently isn't very idiomatic (It was supposedly translated directly from the ML book). I would probably recommend both that and The Dragon Book, as they cover roughly the same material but in a slightly different manner.

Ace17 · on Dec 14, 2015

Dick Grune - Modern Compiler Design

spooningtamarin · on Dec 14, 2015

I learned compilers this way and yes, it's extremely hard. Things got so high-level that I forgot what the real problem was and kept wondering why the algorithm I wrote isn't working.

hitlin37 · on Dec 14, 2015

Since technologies changes before the book is out, how does that hold in compiler domain? Are there any modern compiler book that explains around llvm/gcc as an example compiler?

sklogic · on Dec 14, 2015

Not sure if there are any books around LLVM or gcc, but there is a lot of up to date texts.

Grune at al., "Modern Compiler Design",

Appel, "Modern Compiler Implementation in ML",

and many more.

P.S. Because of a slow-ban, adding another one to this answer. This is the best source on SSA:

http://www.cri.ensmp.fr/people/pop/papers/2006-12-thesis.pdf

sgeisenh · on Dec 14, 2015

I'll just add this on: https://www.cs.cmu.edu/~fp/courses/15411-f14/schedule.html

The lecture notes are simple and excellent, though at a very high level of abstraction. Though most of the notes are full of references to the original material, which is super nice.