Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Numbers and tagged pointers in early Lisp implementations (snellman.net)
89 points by jsnell on Sept 9, 2017 | hide | past | favorite | 21 comments


A piece of more recent history that people might not know: the 32-bit SPARC chips had some special instructions for adding and subtracting integers, assuming a representation in which the two low-order bits were used as tag bits, and a value of 00 in those bits indicated a fixnum (Lisp-speak for an immediate integer). The TADDCC instruction ("Tagged ADD and set Condition Codes") was just like ADD, except that it would also set the overflow condition code if either operand had a tag value other than 00 (it would also be set if the addition overflowed). The next instruction would call a handler if the overflow bit was set; the handler would determine whether (a) this was a case of actual integer overflow requiring construction of a "bignum" (a heap-allocated integer); (b) one or both operands were already heap-allocated integers (or perhaps other kinds of numbers) requiring some other form of addition; or (c) one or both operands were not numbers at all, a runtime error. But the point is, the most common case was handled in two instructions, and those could even be combined into one using the TADDCCTV variant, which trapped automatically on overflow. (There were also TSUBCC and TSUBCCTV, of course.)

My understanding of the history here is that a lot of Sun people, notably Bill Joy, came from Berkeley, where Franz Lisp was also developed; and in the mid-1980s (the first SPARC came out in 1987) it still seemed possible that Lisp would be of great commercial importance. AFAIK, these instructions were dropped for the 64-bit SPARC.

But the point of posting this is to underscore that the use of "fixnums", tagged immediate integers, is well known. Not only, as Juho has helpfully uncovered, does it date back to the 1960s, but it had explicit support in what was once a major microprocessor architecture.

I'll digress a little further and add that I really wish Java had used the Lisp approach for arbitrary-precision integers. Then Joshua Bloch would never have had to write this blog post: [0], among many other benefits.

[0] https://research.googleblog.com/2006/06/extra-extra-read-all...


Furthermore, SPARC, like most RISC at the time, didn't support unaligned accesses, so to a large extent you got tag checking for free. Eg. assuming "01" tags for pointers and a simple two-word CONS cell, car would simple be one load instruction.

  ld %rd, [%r1 - 1]
if %r1 didn't have a "01" tag, this would trap.

Finally, SPARC reserved 8 registers for "global" use, that is registers what wouldn't be otherwise used by the compiler. Use was as simple as

  register int *sp asm("%g3");
which was immensely valuable for VM implementations (and others).

Unfortunately, the tagged add instructions didn't get much use; perhaps SPARC wasn't a big enough base to justify designing around it.


... and these days tag checking seems to be almost for free on out-of-order processors (e.g. x86) because the check is only a couple of instructions with a predictable control dependency and no data dependency.


Right. This is probably one reason TADDCC was omitted from SPARC64.


There certainly were Lisp implementations that used it -- notably Franz Allegro Common Lisp, which I used for years, FWIW.


Eric Benson claimed that this was added for Lucid CL:

https://compilers.iecc.com/comparch/article/91-04-082

> Yes, the tagged arithmetic instructions were put in the SPARC architecture for Lucid Common Lisp.


That could be true. But I'd like to hear what the Franz people have to say about it before I accept it as the whole truth.


Don't forget that Lucid was the OEM for the Lisp implementation in SUN Common Lisp and the SUN Symbolic Programming Environment.

http://ftp.lanet.lv/ftp/sun-info/sunflash/1990/Aug/20.01.lis...

http://3e8.org/pub/scheme/doc/lisp-pointers/v2i2/p5-endelman...


Interesting. I never knew about that, or had long forgotten. The assertion about the editor being written in Common Lisp is especially surprising; is that true? I never used Lucid or SPE, but I certainly did use Lucid Emacs -- in fact I still use XEmacs daily. Why would they have written a second editor?


Lucid had an editor called Helix. It was based on CMUCL's Hemlock. Probably that's also the editor SUN used in SPE. Hemlock is a Emacs variant written in Common Lisp.

Lispworks's and Clozure CL's editor also started from Hemlock.


Some of the ideas in SPARC came from SOAR, described in David Ungar's thesis [1]. He suggests that the Rice University R2 was the first machine with tags though the R1 [2] seems to have used tags too.

[1] https://books.google.co.uk/books/about/The_design_and_evalua... [2] https://en.wikipedia.org/wiki/Rice_Institute_Computer


Sadly Java was only half way to Lisp, as Guy Steele puts it.

Thanks for sharing the experience.

Ada was also another example of having Ada Machines, that is how Rational was created.


Just tangentially: I have high hopes for an experiment with a 96-bit word size in RaptorJIT (a fork of LuaJIT, a compiler for a Lisp-like language.) That's a 32-bit tag and a 64-bit value.

This seems practical for a few reasons:

Full 64-bit pointers and numbers can fit into tagged words. (No boxing and so no heap allocation for u64 and pointer arithmetic.)

Naked 64-bit values can still be stored in registers and used directly. The JIT knows the type of each register and does not need to load the tag bits (they are checked on load and added on store.)

The x86-64 target CPU is efficient for unaligned access.

We will see how this works out in practice. Link: https://github.com/raptorjit/raptorjit/pull/93


How do you pass arguments? In two registers? On the stack?


This turns out to be much simpler than expected, on paper at least.

First, this is a tracing JIT and so it basically inlines everything. There is no argument passing convention at all within a block of JIT code because subroutines bodies are completely inline. So - passing tagged values as arguments is actually a fairly infrequent operation.

Then in cases where tagged values really do need to be passed around they will always be passed by pointer reference. The JIT will write them to memory - the Lua stack or the heap - and pass a pointer reference in a register.

The really happy circumstance is that the whole C runtime system is already written to reference tagged values using pointers, and there is already a port that supports 64-bit Lua values using 32-bit machine words.


I see. It seems like that would still make calling supporting routines annoyingly expensive, but perhaps that (and the extra tag memory) doesn't matter as much as I think it does.

Good luck with the project.


Thanks!

There are really two kinds of supporting routines here.

Runtime system routines operate on tagged values and do have to swallow the stack convention. Thankfully the code is already written in this style and the JIT seldom invokes any of these routines in optimized programs (it generates inline versions.)

Second is C routines called via FFI. These use native C data types and the standard platform calling convention. They never see the tagged values (get an untagged uint64_t in a register instead of a TValue struct.) These are easy, cheap, and frequently used.


I would read the PDP-6 Lisp description as meaning that negative and larger integers were boxed, not that there wasn't a way of representing them.


Indeed. But I think the text of the article already agrees with you :)


Yes, you did write that you were just considering immediate values.

I'm not sure that PDP-6 Lisp used tags though, MacLisp used BiBop to determine the type of an object so the entries in the type table corresponding to the special pointers would have contained the fixnum type.


They were still writing up BIBOP as the new hotness in 1977, so I'm pretty sure this predates BIBOP by a decent margin.

But it's a fair point that the encoding in PDP-6 LISP doesn't match what we'd think of as tagging today. They left the 0-pointer as a non-integer (probably that meant NIL), so the type test would have been done using two comparisons rather than a mask+compare.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: