It's not just calling conventions. LLVM IR also bakes in various assumptions abo...

phaemon · on July 24, 2013

I don't understand what this means. Could you please give an example of some code that loses information in this way when compiled with LLVM?

azakai · on July 24, 2013

Say you have a struct type X with properties int, double, int. The offset of the last property depends on how alignment works on the target platform - it could be 12 or 16 on some common ones. LLVM IR can contain a read from offset 12, hardcoded. Whereas the C code contains X.propertyThree, which is more portable.

nostrademons · on July 24, 2013

But that's not how LLVM works, at least when I worked with it a couple years ago. You would define a struct type in terms of primitive types (int64, ptr, etc), and then use getelementptr with the offset of the field path you wanted. Yes, it's a numeric offset, but it's a field offset within the struct, not a byte offset. LLVM handles packing, alignment, and pointer size issues for you automatically.

azakai · on July 25, 2013

Yes, you can define structs and use getelementptr to access values. But, frontends can also bake in offsets calculated from getelementptr. They can also bake in sizeofs of structs, for example. And LLVM optimizations end up baking in hardcoded values in more subtle ways too.

vidarh · on July 24, 2013

Once you have defined a struct in terms of primitive types, it is platform dependent.

Consider C:

A C int can be 16 bits. Or 32. Or 64. Etc. As long constraints of the relation to the other types is met.

The moment the frontend specifies a primitive type for a field in the struct, that code is incompatible with a whole lot of platforms.

eropple · on July 25, 2013

Your primitive types aren't LLVM's though, are they? I mean, I haven't looked at LLVM thoroughly (just enough to be familiar with it, a friend is writing a language he wanted some input on), but I would be surprised and disappointed if they had a C "int" type as opposed to "signed 32-bit integer" or whatever. At which point it's compatible with whatever else is throwing around a signed 32-bit integer.

vidarh · on July 25, 2013

But that is exactly the point - that LLVM IR is not platform independent.

The fronted must choose which specific integer type that "int" in C maps to. At that point, the IR is no longer machine independent - if you pick 32 bit signed ints to represent C "int", your program will not match the C ABI on any platform using 16 bit unsigned int as C "int" and you won't be able to directly make calls to libraries on that platform, for example.

eropple · on July 25, 2013

So use uint32_t?

vidarh · on July 26, 2013

This misses the point. The point is that if you pass a C program that uses "int" through a C-compiler that spits out LLVM IR, the resulting LLVM IR is not portable.

You might not be able to change the C program - it might be using "int" because the libraries it needs to interface to uses "int" in their signature (and handles it appropriately) on the platforms you care about.

phaemon · on July 24, 2013

Ah, I think I see....you mean I could write non-portable IR code by doing that, although LLVM would never produce code like that? I guess there must always be IR that the frontend will never produce then?

caf · on July 24, 2013

No, the implication is that the LLVM IR that the frontend produces changes depending on the ultimate target that the LLVM IR will be compiled to. In other words, the frontends aren't backend-agnostic.

phaemon · on July 24, 2013

Oh, right! That makes more sense. So you have to specify the backend when you start the process? I didn't know LLVM did that.

azakai · on July 25, 2013

Yes, the frontend very much knows what target you are aiming for. It greatly affects the IR that is generated.

And once you generate that IR, you can't just built it to an arbitrary target, it must be the one it was generated for.

KenoFischer · on July 24, 2013

No, every LLVM frontend (e.g. Clang) has to do so all the time for things to work.

phaemon · on July 24, 2013

That makes no sense at all. You've just said that every LLVM frontend has to produce code that every LLVM frontend won't produce! If you mean something different, then could you please be clearer, as I really don't understand what you're talking about.

[EDIT: caf's post made it clearer. I know what you meant now]

bodyfour · on July 25, 2013

     int a() { return sizeof(void *); }

Obviously a trivial example, but it's illustrative: the front-end compiler knows a bunch of things about your target and bakes that information into the IL. If you took IL generated by the compiler with "-arch i386" and then compiled the IL using "-arch x86_64" it's quite possible to get a non-working executable.

It's possible to carefully write IL that will work on multiple platforms as done in this blog post, but I'm not sure how useful that really is. You're still giving up the exact control that assembly gives you, so I don't know how much better you'd get than clang-produced IR. In other words, if you want "portable assembly language" use C.

Still, its an interesting blog post. It's good to show people how the compiler works behind the curtain.