Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Great article. My personal take on this is that C programs are so damn reliable because there is nothing under the hood, the building blocks are so simple and transparent that you can follow the thread of execution with minimal mental overhead.

That means that when you lay out your program the most important parts (memory map and failure modes) are clearly visible.

IF you are a good programmer.

And that's the reason there is an obfuscated C contest, if a C programmer sets his or her mind on being deliberately hard to understand that same power can be used against any future reader of the code. Incompetence goes a long way towards explaining some of C's bad reputation. You can write bad code in any language, but none give you as much rope to hang yourself with as C (and of course, C++).



    the building blocks are so simple and transparent
    that you can follow the thread of execution with 
    minimal mental overhead.
I do not agree.

I've seen plenty of code that does weird things with pointers, like passing around a reference to a struct's member, then to retrieve the struct decrementing a value from the pointer + casting. Or XOR-ing pointers in doubly-linked lists for compression. And these are just simple examples.

I've seen code where I was like "WTF was this guy thinking?".

My biggest problem with C is that error handling is non-standard. In case of errors ome functions are returning 0. Some are returning -1. Some are returning > 0. Some are returning a value in an out parameter. Some functions are putting an error in errno. Some functions are resetting errno on each call. Some functions do not reset errno.

Also, the Glibc documentation is so incomplete on so many important issues that it isn't even funny.

Yes, kernel hackers can surely write good code after years of experience with buggy code that they had to debug.

But for the rest of the code, written by mere mortals, I basically get a headache every time I have to take a peek at code somebody else wrote.


> I've seen plenty of code where I was like "WTF was this guy thinking?".

Yes, that happens. But I've seen that in COBOL, Perl, Pascal, Java, PHP and in Ruby as well.

> In case of errors (s)ome functions are returning 0. Some are returning -1.

That's not a feature of the language.


     That's not a feature of the language.
Well, yes, but it's kind of nice when you've got exceptions with stack traces attached.

Some people don't like exceptions, but I do.


It is indeed "kind of nice". But the question at hand is whether it's a requirement for writing reliable software. I tend to agree with the posts here that argue that it's not. It saves time for developers, it doesn't meaningfully improve the quality of the end product.

Serious C projects tend to come up with this stuff on their own, often with better adapted implementations than the "plain stack trace" you see in higher level environments. Check out the kernel's use of BUG/WARN for a great example of how runtime stack introspection can work in C.


gdb: where


No, but C could use more consistent error handling semantics, rather than conflating return values and error codes. Worse still: a combination of a return code and a global error code.


"Worse still: a combination of a return code and a global error code."

That's not the worst that exists in C :-) Let me quote a dietlibc developer from http://www.koders.com/c/fid1639C203A2255EB1FA11DC6A68D74FEB2...

    /* Oh boy, this interface sucks so badly, there are no  words for it.
    * Not one, not two, but _three_ error signalling methods!  (*h_errnop
    * nonzero?  return value nonzero?  *RESULT zero?)  The glibc goons
    * really outdid themselves with this one. */


And a comment written by a 13-year old proves what exactly?


But first function returning multiple values. Baby steps. :-)


> In case of errors (s)ome functions are returning 0. Some are returning -1.

That's not a feature of the language.

The inconsistency is a natural, expected, unavoidable result of the language forcing, er strongly encouraging, use of an unsuitable error reporting mechanism ("find some value in the range of the function's return type that isn't in the range of the function, and use it to indicate an error"). This wouldn't be an issue with exceptions or tuples / multivalue return like some languages allow.


> like passing around a reference to a struct's member, then to retrieve the struct decrementing a value from the pointer + casting.

that's not weird, that's a pretty standard way to enqueue structures on singly/doubly linked lists... it's made somewhat prettier by offsetof/CONTAINING_RECORD though


Yeah, but why?

I mean, can't you pass a reference to the whole structure instead? I prefer pointers to void* to the whole thing, with a normal cast later, instead of seeing pointer arithmetic.

I'm not a C developer, I just play around -- I've seen for example this practice used in libev, passing around extra-context along with the file-handler in events callbacks.

That seems really ugly to me, as they could have added an extra parameter for whatever context you wanted to be passed around.


It's a relatively common pattern to have a "collection" data structure (like a list or hash table) use link structs embedded inside other structs to simplify memory management. When you append to a linked list in Java, you always allocate a new Link object and then update the various pointers. Using this pattern in C, the object you want to put in a list contains a list_link_t structure, and the list library takes as arguments pointers to these structures. This may sound like an argument about convenience, but the implications of this are very significant: if you have to allocate memory, the operation can fail. So in the C version (unlike the Java one) you can control exactly in which contexts something can fail.

For example, if you want to trigger some operation after some number of second elapses, you can preallocate some structure and then just populate it and fill it in when the timeout fires. Timeouts are usually just signals or other contexts where you have no way to return failure, so it's important that it be possible to always handle that case correctly without the possibility of failing.


This pattern is sometimes called an intrusive data structure. For example, see boost::intrusive in the C++ world. It saves allocation, gives better locality, and allows various optimizations such as the ability to remove an object from a doubly-linked list in constant time.

Another way to think about all the offsetof() stuff is that it's emulating multiple inheritance in C. You can think of structures as inheriting the "trait" of being a participant in a container; the "pointer-arithmetic-and-cast" idiom to move from a container entry to the corresponding object is isomorphic to downcasting from the trait to the object that contains it.

Interestingly it is not possible to express this pattern in a generic way within Java's type system.


True - and the important point here is that this is a pattern. Like any language, to be truly fluent in C you have to understand the common idioms as well as the syntax, grammar and vocabulary.


the purpose of offsetof/CONTAINING_RECORD is so that you don't "see" the pointer arithmetic, it's safely ensconced in a macro ;)

one advantage is it produces a generic linked list API. you can write routines to traverse, add, and remove elements from the list without caring about the structure of data stored in the list. if you use offsetof, you can also have the list data for a structure at any position inside of the structure instead of the beginning. some systems do that so they can store header information at the beginning.

you can also have elements enqueued on multiple lists. you might say that if you're doing that, you have bigger problems, but sometimes shakespere got to get paid.


Check out the Linux kernel's linked list implementation for an example of it being done right. The actual workings of it are hidden behind macros that you can look at (quite simple, easy to grasp) so it's very clean to used.


> I've seen code where I was like "WTF was this guy thinking?".

You can write obfuscated code in any language. The point about C is that the mental model is very simple. There's no magic happening anywhere, so if you can parse the language, you can figure out what's happening line-by-line pretty easily.


This is one of the biggest reasons Linus Torvalds refuses to re-write the Linux kernel in C++ even though he is repeatedly pushed to do so... in C++ a whole bunch of things outside the file (templates, operator overloading) can make it so that what you're looking at doesn't do what you think.

In C, what you see is what you get. :)


For definitions of "repeatedly pushed to do so" that mean "asked on occasion by language trolls on the mailing lists who aren't even kernel devs". ;)


    #define int double


There is a special place in hell for you.

Had me laughing though :)


Remember that if you want to use C++ in kernel development, you'll only get a small subset of C++ because there's no runtime system to rely on. Exceptions are one example of this.

Without exceptions, the advantages of C++ are not that great compared to the hassle it needs to get running in kernel mode like dealing with name mangling, static/global constructors, etc and hassling with compilers.


> like passing around a reference to a struct

I thought that references are a feature of C++, not C. Personally, I never really got references... They are just a kind of magical pointers that programmers can forget about, but they make the code much less readable and can interact in funny ways...


Btw, I corrected my statement -- I was referring to "a struct's member" from which later you can retrieve the actual struct that owns that value.

Of course C has references because C has pointers. References in C++ are just constant pointers.


> Of course C has references because C has pointers. References in C++ are just constant pointers.

This is incorrect. You cannot have a reference to nullptr, for example. Pointers and references are different beasts, nowhere in the C standard does it refer to pointers as references. The underlying representation in compilers does not imply equivalence.


Absolutely. Having learned Pascal before C, I really missed pass-by-reference for quite some time. Efficiency wise, a reference is just a hidden pointer, BUT, it is nice to know that the reference CANNOT be null. The caller of a routine expecting a reference must have actual data to pass, or the routine never gets called.

C is a very handy portable assembler, though.


You might like to read "Moron Why C is NOT Assembly"

http://james-iry.blogspot.com/2010/09/moron-why-c-is-not-ass...


I've seen code that does things like "Foo &foo = * ((Foo * )0);" (possibly split among multiple statements). It seems to work fine, I suppose it's really undefined behavior?


Dereferencing a pointer to 0 seemed to work fine?


I think what happened was that the reference was passed to a function that (1) under most conditions (I think there was a fast-path added after the function had been around a while) accessed it directly, and (2) under all conditions turned it back into a pointer for another call. The way that function was called in this particular case was outside of those "most conditions", so the only thing that was done with the invalid reference was to turn it back into a pointer and then null-check it. And so while making the reference probably counts as "dereferencing" the pointer as far as language rules go, the memory that it pointed to was never actually accessed.

(Why yes, that does sound like something badly in need of refactoring. And illegal reliance on implementation details.)


On AIX, page 0 is mapped and readable, so dereferencing null works just fine. :)


Actually, the C standard does say:

  A pointer type describes an object whose value
  provides a reference to an entity...


References are a C++ creation with a precise definition. This is what this quote is talking about, considering C does not have references.


You don't need to do wacky things with pointers to get into trouble in C:

  int i;

  /* Iterates over everything except the last n elements of array... right? */
  for (i = 0; i < length - innocent_little_function(); i++)
      do_something_with(array[i]);


>You don't need to do wacky things...

A function call in a for loop's conditional doesn't fit your definition of wacky?


A function call in a for loop's conditional is practically C best practice. K&R do it on almost every page.


They usually only do it for builtins (e.g., strlen) and const functions, which makes this example a lot safer.


A bit suspicious, maybe? I was trying to suggest the unsigned issue without actually spelling it out, not anything to do with side effects.

I probably should have used sizeof, even though that doesn't make sense there.


Incompetence goes a long way towards explaining some of C's bad reputation

Not just incompetence. Also bad language choice (usually due to legacy).

If programmers don't get enough time to properly test and review the code, which needs to be done very thoroughly in C, it's easy for even experienced developers to shoot themselves in the foot.

C is very good (let's say irreplacable) for low-level hardware and OS code. This is code that needs to be verified and tested very well.

On the other hand, using C for run-of-the-mill business projects or higher-level stuff on a tight deadline can be a very bad idea. It results in a lot of overhead for programmers to think about the details of error handling, buffer sizes, pointers, memory allocation/deallocation and so on, especially getting it right for every function. It is a recipe for screwups.

In this case it is very useful to have garbage collection, bounds checking, built-in varlength string handling, and other "luxuries" that modern languages afford you.


Right. C with Lua is a nice combination serving a wider range of projects without sacrificing C's approach to low-level correctness.


I love Lua, but what stack would allow you to use it in a web project?


Mongrel2 has a Lua web framework called tir.


Precisely. These days, with the advent of easy to integrate Javascript/Python/Ruby/Lua scripting languages, there is very little reason to write an entire application in C. My last two projects, I wrote the business logic in a scripting language, and all of the performance bound stuff in C. There are a few gotchas the first time you do this - your bindings code needs to fit with the object model that you're using in C, you need to force all calls into the VM onto the same thread etc. I ended up just writing my own IDL parser/bindigs generator, but once you've done that once, you can use it for all of your future projects, and it really isn't that much work. The pay-off is huge - you can arbitrarily move a module of code between C and your scripting language depending on the optimal split for performance/ease of programming.


Very true. When you have managers shouting "get it out" and over-promising to clients, then any language is a bad choice, but particularly powerful languages that require more careful thought and testing. I love C++ and use it regularly, but when I need to "get stuff out quickly" I'll use something like Python as I can be more reckless with it.


I think that in other languages such as Object Pascal it is easier to be a good programmer. I've seen horrible code written in OP, yes, but I think the language itself helps a programmer be better. For one thing there are fewer ways to kill yourself than even in C++, and yet there is little difference in speed between them.

I think C is one of these "other languages" because "here be dragons".

I watch a lot of people bang out C++ code as if it's totally safe, and fail. I see a lot of people hammer out C# code and say, "to hell with you, you don't even have .NET!" And so on. But today, when a programmer sits down and writes a C program they must sit and think out what they're doing and why—with no abstractions like OO to make an easy solution.

There are so many ways to blow your head off in C without knowing you left the opportunity in the program, that it forces a competent programmer to think differently about how they code. And a newbie? Well, if they aren't scared stiff about blowing a hole in their system, they should be! ;D

And C doesn't change often, unlike other languages.

I'm no C programmer, but I've seen C code for years and translated it into whatever language I'm using at the time. I have tremendous respect for UNIX/Linux, and a great many C-powered programs. Thanks for your work on them, guys and gals.


> But today, when a programmer sits down and writes a C program they must sit and think out what they're doing and why

That's true for any programming language. Sadly, far too often, programmers are unable to afford taking the time needed to think about what they are doing or understanding what happens under the hood of the libraries they link against.


>Incompetence goes a long way towards explaining some of C's bad reputation

Incompetence is REASON for C bad reputation(if any).


Agree. I'd also say that someone who is not a good programmer will find C quite unreliable. Bugs and issues popping up every now and then.


That is why in the "old days" C was used as a language for advanced courses in CS programs. It teaches you to think straight about your code. Nowadays people think it is easier just to write anything and catch exceptions later.


> the building blocks are so simple and transparent that you can follow the thread of execution with minimal mental overhead.

The first fundamental purpose of any programming language is to provide abstraction via functions. This implies that following the thread of execution is never easy and the blocks are never simple. It's pretty much a wash, with special mention for languages in the Hindley-Milner family.

The second fundamental purpose of any programming language is to provide specification abstraction via replaceable modules. This is where C fails. It is common practice in C culture to not specify interfaces in depth (we are all good programmers, aren't we?) and the implementation via manual virtual tables makes it painfully difficult to find the specific implementations in the code base.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: