How to Get Fired Using Switch Statements and Statement Expressions (2016)

hinkley · on Oct 15, 2020

    for(i = 0; i < 10; i++){
        case 1:{

I just threw up a little in my mouth.

All these years I didn't know such a monstrous thing was legal in C code. Is it possible to make amendments to the Geneva Convention, and if so, who should I call?

kijdnrs · on Oct 15, 2020

The International Committee of the Red Cross (ICRC) in Geneva is the traditional guardian of international humanitarian law, and has led the development of additions to the Geneva Conventions of 1949, such as the Additional Protocols 1 and 2 (https://www.icrc.org/en/doc/resources/documents/misc/additio...). You can reach them at +41-22-734-60-01

akerro · on Oct 15, 2020

FUCK, you might think it's funny, but I literally work with one senior developer (10+ years experience in pure Java). He write code like that, during review insists this is OK because he's been writing such code for over a decade! I inherited this project after he moved to another country and is not legally to work remotely for company in the UK. He has permissions to override jenkins+sonarqube, so he merges code without test coverage. His code has literally 0 tests. Only today I was rewriting code like:

   if(i == 1) {}
   else if(b == true){}
   else if(s == "string") {}
   else if(response.status() == Status.OK){}
   else if(hereIsARecursiveMethodAlwaysReturningFalse()){}
   ... 3 more

notice each `if` block has different variable of different type!

hinkley · on Oct 15, 2020

I have a Good Fences Make Good Neighbors trick I use, and some people call it a Canary Build.

When contributors to two different modules keep running into contract violations, you set up a CI build that triggers when either of the modules is built. If it fails it means that something got broken. It doesn't stop the build pipeline, but it warns you that garbage is about to come out the other end.

There's a general dynamic between people where peer pressure does not work when the delay between action and consequence grows too long. Nobody truly internalizes how upset other people are when they are found out for something bad they did a year ago, a month ago, or in some cases days ago (hence why roommates fight so often about chores). But getting called out for something you did two hours ago has sorted out an awful lot of bad behavior.

And the nice thing about the Canary Build is that in many CI tools you can set it up and not give him any permissions.

lostcolony · on Oct 15, 2020

"It doesn't stop the build pipeline, but it warns you that garbage is about to come out the other end."

Why doesn't it? Prevent it from merge and build; require the dev to either fix it, or convince the rest of the team that the change should be allowed.

J-Kuhn · on Oct 16, 2020

You make an API change.

The API change is communicated and approved by both parties.

You build the first thing.

Canary fails, because API has changed.

You change the other to match the new API.

Canary fails again, because it uses the most recent non-failed build.

???

Owner of the canary build is now shunned.

lostcolony · on Oct 19, 2020

I think we must have very different expectations as to what a contract violation is.

tomc1985 · on Oct 16, 2020

I mean, that isn't exactly hideous code. Sometimes you need to check a variety of different conditions in sequence, and I don't think a switch statement works here.

So what if they're different types? It's not like your passing those variables to functions right there. And are we really so robotic that we can't understand different types in a conditional?

Except the recursive function bit. Why bother if its always false...

akerro · on Oct 16, 2020

> It's not like your passing those variables to functions right there

Yes it did. Case blocks had several lines.

>So what if they're different types?

It makes it hard to read and destroys expectation of what possible cases there are. It makes it hard to test as a lot of test preparation/mocking is necessary.

>Except the recursive function bit. Why bother if its always false...

Yea, that the point! Because there was no test for it

tomc1985 · on Oct 16, 2020

Wait, so what's the remedy for the different types issue then? Should he coerce all these diverse types into one type just to do the comparison? That just seems wasteful.

Assuming he has to do the check that way, of course.

theflyinghorse · on Oct 16, 2020

Better question is "Why are they gating all this crap here like that? What is the presumed set of expected outcomes?". This code is bordering somewhere on the line between sabotage and incompetence and it smells really bad.

akerro · on Oct 16, 2020

My solution was to change the checks into single methods and move one god-decision-making-method into classes that are components with their own logic.

hedberg10 · on Oct 16, 2020

> I mean, that isn't exactly hideous code

It is. When you read it in isolation here, it's fine. If you have to read pages and pages of it and you have to concentrate/have half your brain working on the logic of the code, you don't want to annoy your brain with details like these.

specialist · on Oct 15, 2020

I gently encourage you to dust off your copy of the CIA Sabotage Field Manual and set about getting your tormenter fired.

lmilcin · on Oct 15, 2020

At least he didn't put each one of those statements in separate classes to make you hunt for half a day to figure out what his code does.

kazinator · on Oct 15, 2020

Is there a bug in there? What's a better way to structure those tests, if they are logically valid?

IggleSniggle · on Oct 15, 2020

Depends on the actual situation, but just from looking at it I would guess that named functions instead of nebulous unrelated nested branches would be a start.

Edit: it’s also just kind of a code smell that suggests the overall structure is not well thought out, with that particular collection of tests. Could be fine in that respect in context tho

akerro · on Oct 16, 2020

As far as our manual testing goes, there was no bug in it. But also there were no tests for it. Writing tests for it is horrible. You need to prepare test for each switch/case branch and it adds a lot of mocking, faking data, setting up variables...

joshxyz · on Oct 15, 2020

Fucking hell that's a nightmare haha

kazinator · on Oct 15, 2020

I have the following pattern two or three times before. When it comes up, it's very useful.

It can eliminate code repetition, without the bother of making a whole new function:

    for(i = 0; i < 10; i++){
      switch (i) {
      case 1:
         // this is special for 1
         break;
      case 3:
         // this is special for 3
         // fallthrough
      case 5:
         // case 5 needs 3 processing, plus its own
         break;     
      case ...:
         // ...
      }

      // a slightly long block here
      // common to all cases.
    }

I don't think I've ever done anything quite like this though:

   switch (i) {
   case 0: // so we can enter the for at all
     for (i = 0; i < 10; i++) {
   case 1: ;
     }
   }

which is what the parent comment is getting at, if taken literally.

jmholla · on Oct 16, 2020

I think you have your comments for 3 and 5 backwards.

kazinator · on Oct 16, 2020

Right; case 3 needs to do the 5 processing, too.

See, this is what happens when, in a forum, I pretend that I comment. Don't worry, I don't, IRL.

frutiger · on Oct 16, 2020

This is what happens when you don’t bother to make a separate function!

kazinator · on Oct 16, 2020

True; but wait til you see how I plan to screw that up.

souprock · on Oct 15, 2020

So... I did this:

  switch(override){
  default:
    if(foo==42){
  case THING1:
      code_here();
    }else if(bar&0x42){
  case THING2:
      other_code();
    }else{
  case THING3:
      more_code();
    }
  }

I thought it was more readable than the alternatives.

jolmg · on Oct 15, 2020

If override can only be those 3 and they're all non-zero, and default is when override is 0, then the following seems clearer in my opinion.

  if (override ? override == THING1 : foo == 42) {
    code_here();
  } else if (override ? override == THING2 : bar & 0x42) {
    other_code();
  } else if (override ? override == THING3 : true) {
    more_code();
  }

predakanga · on Oct 15, 2020

There's a particularly famous instance of this, called Duff's Device[0], with a great quote attached (regarding fall-through in case blocks):

> "This code forms some sort of argument in that debate, but I'm not sure whether it's for or against."

[0]: https://en.wikipedia.org/wiki/Duff%27s_device

tasty_freeze · on Oct 15, 2020

The very article this thread is about talks about Duff's device. This comment confirms my suspicion that most HN readers, like me, read the comments before optionally reading the article.

saagarjha · on Oct 16, 2020

Often the comments are better than the article :( But yes, you need to be careful in order to not bring up things that were already mentioned. Often it is good enough to read the comments, skim the article, and then only post a comment.

predakanga · on Oct 16, 2020

A fair judgement - I had only skimmed the article before replying.

I mostly wanted to provide the quote from Duff, it seemed relevant to the OP.

wahern · on Oct 15, 2020

It's also used to portably implement generators and coroutines. See https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html More generally, the semantics are useful for machine code generation and translation.

Computed goto's are even more useful for the above, but they're an extension. I'd love to see computed goto's added to the C standard, but it's far too late to change the semantics of switch. Rather, just accept that their code flow semantics make them slightly more type safe syntactic sugar for goto--not just in how they're implemented, but in how they can be used.

wiredfool · on Oct 15, 2020

You need to call the IOCC.

saagarjha · on Oct 16, 2020

*IOCCC

georgeecollins · on Oct 15, 2020

This is a great example of the dangerous syntax of Switch statements in C. However, I do think programmers should not reflexively avoid Switch/Case statements in their code. Polymorphism is often a way of writing switch statements that looks very clean, but hides some of the same spooky branching behavior. The nice thing about a switch/branch is you can see where the code might go from the text in front of you.

lmilcin · on Oct 15, 2020

I also think that every tool has its purpose. Just because it can be misused isn't reason to banish it altogether.

In the end, readability and maintainability of the code is primary concern. I don't really care for those fads that say "you should absolutely ban switch statements". I look at these as a collection of tools that help me make the code more readable.

Usually radical statements like this come from people that will take the simple naive code and make it completely incomprehensible for exactly opposite effect that the "best practice" was supposed to achieve.

Heck, there are even good uses for goto that make the code more readable as long as you follow accepted convention and don't try to be fancy with it. When I spent couple of years working on embedded ANSI C applications I often used it to make complicated inner loops (like reading and parsing input from device) more readable.

marcosdumay · on Oct 15, 2020

In C? No, C developers should reflexively avoid switch statements, and only fight this reflex after a serious risk evaluation (and very likely, after a lot of tests are written).

Like macros, C switch is too powerful and dangerous to use on a whim.

simias · on Oct 15, 2020

I sorta see where you're coming from, but I can't really see myself avoiding switch entirely in C. I do agree that it's very poorly designed since it makes it very easy to shoot yourself in the foot. It does make it very nice to write some things, in particular state machines. It can even be less error prone in these situations since compilers will (sometimes) warn if you don't match all possible values of an enum, which is nice if you add a state to your machine and forget to update it everywhere.

Fortunately these days there are compiler warnings to alleviate some of the risks. GCC with -Wextra will warn if you have an unmarked fallthrough in a switch for instance (but not with -Wall).

But beyond that I do agree that if I could go back in time and tell Dennis Ritchie to change some things about the language "make switch break by default and add a fallthrough keyword when you actually want the behaviour" would be very high on the list. I get why he made it that way in the first place (if you implement switch/case in assembly with a jump table you get basically the same behaviour by default and you need additional code to break) but it makes for poor ergonomics in C.

brundolf · on Oct 15, 2020

C is one of those languages (like JavaScript, arguably) that's stuffed full of so many foot-guns you can't possibly hope to make them all statically impossible. So strong conventions need to exist no matter what, and "don't put case statements below the first block" seems like an easy enough one to visually enforce.

AnimalMuppet · on Oct 15, 2020

Baloney. There is absolutely no need to reflexively avoid switch statements. Just don't do this kind of garbage with them.

marcosdumay · on Oct 15, 2020

And never forget a break. And triple check the type of those case labels. And keep them short, so all of the above is verifiable.

Or, alternatively, you can default to some syntax where the compiler will help you, and leave switches to use with care only where they bring a huge gain.

StillBored · on Oct 15, 2020

Lack of 'break' is a very useful bit of syntax I use somewhat frequently for categorization. AKA you have a handful of case values that fall into a smaller subset of catagories. Rather than an unreadable set of if/else's that all have multiple if "a||b||c||d.." conditions, vs a big state machine, or match structure.

Then if one enforces a clearer formatting style than is common on most open source projects the lack of "break" statements stick out visually and it becomes much harder to miss a break statement when the intentionally missing ones have lines like

"// Intentional fall-through"

Or you use a "nobreak" macro and enforce the use of break/nobreak in the automated style/linter.

touisteur · on Oct 15, 2020

But if you had a syntax that allowed inner/embedded functions, exhaustive matching, and the ability to combine a|b..e, you might never ever EVER need fallthrough. The case statement in Ada is the best control flow structure (I'm not very familiar with advanced FP-style pattern matching, but I can read and follow Ada's with almost closed eyes, and they're exhaustive by default you won't compile until you've handled all the cases), and Ada2012 added case- and if-expressions, and now it seems there's some ongoing work on making it a bit more powerful...

This kind of construction makes the power of enumerated types, and restricted range types so much more evident.

StillBored · on Oct 15, 2020

Without a doubt, even pascal's case statement is better syntactically. It also happens to be more terse.

georgeecollins · on Oct 16, 2020

Sometimes I miss Pascal.

Kranar · on Oct 15, 2020

Also be careful about declaring variables within a case unless you explicitly introduce a block statement.

Honestly the potential for mistakes using a switch are so numerous that I absolutely agree with you about reflexively avoiding them and preferring the use of an if statement. The optimization benefit no longer holds on modern compilers so all you're left with is fall through to simplify some really complex branching needs.

Cerium · on Oct 15, 2020

At least you get a helpful warning if you try to declare a variable without a block.

dylan604 · on Oct 15, 2020

"In this article, we will discuss how you can leverage switch statements and statement expressions to produce C code that is so difficult to understand, you'll need to look at the assembly to figure out what it does. "

Go gawd man, how bad is your code that it's easier to read the assembly? I'm not familiar with reading assembly code directly, but in my mind that just reiterates the point the author was making. That's definitely in a special category of bad code.

beagle3 · on Oct 15, 2020

I have inherited C++ "class spaghetti" code, which was - in fact -- easier to read the disassembled compiled code than the source code. Because the compiler was often able to prove and inline what actually gets called -- making the code logical (and relatively easy to follow) whereas the source code was abstract nonsense.

"spaghetti hierarchy", common in C++ and even more so in Java, is in my opinion and experience, much worse than "spaghetti code" of the old Basic/Fortran and very-early-C days -- the old "goto" spaghetti was hard to follow, but at least every goto named a concrete target. In a spaghetti hierarchy, execution jumps every 2 lines (with actual actions sparsely sprinkled among those lines), but to determine where it goes - you have to keep track of which class/subclass every object was actually instantiated in, and what methods that class/subclass overrides.

im3w1l · on Oct 16, 2020

I mean the fact that it sometimes triggers compiler crashes is kind of a hint that the compiler has no idea what to do and just makes it up as it goes along.

ddingus · on Oct 15, 2020

Interestingly, that means just using:

GOTO as in jmp #address

Would be cleaner.

czbond · on Oct 15, 2020

Makes me think someone wanted to use their "CompSci Assembly" class for the first time ever and show they're an 'alpha engineer'.

brianberns · on Oct 15, 2020

> Statement expressions ... allow you to embed a compound statement within an expression. The value returned by the last expression is the value returned by the entire statement expression.

> You might ask "Why would you ever want to do such a thing?"

I just want to mention that expressions are crucial in functional programming, where even an if-statement returns a value. Statements that don't return a value don't make any sense in a pure functional world, because they aren't functions.

So hopefully that answers the "why?" question. (Of course, abusing expressions in switch statements, as in this article, isn't something you can do in FP.)

pdonis · on Oct 15, 2020

> expressions are crucial in functional programming, where even an if-statement returns a value

Yes, but in functional programming, expressions can't have side effects, so the problem that the article is discussing in that part doesn't even exist.

Jtsummers · on Oct 15, 2020

Yes they can have side effects. It's Haskell and other "pure functional" languages that reduce or eliminate side effects.

pdonis · on Oct 15, 2020

> Yes they can have side effects.

I'm not saying expressions can't have side effects in particular languages. I'm saying that, by definition, "functional programming" means that expressions can't have side effects; which means that if you are doing "functional programming", then even if the language you are programming in allows expressions to have side effects, you are not making use of that feature, but are writing your expressions to make sure they don't have side effects, since that is what functional programming requires.

ramshorns · on Oct 15, 2020

Are void functions not considered functions in functional programming?

I guess in math they're not. Functions have to have a codomain and map some things into it.

Jtsummers · on Oct 15, 2020

Regarding terminology, this is something I find frustrating in some languages. C functions aren't all functions. Java's methods aren't really all methods on objects (is it really accurate to call a static method, which is called using the class name and not an object instance, a method?). Some languages do make these distinctions, though. Playing around with SPARK/Ada at home lately, it makes a strong distinction between procedures (no return) and functions (have a return value). Procedures look like:

  procedure Put_Line (A : Integer) is
  begin
    Put(A); New_Line;
  end Put_Line;

Functions look like:

  function Square (A : Integer) return Integer is
  begin
    return A * A;
  end Square;

Similarly, in Common Lisp there's a clear distinction between functions and methods (really multimethods) by way of how they're declared. Functions are singular, have no multiple dispatch based on type, where as you can have many implementations of the same method dispatched on type. Though, again, a function may not really be a function and could be more accurately called a procedure.

However, it probably makes sense for most people to think of everything as functions or everything as methods. At least it doesn't leave them asking "Which syntax do I use to declare this?"

touisteur · on Oct 15, 2020

I almost burned all my Ada books, my Ada-u-akbar t-shirt, my 'in strong typing we trust' Lady Lovelace medallion, when 'they' added the 'in out' access mode to functions in AdaFucking2005. I mean you wait ten years and then add 'that'? Y'all are going to PL HELL for that...

Jtsummers · on Oct 15, 2020

For others, here's [0] the Ada rationale writeup on this. I suppose because I've been trying to learn SPARK more than Ada proper, this hasn't bitten me. Since SPARK's version of functions are closer to pure functions. But reading the rationale, I sort of get why they did it. Functions were already impure, but couldn't be marked as explicitly changing their inputs (access types could be altered and you'd never know by the function signature), so changing it to allow explicit `in out` parameters made some sense (as the Ada language has a preference for more explicit rather than implicit behavior). Though that kind of defeats my case since I used Ada as an example, and its functions may as well be procedures.

[0] https://www.adaic.org/resources/add_content/standards/05rat/...

touisteur · on Oct 19, 2020

There's pragma pure and I sure would like it to be enforced and to allow lazy execution sometimes.

As for pointer types being undetectable on call sites I agree that it is annoying, but they could be hidden behind a private or limited private type so I'm not sure there's an easy way out. But I still feel this change was a lazy way to make some border cases 'nicer to look at' or 'easier to code' but not easier to read...

But don't take my word for it, I hate the dot notation with burning hatred... So...

kazinator · on Oct 15, 2020

In Common Lisp, methods are the bits of code which specialize a generic function.

defgeneric most certainly defines a function. The symbol becomes fbound and can be used like function: (mapcar #'mygeneric ....) and so on.

defmethod defines specializations for it. If you write a defmethod without a matching defgeneric, then the generic function gets implicitly defined.

brianberns · on Oct 15, 2020

> Are void functions not considered functions in functional programming?

There's usually a "Void" type that's inhabited by a single value to handle that case. In a void function, all inputs are mapped to that value.

lokedhs · on Oct 16, 2020

Kotlin made this explicit by calling the void type Unit, where it is a class that has a single instance with the same name.

So when you define a function:

    fun foo(a: Int): Unit {
        ...
    }

You can actually assign a value to the result:

    val x = foo()  // x now contains the value Unit

There is another type which does not have any instances, called Nothing. Declaring a function to return Nothing indicates that the function will never return. Other than that, Nothing is a regular type, but code that uses it will be unreachable (and flagged as such by the IDE) because you can never create an instance of it.

    fun foo(): Nothing {
        throw SomeException()
    }

This leads me to the question. Unit is obviously the mathematical Unit type, but is there a way to model the Nothing type in mathematically?

Tyr42 · on Oct 16, 2020

Yeah, it's called Bottom (⊥)

    fun foo(): ⊥ {
        return foo();
    }

In Rust it's called !, and you can write

    fn will_panic() -> ! {
        panic!("uh oh");
    }

This is actually useful for some embedded systems, where the main function isn't allowed to return, and gets declared with the Nothing / Bottom / ! type.

The compiler is also smart enough to say that an infinite loop has type !, so you can have

    fn main() -> ! {
        let setup = ...;
        loop {
            blink_led();
            sleep(100);
        }
    }

and it'll complain if you put a break statement in the loop.

ramshorns · on Oct 15, 2020

Cool. So `void f (double a, double b)` is like `f : R^2 -> R^0`, where R^0 is a singleton set.

kazinator · on Oct 15, 2020

In C and C++, where this void kludge from, the void type is an incomplete type which cannot be completed and contains no values.

Jweb_Guru · on Oct 15, 2020

void is the unit type, which has a single inhabitant, so those functions have a codomain; you can think of the unit type as the set containing the empty set. Sure, you don't explicitly write a return statement in C for `void` functions, but as long as the function still terminates, it can still be thought of as producing a value. Admittedly, not a very useful one, since all having a value of type void tells you is that whatever function produced it completed, but that mostly reflects the fact that `void` functions generally do other non-functional stuff; there's nothing inherently wrong with returning or having such a value, and it can be quite useful for generic code. For example, a map from keys to unit can be used to implement a set "for free."

Even in non-generic, purely functional code, the unit type is still useful--as the domain for a constant function! In most C-like languages, this is of course just represented by a function that takes no arguments, but type theoretically it's equivalent to taking a single void argument (or any number of void arguments, of course, since the Cartesian product of two sets with one inhabitant produces another set with one inhabitant). In strict functional languages that insist that everything has a type, you will often use this encoding explicitly, to implement thunks (call-by-name evaluation).

In partial languages (aka every language you're likely to practically use) or total languages which have the principle of explosion (which covers most of the remaining languages in existence), there is also the bottom type (the empty set) which has no inhabitants; this represents falsity or impossibility, which is computationally meaningful as a (perhaps not terribly informative) type for programs that can't return a value; for instance, nonterminating ones, or ones that always throw an exception. However, since in general you shouldn't be able to produce a closed value of that type, it can usually be freely cast to any other type you want, so many languages lack explicit syntax for it.

That said, it can still be convenient at times to have a way of explicitly talking about the empty type; for example, in Rust (behind a feature flag currently), if you use a sum type where all but one of the variants includes the bottom type `!`, the compiler will recognize that only one variant is possible, and allow you to directly extract data from the inhabited variant (and can optimize out the tag, or at least that's the intent). This is useful when writing generic code that has to implement an API that returns (for instance) a `Result<T, E>`, but your implementation doesn't have an error condition; in such cases, you can set `E = !`.

In total, dependently typed languages with the principle of explosion (which are certainly functional!), the type is also useful for another reason; a function from A to False is the equivalent (constructively anyway) of the negation of A, `~A`. Therefore, False ends up getting used quite a lot in type-level expressions, for the same reason as tests against the empty set occur a lot in set theory; even though strictly speaking you shouldn't be able to produce a closed value of type False (or else you should file a bug), when you're doing proofs by contradiction you can end up with one due to some false assumption in your context (which you can immediately use to prove that anything you wish derives from that context; this can be particularly useful to discharge impossible arms of pattern match expressions). This also happens implicitly in languages that perform flow-sensitive analysis of pattern match expressions; if they detect that one arm never returns (e.g. because it throws an exception or runs a loop that clearly doesn't terminate) they can implicitly assign it the bottom type, which can then be automatically cast to the type of the full expression.

In short: not only are function types with void (and even empty!) domain and codomain functional, they are actually remarkably useful :) In fact, they are so fundamental, that almost all of (standard) dependent type theory can be constructed from just three base types: the empty type, the unit type, and bool! It's a bit unfortunate that two of these three fundamental types don't have direct syntax in a lot of languages, but as you can see this is mostly because they are so ubiquitous that they are largely hidden within other language features.

kazinator · on Oct 15, 2020

> void is the unit type, which has a single inhabitant

Not in C. void is an incomplete type, with an empty domain. There is no value which is an instance of void.

The void type cannot be completed, and so there is no way to instantiate an object of void type. An object of type void cannot be defined or declared. There can be no array of void, nor a structure member of type void.

The void keyword in C and C++ just serves as a syntactic/semantic hack.

Casting an expression to (void) is a gesture which indicates that the value is deliberately being discarded (and is not an actual conversion).

A function declared as returning void returns nothing; and a return statement with a value must not be used in its body: even a value that has been cast to (void) type. Because, remember, that is not actually a conversion to a void type, and the function has no return type.

In ANSI C, (void) was introduced to distinguish the syntax of a declared-empty parameter list from that of an unspecified parameter list. That's a pure syntactic hack. It could easily have been something else, like (static) or (--).

There can be a pointer to void, and that is loaded with yet more hacks. Such a pointer can't be dereferenced or subject to arithmetic (that being possible is a GCC extension).

Jweb_Guru · on Oct 15, 2020

I enjoy pedantry too, so I guess I kind of understand why you keep making this point. However, it's not relevant to what the OP asked, which was about the mathematical modeling of C functions that return void, rather than trivia related to C syntax. In the two positions in C and C++ that I discussed, the domain and codomain of a function, void is functionally identical to unit, despite your protestations to the contrary (there is literally no semantic distinction between them). In Rust, none of these restrictions exist and we can freely use unit in all the ways you described, but in the domain and codomain of a function it still works just like void does in C (we can't cast from arbitrary types to void, but this is more because Rust takes a hard stance against stupid casts than because it would be difficult to implement, and it has other ways of representing deliberately throwing away a value).

The C standard, of course, may disagree, and claim that there is no value of type "void" and that void functions "have no return type." That I can call and successfully return from a void function, call a function taking void, and even cast a value to void, provide ample evidence (which would also be backed up by reading the standard) that what it means by "type" is not the same thing as what a mathematician means by "type" (and "completed" is a complete red herring here). The fact that you don't explicitly return a void value doesn't really mean anything--no return or return with no value are pure syntactic sugar for returning a void value, which is also how it's implemented in many other languages that enjoy explicit unit types.

What I can't do in C is bind one of the many values of type void to a variable (and do a few other things, like call `sizeof` on the type, or explicitly name the inhabitant of void as a literal, which again have pretty much nothing to do with the semantics of the thing). That is a much weaker restriction, for the same reason that defining a binding as `const` is much weaker than a guarantee that the underlying value isn't mutated; bindings are a largely syntactic artifact and don't affect the mathematical model of a C function in any way. This is especially true for unit types, since having an instance of one is completely uninformative as they both always exist and are all definitionally equal!

Of course, this doesn't apply to void * , which as you point out is really its own bizarre thing that is mostly unrelated to void itself (I'd like to call it a pointer to the top type, but I'm not certain even that would cover its semantics). I think it's fair to say that despite some degree of overlap, void * is an unrelated concept overloading the "void" keyword, and that the actual interpretation of bare "void" is indeed equivalent to unit, not that every other use of `void` in C is completely arbitrary and unprincipled. The reason, AFAIK, why C doesn't just add all those void-related features it's missing is because it has baked in decisions like "every type has a nonzero size" and "arrays are pointers" (conflicting with void * ) that make adding stuff like this after the fact very complicated, not because it wouldn't make sense semantically (in a proper semantic model, where void * was called something like any * and zero sized types were legal, many of these issues would go away, including--I'm pretty sure--all the reasons why C and C++ must insist that void values not be completed). In any case, none of this helps with or is relevant to reasoning about C functions as mathematical objects.

And just to be extra clear--even if C had no keyword at all for `void` functions and didn't have `void` as an option for argument lists, these functions would still be modeled as taking / returning values of type unit.

kazinator · on Oct 15, 2020

> What I can't do in C is bind one of the many values of type void to a variable

The type void has no values. It's similar to the nil type in Lisp (which also has no values), except that it doesn't form the bottom of a type spindle in terms of inheritance: whereas nil is the subtype of every other type including itself, void is no such thing.

The claim that void is a type which holds one element is like saying it's the null type/class of Common Lisp (which has one element, the object nil). That's a different beast.

> The C standard, of course, may disagree, and claim that there is no value of type "void"

Since there is actually such an abstract concept (a type with an empty set of values), the C definition holds water.

I just added Common Lisp to the Wikipedia page: https://en.wikipedia.org/wiki/Unit_type That kind of wrecks some of its claims.

Jweb_Guru · on Oct 15, 2020

The Wikipedia page on unit types is completely accurate with regards to what a unit type is, and a unit type is the correct model for the type that a void function returns in C. Sure, void can be interpreted in a few different ways, but void in function return or argument position definitely cannot be interpreted as the bottom type. As far as I know, C does not really have bottom as a first-class concept, however useful it might be. So while I agree with you that it would be theoretically possible for C's statement about void not having any inhabitants to coincide with the type theoretic defintion of void, in practice it does not (for all the reasons I mentioned earlier).

You will note that all the differences mentioned between void and unit in the Wikipedia article are syntactic, not semantic; this is because semantically there is no real distinction between them.

What is bizarre to me, though, is that you didn't read the Wikipedia article you just carelessly edited, which already mentions Common Lisp and explicitly tells you not to confuse the NULL type (which has single inhabitant NIL--i.e. you agree with me that it is the same as void, and they are both unit types) with the NIL type (bottom). Firstly, the existence of a subtyping relationship that lets you upcast values of type unit to values of type symbol (or indeed, any other type) does not somehow make it not a unit type (only subtyping in the other direction would accomplish such a thing), and certainly does not invalidate any of the article's claims as you assert. Secondly, no implementation detail of Common Lisp will affect the definition of a unit type, regardless of how many edits one makes to a Wikipedia page, any more than the C standard claiming void functions don't have a return type changes how they are modeled mathematically. Notions as fundamental as unit types are defined semantically and aren't really tied to specific language syntax.

saagarjha · on Oct 16, 2020

Fun fact, in C++ void(); is a syntactically valid statement.

im3w1l · on Oct 16, 2020

I think the best way to sum up the discussion is. void in c has no values. But the most straightforward way of modeling void mathematically is a type with one value.

kazinator · on Oct 16, 2020

A type with no values is not any less straightforward.

ISO C already provides a mathematical model, and one in which void is a type with no value.

Neither concept extends well to functions that return multiple values. The proper generalization which covers that case is that a function has an ordered list of return types. A function returning nothing has an empty such list: a zero-length list of return types.

Under this model, we don't require any unit type or bottom type or anything of the sort in connection with functions that don't return anything.

Note that this is similar to a parameter list: an empty return value list is similar to an empty parameter list, in which there are no parameters and hence no parameter types at all.

This model can be applied to single-value return languages like C, if we pretend that functions return multiple values, restricted to the zero or one plurality.

Jweb_Guru · on Oct 19, 2020

For what I hope is the last time, whatever the merits of a type with no values, that does not describe the semantics of void in argument and return position in C. ISO's definition is not relevant at all here, since they are using definitions specific to C rather than type-theoretic ones, and given that you haven't addressed any of the points I've made except to cite various standards bodies, I'm not really sure how else to explain this to you.

You are correct that it is possible to forego unit in exchange for defining vectors of length N as a fundamental type. This is the solution used by Rust, for example, in defining its tuple types, and in Rust indeed the empty tuple and unit are equivalent. Personally, I do not find this simpler than defining unit as a base type and pairing as a fundamental operation on types, since you are effectively just hiding the same cons/nil definition in the definition of natural numbers used for the list length. In any case, unit is not intended to automatically generalize to multiple types, and the rule for product types is a completely orthogonal feature that is useful by itself, so this seems like a bit of an aside. It also seems a bit pointless to introduce a general type like tuples of arbitrary length if you are going to restrict "arbitrary" to 0 or 1 (well, not pointless in some contexts, but in this one it seems strictly more complicated than just having a unit type, since any type you could use with a 1-tuple already has to exist in the first place).

I don't know why you assert that we don't need bottom under this model, however. A list of length zero is not the same as bottom, it is the same as unit (I believe you may have a fundamental misunderstanding about this?). To model bottom in an equivalent fashion, you would want a mechanism for defining sum types with N constructors--0 constructors represents bottom. And just like with the tuple case, this functionality is effectively equivalent to providing the bottom type plus a two-constructor sum type (although depending on the strength of your type theory, a more elaborate construction may be preferred, generally speaking you don't need to be able to define more than two constructors at a time).

Just to reiterate (and close the door on my end of this conversation): bottom and unit are not the same, and your proposed solution does not allow you to represent bottom. It allows you to represent void because void is not semantically bottom, it is semantically unit. I am extremely confident that whatever solution you come up with for representing void, whatever its individual merits, will either be interpretable as unit, or not reflect the actual semantics of C.

kazinator · on Oct 20, 2020

> void in argument and return position in C

void in the argument position in C is just a punctuation that was added by ANSI C to denote a prototyped empty parameter list. This is because () already had the meaning of "unspecified parameter list".

C++ fixed this issue from the beginning by banning the concept of an unspecified parameter lists and so initially it had no (void). In C++, () means "empty, fully declared parameter list". C++ added (void) for ANSI C compatibility.

Note that a parameter list like (void, void) or (void, int) and other possibilities is not allowed. It really is just a syntactic hack for that special case and not a parameter type declaration.

The (void) spelling could instead have used some other token. ANSI C could plausibly have chosen, say, the token sequence (!) to distinguish a prototyped empty parameter list from (). The shape is certainly available because (!) is a syntax error. Would ! then denote the unit type, even though it's not a type specifier?

> I don't know why you assert that we don't need bottom under this model, however.

That's nothing compared to the question of where I assert such a thing.

(Still, why would we need to represent a bottom type in describing a language that doesn't have it as a concept? It's handy for "internal use" in the model. Every type system should have a bottom type.)

> your proposed solution does not allow you to represent bottom

We can easily establish the existence of more than one empty type, and stipulate that they are all distinct and incompatible, even though their domain is the same empty set. So if we have used one such type U for a specific purpose which somehow makes it unsuitable as bottom, we can invent another type V very similar to it, and call that one bottom. We endow V (and only V) with the required property that it's the subtype of every type, including itself. Since U is not endowed that way, it's not the bottom type, but U does not hinder the existence of V in any way.

As long as we don't create a conflict in a system, giving rise to a contradiction, we are not prevented from adding anything.

I understand that if the role for U is to represent something like the C void, then it could be a unit type. A unit type will do everything we need in the model. Yet, it's not elegant; the unit type carries a useless value which has no manifestation in the system being modeled.

saagarjha · on Oct 15, 2020

Well, in general you use this to write macros that don't evaluate twice.

jlebar · on Oct 15, 2020

My favorite (not actually horrible) switch statement trick:

Instead of

  string x;
  switch (y) {
    case 0: x = "foo"; break;
    case 1: x = "bar"; break;
  }

try an IIFE!

  string x = [&] {
    case(y) {
      case 0: return "foo";
      case 1: return "bar";
    }
  }();

Once you get used to reading it, there are a bunch of advantages. Among them, you can't forget a "break", and you can't forget to assign into `x`.

userbinator · on Oct 16, 2020

I don't see how either of those is simpler than

    string x = y ? "foo" : "bar";

or if y can take more values than 0 or 1, then that strongly suggests it should become a table lookup:

    string x = some_strings[y];

usefulcat · on Oct 16, 2020

I think the point is more the general technique, not this particular usage of it.

Regarding the ternary, nested ternary expressions aren't the most readable thing.

Regarding table lookup, sure but only if all values of y are sequential integers.

lmilcin · on Oct 15, 2020

Always prefer simple over clever. Both statements do same thing but only one of them is simple to understand and hard to break for a novice developer (or roughly 3/4th of your team).

KeytarHero · on Oct 15, 2020

In the case of the lambda version, the compiler will fail if you forget a break statement or don't handle a possible case. Plus this way you can make x const, which can also prevent breaking other code further down.

It seems to me the extra safety it provides actually makes it harder for a novice to break your code?

Unless you mean strictly in the sense of "harder to break compilation", but I'm not sure that's a good thing (insert obligatory mention of Rust here)

mpfundstein · on Oct 15, 2020

if your novice dev cant understand the second example, he should be fired on the spot

Jtsummers · on Oct 15, 2020

They shouldn't be fired, they should be taught. And it depends on what languages they already know and which version of C/C++ they were taught. C++'s lambdas looked very strange to me even though I'd worked with C++ code for 15 years, but nothing newer than C++03 (both what I learned in school and due to code I worked on professionally just being that old). It took me a while and finally sitting down with a couple books on "Modern C++" to grok what was going on with that syntax.

lmilcin · on Oct 15, 2020

Oh, wow, what an attitude.

Did you consider that it is possible that everybody can read the second example but may need to spend more time reading it to comprehend if they haven't seen it before and may make mistake interpreting the code?

Did you consider not every company is Intel or Google and there is a lot of companies that can't get "top 1% talent"?

Readability is about making it easy to understand the code, without putting effort into reading.

Readability is important because code is written once and red many times by people who may need to read a lot of code and don't want to spend much time trying to understand every line of it.

kortilla · on Oct 16, 2020

“Novice”

jolmg · on Oct 15, 2020

> you can't forget a "break", and you can't forget to assign into `x`.

Same benefits that you also get with:

  string x
    = y == 0 ? "foo"
    : y == 1 ? "bar"
    : "baz"
    ;

Only, using ?: additionally forces you via syntax to specify what x's value should be when all prior conditions fail. In both your examples, x is undefined when y is neither 0 nor 1.

im3w1l · on Oct 16, 2020

An autoformatter will ruin the way you laid it out and make it look awful.

userbinator · on Oct 16, 2020

Then stop using or pandering to stupid tools.

saagarjha · on Oct 16, 2020

That's a mean way to refer to your coworkers…

jolmg · on Oct 16, 2020

The formatting is besides the point.

KeytarHero · on Oct 15, 2020

This also lets you make x const

dyingkneepad · on Oct 15, 2020

Lambdas are a C++ thing, not C.

If 0 and 1 are the only possible values, you can be much much more elegant:

string x = y ? "bar" : "foo";

Else:

string x = (y == 1) ? "bar" : (y == 0) ? "foo" : "";

karlerss · on Oct 15, 2020

I'm sorry:

x = ["foo", "bar"][y]

saagarjha · on Oct 16, 2020

In C:

  char *x = (char *[]){"foo", "bar"}[y];

Gaelan · on Oct 15, 2020

Wait, what language is this? That doesn't look like C.

Jtsummers · on Oct 15, 2020

C++ lambda. You can basically do this:

  auto succ = [](int i) { return i + 1; };

To make variables in the surrounding scope visible (i.e., create a closure) you have to specify that they're available and how. Grabbing a reference to all variables in scope you could do:

  int x = 0;
  auto inc_x = [&] () { return x++; };

This makes x available via reference so it can be modified. If you just want the value:

  int x = 0;
  auto always_one = [=] () { return x + 1; };

And you can constrain which variables are visible within the lambda:

  int x,y;
  auto foo = [&x] (auto n) { x += n; };

The return type can be determined by the compiler, or they can be made explicit (I chose not to). The first examples all returned integers, the last one has void return type.

dyingkneepad · on Oct 15, 2020

That's a C++ lambda.

dataflow · on Oct 15, 2020

I think you meant switch(y) instead of case(y)?

ncmncm · on Oct 16, 2020

Putty has coroutines coded exactly that way -- one huge file, with a case for each spot that does something that could block. A coworker said, "I love it, and I hate myself for loving it."

Clang, and Gcc up to 8, will turn a switch statement with small numeric alternatives into a bitmask constant and a test against the bits. So,

  switch (c)
    case'a':case'e':
    case'i':case'o':case'u':
      return true;
  return false;

turns into a range check and a "bt" instruction, effectively

   !!((1 << c-'a') & 0x40111)

BUT: Gcc-9 and Gcc-10 both generate, instead, a jump table 168 bytes long. Microbenchmark results notwithstanding, this seems like a radical pessimization.

pfarnsworth · on Oct 15, 2020

One of the first nasty bugs I had to work on when I first came to Silicon Valley was stack corruption from a fall through of a switch statement. After something like that, you learn pretty quick to always put break lines at the end of the switch before anything else.

cjfd · on Oct 15, 2020

And then one puts in one too many in a case where fall through actually was the idea....

panda88888 · on Oct 15, 2020

I try to be explicit and place a comment saying fall through if it is the intended behavior. This helps to inform the next person reading the code.

acheron · on Oct 15, 2020

So there was a classic bug in the game NetHack where there was indeed a comment indicating there should be a fall through. Then somebody added a new case, and the fall through now went to the wrong place. But since the comment said it was supposed to fall through, nobody reported the bug.

https://nethackwiki.com/wiki/Yeenoghu#.22A_ludicrous_bug.22

climb_stealth · on Oct 16, 2020

There are linters that raise an error when there is a fallthrough without an explicit comment [0]. I feel like that should be a must for working with C code.

[0] I don't remember which one it actually was. Possibly something commercial for MISRA compliance.

cjfd · on Oct 16, 2020

Actually in C++ there is a fallthrough attribute in the language nowadays. https://en.cppreference.com/w/cpp/language/attributes/fallth...

thaliaarchi · on Oct 15, 2020

This same blog has a post on the idiosyncrasies of the C preprocessor. Reminds me of several convoluted macros I assembled to stress test a C/C++ static analyzer at my last job. My favorite that I wrote was the following, combining line continuations, trigraphs, digraphs, universal character names, and block comments. Particularly insidious is the line continuation between / and * in the block comment. It is equivalent to: #define 🇩🇪() "de"

   ??/
  %: \
  /??/
  */*\
  **/\
  def\
  ine\
   \U\
  0??/
  001\
  F??/
  1E9\
  ??/\
  U00\
  01F\
  1EA\
  (??/
  ) "\
  de"\

This example also exposes undefined behavior from the C++ standard: "if a splice results in a character sequence that matches the syntax of a universal-character-name, the behavior is undefined" [0].

[0]: https://eel.is/c++draft/lex.phases#1.2

saagarjha · on Oct 15, 2020

The best part is the syntax highlighter struggling to make sense of the code :P

dylan604 · on Oct 15, 2020

"I understand all of the words, but I am unfamiliar with the sentence structure" but in code.

ramshorns · on Oct 15, 2020

Surely it's because statement expressions aren't part of the standard. But maybe something like GNU source-highlight can handle the extensions.

qw3rty01 · on Oct 15, 2020

Is the uninitialized read example a compiler bug? `i` in the code is definitely initialized, but the compiler creates a temporary variable, where its initialization is bypassed by the switch jump. Isn't that a code generation issue (specifically where it places the jump label)?

LorenPechtel · on Oct 15, 2020

This just reinforces my opinion: Programs, like ships, sink in the C.

klyrs · on Oct 16, 2020

There are two kinds of C programmers.

  - folks who haven't seen Duff's device
  - folks who understand Duff's device

Seriously, I was waiting for this article to do something evil but... maybe I'm a horrible person but none of the examples were too bad.

a1369209993 · on Oct 16, 2020

Well, the ones involving gotoing into halfway through the evaluation of a expression were bad in that they don't work, or appear to work due to coincidence but will break if slightly perturbed, but they're not hard to understand.

klyrs · on Oct 17, 2020

To be fair, that's gcc, not C.

a1369209993 · on Oct 17, 2020

No, that's clang/llvm; gcc handles this correctly (at least by the extremely low standards of gcc/llvm handling of undefined behaviour):

  $ gcc test.c
  test.c:3:5: error: jump into statement expression

ChrisMarshallNY · on Oct 15, 2020

I remember writing code like that.

Basically, I used a switch statement as a goto.

I must deeply and profoundly apologize. It will never happen (by me) again.

luord · on Oct 15, 2020

As if I needed more reason to believe that C should absolutely not be the first language taught in CS programs (sadly my alma mater still disagrees). I think I'm gonna have nightmares.

Of course, C isn't taught like this over there, but the point is that C gives you way too much rope to hang yourself with, and that's if you're an experienced developer, let alone a college freshman. Sure, the argument can be made that screwing up as a freshman helps you learn in a way that doesn't compromise your career, but I believe that quite a few of my classmates wouldn't have given up on programming altogether had they not been thrown at C right away. I myself didn't learn to love programming until I tried less spartan languages.

quantumet · on Oct 15, 2020

My favorite "extremely compact C" style:

   if (*len * ("11124811248484"[*type < 14 ? *type:0]-'0') > 4) { ... }

because naming lookup tables is clearly too verbose. Among other interesting decisions.

userbinator · on Oct 16, 2020

I pretty much understood that line right away... it's not that different from the code I'm used to reading and writing (embedded, low-level stuff), which just shows that even within C alone there can be a huge range of style from APL-terse to mind-numbingly-Enterprise-Java verbose.

TYMorningCoffee · on Oct 15, 2020

How do switches simulate coroutines? I'm reading thru the article they linked https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html

But don't follow how they achieve independent stacks so a caller can continue where it left off in the callee.

masklinn · on Oct 15, 2020

> But don't follow how they achieve independent stacks so a caller can continue where it left off in the callee.

They're "stackless" coroutine, you can only "yield" at the toplevel (where the switch can resume), and each coroutine function can only animate one coroutine because the state is global (it's a static).

swiley · on Oct 15, 2020

There's no way to actually create something like call/cc in c so the coroutines can't yield while in a subroutine. It's more like syntactic sugar for for(;;)switch(task){...}

simias · on Oct 15, 2020

The mednafen PSX emulator uses this trick to implement some of the modules (the MDEC and SPU use it IIRC). I always found it hard to follow and personally prefer simpler, more verbose code with explicit state management.

xaedes · on Oct 15, 2020

It is using static variables for coroutine state.

busfahrer · on Oct 15, 2020

This article reminded me of 10+ years ago when I was browsing some MediaWiki code and came across "do { ... } while (false)", which had me flummoxed for a bit until I remembered that PHP has no goto statement.

fake edit: I just looked it up and it seems they added it to the language in 2009.

aasasd · on Oct 16, 2020

I've read the first example and the following explanation, and now I'm conflicted between idle desire to know more about programming-related curiosities, and the foreboding of having this atrocity in my head afterwards. It's like seeing gore on Reddit.

spaetzleesser · on Oct 16, 2020

Usually when I am planning on getting fired I get drunk every day and make inappropriate comments to my colleagues :)

But yes, C allows you to do crazy stuff. Not that you should.

glitchc · on Oct 16, 2020

Oh I loved this! Thank you for sharing it.

anonymousiam · on Oct 15, 2020

Step 1: Slip the code fragment that puts the compiler into an infinite loop into a Git commit for a large collaborative project.

Step 2: Watch the fun.

Step 3: Profit?

TheDong · on Oct 16, 2020

A git bisect script + 'timeout' would let you find the specific commit in a few minutes.

Not to mention CI should prevent that from merging in the first place.

Kuraj · on Oct 15, 2020

At risk of being downvoted for not bringing much to the discussion, but holy shit.