Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> syntax = "proto2" uses explicit presence by default

> syntax = "proto3" used implicit presence by default (where cases 2 and 3 cannot be distinguished and are both represented by an empty string), but was later extended to allow opting into explicit presence with the optional keyword

> edition = "2023", the successor to both proto2 and proto3, uses explicit presence by default

The root of the problem seems to be go's zero-values. It's like putting makeup on a pig, your get rid of null-panics, but the null-ish values are still everywhere, you just have bad data creeping into every last corner of your code. There is no amount of validation that can fix the lack of decoding errors. And it's not runtime errors instead of compile-time errors, which can be kept in check with unit tests to some degree. It's just bad data and defaulting to carry on no matter what, like PHP back in the day.



> It's like putting makeup on a pig, your get rid of null-panics

How so? In Go, nil is the zero value for a pointer and is ripe for panic just like null. Zero values do not avoid that problem at all, nor do they intend to.


Ill give you that nil is a fine default for pointers, and pointer like things (interfaces, maps, slices). Its mostly fine to use empty string. However 0 has semantic meaning for just about every serialized numeric type I've ever encountered. The zero value also does really poorly for PUT style apis, "did the user forget to send this or did they mean to set this field to empty string" is very poorly expressed in Go and often has footguns around adding new fields.


In Go, you would use additional bits to identify whether the zero value is valid/present. You would not use the zero value of an integer, or other type, alone, unless the zero value truly has special meaning and can stand on its own; an uncommon case for integers, as you point out.

Unfortunately, there is no escaping the extra bits. Not a terribly big deal for a large, powerful machine with lots of memory, but might be a big deal over constrained networks. Presumably that is why proto3 tried to save on the amount of data being transferred. It adds up at Google scale. But it did eventually walk back on that idea, making the extra data opt-in.


I'm not convinced at all that data-size is the reason for any of this. You would also be sending more data on any type that is not a primitive, i.e. any type that is bigger then the sentinel null value, you would have to send the full blown struct.


As is often the case, Go's designed anachronism creates more problem than it solves: had Go had a modern, expressive type system, rather than staying with one from the 70s, this problem would never exists.


That's just that they picked a worse case of zero value for slices and maps, presumably for performance gains.


The slice type is an implicit struct, in the shape:

    struct {
        data uintptr  
        len  int      
        cap  int      
    }
Which is usable when the underlying memory is set to zero. So its zero value is really an empty slice. Most languages seem to have settled on empty slice, array, etc. as the initialized state just the same. I find it interesting you consider that the worst case.

Maps are similar, but have internal data structures that require initialization, thus cannot be reliably used when zeroed. This is probably not so much a performance optimization as convention. You see similar instances in the standard library. For example:

    var file os.File
    file.Read([]byte{}) // panics; file must be initialized first.


> Maps are similar, but have internal data structures that require initialization, thus cannot be reliably used when zeroed.

Its a performance thing, the map structure requires a heap allocation even with zero elements. This is because the map structure is type generic without being fully monomorphized.


The heap allocation could transparently happen at use. The only additional performance overhead that would bring is the conditional around if it has already been initialized or not, which is negligible for modern computers. Go is not obsessed with eking every last bit of performance anyway.

But, that is not the convention. It is well understood that if a struct needs allocations that you must first call a constructor function (e.g. package.NewFoo). It is reasonable to expect that map would stick to the same convention.


It should be noted that slices and maps are completely opposite ends of how they behave in relation to nil in Go. A nil slice is just an empty slice, there is no operation you could do with one that will fail if done with the other. In contrast, a nil map doesn't support any operation whatsoever, it will panic on doing anything with it.


You're quite mistaken. You can use len() on both nil maps and slices, and it will return zero (as with empty ones). Panic occurs on both nil assignments. But then, access only panics on nil slices - nil maps produce the zero value. It's horrible.

https://go.dev/play/p/2KFOoJ0oyWB


Access on an empty slice would also panic because of out of bounds. So there’s no useful distinction there.

However, nil slices work just fine when calling append on them, while writing to a nil map in any way will panic.


Right, forgot that len() works on nil maps, and I was really not aware that reading from a nil map is not an error, that's crazy.

For the nil slice though what I said remains true: a nil slice is the same thing as an empty slice. Of course reading from or writing to its first element panics, given that it doesn't have any elements. The same would have happened if you had initialized it as `var ns []int = make([]int, 0)` or `ns := []int{}`.


Ah now I get it, yes you're right, they behave the same.

Ironically, just today I read that in the next release they will add some JSON annotation to distinguish nil slices (omitzero): https://tip.golang.org/doc/go1.24


I don't think the reason for zero values has anything to do with "avoiding null panics". If you want to inline the types, that is avoid using most of your runtime on pointer chasing, you can't universally encode a null value. If I'm unclear, ask yourself: What would a null int look like?

If what you wanted was to avoid null-panics, you can define the elementary operations on null. Generally null has always been defined as aggressively erroring, but there's nothing stopping a language definition from defining propagation rules like for float NaN.


Sorry, I don't follow you. If you don't have zero values, you either have nulls and panics, or you have some kind of sum-type á la Option<T> and cannot possibly construct null or zero-ish values.

Is there a way to have your cake and eat it too, and are there real world examples of it?


You're thinking in abstract terms, I'm talking about the concrete implementation details. If we, just as an example, take C. and int can never be NULL. It can be 0, compilers will sometimes tell you it's "uninitialized", but it can never be NULL. all possible combinations of bit patterns are meaningfully int.

Pointers are different in that we've decided that the pattern where all bits are 0 is a value that indicates that it's not valid. Note that there's nothing in the definition of the underlying hardware that required this. 0 is an address just like any other, and we could have decided to just have all pointers mean the same thing, but we didn't.

The NULL is just a language construct, and as a language construct it could be defined in any way you want it. You could defined your language such that dereferencing NULL would always return 0. You could decide that doing pointer arithmetic with NULL would yield another NULL. At the point you realize that it's just language semantics and not fundamental computer science, you realize that the definition is arbitrary, and any other definition would do.

As for sum-types. You can't fundamentally encode any more information into an int. It's already completely saturated. What a sumtype does, at a fundamental level, is to bundle your int (which has a default value) with a boolean (which also has a default value) indicating if your int is valid. There's some optimizations you can do with a "sufficiently smart compiler" but like auto vectorization, that's never going to happen.

I guess my point can be boiled down to the dual of the old C++ adage. Resource Allocation is NOT initialization. RAINI.


Then your point is tangent to the question of zero values, and even more so to the abstract concept of zero values spilling over into protobuf.


No, there isn't. It is just other versions of the same problem with people pretending it is somehow different.

People generally like to complain about NULL/nil whatever, but they rarely think about what the alternatives mean and what arrangements are completely equivalent. No matter what you do, you have to put some thought into design. Languages can't do the design work for programmers.


There is a way to have your cake and eat it too: rust.

In rust, you have:

    let s = S{foo: 42, ..Default::default()};
You just got all the remaining fields of 'S' set to "zero-ish" values, and there's no NPEs.

The way you do this is by having types opt in to it, since zero values only make sense in some contexts.

In go, the way to figure out if a type has a meaningful zero value is to read the docs. Every type has a zero value, but a lot of them just nil-pointer-exception or do something completely nonsensical if you try to use them.

In rust, at compiletime you can know if something implements default or not, and so you can know if there's a sensible zero value, and you can construct it.

Go doesn't give you your cake, it gives you doc comments saying "the zero value is safe to use" and "the zero value will cause unspecified behavior, please don't do it", which is clearly not _better_.


> There is a way to have your cake and eat it too: rust.

Suppose my cake is that I have a struct A which holds a value, that doesn't have a default value, from your library B. Suppose that at the time I want to allocate A I don't yet have the information I need to initialize B, but I also know that I won't need B before I do have that information and can initialize it. In simple terms. I want to allocate A, which requires allocating B, but I don't want to initialize B, yet.

What do I do?

If you answer involves Option<B> then you're asking me to to grow my struct for no gain. That is clearly not _better_.


Doesn't Rust have explicit support for uninitialized memory, using the borrow checker to make sure you don't access it before initializing it? Or does that just work for local variables, not members of structs?


You can’t do the “declare before initializing” thing with structs, that’s correct.


Then you can't eat it too (or else you'll get very sick with NPEs/panics), sorry.


More specifically, it could result in undefined behavior, if a panic happens between the allocation and initialization (i.e., it was allocated, not initialized, panicked, and something observed the incomplete struct after the panic). Alternatively, the allocation would always have to leak on panic, or the struct would have to be deallocated without a destructor running.


I agree that rust, with Option and Default, is the only right choice - at least from what I've tried. Elm for example has Option but nothing like Default, so sometimes it's tedious that you have to repeat a lot of handmade defaults, or you're forced to use constructor functions everywhere. But at least the program is correct!

Go is like PHP in regards to pushing errors forward. You simply cannot validate everything at every step. Decoding with invariants is the right alternative.


What is with Rust evangelicals shitting up Go posts? Shut up and go away! Go talk to other Rust users about it if you love it so much!

it's for different things!

the things I build in Go simply do not need to be robust in the way Rust requires everything to be, and it would be much more effort to use Rust in those problem domains

Is Go a more crude language? maybe! but it lets me GET SHIT DONE and in this case worse really is better.

All I know is that I've spent less time over the last ten years writing Go dealing with NPEs than I have listening to Rust users complaining about them!

if you love Rust so much, YOU use it then! We like Go, in threads about Go. I might like Rust too, in the same way I like my bicycle and my car, if only the cyclists would shut up about how superior their choices are


> or you have some kind of sum-type á la Option<T> and cannot possibly construct null or zero-ish values.

Option types specifically allow defaulting (to none) even if the wrapped value is not default-able.

You can very much construct null or zero-ish values in such a langage, but it’s not universal, types have to be opted into this capability.


Exactly my point, you have to opt-in, and in practice you only do precisely where it's actually necessary. Which is completely different than "every single type can be a [null | zero value]". You cannot possibly construct some type A (that is not Option<A> or A@nullable or whatever) without populating it correctly.

Of course you need some way to represent "absence of a value", the matter is how: simple but incorrect, or complex but correct. And, simple/complex here can mean both the language (so performance tradeoff), and (initial) programmer ergonomics.

That's why I ask if you can have your cake and eat it too, the answer is no. Or you'll get sick sooner than later, in this case.


> You cannot possibly construct some type A (that is not Option<A> or A@nullable or whatever) without populating it correctly.

Except you can. The language runtime is clearly doing it when it stores [None|Some(x)] inline in a fixed size struct.


There is no way to store None | Some(x) in sizeof(x) bytes, for simple information theory reasons. What you can do is store between 1 and 8 optional fields with only 1 byte of overhead, by using a single bit field to indicate which of the optional fields is set or not (since no commonly used processor supports bit-level addressing, storing 1 extra bit still needs an entire extra byte, so the other 7 bits in that byte are "free").


> There is no way to store None | Some(x) in sizeof(x) bytes

That's subtlety incorrect. Almost all languages with NULLs in fact already do this, including C. On my machine sizeof(void*)=8, and pointers can in fact express Some(x)|None. The cost of that None is neither a bit not a byte, it's a single value. A singular bit pattern.

See the None that you talk about is grafted on. It wraps the original without interfacing with it. It extends the state by saying "whatever the value this thing has, its invalid". That's super wasteful. Instead of adding a single state, you've exploded the state space exponentially (in the literal sense).


I should have made that caveat: if X doesn't need all of the bits that it has, then yes, you can do this. But there is no way to know that this is the case for a generic type parameter, you can only do this if you know the semantics of the specific type you are dealing with, like the language does for pointer types.

I should also point out that in languages which support both, Option(int*) is a valid construct, and Some(nullptr) is thus not the same thing as None. There could even be valid reasons for needing to do this, such as distinguishing between the JSON objects {} and {"abc": null}. So you can't even use knowledge of built-in types to special-case your generic Option implementation. Even Option(Option(bool)) should be able to represent at least 4 distinct states, not 3: None, Some(None), Some(Some(true)), Some(Some(false)).


From what I remember, proto3 behavior happened to map to objective c since iOS maps coincidentally happened at around the same time so they could be loud.

It was partially reverted with proto3 optional and fully reverted finally. Go's implementation happened to come around the same time as proto3 so allowed struct access, despite behaving quite differently when accessing nil fields. That is also finally reverted. Hopefully more lessons already learned from the Java days will come sooner than later going forward...


Yes, as much as I love Go and love working with it every day. This inner workings of Go with zero-values has been an design issue that comes up again and again and again.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: