Yes, that's C's biggest mistake. (But remember, they had to cram the compiler into a 16-bit machine.) No, "fat pointers" are not a backwards-compatible solution. They've been tried. They were a feature of GCC at one time, used by almost nobody.
I once had a proposal on this. See [1]. Enough people looked it over to find errors; this is version 3. The consensus is that it would work technically but not politically.
The basic idea is that the programmer knows how big the array is; they just don't have a way to tell the compiler what expression defines the length of the array. Instead of
int read(int fd, char buf[], size_t n);
you write
int read(int n; int fd, char (&buf)[n], size_t n);
It generates the same calling sequence. Arrays are still passed as plain pointers. But the compiler now knows how big "buf" is, both on the caller and callee side, and can check.
I also proposed adding slice syntax to C, so, when you want to talk about part of an array, you do it as a slice, not via pointer arithmetic.
The key idea here is that you can call old code from new ("strict") code, and strict code from old code. When you get to all strict code, subscript errors should be all checkable.
I suspect that the reason your idea was not adopted was the syntax. It's not a phat pointer, it's two arguments with some rather complex syntax to connect the two.
The reason I'm fairly confident of that assessment is I've had similar experiences with D when the syntax for something was too complex. Early on, the syntax for lambdas was rather clunkly. Everyone either hated it, or insisted that D didn't even have lambdas. Greatly simplifying the syntax was a revelation, suddenly D had lambdas and they became used everywhere.
int read(int n; int fd, char (&buf)[n], size_t n);
is a bit bulky. The initial "int n;" is a little used GCC extension. Allowing
int read(int fd, char (&buf)[n], size_t n);
is an option. "n" is used before it is declared, which is strange for C. This is only a problem because of the UNIX idiom that buffer pointer comes before size in most system calls.
char (&buf)[n]
is also a bit bulky, but that, too, is forced by C/C++ tradition.
char &buf[n]
would be an array of refs, and
char buf[n]
would be an array passed by copy.
There have been many, many attempts to "fix" C in a non-backwards compatible way. The result is always a new language. It's the backwards compatibility that's hard.
How about eschewing the passing by value altogether? If someone wants to do that, they can memcpy() the array inside the function for no extra syntax. So, the following would mean fat pointer:
int read(int fd, char buf[n], size_t n);
The following guarantees your array is not modified (but it's still passed by pointer).
int write(int fd, const char buf[n], size_t n);
The following emulates passing by value:
void foo(const int buf[n], size_t n) {
int tmp[n];
memcpy(buf, tmp, n);
}
Alternatively:
void foo(const int buf[n], size_t n) {
int tmp[n];
arrcpy(buf, tmp); // may or may not check bounds
}
Maybe this would render the proposition less useful, but it would already help. Here's for instance authenticated encryption from Monocypher, my crypto library:
It is not crystal clear that `text_size` is referring to the size of both the `plaintext` and the `cipher_text`. With something like your proposition, I could write this instead:
That way, the size of each buffer is crystal clear. Bonus: a sanitizer can check that I don't overflow my bounds (and I love sanitisers for stuff as sensitive as a crypto library).
> I also proposed adding slice syntax to C, so, when you want to talk about part of an array, you do it as a slice, not via pointer arithmetic.
I highly disagree with this. One of the advantages of conflating pointers with arrays is an obvious and very consistent way of indexing and slicing on the entire language that has minimal syntactic baggage.
The thing is, you don't want to index pointers. You only want to index a particular kind of "indexable pointers." They should be separate constructs, even if you don't have fat pointers.
Edit: That is more useful if you have function overloading, or templates, to avoid touchy ambiguities. It's still a slightly useful distinction to have in C, just for human readability.
I once had a proposal on this. See [1]. Enough people looked it over to find errors; this is version 3. The consensus is that it would work technically but not politically.
The basic idea is that the programmer knows how big the array is; they just don't have a way to tell the compiler what expression defines the length of the array. Instead of
you write It generates the same calling sequence. Arrays are still passed as plain pointers. But the compiler now knows how big "buf" is, both on the caller and callee side, and can check.I also proposed adding slice syntax to C, so, when you want to talk about part of an array, you do it as a slice, not via pointer arithmetic.
The key idea here is that you can call old code from new ("strict") code, and strict code from old code. When you get to all strict code, subscript errors should be all checkable.
[1] http://www.animats.com/papers/languages/safearraysforc43.pdf