Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Are pointers and arrays equivalent in C? (2009) (thegreenplace.net)
112 points by deanstag on March 21, 2015 | hide | past | favorite | 64 comments


You can actually pass an array as an array in C99:

    #include <stdlib.h>

    void f(size_t m, size_t n, float a[m][n])
    {   for (size_t i = 0; i < m; i++) {
            for (size_t j=0; j < n; j++) {
                a[i][j] = 0;
            }
        }
    }

    int main(int argc, char* argv[]) {
        float a[10][20];
        f(10,20,a);
    }

This "conformant array" feature was supported in C99. GCC supports it. Microsoft didn't like it, though, and didn't implement it, so nobody uses it, and it was made optional in later versions of the C standard.

It doesn't do what it looks like it does. The last subscript is totally ignored by the compiler. For 1D arrays, it's not only ignored, the object is still passed as a pointer to the element type.

I once proposed making arrays in C size-safe by taking conformant arrays seriously and adding some features to make them more useful.[1] After much discussion, the idea was considered sound, but not popular enough. So buffer overflows in C continue.

I have great hopes for Rust.

[1] http://animats.com/papers/languages/safearraysforc43.pdf


That's ordinary C89 behavior, not a C99 feature, and I'm pretty sure Microsoft supports it too. The C99 feature here is VLA (variable-length array), arrays whose length is a runtime value.

If you change m and n to be constants, it works with C89.

The rule is simple (yet confusing!): In a function prototype, if the parameter (top-level) type is an array, it is de-sugared to be an element-pointer instead.

Thus, the code you wrote with constants instead:

    #define M 10
    #define N 20
    void f(size_t m, size_t n, float a[M][N])
De-sugars, as per C89 ordinary rules, to:

    void f(size_t m, size_t n, float (*a)[N])
And it is an ordinary array degradation to a pointer, except instead of pointing to a float, it happens to point at a float[N].


Variable length arrays are a different feature. They are local temporary arrays of variable size:

    int f(size_t n) {
        int temp[n]; /* local array of size n */
    ...
    }
Someone once used that feature in the Linux kernel to get temp space for a string. It was taken out, because those arrays went on the stack and thus allowed large kernel stack growth based on data from outside the kernel.

Nobody uses that feature either.


EDIT: I now see what you mean, sorry for misunderstanding.

The feature sounds useful, but the reason I misunderstood you is that you used the same syntax that represents array-to-ptr degradation, which made me think you were confused :-) I'm not sure how you would make this properly backwards compatible with C. You'd need to use a different syntax for such a major new feature.

Kept here for posterity:

You used variable length arrays in conjunction with ordinary array-of-array notation that (in function parameter list context) degrades to ptr-to-array.

Here is the output of gcc on your code, with -Wvla (which warns about use of vla):

  gcc -c testvla.c -o /dev/null -Wvla -std=c99
  testvla.c:3:1: warning: variable length array ‘a’ is used [-Wvla]
   void f(size_t m, size_t n, float a[m][n])
   ^
  testvla.c:3:1: warning: variable length array ‘a’ is used [-Wvla]
The VLA is why your code is C99. If you use a[10][20] in the function prototype, it is ordinary C89, and if Microsoft wrote a C89 conformant compiler, it works on Microsoft's compiler as well.


Safe C++ equivalent which I've used on Microsoft's compilers:

template< size_t m, size_t n > void f( float (&a)[m][n] ) { ... }


When I was first learning C, pointers scared the crap out of me, and it took me longer to understand them than I am comfortable admitting to.

Then I got pointers and this big "NOW I get it"-moment and felt really clever.

And then came the moment I had a problem with pointers and arrays being interchangeable most of the time and became very confused about when they were not equivalent.

People who like C (which I do, too) will often point out C's perceived simplicity when compared with, say, C++, but C definitely does have its murky corners, and the need for backward compatibility with existing code will most likely prevent them from getting cleaned up.


I never had trouble with this because from an early age I developed a mental model of how memory worked: numbered bytes stored in a long sequence. What can you actually store like this? Well, bytes obviously, and numbers or characters represented by sequences of bytes. But you can also store addresses to locations in this long list of bytes. And you can store an address of an address. I mean, how else could it work, I asked myself. Now a pointer is a location in memory that stores an address to something. What is an array then? It is a pointer to a beginning of a typed sequence (typed as in you tell the compiler how to treat each subsequence but that is fluid because of casting). So a[1] and *(a + 1) are the same thing. C is simple in that it always deals with sequences of bytes, nothing more.


There is a quote (I forget who it is attributed to), that "Unix is simple, it just takes a genius to understand its simplicity" or something like that.

If you come from a bash/perl background to C, suddenly being so close to the bare metal can be overwhelming. Most languages go out of their way to hide how the CPU and memory actually interact. C (and C++ to the degree that it is C's offspring) is the only language that makes it this explicit. And even so, many C tutorials and textbooks will try to distract readers from this, for various reasons.

I've met a few other programmers that also found pointers confusing and intimidating at first, and they, just like me, had this moment when we finally got it, and suddenly it was so simple, and we wondered how we could ever have been so stupid not to understand this.

It's a mild version of the enlightenment Lisp hackers must experience when they finally get it. ;-)


Much of my initial confusion with C style pointers was using & to show pass-by-reference and to get the address of a variable. It still doesn't make sense, but at least I've got it deeply ingrained now :\


C doesn't have pass by reference.


It has "pass a pointer by value", which works out conceptually to the same thing, only with slightly different syntax.


Yeah I was talking about this:

> was using & to show pass-by-reference


No, it does not. That is why pointers are so important in C. But to a newbie, the difference between a local variable and a pointer to it can be rather confusing, and the fact that with arrays this difference becomes rather blurry unless one has a good mental model of what the compiler turns your code into, makes it even more confusing.

Given that some data structure such as linked lists are highly inconvenient or even impossible to implement without dynamic memory management, one is pretty much bound to writing code that is either primitive (not always a bad thing) or horrible to look at (and maintain) until one understands pointers.


C++ hates creating new words so they try to reuse old stuff as much as possible with new meanings.

There are 3 meanings of static, 2 meanings of auto, 2 meanings of delete, 2 meanings of default.


This is not limited to C++. All widely-used, "old" programming languages shun away from adding keywords, because it can break previously working code. Python is the same way too.


I'd say that in the case of pass-by-reference, the symbol they reused makes sense: references are basically syntactic sugar for a limited form of "automatically-dereferencing pointers", meaning that creating a reference involves taking an address, and so & would be the logical choice.

Personally I don't think it's so useful of a feature; it saves typing a few asterisks but obscures the fact that you're actually modifying some other variable.


Personally I don't think it's so useful of a feature; it saves typing a few asterisks but obscures the fact that you're actually modifying some other variable

There are many real-world use cases where references are exactly the right feature, and the const-ness of that reference should make the intentions clear.

Performance-critical code is one place where not having the ability to pass large objects around by reference would be a performance killer. References allow us to avoid making unnecessary copies.

Yes, you can accomplish the same thing with a pointer, but raw pointers are much less common in modern C++. It's not just syntactic sugar; a reference conveys information. When writing a function you can eschew null pointer checks when using references, whereas a function taking a (raw) pointer should handle the null pointer case since technically you can call foo(nullptr) and it'll compile.

With a reference bar must exist for the foo(bar) expression to compile when calling void foo(const Bar& bar), so it's safer, simpler code.


Without references there would be tons of problems.

How would you implement a copy constructor or copy assignment operator without them? Those two things are necessary for user defined types to be able to be treated like built in types, which is necessary if you want templates to be fully generic, which is essential for all the STL containers.


Given that references are implemented with pointers, those special functions are really taking pointer parameters anyway. Syntax-wise things might look a little different, but I don't think it poses much of a real problem.


But you lose the constant syntax that allows STL containers to work. vector<int> and vector<Person> work because vector can be templated, and

    int i1;
    int i2 = i1;

    Person p1;
    Person p2 = p1;
both have the same syntax. If suddenly p2 = p1 no longer compiles and you have to do p2 = &p1 that looks really weird because the two sides of the = have different types and vector<Person> no longer will compile. On the other hand if we say all overloaded operators automatically convert operands to their addresses, then we are sort of back where we started with references: you can modify a variable inside a function without having to say &. Only this would be much less flexible than what we have now because sometimes you want an overloaded operator to take something by value, but wouldn't be able to do it.


I'm the same way! Should have used '@' to create a pointer, and '*' to deref one. They got it all twisted up in 'C'.


I think the easiest compare-contrast is with Pascal, and 'C' just shows the implementation details of pass-by-reference a bit more than Pascal.


Granted it was a long time ago, but what helped me to understand pointers was learning a simple computer architecture. I happened upon a book on the Z80, that I recall was published by Howard Sams, but might have been sold at Radio Shack.


C++ inherited this particular complexity from C, so it naturally falls out of a complexity comparison.


One other difference on modern compilers relating specifically to character arrays:

This will work fine:

    char arr[] = "stuff";

    printf("arr:%s\n", arr);
    arr[0] = 'b'; arr[1] = 'l';
    printf("arr:%s\n", arr);
arr is allocated on the stack (not heap, sorry about the mistake) and is mutable.

But the following will segfault:

    char* ptr = "stuff";

    printf("ptr:%s\n", ptr);
    ptr[0] = 'b'; ptr[1] = '1';
    printf("ptr:%s\n", ptr);
Strings are allocated in the read-only data segment. Technically you can only write const char* ptr = "stuff", but due to wanting to remain backward compatibile, gcc will let you write the above and not even warn you unless you use -Wall or similar.


Yeah. I was exploring exactly this problem when i came across this original article. Later i saw that "char arr[] = "stuff" " actually inserts assembly code that puts the value stuff into the stack. That was interesting news to me. stackoverflow.com/questions/29184100/char-array-and-pointer-initialization-semantics/29184190#29184190


This is straight from Expert C Programming book, including the diagrams. It's dishonest to recycle and present it as if it's yours with a reference mention at the very bottom. This is not a reference, it's a copy.


I agree with the other poster. I was going to upvote you earlier, and torrented the book you refer to for verification (ironically because I don't believe material should be stolen like this), but apparently I never got around to it because I was able to downvote you just now. The reason I never finished upvoting you was that I didn't think I found the right book: Expert C Programming doesn't match your description (of not being a reference, but rather the article being a copy of it). I didn't find the diagrams, and the text was very different.

So it sounds like your accusation is way off-base, and you should specify that you were wrong. Maybe you were just reminded of that other book but didn't verify that the article was a copy of it?


What was copied? I flipped through my copy of Expert C Programming and didn't see either of the diagrams.

If instead by "straight from" you mean "similar to", well, there is only a handful of ways to explain this material so of course they are going to be similar.

Please don't throw around accusations of copying lightly. I would like to see more material like this shared, not less.


> straight from Expert C Programming book, including the diagrams

no it is not. in fact, this article cites the book that you mention as one of the references.


Also sizeof is different.

   char a[20];
   char *b;
sizeof a -> the size of the array (20)

sizeof b -> the size of the pointer (8 on my machine)


I was really surprised that it wasn't mentioned since it is the most important difference and a cause of frustration for programmers just beginning to learn C.


Yes, to me this is the largest difference, since it will cause runtime errors instead of compile problems.


Funny how old articles resurface suddenly :-) It's great that HN still likes technical content, though.

FWIW I also wrote a (shorter) follow-up post a few months later about multi-dimensional arrays and their relation to pointers-to-pointers: http://eli.thegreenplace.net/2010/04/06/pointers-vs-arrays-i...


I like to think of them as being different and the equivalence - in some cases - only comes from what is known as the "array-pointer-decay" semantic:

http://www.lysator.liu.se/c/c-faq/c-2.html

This is why &a+n is not equivalent to a+n, something that a surprisingly large number of "C pointer tutorials" seem to get wrong.


Excellent. Years ago (15-20?) I wrote a program with the bug he mentions at the end. Way before Stack Overflow was a Google away. I could not work out why it crashed. I ended up rewriting it and it worked, but never understood what the bug was (in my mind it has been logged as 'don't extern arrays' ever since). I just read that and ah-ha!


Another interesting thing to know about pointer and arrays is:

  #include <stdio.h>
  typedef int a_t[100];

  int main(void)
  {
     int  a[100];
     a_t* p = &a;

     // all printed pointer values are equal
     printf("%p %p %p %p\n", (void*)a, (void*)&a, (void*)p,  (void*)*p);
     return 0;
   }


Will I be correct if I say that in the end, the difference is that arrays are immutable (in the address that they point to) and pointers are not?


No. An array has size (N * sizeof(element)). Arrays are immutable in their address in the same sense that any lvalue is immutable in its address.

  int x;
  int y[2];
The two declarations above are similar, one happens to be an (int). The other an (int[2]).

The difference is that when you take the rvalue of an int, say x, you get an rvalue int. When you take the rvalue of an int[2], you get a degraded (int *) rvalue that points to its first element.

Until you convert the array object to an rvalue, it is no more related to an (immutable) pointer than the above int. Both have an immutable address, like all lvalues do.


In part. Pointers can be made constant in two ways, a) constant in that they cannot be used to change the memory they point to, and b) in that one cannot change what memory they point to.

  char const * ptr_a; /* You cannot write to the memory pointed to by ptr */

  char * const ptr_b; /* You cannot change what ptr_b points to */

  char const * const ptr_ab; /* You neither change what ptr_ab points to nor the memory it points to - at least not through this pointer */


Well, there's some other important differences

sizeof somearray vs sizeof somepointer are naturally different.

&somearray vs &somepointer are very different types.

Which also means that somearray + 10 vs somepointer + 10 yields very different results.


This is a well-presented version of http://c-faq.com/aryptr/.


[deleted]


Arrays are not necessarily located at compile-time-known positions.

The compiler translates all symbolic names to addresses, regardless of whether it is an array or not.


Aren't everything equivalent in C?


Why down vote? I don't think anyone who disagrees with this really knows C, or more accurately, how most computers work. Can down voters name something that can't be converted to any others? Unless you are using a Harvard architecture, anything in memory are equivalent.


You can't just freely reinterpret memory in C without following certain rules, or you get undefined behavior. Also, types have sizes, and alignment restrictions, and on some systems, trap representations.


Hmm, I am glad to know so many people stop at the boundary of compiler's restrictions and abstractions. btw, please do not misuse and misinterpret John's blog sooo, he is one of my most respected professors.


C has types. Variables of different types generally aren't equivalent.


http://blog.regehr.org/archives/213

I would downvote you, too, if I could.


Ah Betteridge's law of headlines at work!

Arrays are aggregates of contiguously allocated objects.

Pointers are values indicating the storage locations of objects.

Thus, not equivalent.

Pointers are involved when you access an array. That doesn't make it an array. Just like scanning written text on a piece of paper using your index finger does not mean that "paper" and "your index finger" are equivalent. That's true even if you always instantiate an index finger before evaluating a piece of paper: co-ocurrence is not equivalence.


Why is this comment being downvoted? The explanation is correct and the analogy quite good too!


Because it's explaining the wrong confusion. People who are confused about pointers and arrays in C don't think that the concept of a pointer is the same as the concept of an array - they think that in C, the two have the same semantics. That explanation and analogy do nothing to explain the semantic difference, and honestly come off as a pretty condescending by essentially saying "Well they're different things, of course they're different!"

The poster probably did not help matters after that by calling the downvoters "clueless n00bs" or making a self-righteous statement about how they won't delete the comment just because of downvotes. That kind of toxic behavior just ensures that you'll get more downvotes.


Because the number of people who believe that arrays are pointers vastly outnumbers those who don't, and many of those clueless n00bs have karma >= 500 on HN, or whatever the level is to be able to downvote. Judging by one answer, some people don't believe that the comment could even be about C, so wrong it must be!

Nope; I simply refuse to delete a comment that is civil and correct just because of some fluffy downvotes. It stays and that's that.


I suspect it's downvoted for the mention of Betteridges Law of Headlines, which (rightly) earns downvotes whenever it's mentioned.


Look, it even worked for your comment! Must be true.


The key part of the headline here is "in C". In C, it's known that "name of the array is the address to its first element" which is one of the things that lead people into thinking that pointers and arrays are equivalent - in C.


No, the name of an array is not the address to its first element.

For instance, this equivalence is not required:

   (sizeof array) == (sizeof &array[0])
though it could be true by coincidence.

If array were the address of the first element, then it would always have the same size. Be careful with words like "is".

Oh, the above is "in C", by the way. We wouldn't want not to satisfy your key point!


Quite right.

A set of interesting equalities:

    (void*)&array   == (void*)&array[0]   // likely true
    (void*)&pointer == (void*)&pointer[0] // likely false (although it could be true by coincidence)
A pointer to an array is the address of the first element.

The name of an array can be trivially and implicitly converted to the address to it's first element, but this conversion creates a (temporary) pointer, it does not make the array itself a pointer!

The array (nor it name) is not in and of itself storage for any such address, it is no pointer, and as a result &array won't give you back a pointer-to-pointer because there's no underlying 4-8 byte storage for the underlying pointer - there's no permanent pointer to point to.


According to the C FAQ, Whenever an array appears in an expression, the compiler implicitly generates a pointer to the array's first element, just as if the programmer had written &a[0]. The exceptions are when the array is the operand of a sizeof or & operator, or is a string literal initializer for a character array.


Sure:

   (void*)&array   == (void*)&array[0]   // likely true
That is true by definition. The address of the first element is the same as that of the array, period. They cannot convert to different void pointers.


Which definition is that?

You probably though of:

  array  == &array[0]
which is always true.

This:

  (void*)&array   == (void*)&array[0]
which simplifies to:

    (void*)&array   == (void*)array
is not.

C doesn't guarantee that the values of &array and array are the same. The types T( * )[ * ] and T* are not even compatible and those two pointers are allowed to have different sizes and representation.


The first element of an array is located at the address of an array. Even if the pointer-to-array and pointer-to-element types have different size and representation, they point to the same place. The resulting pointer-to-void converted values must point to that place, and have the same type.

This relationship is also true between a pointer to a struct and a pointer to the first element. And also among pointers to the elements of a union.


Granted, we've been presupposing C, but:

In C++ you have a simple out by overloading T::operator&, allowing you to alter the result of the RHS of that equation and break it.

There's also a little bit in the standard prohibiting comparing pointers to different arrays - a leftover from the 16-bit era of near and far pointers I suspect - but that appears to only apply to relational comparisons in the C++11 standard:

http://stackoverflow.com/questions/9086372/how-to-compare-po...

I either misremember this applying to == and != as well (likely), or this changed since the C++03 standard that I'm used to referencing (unlikely). I'm also unsure if C has an equivalent rule. A fun note from that SO link:

    std::less<T*>
guarantees more than

    operator<( T*, T* )
enabling

    std::map<T*,...>
et all.

(edit: formatting)


...and another one bites the dust...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: