Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Stop using strncpy already (randomascii.wordpress.com)
37 points by UberMouse on April 4, 2013 | hide | past | favorite | 47 comments


It's good to see some publicity around this, my experience from Stack Overflow tells me few C programmers know about these details.

That said, if the article's code would have appeared on Stack Overflow, I wouldn't be able to prevent myself from also saying:

The author's suggested replacement, the C++ template function (!) strcpy_safe(), is not so hot in my opinion.

It uses strncpy(), which is record-oriented and thus will always fill all its n bytes. If you do:

    char bigbuf[1024];

    strncpy(bigbuf, "foo", sizeof bigbuf);
Then strncpy() will happily 0-fill all of that buffer space, which of course is completely pointless and just a waste of precious cycles. This is because it treats the buffer as a "record", and insists to initialize all of it.

This is the "flip side" of the logic that prevents it from 0-terminating if the string fills the buffer; since the buffer is a record with a known size, and not really a C string, it doesn't need to be terminated, right? Heh.

Also, POSIX reserves the namespace of functions whose names start with "str". I'm not 100% sure how well this applies to C++, but it seems prudent to avoid defining your own function with a name like that.


>> Then strncpy() will happily 0-fill all of that buffer space

Yes. And that feels like a really poor choice of the function name in my opinion. It should have been strcpy_pad_with_zeros() rather than strncpy(). To reflect the fact that it is going to waste precious cache lines and memory writes.


It probably made sense in an era when typing was slow and fixed-length records padded with zeroes were common.


It made sense when portability from C to C++ was important, as in now and for as long as both languages have constancy.


"Stop using strncpy already!"... or you might as well write an article called "Stop using null-terminated strings already!"... or since many commenters here have already pointed that out, and the author is talking about C++ anyway, perhaps "Stop fearing std::string already!"


Here's a better idea: Stop using the str* functions completely. Start using buffers with explicit length fields attached, and start using the mem* functions.


Each has its uses. The standard C and C++ library use C-strings for many things, so if you want to use, say, fopen you'd have to have a conversion routine set up (at additional memory and cpu cost). Just yesterday the ex-protobuf guy explained that in his new format (whose name I forgot) he uses null terminated strings exactly for that, even though he also stores the length of the string already.

But then, it's just C strings vs. Pascal strings all over again. It's your job as an engineer to use the right tool at the right time.

On the other end, it sucks that strlcpy is not as portable as the n counterparts. Last time I checked, the gnu libc still didn't have it for stupid bikeshed reasons.


> the gnu libc still didn't have it for stupid bikeshed reasons

Yep, still doesn't. Not so much for bikeshed reasons, but more for the Gnu people's stupid NIH reasons. OpenBSD offered it to them, reply was along the lines of "we don't see the need for it".



On Windows you should use the secure C API functions such as strcpy_s.

In C++ using these function is just plain nonsense. Use std::string or equivalent.


Null terminated strings are one of the worst failures of design in programming languages of all time. In C you're sort of stuck with turd polishing. It's never going to be good or right, it's always going to be partly broken in some way.

If I were writing a new code-base in C I'd standardize on using bstring but that too doesn't solve every problem.


The two biggest failures that C brought to computer security were null terminated strings and implicit decay of arrays into pointers.


So this safe function cannot be used with dynamically allocated buffers without casting them? That means it takes away the benefit of not passing in the size explicitly. It also does not deal with wide chars.

I think storing sizes explicitly is the better solution. If what you are dealing with are actually character strings, then you might store the byte size along with the char count.

If you are only dealing with ASCII (are you sure?), you could use memcpy and avoid the strncpy zero fill behavior.


From the article:

>In order to use these functions correctly you have to do this sort of nonsense.

    char buffer[5];
    strncpy(buffer, “Thisisalongstring”, sizeof(buffer));
    buffer[sizeof(buffer)-1] = 0;
Actually, its more like this:

    char buffer[5] = {0};
    strncpy(buffer, "This is a long string", sizeof buffer  - sizeof buffer[0]);
(edit: sizeof, not sizeof()! edit2: bugfix!)

But okay, maybe your compiler won't let you do that (it should). Oh. We've already touched on what Microsoft won't let you do .. maybe Microsoft won't let you do that. (In which case the answer should be: don't use Microsoft).

But anyway .. then the author says this:

>We are programmers, are we not? If the functions we are given to deal with strings are difficult to use correctly then we should write new ones.

Umm: NO! Learn to use your tools properly and stop re-inventing the wheel to fit your misunderstanding of the world! BILLIONS of lines of code out there use strncpy() and other n-fn variants, and guess what: even in safety-critical, life-threatening, embedded environments!

This article should really be titled: "If you are going to use strncpy(), and you're scared of it, THEN DON'T DEPLOY WITHOUT FULL CODE COVERAGE TESTING!"

NB: edit2 GOTCHA! Coverage, people.


Testing, even full coverage testing (a perversion of a good idea if ever I saw one), cannot and will not catch all errors. It is not a substitute for better programming techniques, and certainly not a reason to avoid using them.


Maybe, but it will catch bugs like this if the test is written for it, and anyone writing C in the 21st century professionally, with safety in mind, is a) writing tests, and b) getting 100% coverage.


Mmm... that will fail since the last character is not null. Change:

  char buffer[5] = {0};
  strncpy(buffer, "This is a long string", sizeof buffer);
to:

  char buffer[5] = {0};
  strncpy(buffer, "This is a long string", sizeof(buffer) - 1);
(I find sizeof() to be clearer than just sizeof in this case since "sizeof buffer - 1" looks awfully strange.


Pop quiz: is sizeof a function?

(Hint, no, its an operator. Like -- and &. Do you prefer this form: --(someint);


Regardless, it still looks better with parenthesis and removing them serves no real benefit other than proving that I know it is an unary operator.

Feel free to grep through your /usr/include for sizeof and see which way is more common.


There actually is a good reason not to use parens with sizeof, and it is in fact a means of introducing subtle bugs to your code if you don't know the difference:

    sizeof some_struct; //computed at *compile* time
    sizeof(some_struct); // also computed at *compile* time, but looks like its a runtime call
Anything that looks like something but isn't actually that thing, in my opinion - especially with a language like C - is room for programmer enlightement ..


The languages state that sizeof requires parentheses when the expression is the name of a type -- sizeof int is an error, sizeof(int) is not.


Just FYI - code: sizeof buffer - sizeof(buffer[0])

is looking unconventional. Couple of reasons... You generally want to avoid +1 / -1 in your code. You don't want to write sizeof() without parenthesis, - it reduces readability.


Pop quiz: what are possible results of running sizeof(char)? It's just another thing to note ;)


I think perhaps confusion on the use of parenthesis comes from the fact that you need them for types and not for objects.


I like the consistency of using parenthesis always. Just like some people insist of putting them in logical statements even if they do not affect the order of evaluation in that particular case.


It's an operator but nobody should be forced to remember that it's an operator when it doesn't matter.


It does matter. sizeof is computed at compile time, it is not computed at runtime. something() is a runtime invocation .. sizeof() 'looks' like that, but isn't.


1. A lot of math gets computed at compile time.

2. sizeof is not necessarily done at compile time. C99 allows variable-sized automatic arrays, forcing it to store and later look up the value at runtime if you use sizeof.


edit: parent didn't originally - 1 from sizeof.

Except you just failed to null-terminate your string, good job at proving the truth of the article :)

  #include <stdio.h>
  #include <string.h>

  int main(void)
  {
      char buffer[5] = {0};
      strncpy(buffer, "This is a long string", sizeof(buffer));
      printf("%s\n", buffer);
  }

  $ ./test
  This ^`gh?


    $ cat /tmp/t.c
    #include <stdio.h>
    #include <string.h>

    char buffer[5] = {0};

    int main(int argc, char argv)
    {
	strncpy(buffer, "This is a long string", sizeof buffer - sizeof buffer[0]);
	printf("The string: [%s] The len: %ld\n", buffer, sizeof buffer);

    }
    $ /tmp/t
    The string: [This] The len: 5
    $ gcc -v
    Using built-in specs.
    COLLECT_GCC=gcc
    COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.6/lto-wrapper
    Target: x86_64-linux-gnu
    Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 4.6.3-1ubuntu5' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs --enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.6 --enable-shared --enable-linker-build-id --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --enable-plugin --enable-objc-gc --disable-werror --with-arch-32=i686 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
    Thread model: posix
    gcc version 4.6.3


edit: oops. Coverage before Coffee!


  The string: [This ] The len: 5
Doesn't it bother you that your 5 character long buffer that is supposed to be terminated by a null happens to have 5 characters in it "[This ]" instead of 4 visible "[This]"?

Your memory just happens to have a null 6 bytes after the pointer.


Run the code and see for yourself, parent fixed it.


This is an excellent post to show everyone who says that C is easy. Or that only sucky programmers creates segfaults and all you need is some discipline. Not even very experienced developers can be expected to keep track of all the nuances of the various ??str??cpy functions and their various faults. The solution is to stop using C already (unless you really, really have to).


You know, I can't remember a time, when I've got a segfault just because of poorly written strxxx code. It just doesn't happen this often in real life. In practice, it only happens in relatively complex code that interfaces to other languages, and in these cases usually you are facing a lot more trouble with memory management, so tracking of string sizes is peanuts in this case.

Real danger of C strings is that they create vulnerabilities in the network code. But unfortunately this is precisely the place, where latency is important, where everything is in C and where a solution to stop using C just wouldn't do.


C is easy. Keeping track of str* function nuances is as simple as opening the man page whenever you use a function you haven't used in a while (just a keypress away in vim, as easy as hitting F1 in an IDE). I fully intend to use C more. C makes things like static analysis and valgrind possible; you don't get those with Ruby (which I also use extensively). Loose typing comes with its own set of tricky invisible security issues. Use C for great glory.


The solution is to stop using C already (unless you really, really have to).

Use C for great glory.

Both of you are taking an extreme position.

C isn't very difficult. However, there are a lot of nuances that you have to remember while using it. Mostly this is a matter of experience. It eventually becomes something akin to muscle memory to match every malloc() with a free(), and to always have an acute awareness of buffer sizes.

When speed is critical, using C can often be a decisive advantage. Nothing else can match the raw horsepower that an effective C programmer can bring to bear on a problem. This implies that if you don't know C, then you're handicapping yourself as a developer. Consider that most of the really excellent programmers are also excellent C programmers: cperciva, guido, antirez, tlb, rtm, zed, etc. Either all of the excellent programmers are also delusional for using C, or you're missing something important by not knowing how to be effective with C.

On the flip side, C doesn't make sense for most use cases. For example, writing a distributed system in C when you could use Go, or writing Dropbox in C when you could use Python. Most people underestimate just how fast modern CPUs are, along with underestimating how much speed you can get from effectively profiling your app written in a high-level language like Lisp. Good profiling is more valuable than changing to a lower-level language, and it's also easier in the long run.

The key is balance, and a wide perspective.


I agree totally. My final sentence, referring to great glory, was tongue in cheek. The software I write for Nitrogen Logic includes C backends, Ruby webapps, and C extensions to Ruby, each used where appropriate.


Seems bit odd to recommend his own function when the only complaint about strlcpy is that "it isn’t natively available in VC++.". I don't see how rolling your own is an improvement over battle-tested external function.


Oh the joys of c...


After every few days of programming in Go I start to miss C a little bit, probably more out of nostalgia than anything else,- until 5 minutes later I think about something like this.


Am I missing something obvious? I don't profess to know C++, but surely the proposed solution only works when writing to an array? There is no way for the compiler to know how much space has been allocated for arbitrary pointers.


Some malloc implementations do provide functions for determining the size of an allocated block, but that is also very problematic and certainly doesn't work with sizeof.


There's nothing wrong with strncpy if you know what you are doing. The problem is that far too few people know what they are doing and don't how to check if what they're doing is correct or not.


asprintf already relieves this headache...

http://linux.die.net/man/3/asprintf


I'm sorry, but on platforms I need to support this is not available.


Yeesh, everyone is so angry in this thread...


No!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: