I would bet on the issue being in the init script itself rather than squid. (I'm...

kevan · on March 24, 2015

Agreed, I should've elaborated. All it takes is something like this in the init script without checking if the variable is empty:

    rm -rf "$STEAMROOT/"*

lpsz · on March 24, 2015

And this is why it is important to write something like

    set -eu

on top of your bash scripts -- execution will stop on errors (non-zero retvals) and on undefined variables.

cellularmitosis · on March 24, 2015

I also include set -o pipefail (exit if ANY command in a pipeline fails). Had to get bitten and waste an hour before that became a habit.

set -e and set -o pipefail really should have been the default, rather than an opt-in.

nwalfield · on March 24, 2015

set -o pipefail makes common idioms a pain. Consider using head, which simply exits after it has read a few lines. In this case, the input process gets a SIGPIPE and exits with a non-zero exit code:

Consider /tmp/test.sh:

  set -o pipefail
  yes foo | head

  $ bash /tmp/test.sh >/dev/null
  $ echo $?
  141

pixelbeat · on March 24, 2015

That's a bug IMHO which I reported at http://lists.gnu.org/archive/html/bug-bash/2015-02/msg00052....

I've collated other mishandling of closed pipes at: http://www.pixelbeat.org/programming/sigpipe_handling.html

quotemstr · on March 24, 2015

For a while now, I've thought we should change SIGPIPE's SIG_DFL action to _exit(0).

darklajid · on March 24, 2015

That's not as simple or clear as you make it sound though.

http://mywiki.wooledge.org/BashFAQ/105

disagrees and refers to GreyCat's preference not to use -e at the bottom of the list of 'complications'.

un1xl0ser · on March 24, 2015

From the same page:"rking's personal recommendation is to go ahead and use set -e, but beware of possible gotchas. It has useful semantics, so to exclude it from the toolbox is to give into FUD."

You can use set -e, and turn it off (set +e) for code blocks and things that are problematic. He could also add '|| true', and you may be able to use colon to avoid point problems without turning everything off. These are edge cases and you can easily work around them if you an advanced user.

If you are not an advanced user then you should certainly use -e.

un1xl0ser · on March 24, 2015

  $ diff -u /tmp/a /tmp/b
  --- /tmp/a	2015-03-24 08:33:00.021919797 -0400
  +++ /tmp/b	2015-03-24 08:33:05.629963015 -0400
  @@ -1,5 +1,5 @@
   #!/usr/bin/env bash
   set -e
   i=0
  -let i++
  +let i++ || true
   echo "i is $i"
  $ /tmp/a
  $ /tmp/b
  i is 1
  $

jiffytick · on March 24, 2015

or check the variable before using it, like any other programming language:

[[ "$VAR" ]] && rm -rf "$VAR/*"

I think most of these issues stem from the fact that most developers that write shell scripts don't actually understand what they're doing, treating the script as a necessary annoyance rather than a component of the software.

TheDong · on March 24, 2015

If anyone understands shell scripts, it would be people writing init scripts at Red Hat :)

Anyways, that is not anything like other programming languages. Checking in that way is error prone and not really an improvement (nor equivalent to set -o).

  [[ "$DAEMON_PATH" ]] && rm -rf "$DEAMON_PATH/*"

See what I did there? It's an rm -rf /* bug because "checking variables" is not the answer.

In other programming languages, if an identifier is mis-typed things will blow up. E.g., in ruby if I write:

  daemon_path=1; if daemon_path; puts deamon_path; end

I get "NameError: undefined local variable or method `deamon_path`"

These issues do not always stem from bad developers. Bash's defaults are not safe in many ways and saying "people should just check the variable" isn't helpful here.

bishc · on March 24, 2015

Shameless plug for my language "bish" (compiles to bash) which aims to solve many of these annoyances with shell scripting: https://github.com/tdenniston/bish

pwg · on March 24, 2015

Bash has the ability to also flag use of an undefined variable an error, it is just not on by default.

set -u

Man page quote: "Treat unset variables and parameters other than the special parameters "@" and "*" as an error when performing parameter expansion. If expansion is attempted on an unset variable or parameter, the shell prints an error message, and, if not interactive, exits with a non-zero status."

woah · on March 24, 2015

Elephant in the room- shell is a bizarre language

Someone1234 · on March 24, 2015

Yeah, everyone always loves to shit on BAT (which is fair, it is terrible) and VBS (which is slightly less fair) but inspite of how many problems Bash has (least of all the massive security issue last year), it gets off almost scot free.

These bugs are indicative of Bash's design problems. Why is it used for init scripts? And don't even get me started on how Bash interprets filenames as part of the arguments list when using * (e.g. file named "-rf").

Say what you will about Powershell, but having a typed language that can throw a null exception is useful for bugs like these. The filename isn't relevant, and a null name on a delete won't try to clear out of the OS (just throw).

rodgerd · on March 24, 2015

> it gets off almost scot free.

Not just scot free - during the Great systemd War of 2014 is was a talking point for the antis that using anything other than the pure, reliable simplicity of shell for service management was MADNESS!

digi_owl · on March 25, 2015

I don't think that was the argument, as much as it was that if a shell script fouled up it was easier to get in and do field repairs because it was interpreted rather than compiled.

sjolsen · on March 24, 2015

>And don't even get me started on how Bash interprets filenames as part of the arguments list when using * (e.g. file named "-rf")

That's not Bash. That's just... programs in Unix. Such is life when everything is stringly typed.

emmelaich · on March 24, 2015

I think a better alternative is something like

    rm -r "${VAR:-var_is_not_set_so_please_fix_this_script}"

which substitutes the var_is_... if VAR is not set.

BTW, I hate hate hate -f. It has two meanings: 1. 'force' the removal 2. ignore any error

I've seen an instance of this sort of bug in my sysadmin career that I remember. It was a Solaris patch which wiped a chunk of the system.

rjcz · on March 24, 2015

No, this will remove 'var_is_not_set_so_please_fix_this_script' file if one exists.

If you're suggesting using parameter expansion, at least suggest the correct one (i.e. one that will give a meaningful error message):

    ${parameter:?word}

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3...

emmelaich · on March 24, 2015

Yep, better. Thanks.

mraison · on March 24, 2015

> like any other programming language

some real-world programming languages don't have undefined variables :)

coldtea · on March 24, 2015

Since the variable he shows is used in a string interpolation, it doesn't have to be undefined.

Being the emptys string "" would work just as well.

guardian5x · on March 24, 2015

I wonder why set -eu is not the default setting.

blueskin_ · on March 24, 2015

1. open bash

2. set -e

3. type an invalid command or run one that returns non-zero

4. "crap, where did my shell go?"

twic · on March 24, 2015

It could be the default for non-interactive shells without causing this problem. Or we could have a more nuanced rule, where -e means "stop executing the current sequence of commands as soon as there is an error", where a "sequence of commands" is a single line in an interactive shell (so "false; whoami" would print nothing), or the entire file in a script.

The real answer is that this has not been the default in the time between shells being invented and this comment being posted, and so the squillions of lines of shell script out there in the wild keeping the world turning have not been written with this in mind. Making it the default now would break a lot of things.

With the benefit of hindsight, though, i would say that yes, this should have been the default in scripts. Oh well.

_hyn3 · on March 24, 2015

There are lots and lots of these 'nuanced rules'.

http://mywiki.wooledge.org/BashFAQ/105

anh79 · on March 24, 2015

set -u is good. set -e requires to change a lot of code. See for example

https://github.com/icy/bash-coding-style#set--e

DSMan195276 · on March 24, 2015

That's not completely true. At least with the GNU tools, 'rm' won't delete the root directory unless you specifically give it the '--no-preserve-root' flag. Since that flag has no use outside of deleting root, it's unlikely it has the flag on it. With that in mind the script must do some type of manual deleting for some reason.

viraptor · on March 24, 2015

I believe that "--preserve-root" applies only to / itself. That means `rm -rf /*` will expand to `rm -rf /bin /dev /etc /lib ...` and delete all anyway.

DSMan195276 · on March 24, 2015

That's accurate. `rm -rf /* ` will still work to delete everything. But that said, `rm -rf "$STREAMROOT/"` can't ever expand to that, and more-over since the expansions in double-quotes it won't be subject to path expansion by bash. So even "/* /", which would normally expand into "/bin/ /dev/ /etc/ ..." won't. You can see what I mean yourself, just use echo:

    `echo /* `: /bin /dev /etc /lib ...

    `echo /*/`: /bin/ /dev/ /etc/ /lib/ ...

    `echo "/*"`: /*

    `echo "/*/"`: /*/

If you try it with `ls`, you'll find that `ls "/* "` results in `ls: "/* ": no such file or directory`.

Edit: Formatting.

pskocik · on March 24, 2015

The original example was `rm -rf "$STREAMROOT/"*`, though (the asterisk being out of the double quotes.) Now that glob will expand.

DSMan195276 · on March 24, 2015

Ah, my apologies, I think that was HackerNew's markup at work. The '* ' wasn't there when I looked before.

mirkules · on March 24, 2015

This just happened to my coworker today. I'm sitting behind him telling him which commands to type (he's new to Linux...) when suddenly he jumps the gun and pushes enter just as I say "slash". My heart nearly stopped. I didn't even know preserve-root existed (plus I always iterate not to log in as root). It was a snapshotted vm but we still would have lost the day's work.

arthurcolle · on March 24, 2015

Coworker - intern? :P

jhallenworld · on March 24, 2015

It should optimize for this case and run mke2fs instead.

kevan · on March 24, 2015

In the original Steam bug the asterisk is outside the double-quotes and path expansion definitely applies.

a13xb · on March 24, 2015

I feel like it would be a frighteningly common bug. I remember one like this from 2011 [1]. Install/packaging/utility scripts usually do not get as much attention and testing as the application code itself.

[1] https://github.com/MrMEEE/bumblebee-Old-and-abbandoned/issue...

userbinator · on March 24, 2015

I'd say the fact that these bugs only very occasionally happen - relative to the huge number of shell scripts out there that are being executed every day - that it's not really "frighteningly common". You only hear about the ones that fail.

MichaelGG · on March 24, 2015

By the same logic, memory safety issues only happen rarely, right? Most programs/scripts are going to be tested if part of a distribution and such errors removed. But without polling people it'd be hard to know of the many times this kinda thing messed things up. I personally wiped out a production DB due to expanding an unset variable (fortunately immediately after taking a backup.

SwellJoe · on March 24, 2015

This is, as the bug notes, a regression, and I'm guessing you're right about it being in the initscript (I'm pretty sure). I used to be a very heavy Squid user and Squid developer and I remember a very similar bug many years ago. It was in the cache_dir initialization code. It would read the configuration file, parse out the cache_dir lines, and if the directories didn't exist it would create them as part of the startup.

There were some circumstances where if there was no cache_dir line configured, or if the cache_dir was a link or something, the details are very sketchy in my mind after so much time, but it would end up destroying /.

I'm guessing this is of that same nature.

4ad · on March 24, 2015

RHEL uses systemd.

bonzini · on March 24, 2015

This is RHEL6.7, it doesn't use systemd.

4ad · on March 24, 2015

Oh, I see.

Xylakant · on March 24, 2015

Specifically, this bug would not have happened with systemd since systemd does not leave the handling of pid-files and pid-dirs to shell-scripts.

bandrami · on March 24, 2015

No, but if an analogous bug happened (systemd forgot to set an internal squidroot variable before clearing the squidroot, for instance), it would be much, much harder to figure out what was going on. Which is really what everybody's complaints boil down to.

marcosdumay · on March 24, 2015

That normally does not happen when handling pid-dirs, simply because those are standard and can be handled with standard tools.

Every time I've seen such a bug (honestly, not many), it was created when cleaning a temporary dir.

blueskin_ · on March 24, 2015

"systemd" and "sane" only ever go in the same sentence as "sane people don't use systemd".

It looks like a bug in the init script; runnign it as squid's user wouldn't have triggered destroying the whole filesystem; likely just squid's config and anything under its /var.

viraptor · on March 24, 2015

I'll be the first to call out systemd for a lot of things, but not its core init idea. It's the same as daemontools, upstart, supervisord, and others do. Implementation is very different of course, but the idea is common - you run/kill services, not start/stop them. That's the reason we can leave the ugly and error-prone init scripts behind.

koenigdavidmj · on March 24, 2015

> It looks like a bug in the init script

Which is what happens when you have every daemon writing their own PID handling code, running as root, in a language whose interpolation rules nobody really understands.

digi_owl · on March 25, 2015

A legacy of sysv rather than having shell script handle init.

It is quite possible to have the script for PID handling be written once, and imported as needed.

jdub · on March 24, 2015

"systemd" and "sane" only ever go in the same sentence as "sane people don't use systemd".

It looks like a bug in the init script...

Ha ha ha ha ha ha ha.

MichaelGG · on March 24, 2015

I hear people complaining, but why has every distro picked it up then? If it's so insane, why are these people all converging on it?

digi_owl · on March 25, 2015

Devs love it (in particular web service/app/buzzword-off-the-day devs), admins (at least those not in charge of "cattle") loath it.

This because what it provides it rapid spin up of containers and VMs, while everything talks to each other via APIs and DBUS.

But this rapidity also leads to issues with field repairs and debugging.

"Everyone" is adopting it because the Linux money is in web servers/services.