I would bet on the issue being in the init script itself rather than squid. (I'm assuming squid doesn't run as root by default in rhel) If that's true then it's another point for more sane process managers (upstart/supervisord/systemd/...)
set -o pipefail makes common idioms a pain. Consider using head, which simply exits after it has read a few lines. In this case, the input process gets a SIGPIPE and exits with a non-zero exit code:
Consider /tmp/test.sh:
set -o pipefail
yes foo | head
$ bash /tmp/test.sh >/dev/null
$ echo $?
141
From the same page:"rking's personal recommendation is to go ahead and use set -e, but beware of possible gotchas. It has useful semantics, so to exclude it from the toolbox is to give into FUD."
You can use set -e, and turn it off (set +e) for code blocks and things that are problematic. He could also add '|| true', and you may be able to use colon to avoid point problems without turning everything off. These are edge cases and you can easily work around them if you an advanced user.
If you are not an advanced user then you should certainly use -e.
or check the variable before using it, like any other programming language:
[[ "$VAR" ]] && rm -rf "$VAR/*"
I think most of these issues stem from the fact that most developers that write shell scripts don't actually understand what they're doing, treating the script as a necessary annoyance rather than a component of the software.
If anyone understands shell scripts, it would be people writing init scripts at Red Hat :)
Anyways, that is not anything like other programming languages. Checking in that way is error prone and not really an improvement (nor equivalent to set -o).
[[ "$DAEMON_PATH" ]] && rm -rf "$DEAMON_PATH/*"
See what I did there? It's an rm -rf /* bug because "checking variables" is not the answer.
In other programming languages, if an identifier is mis-typed things will blow up. E.g., in ruby if I write:
daemon_path=1; if daemon_path; puts deamon_path; end
I get "NameError: undefined local variable or method `deamon_path`"
These issues do not always stem from bad developers. Bash's defaults are not safe in many ways and saying "people should just check the variable" isn't helpful here.
Shameless plug for my language "bish" (compiles to bash) which aims to solve many of these annoyances with shell scripting: https://github.com/tdenniston/bish
Bash has the ability to also flag use of an undefined variable an error, it is just not on by default.
set -u
Man page quote: "Treat unset variables and parameters other than the special parameters "@" and "*" as an error when performing parameter expansion. If expansion is attempted on an unset variable or parameter, the shell prints an error message, and, if not interactive, exits with a non-zero status."
Yeah, everyone always loves to shit on BAT (which is fair, it is terrible) and VBS (which is slightly less fair) but inspite of how many problems Bash has (least of all the massive security issue last year), it gets off almost scot free.
These bugs are indicative of Bash's design problems. Why is it used for init scripts? And don't even get me started on how Bash interprets filenames as part of the arguments list when using * (e.g. file named "-rf").
Say what you will about Powershell, but having a typed language that can throw a null exception is useful for bugs like these. The filename isn't relevant, and a null name on a delete won't try to clear out of the OS (just throw).
Not just scot free - during the Great systemd War of 2014 is was a talking point for the antis that using anything other than the pure, reliable simplicity of shell for service management was MADNESS!
I don't think that was the argument, as much as it was that if a shell script fouled up it was easier to get in and do field repairs because it was interpreted rather than compiled.
It could be the default for non-interactive shells without causing this problem. Or we could have a more nuanced rule, where -e means "stop executing the current sequence of commands as soon as there is an error", where a "sequence of commands" is a single line in an interactive shell (so "false; whoami" would print nothing), or the entire file in a script.
The real answer is that this has not been the default in the time between shells being invented and this comment being posted, and so the squillions of lines of shell script out there in the wild keeping the world turning have not been written with this in mind. Making it the default now would break a lot of things.
With the benefit of hindsight, though, i would say that yes, this should have been the default in scripts. Oh well.
That's not completely true. At least with the GNU tools, 'rm' won't delete the root directory unless you specifically give it the '--no-preserve-root' flag. Since that flag has no use outside of deleting root, it's unlikely it has the flag on it. With that in mind the script must do some type of manual deleting for some reason.
I believe that "--preserve-root" applies only to / itself. That means `rm -rf /*` will expand to `rm -rf /bin /dev /etc /lib ...` and delete all anyway.
That's accurate. `rm -rf /* ` will still work to delete everything. But that said, `rm -rf "$STREAMROOT/"` can't ever expand to that, and more-over since the expansions in double-quotes it won't be subject to path expansion by bash. So even "/* /", which would normally expand into "/bin/ /dev/ /etc/ ..." won't. You can see what I mean yourself, just use echo:
This just happened to my coworker today. I'm sitting behind him telling him which commands to type (he's new to Linux...) when suddenly he jumps the gun and pushes enter just as I say "slash". My heart nearly stopped. I didn't even know preserve-root existed (plus I always iterate not to log in as root). It was a snapshotted vm but we still would have lost the day's work.
I feel like it would be a frighteningly common bug. I remember one like this from 2011 [1]. Install/packaging/utility scripts usually do not get as much attention and testing as the application code itself.
I'd say the fact that these bugs only very occasionally happen - relative to the huge number of shell scripts out there that are being executed every day - that it's not really "frighteningly common". You only hear about the ones that fail.
By the same logic, memory safety issues only happen rarely, right? Most programs/scripts are going to be tested if part of a distribution and such errors removed. But without polling people it'd be hard to know of the many times this kinda thing messed things up. I personally wiped out a production DB due to expanding an unset variable (fortunately immediately after taking a backup.
This is, as the bug notes, a regression, and I'm guessing you're right about it being in the initscript (I'm pretty sure). I used to be a very heavy Squid user and Squid developer and I remember a very similar bug many years ago. It was in the cache_dir initialization code. It would read the configuration file, parse out the cache_dir lines, and if the directories didn't exist it would create them as part of the startup.
There were some circumstances where if there was no cache_dir line configured, or if the cache_dir was a link or something, the details are very sketchy in my mind after so much time, but it would end up destroying /.
No, but if an analogous bug happened (systemd forgot to set an internal squidroot variable before clearing the squidroot, for instance), it would be much, much harder to figure out what was going on. Which is really what everybody's complaints boil down to.
"systemd" and "sane" only ever go in the same sentence as "sane people don't use systemd".
It looks like a bug in the init script; runnign it as squid's user wouldn't have triggered destroying the whole filesystem; likely just squid's config and anything under its /var.
I'll be the first to call out systemd for a lot of things, but not its core init idea. It's the same as daemontools, upstart, supervisord, and others do. Implementation is very different of course, but the idea is common - you run/kill services, not start/stop them. That's the reason we can leave the ugly and error-prone init scripts behind.
Which is what happens when you have every daemon writing their own PID handling code, running as root, in a language whose interpolation rules nobody really understands.