I hate it when blog posts do not have any date and time somewhere on the article page. The only indicator is that it was filed into "July 2016" in the archive sidebar.
It's kind of a gray area, but looking closely I find it leans heavily towards the buggy, "unexpected behavior" side of things.
Here's why: when encountering HTTP->HTTP redirects, wget already ignores the redirected file name and writes to the original file name that was provided in the URL. It was strangely inconsistent that it specifically didn't do this for HTTP->FTP redirects, and at the very least the behavior should be consistent by default. This is clarified in the linked release notes, and also explains why they had no hesitation in considering it a wget bug and fixing it immediately.
As for the question of whether using redirected file names is more expected and intuitive, IMO providing a URL without specifying an output filename is basically a shorthand that merges the two arguments into one.
Not at all. Though following redirects and not using the original "file" name by default may be a bad idea. The whole purpose of wget is to get a file from the web and store it locally, by default using the remote filename in the working directory.
It is important to make a difference between a remote filename supplied in a Content-Disposition header or the substring that can be extracted from the given URL by splitting at slashes.
wget by default uses the latter, appending the QSA (?foo=bar) even when not intended. Same behavior for `curl -O`. You have to use `wget --content-disposition` or `curl -O -J` to explicitly trust the filename send by the server.
If you specifically issue the command as trying to download a file called "safe_file.txt", and then the actual file that gets downloaded is not only a different file but even a different file name, this is definitely unexpected and un-intuitive.
eh, the path the url contains isn't a file system path, its a string of characters that a server can treat in a million different ways, so its not exactly practical to rely on that
A path, which contains data, usually organized in hierarchical form,
that appears as a sequence of segments separated by slashes. Such a
sequence may resemble or map exactly to a file system path, but does not
always imply a relation to one.[9] The path must begin with a single slash
(/) if an authority part was present, and may also if one was not, but must
not begin with a double slash.
You can rely on the path to be made up of parts that will not begin with a double slash, and will be separated by slashes, and be defined by a specific character set. URLs also contain a query and a fragment after the path, and the client can choose whether to keep those in the file name or not.
Wget has many options which define how file names are created, mainly to better serve in mirroring large complex websites. But in general, whatever the "path" was, the final part of the path will be the file name, or a default one is provided. There are options for enforcing suffixes, creating directory structures recursively, changing accepted character sets for file names, and hundreds more options.
You are taking my comment a bit too literally (especially since the only thing I was trying to convey was: url files != filesystem files). I don't mean defined behavior, I mean what is expected, since that is what you were talking about.
If we go with defined behavior, then wget using the server-provided name just like a web browser does (resolving the redirects, and then explicitly noting that the file is being saved with that new file name), is completely expected, no?
Web browsers have [generally] two settings for saving files: prompt the user, or save automatically. In either case they [generally] save to a "Downloads" folder rather than the home directory. So we can expect whatever happens to happen in the Downloads folder.
Wget downloads to the current directory by default, which is always the home directory by default unless you change directories first, and Wget almost never prompts the user for where or how to save the file. (I think i've seen Wget give a prompt once under a particularly strange set of circumstances, but that was while mirroring a huge complex site)
As an example of how this changes based on interpretation, a while back there was a "bug" introduced by systemd. It was mounting a logical filesystem read-write, which caused [in some circumstances] whole systems to be bricked due to a bug in some firmware drivers. Systemd's policy was "our software is working fine!" and refused to fix or prevent the behavior, even though the fix would not have negatively affected systemd's operation at all. In the current case, Wget sees the behavior as potentially dangerous and has thus fixed it to prevent people getting fucked over by unexpected behavior.
Another way to look at it: if this was expected, no user would ever download a file without the "-O" option, because to do so would potentially put them at risk.
> a while back there was a "bug" introduced by systemd.
It was mounting a logical filesystem read-write
This is rather misleading. The efivarfs filesystem was mounted by the boot scripts of every Linux distro that supports booting from EFI firmware, because it is the only supported interface to install an EFI bootloader.
> refused to fix or prevent the behavior,
Neither have any of the Linux distros that don't use systemd.
> even though the fix would not have negatively affected systemd's operation at all.
But it would have broken various other tools that system administrators might want to run, as detailed by the author of efivarfs in these comments:
It's not a security hole and it's not a bug. Wget even prints out all the things that are happening. Everybody seems to be grasping at straws right now just to be able to make the next branded vulnerability.
The wget developers disagree, they call it a security vulnerability. I'm inclined to agree because knowing that HTTP → HTTP redirections don't have this behaviour, it's dangerous for HTTP → FTP to have it.
Whether wget prints it out is irrelevant, as it's often used in scripts.
Well, "good" to see a bug that wouldn't have been solved by using Rust.
I had never really thought about this attack vector though. Writing to `.bash_profile` is effectively the same as writing executable code.
Though I like the whole "everything is a file" concept (I know this isn't exactly that but...), it seems like we should be striving for something safer. For example, I guess in something like Plan9 you could have wget start playing sound out of your speakers? Crazy.
File handles shouldn't cause RCE vulnerabilities, right?
I suppose sandboxing is one way of dealing with this. Another is having to explicitly "give a file" to wget and co. for editing. None of this "oh, here you go program, write to anything in my home directory".
Peppering your description with 100 uses of the word "arbitrary" and "crafted" does not cause expected behaviour to become a security vulnerability! That's not a "crafted" location header, it's a perfectly normal location header.
There are better places than directly in `$HOME`. One is some abstracted service like windows registry. Another is just separating the configs better, like `$HOME/.config`. Both are good ideas.
I think you're wrong on both counts. Specifically:
1. It doesn't have to be writable for everything. The $HOME/.config gives a nice abstract separation of what's "mostly readonly configuration" and what's not. (home files, runtime data, etc.) While it doesn't buy anything right now, just separating the configs into a hierarchy allows us to do interesting things like system-wide notification/approvals of .config writes. It's not impossible in the flat world of "all dot-files live in $HOME", but it could make generic solutions easier.
2. Windows registry is just one implementation of an idea. While it cannot provide more protection than files at the moment, it does have ACL capabilities, so there's no reason a different implementation cannot extend this idea to checking process identifiers in some OS-specific way. For example imagine linux registry as a dbus service which can have calls filtered at selinux level. You can access /usr/bin/foo settings, /usr/bin/foo can read those settings, but /tmp/malicious/foo can neither read nor write them, even though it runs as your user.
> 1. It doesn't have to be writable for everything. The $HOME/.config gives a nice abstract separation of what's "mostly readonly configuration" and what's not. (home files, runtime data, etc.) While it doesn't buy anything right now, just separating the configs into a hierarchy allows us to do interesting things like system-wide notification/approvals of .config writes. It's not impossible in the flat world of "all dot-files live in $HOME", but it could make generic solutions easier.
If you're running as a regular system user, there is nothing you can do. All files you create are owned by you, readable by you, and modifiable by you. If we're talking about some kernel enhancements that do not yet exist, then yes, just about anything is possible. I was under the impression we were discussing things that are actually possible today on any old Linux install.
> 2. Windows registry is just one implementation of an idea. While it cannot provide more protection than files at the moment, it does have ACL capabilities, so there's no reason a different implementation cannot extend this idea to checking process identifiers in some OS-specific way. For example imagine linux registry as a dbus service which can have calls filtered at selinux level. You can access /usr/bin/foo settings, /usr/bin/foo can read those settings, but /tmp/malicious/foo can neither read nor write them, even though it runs as your user.
The linux filesystem also has ACL capabilities, much like the windows filesystem and registry. They all work on the user level, not on the application level. If the application /usr/bin/foo runs as my user, and somehow has some extra abilities than any other process running as my user, then I will just modify the memory of /usr/bin/foo after executing it, and leverage that process to access those "protected" files. This is a prime example of security through obscurity.
> I was under the impression we were discussing things that are actually possible today on any old Linux install.
Both apparmor/selinux (process/label security) and fusefs (virtual per user/group/process filesystem) already exist. They're not put together for this specific purpose, but it just needs some userland glue, not kernel modifications.
> They all work on the user level, not on the application level
That's true for ACLs as implemented by the filesystem. Not for LSMs, which see both the application and the user (and many other things)
> has some extra abilities than any other process running as my user, then I will just modify the memory of /usr/bin/foo after executing it
No you won't. Extra abilities usually translate to suid, which you cannot ptrace. (default in kernel) Without extra abilities, you can restrict ptrace/memory access as well. (sysctl kernel.yama.ptrace_scope and others)
Also, you're talking about a different threat model. I described a system where you can change `foo`s and `bar`s configuration, but `foo` cannot change `bar`s config without going through external process (with possible approvals). Sure, you could create a `foo` which can mess with `bar` if you want - you're allowed to do that. What this system protects from is someone taking over the `foo` process (via exploit in it) and modifying it at runtime - this is not security through obscurity - this is normal ACL enforcement where process / execution domain is defined by both the binary and the user who runs it, not just the user.
If a binary has the suid flag set then it's not running as your own user, now is it?
> I described a system where you can change `foo`s and `bar`s configuration, but `foo` cannot change `bar`s config without going through external process (with possible approvals). Sure, you could create a `foo` which can mess with `bar` if you want - you're allowed to do that. What this system protects from is someone taking over the `foo` process (via exploit in it) and modifying it at runtime - this is not security through obscurity - this is normal ACL enforcement where process / execution domain is defined by both the binary and the user who runs it, not just the user.
Except it doesn't protect against that. Someone could exploit foo, have foo modify bar's memory and merrily carry on from there. Unless neither of these binaries run as the same user, or are getting sandboxed. But at this point we are so far from your initial point of simply moving files from $HOME that I'm not even sure what we're discussing.
> Someone could exploit foo, have foo modify bar's memory and merrily carry on from there.
I recommend doing some reading about LSM (selinux/apparmor) and/or grsec. This is exactly what they can prevent, even if you run foo and bar as the same user.
> But at this point we are so far from your initial point of simply moving files from $HOME that I'm not even sure what we're discussing.
Per-process / process group configs. You can (today, with tools you likely have already installed on an everyday linux distro) implement a basic system where the view into .config is limited per process and processes can read-but-not-modify settings, even though you (user who executes those processes) can read and modify.
It's possible in flat $HOME, but hard. $HOME/.config makes it easier. Configuration as a registry service makes it trivial.
> I recommend doing some reading about LSM (selinux/apparmor) and/or grsec. This is exactly what they can prevent, even if you run foo and bar as the same user.
Thank you, but I am aware about these technologies. And I just checked on my Ubuntu laptop, and I can modify the memory of any process running under my user. I'm talking about distros that are actually in use by people today, without any exotic modifications. I am aware there are ways to block all of this (making your life hell in the process) using tools that may or may not already be available, but that is a little bit more invasive than your initial suggestion of "Let's move files out of $HOME". And now you agree that moving them out of $HOME is not even necessary. So it seems you've changed your point from "Let's move all configuration files out of $HOME" to "Let's bake in a lot more security". Can't argue against that.
> And I just checked on my Ubuntu laptop, and I can modify the memory of any process running under my user.
Yes, you can. And that wouldn't change. That's not the point of the modification.
But you're wrong saying this is exotic or would make life hell. You're running Ubuntu. This is already happening and it seems you're not even aware of it. Run `apparmor_status` and see what profiles are already enforced. I'm pretty sure that you have /usr/bin/evince listed in there. (it's installed by default)
Now, try to open any pdf in evince and save a copy to `~/.ssh`. Or `~/.mozilla`. Or `~/.config/chromium`. Or `~/.gnupg`. You have access to those directories as a user - so guess why it's failing? Apparmor does exactly what I described. You can still modify the memory of evince - do whatever you want to it, I'll wait. But it won't change the result - you can't write to those directories from evince itself. You can still save that pdf as anything in your $HOME however.
I didn't change my point. Security is not binary - there's no secure and insecure, but a gradient in between. Moving configuration out of home is one step of the big process which will make better security easier to apply.
$HOME/.config would somewhat mitigate against this exploit. A user running wget is much more likely to be in their home directory than in .config, and a hidden script file is less dangerous if there's nothing around which will execute it.
This would still need fixing, but it would be a bit less dangerous.