Yes, their symlink handling is absurd, and they have the usual sanctimonious defense when questioned about this. To be fair, most commercial cloud services overthink this, and also get it wrong.
A symlink is just a file, like any other. It is a user's responsibility to insure that a symlink will work elsewhere; cloud services should just copy the file, just like any other. The users who expect a symlink to trick a cloud service into special behaviour (like syncing folders elsewhere) are also wrong. You no more want a cloud service going off and "thinking" about a symlink, than you wanting it going off and "thinking" about porn it finds on your computer. These are just files; their contents or meaning are absolutely not any cloud service's concern.
For example, a MacOS application bundle typically includes internal symlinks that most users are unaware even exist. Yes, DropBox breaks these, too. They tell you that their service is not intended for synchronizing entire file systems, as if properly handling an arbitrary file is somehow rocket science. I'll grant them that this is over their heads, but it's not hard.
There's exactly one sync program that gets it all right: Unison, written by Benjamin Pierce, a world-famous computer scientist. Yes, he actually thinks about this more clearly than any of the programmers of commercial services, and the best community ideas are adopted.
Unison handles symlinks properly.
Atomic directories are a relatively new feature in unison: One can declare a directory atomic, forcing the user to choose at the directory level when there’s a conflict.
I declare .git directories atomic. A better example: A MacOS .sparsebundle disk image file appears as many files (bands) inside a directory, but is intended to be seen by the user as an atomic file, not a directory. This has the advantage of more efficient backups: If one makes a minor change to a large mounted disk image (say, a few MB to a multiple GB disk image) then backup software isn’t forced to make a new copy of the entire multiple GB disk image.
If one makes minor changes to the same mounted disk image on two machines, and then does a two-way sync, one could buy the farm. Most likely, there will be a conflicting root file alerting one to the problem. Far cleaner to simply be forced to choose one disk image directory over the other. Functionally, the entire disk images are in conflict, not specific files within.
The ability to declare atomic directories is not a feature of any other two-way sync software, and it should be. A good heuristic: If a naive user can’t easily open a folder to reveal its contents (say, a Mac application bundle, or a sparse disk image) then the supporting directory should be treated as atomic by default.
You are taking strong positions on what is right or wrong when there is no objective answer and tradeoffs both directions.
> It is a user's responsibility to insure that a symlink will work elsewhere; cloud services should just copy the file, just like any other. The users who expect a symlink to trick a cloud service into special behaviour (like syncing folders elsewhere) are also wrong. You no more want a cloud service going off and "thinking" about a symlink, than you wanting it going off and "thinking" about porn it finds on your computer. These are just files; their contents or meaning are absolutely not any cloud service's concern.
Unison seems to be more designed for personal sync'ing than collaboration. That informs decisions.
If you take your position that symlinks should be sync'd opaquely (as a file with a symlink target), collaborative relationships that involve symlinks are broken without much warning when a collaborator uses windows XP (the most popular OS when Dropbox came out!).
At least following them (which Unison does allow with a setting) ensures Windows collaborators can see them.
Dropbox could probably make the change now (few XP users remain), but you have legacy issues of existing users relying on the "follow" behavior to sync content outside of their Dropbox.
> One can declare a directory atomic, forcing the user to choose at the directory level when there’s a conflict.
Not forcing users to make choices before upsync occurs is an explicit design decision of Dropbox. It ensures that if I power on my computer after being offline, things will quickly sync to the cloud -- I don't need to spend time making decisions.
>symlinks to Windows (or at least WinXP since they didn't exist until Vista)
The version of NTFS in windows xp and the version of windows explorer both supported symlinks just fine. There was just no user-mode api to create them. You could use a kernel driver to make them, or even mount the disk offline and make them
I tried to use unison for serious production work a few years ago, and it fell short. It would fail or become incredibly slow with large (but not huge) amounts of data. A pity because the concept is great. I wonder if it has improved since.
I played with borg some in the early days and was unimpressed with some of the methodology and code quality displayed on public forums like GitHub. This is kind of archival work MUST be correct and carefully designed due to its sensitive nature. I hope I'm not giving the project an unfair shake here, but I checked it again recently and the first several lines of the GitHub page list 3-4 fairly recent versions that have caveats around data corruption and the like ...
This is not really isolated to borg, so I don't want to pick on them too much (plz shield your eyes in the direction of rclone...), but calling it the "holy grail" is a bit rich IMO. This kind of stuff is simply not industrial grade software.
Borg has a lot going for it, but I'm hesitant to call something a holy grail or one true way if it can't do compact diffs. Especially when it needs active server software, which makes pruning much easier.
I have always been intrigued by your service, and will probably try it at some point for a smallish setup for important data, but whenever I think of backups in the TB range, it gets way out of budget (even after accounting for borg or other discounts). Do you have any plans to compete on pricing with other popular cloud/storage providers? I understand that this is a different product for a different audience.
"Do you have any plans to compete on pricing with other popular cloud/storage providers?"
We try to keep out pricing very roughly inline with Amazon S3. If you know to look for the discount signup rates it should be very slightly cheaper than S3.
We can't promise customers support from real UNIX engineers and also match B2 pricing. I'm happy with that and plan to continue on that path ...
> The users who expect a symlink to trick a cloud service into special behaviour (like syncing folders elsewhere) are also wrong.
I use neither Dropbox nor Unison, but on the treatment of symbolic links, every program that somehow manages files have some sort of configuration option to decide whether or not to follow symbolic links. It's not about tricking a service into special behaviour. It's just that either handling by following or not following the links are both appropriate behaviour in different situations.
In the case of Unison, it chooses not to follow by default, but it also can be configured to follow as described here:
Furthermore, when the destination host is a Windows system, Unison refuses to not to follow a symbolic link, since Windows doesn't support symbolic links.
A file for which you exclusively use lstat() & readlink() instead of all the other usual functions in the standard library you use for regular files. What about named pipes? Or sockets? Or whiteout? Are they too just files since they have a filename?
I chuckled at the "TRAILER!!!". I wonder what other file formats have silly things like this.
From `man 5 cpio`:
> The end of the archive is indicated by a special record with the pathname “TRAILER!!!”.
This makes me wonder if this means that the cpio format can't reliably be used for files with the path "TRAILER!!!". Why would the format even need a special record to indicate the end of the archive? Is there any reason why one wouldn't be able to rely on the end of the file to indicate the end of the archive?
GNU tar uses entries where other tar implementations will overwrite previous files on extraction, AIX tar uses special four character names for binary blobs (plus xattrs as names, both after the affected entry), Solaris cpio uses another mode bit for sparse files, ACLs, and xattrs (I actually really like this, it extracts as a simple to parse text file[0] on other implementations), and most pax commands on Linux can't read/write PAX archive files.
FWIW I make use of TRAILER!!! to let me know in a pipeline that everything worked (imagine a command earlier in the pipe fails early and not every entry is processed but ends on a boundary), but here the two pages of zeros that tar uses makes more sense (about the only thing tar does better than cpio).
$ PAGER='col -bx' man 5 cpio | awk '$1 == "mode" { print; while (getline) print }' | head
mode The mode specifies both the regular permissions and the file
type. It consists of several bit fields as follows:
0170000 This masks the file type bits.
0140000 File type value for sockets.
0120000 File type value for symbolic links. For symbolic links,
the link body is stored as file data.
0100000 File type value for regular files.
0060000 File type value for block special devices.
0040000 File type value for directories.
0020000 File type value for character special devices.
$ echo /dev/urandom | cpio -o | dd bs=6 skip=3 count=1 | od -tc
1 block
1+0 records in
1+0 records out
6 bytes transferred in 0.000025 secs (239675 bytes/sec)
0000000 0 2 0 6 6 6
0000006
Yes, I love syncthing for many reasons. Those complaining about UI support will not love syncthing any more than other replacements. Also, setting up ST on phone, AWS, and my desktop was an exercise in patience. I now have a config I'm happy with, but it took a lot of effort that I wouldn't expect from just anyone.
A symlink is just a file, like any other. It is a user's responsibility to insure that a symlink will work elsewhere; cloud services should just copy the file, just like any other. The users who expect a symlink to trick a cloud service into special behaviour (like syncing folders elsewhere) are also wrong. You no more want a cloud service going off and "thinking" about a symlink, than you wanting it going off and "thinking" about porn it finds on your computer. These are just files; their contents or meaning are absolutely not any cloud service's concern.
For example, a MacOS application bundle typically includes internal symlinks that most users are unaware even exist. Yes, DropBox breaks these, too. They tell you that their service is not intended for synchronizing entire file systems, as if properly handling an arbitrary file is somehow rocket science. I'll grant them that this is over their heads, but it's not hard.
There's exactly one sync program that gets it all right: Unison, written by Benjamin Pierce, a world-famous computer scientist. Yes, he actually thinks about this more clearly than any of the programmers of commercial services, and the best community ideas are adopted.
Unison handles symlinks properly.
Atomic directories are a relatively new feature in unison: One can declare a directory atomic, forcing the user to choose at the directory level when there’s a conflict.
I declare .git directories atomic. A better example: A MacOS .sparsebundle disk image file appears as many files (bands) inside a directory, but is intended to be seen by the user as an atomic file, not a directory. This has the advantage of more efficient backups: If one makes a minor change to a large mounted disk image (say, a few MB to a multiple GB disk image) then backup software isn’t forced to make a new copy of the entire multiple GB disk image.
If one makes minor changes to the same mounted disk image on two machines, and then does a two-way sync, one could buy the farm. Most likely, there will be a conflicting root file alerting one to the problem. Far cleaner to simply be forced to choose one disk image directory over the other. Functionally, the entire disk images are in conflict, not specific files within.
The ability to declare atomic directories is not a feature of any other two-way sync software, and it should be. A good heuristic: If a naive user can’t easily open a folder to reveal its contents (say, a Mac application bundle, or a sparse disk image) then the supporting directory should be treated as atomic by default.