Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, their symlink handling is absurd, and they have the usual sanctimonious defense when questioned about this. To be fair, most commercial cloud services overthink this, and also get it wrong.

A symlink is just a file, like any other. It is a user's responsibility to insure that a symlink will work elsewhere; cloud services should just copy the file, just like any other. The users who expect a symlink to trick a cloud service into special behaviour (like syncing folders elsewhere) are also wrong. You no more want a cloud service going off and "thinking" about a symlink, than you wanting it going off and "thinking" about porn it finds on your computer. These are just files; their contents or meaning are absolutely not any cloud service's concern.

For example, a MacOS application bundle typically includes internal symlinks that most users are unaware even exist. Yes, DropBox breaks these, too. They tell you that their service is not intended for synchronizing entire file systems, as if properly handling an arbitrary file is somehow rocket science. I'll grant them that this is over their heads, but it's not hard.

There's exactly one sync program that gets it all right: Unison, written by Benjamin Pierce, a world-famous computer scientist. Yes, he actually thinks about this more clearly than any of the programmers of commercial services, and the best community ideas are adopted.

Unison handles symlinks properly.

Atomic directories are a relatively new feature in unison: One can declare a directory atomic, forcing the user to choose at the directory level when there’s a conflict.

I declare .git directories atomic. A better example: A MacOS .sparsebundle disk image file appears as many files (bands) inside a directory, but is intended to be seen by the user as an atomic file, not a directory. This has the advantage of more efficient backups: If one makes a minor change to a large mounted disk image (say, a few MB to a multiple GB disk image) then backup software isn’t forced to make a new copy of the entire multiple GB disk image.

If one makes minor changes to the same mounted disk image on two machines, and then does a two-way sync, one could buy the farm. Most likely, there will be a conflicting root file alerting one to the problem. Far cleaner to simply be forced to choose one disk image directory over the other. Functionally, the entire disk images are in conflict, not specific files within.

The ability to declare atomic directories is not a feature of any other two-way sync software, and it should be. A good heuristic: If a naive user can’t easily open a folder to reveal its contents (say, a Mac application bundle, or a sparse disk image) then the supporting directory should be treated as atomic by default.



You are taking strong positions on what is right or wrong when there is no objective answer and tradeoffs both directions.

> It is a user's responsibility to insure that a symlink will work elsewhere; cloud services should just copy the file, just like any other. The users who expect a symlink to trick a cloud service into special behaviour (like syncing folders elsewhere) are also wrong. You no more want a cloud service going off and "thinking" about a symlink, than you wanting it going off and "thinking" about porn it finds on your computer. These are just files; their contents or meaning are absolutely not any cloud service's concern.

Unison seems to be more designed for personal sync'ing than collaboration. That informs decisions.

First off, Unison cannot sync symlinks to Windows (or at least WinXP since they didn't exist until Vista) (http://www.cis.upenn.edu/~bcpierce/unison/download/releases/...).

If you take your position that symlinks should be sync'd opaquely (as a file with a symlink target), collaborative relationships that involve symlinks are broken without much warning when a collaborator uses windows XP (the most popular OS when Dropbox came out!).

At least following them (which Unison does allow with a setting) ensures Windows collaborators can see them.

Dropbox could probably make the change now (few XP users remain), but you have legacy issues of existing users relying on the "follow" behavior to sync content outside of their Dropbox.

> One can declare a directory atomic, forcing the user to choose at the directory level when there’s a conflict.

Not forcing users to make choices before upsync occurs is an explicit design decision of Dropbox. It ensures that if I power on my computer after being offline, things will quickly sync to the cloud -- I don't need to spend time making decisions.


>symlinks to Windows (or at least WinXP since they didn't exist until Vista)

The version of NTFS in windows xp and the version of windows explorer both supported symlinks just fine. There was just no user-mode api to create them. You could use a kernel driver to make them, or even mount the disk offline and make them


Great point (found tons of resources here: http://schinagl.priv.at/nt/hardlinkshellext/hardlinkshellext...)

Still a lot of restrictions here that would make in Dropbox's shoes circa 2008 not attempt to support them:

1. Kernel driver requirement requires admin rights which hurts installing ability

2. General instability (looks like applications might not respect symlinks right -- e.g. deletes could recursively delete contents within the symlink)

3. Won't work with users running older NTFS or FAT32 (e.g. XP upgrades) -- not sure how common that was in 2008 though.


> windows XP (the most popular OS when Dropbox came out!).

Symlinks actually worked in Dropbox when I used to use it in 2012...


Define "work"? Dropbox has never sync'd a symlink as a symlink - it follows then on the client machine and a copy is created on the server.

(Source: worked there)


When did you work there?


Hmm I'm pretty sure it used to. I used it and if it had followed symlinks that would have broken things for me and I wouldn't have used it.


"To be fair, most commercial cloud services overthink this, and also get it wrong."

  ssh user@rsync.net ls -asl some/dir
... looking good ...

"There's exactly one sync program that gets it all right: Unison, written by Benjamin Pierce"

  ssh user@rsync.net unison

  Usage: unison [options]
  or unison root1 root2 [options]
  or unison profilename [options]
(Ask about the HN readers' discount)


I tried to use unison for serious production work a few years ago, and it fell short. It would fail or become incredibly slow with large (but not huge) amounts of data. A pity because the concept is great. I wonder if it has improved since.


"I wonder if it has improved since."

Unison is very interesting and it is, indeed, very special in that it solves the very specific use-case of my parent post.

However, my own opinion, and that of just about everyone who cares about backup tools is that 'borg' is the "one true way":

https://www.stavros.io/posts/holy-grail-backups/


I played with borg some in the early days and was unimpressed with some of the methodology and code quality displayed on public forums like GitHub. This is kind of archival work MUST be correct and carefully designed due to its sensitive nature. I hope I'm not giving the project an unfair shake here, but I checked it again recently and the first several lines of the GitHub page list 3-4 fairly recent versions that have caveats around data corruption and the like ...

This is not really isolated to borg, so I don't want to pick on them too much (plz shield your eyes in the direction of rclone...), but calling it the "holy grail" is a bit rich IMO. This kind of stuff is simply not industrial grade software.


Any idea if this benchmark is still accurate? https://github.com/gilbertchen/benchmarking

('attic' on there is actually borg)

Borg has a lot going for it, but I'm hesitant to call something a holy grail or one true way if it can't do compact diffs. Especially when it needs active server software, which makes pruning much easier.


What about restic?

https://restic.net/


Yes, we support restic. I don't have anything interesting to say about borg vs. restic ...


You can work around failures of large initial unison syncs by creating a partial directory tree on the target manually.

It treats the creation of a new directory as an atomic operation, and rolls it back on failure.

(So, if the initial sync will take a week, then precreate the top level or two of the directory hierarchy...)

It’s an annoying problem, but after the initial sync, partial syncs are fast and reliable.


I have always been intrigued by your service, and will probably try it at some point for a smallish setup for important data, but whenever I think of backups in the TB range, it gets way out of budget (even after accounting for borg or other discounts). Do you have any plans to compete on pricing with other popular cloud/storage providers? I understand that this is a different product for a different audience.


"Do you have any plans to compete on pricing with other popular cloud/storage providers?"

We try to keep out pricing very roughly inline with Amazon S3. If you know to look for the discount signup rates it should be very slightly cheaper than S3.

We can't promise customers support from real UNIX engineers and also match B2 pricing. I'm happy with that and plan to continue on that path ...

We would be very happy to have you.


> If you know to look for the discount signup rates it should be very slightly cheaper than S3.

I'm interested. Do you mean the special borg account or should I look more? Thanks!


> The users who expect a symlink to trick a cloud service into special behaviour (like syncing folders elsewhere) are also wrong.

I use neither Dropbox nor Unison, but on the treatment of symbolic links, every program that somehow manages files have some sort of configuration option to decide whether or not to follow symbolic links. It's not about tricking a service into special behaviour. It's just that either handling by following or not following the links are both appropriate behaviour in different situations.

In the case of Unison, it chooses not to follow by default, but it also can be configured to follow as described here:

https://www.cis.upenn.edu/~bcpierce/unison/download/releases...

Furthermore, when the destination host is a Windows system, Unison refuses to not to follow a symbolic link, since Windows doesn't support symbolic links.


"symlink is just a file…"

A file for which you exclusively use lstat() & readlink() instead of all the other usual functions in the standard library you use for regular files. What about named pipes? Or sockets? Or whiteout? Are they too just files since they have a filename?


"Please wait: backing up /dev/urandom - 0% of NaN"

(To be fair, anyone who tries to backup /dev deserves whatever they get...)


  $ echo /dev/urandom | cpio -o | xxd -a
  1 block
  00000000: 3037 3037 3037 3737 3737 3737 3030 3030  0707077777770000
  00000010: 3031 3032 3036 3636 3030 3030 3030 3030  0102066600000000
  00000020: 3030 3030 3030 3030 3031 3737 3737 3737  0000000001777777
  00000030: 3133 3436 3430 3530 3035 3730 3030 3031  1346405005700001
  00000040: 3530 3030 3030 3030 3030 3030 2f64 6576  500000000000/dev
  00000050: 2f75 7261 6e64 6f6d 0030 3730 3730 3730  /urandom.0707070
  00000060: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
  00000070: 3030 3030 3030 3030 3030 3030 3030 3030  0000000000000000
  00000080: 3030 3130 3030 3030 3030 3030 3030 3030  0010000000000000
  00000090: 3030 3030 3030 3030 3133 3030 3030 3030  0000000013000000
  000000a0: 3030 3030 3054 5241 494c 4552 2121 2100  00000TRAILER!!!.
  000000b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  *
  000001f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  $ echo $?
  0


I chuckled at the "TRAILER!!!". I wonder what other file formats have silly things like this.

From `man 5 cpio`:

> The end of the archive is indicated by a special record with the pathname “TRAILER!!!”.

This makes me wonder if this means that the cpio format can't reliably be used for files with the path "TRAILER!!!". Why would the format even need a special record to indicate the end of the archive? Is there any reason why one wouldn't be able to rely on the end of the file to indicate the end of the archive?


Seems cpio really can't deal with "TRAILER!!!" files:

  $ cd $(mktemp -d)
  $ echo foo > 'TRAILER!!!'     
  $ echo bar > barfile
  $ echo 'TRAILER!!!' | cpio -o | xxd -a
  1 block
  00000000: c771 2a00 31cc a481 e803 e803 0100 0000  .q*.1...........
  00000010: 025d ef8c 0b00 0000 0400 5452 4149 4c45  .]........TRAILE
  00000020: 5221 2121 0000 666f 6f0a c771 0000 0000  R!!!..foo..q....
  00000030: 0000 0000 0000 0100 0000 0000 0000 0b00  ................
  00000040: 0000 0000 5452 4149 4c45 5221 2121 0000  ....TRAILER!!!..
  00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  *
  000001f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
  $ echo 'TRAILER!!!' | cpio -o | cpio -t
  1 block
  1 block
  $ echo barfile | cpio -o | cpio -t
  1 block
  barfile
  1 block
  $ echo barfile | cpio -o | cpio -i --to-stdout barfile 
  1 block
  bar
  1 block
  $ echo 'TRAILER!!!' | cpio -o | cpio -i --to-stdout 'TRAILER!!!'
  1 block
  1 block


tar has two blocks of zeros at the end.

GNU tar uses entries where other tar implementations will overwrite previous files on extraction, AIX tar uses special four character names for binary blobs (plus xattrs as names, both after the affected entry), Solaris cpio uses another mode bit for sparse files, ACLs, and xattrs (I actually really like this, it extracts as a simple to parse text file[0] on other implementations), and most pax commands on Linux can't read/write PAX archive files.

FWIW I make use of TRAILER!!! to let me know in a pipeline that everything worked (imagine a command earlier in the pipe fails early and not every entry is processed but ends on a boundary), but here the two pages of zeros that tar uses makes more sense (about the only thing tar does better than cpio).

0: https://www.mail-archive.com/opensolaris-arc@mail.opensolari...


How does this work? Looks like a bug in xxd for me (o.O)


Nope xxd is fine, read the man page.

  $ PAGER='col -bx' man 5 cpio | awk '$1 == "mode" { print; while (getline) print }' | head
  mode    The mode specifies both the regular permissions and the file
          type.  It consists of several bit fields as follows:
          0170000  This masks the file type bits.
          0140000  File type value for sockets.
          0120000  File type value for symbolic links.  For symbolic links,
                   the link body is stored as file data.
          0100000  File type value for regular files.
          0060000  File type value for block special devices.
          0040000  File type value for directories.
          0020000  File type value for character special devices.
  $ echo /dev/urandom | cpio -o | dd bs=6 skip=3 count=1 | od -tc
  1 block
  1+0 records in
  1+0 records out
  6 bytes transferred in 0.000025 secs (239675 bytes/sec)
  0000000    0   2   0   6   6   6                                        
  0000006


SyncThing also gets this right.


Yes, I love syncthing for many reasons. Those complaining about UI support will not love syncthing any more than other replacements. Also, setting up ST on phone, AWS, and my desktop was an exercise in patience. I now have a config I'm happy with, but it took a lot of effort that I wouldn't expect from just anyone.


So a file structure of a cyclic graph of symlinks is wrong and shall simply just break and blame the user?

Or one where you have one large file and symlinks of it all over the place? Should it just explode in size unexpectedly?


I totally agree with this. Perforce gets it right for what it is worth.


I'll take "sentences I never thought I'd read" for $800, Alex!


This is a perfect description of my issue with their symlink handling, and their arrogant response to the complaint about it.


Ideally all these choices you mention would be configurable settings...




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: