> Like all my apps, Hyperspace is a bit difficult to explain. I’ve attempted to do so, at length, in the Hyperspace documentation. I hope it makes enough sense to enough people that it will be a useful addition to the Mac ecosystem.
Am I missing something, or isn't it a "file de-duplicator" with a nice UI/UX? Sounds pretty simple to describe, and tells you why it's useful with just two words.
No because it isn't getting rid of the duplicate, it's using a feature of APFS that allows for duplicates to exist separately but share the same internal data.
My understanding is that it is a copy-on-write clone, not a hard link. [1]
> Q: Are clone files the same thing as symbolic links or hard links?
> A: No. Symbolic links ("symlinks") and hard links are ways to make two entries in the file system that share the same data. This might sound like the same thing as the space-saving clones used by Hyperspace, but there’s one important difference. With symlinks and hard links, a change to one of the files affects all the files.
> The space-saving clones made by Hyperspace are different. Changes to one clone file do not affect other files. Cloned files should look and behave exactly the same as they did before they were converted into clones.
What kind of changes could you make to one clone that would still qualify it as a clone? If there are changes, it's no longer the same file. Even after reading the How It Works[0] link, I'm not groking how it works. Is it making some sort of delta/diff that is applied to the original file? That's not possible for every file format like large media files. I could see that being interesting for text based files, but that gets complicated for complex files.
If I understand correctly, a COW clone references the same contents (just like a hardlink) as long as all the filesystem references are pointing to identical file contents.
Once you open one of the reference handles and modify the contents, the copy-on-write process is invoked by the filesystem, and the underlying data is copied into a new, separate file with your new changes, breaking the link.
Comparing with a hardlink, there is no copy-on-write, so any changes made to the contents when editing the file opened from one reference would also show up if you open the other hardlinks to the same file contents.
Almost, but the difference is that if you change one of hardlinked files, you change "all of them". (It's really the same file but with different paths.)
With a hard link, the content of each of the two 'files' are identical in perpetuity.
With APFS Clones, the contents start off identical, but can be changed independently. If you change a small part of a file, those block(s) will need to be created, but the existing blocks will continue to be shared with the clone.
It’s not the same because clones can have separate meta data; in addition, if a cloned file changes, it stores a diff of the changes from the original.
Replacing duplicates with hard links would be extremely dangerous. Software which expects to be able to modify file A without modifying previously-identical file B would break.
Right, but the concept is the same, "remove duplicates" in order to save storage space. If it's using reflinks, softlinks, APFS clones or whatever is more or less an implementation detail.
I know that internally it isn't actually "removing" anything, and that it uses fancy new technology from Apple. But in order to explain the project to strangers, I think my tagline gets the point across pretty well.
> Right, but the concept is the same, "remove duplicates" in order to save storage space.
The duplicates aren't removed, though. Nothing changes from the POV of users or software that use those files, and you can continue to make changes to them independently.
De-duplication does not mean the duplicates completely disappear. If I download a deduplication utility I expect it to create some sort of soft/hard link. I definitely don’t want it to completely remove random files on the filesystem, that’s just going to wreak havoc.
But it can still wreak havoc if you use hardlinks or softlinks, because maybe there was a good reason for having a duplicate file! Imagine you have a photo “foo.jpg.” You make a copy of it “foo2.jpg” You’re planning on editing that file, but right now, it’s a duplicate. At this point you run your “deduper” that turns the second file into a hardlink. Then a few days later you go and edit the file, but wait, the original “backup” file is now modified too! You lost your original.
That’s why Copy-on-write clones are completely different than hardlinks.
The author of the software is a file system enthusiast (so much that in the podcast he's a part of they have a dedicated sound effect every time "filesystem" comes up), a long time blogger and macOS reviewer. So you'll have to see it in that context while documenting every bit and the technical details behind it is important to him...even if it's longer than a tag line on a landing page.
In times where documentation is often an afterthought, and technical details get hidden away from users all the time ("Ooops some error occurred") this should be celebrated.
Judging by this sub-thread, the process really is harder to explain that it appears on the surface. The basic idea is simple but the implementation requires deeper knowledge.
But why would you discuss the implementation to end-users who probably wouldn't even understand what "implementation" means? The discussions you see in the subthread is not a discussion that would appear on less-technical forums, and I wouldn't draw any broader conclusions based on HN conversations in general.
Because the implementation leaks to the user experience. The user at least needs to know whether after running the utility, the duplicate files will be gone, or whether changing one of the files will change the other.
Symbolic links, hard links, ref links are all part of the file system interface, not the implementation.
Am I missing something, or isn't it a "file de-duplicator" with a nice UI/UX? Sounds pretty simple to describe, and tells you why it's useful with just two words.