> inside of Google, they are using databases to store filesystems. I'm strugglin...

marcyb5st · on April 15, 2021

Googler here, I think OP is referring to the relationship between colossus and bigtable [1]. Basically, what happens is that colossus uses bigtable for storing metadata. So, when you issue a command like `ls /some-cluster-directory/myfolder/mysubfolder` you are really querying bigtable instead of a distributed ls on a cluster fs.

It is much more complicated than that, but the idea is that using bigtable you create a resemblance of a fs that feels like a fs to use for the most part.

[1] http://www.pdsw.org/pdsw-discs17/slides/PDSW-DISCS-Google-Ke...

eru · on April 15, 2021

Yes, that's something like I was going for.

(I'm an ex-Googler, but I am glad that my spotty memory gives me ample protection against accidentally giving out company secrets.)

I suspect Google Drive is also backed by something that's not a traditional file system, even thought Google Drive tries to look a bit like a file system to the user.

iudqnolq · on April 15, 2021

From the outside one clue is that you can have two files with the same name, another is the url to a file doesn't change when you move it. Also it's dog slow to list a directory, like 30sec for 50 files.

hansvm · on April 15, 2021

You can experiment with something kind of like this yourself pretty cheaply. Here's an example [0] user-space filesystem that stores its data in an in-memory sqlite file.

You could just as easily replace those in-memory calls with a networked DB (perhaps with speculative pre-fetching or something, I dunno, I probably wouldn't try to make a python filesystem too performant).

The salient detail here is that as far as your kernel is concerned a filesystem is an API for interacting with data (whether that's with a daemon process like the linked example or with raw function calls built into the kernel). Those APIs can and often do interact with structures physically stored on a local disk, but that isn't a requirement.

[0] http://www.rath.org/pyfuse3-docs/example.html#in-memory-file...

c22 · on April 15, 2021

You used to be able to store your own filesystems on google's databases [0]. No idea if this still works, though.

[0] http://sr71.net/projects/gmailfs/