Coming from storage, I really hate it when people (unknowingly or deliberately) misuse terminology in this field.
Whoever Xata is, they aren't storing files in their database. They are storing blobs, or objects... depends on how you look at it, but definitely not files.
Simple way to see that that's not true: can they store write-only files in their database (eg. /dev/null?) Or can they store UNIX socket files? What about device files? Can they recognize that the file has setuid bit? And so on... practically no useful attributes of files are stored in their database.
They obviously didn't even want to store files... It just feels like some kind of marketing trick (especially since the article is excessively peppered with self-congratulatory quotes from people who have allegedly found the product useful). Just say it as it is: Blob attachments.
I'm no expert in this field. I presume you are technically correct. But also probably too focused on technical details.
Send a file or path to a db server, in an INSERT query (INSERT email, pwhash, avatar (?,?,?) into users), and that than handles the actual storage, all abstracted away, for the user (programmer) of this system, this is "storing the file in the DB". If I can then do a "SELECT email, variation('tiny', avatar)::cdn_url FROM users", that is "getting the image from the DB. Well, technically getting the email and a URL where this file cane be retrieved.
To me, as a user of such a database, it matters not if some engine stores it as a blob, base64, pointer, or just a URI to an S3 block. What matters is that it's in the same layer of abstraction.
> To me, as a user of such a database, it matters not if some engine stores it as a blob, base64, pointer, or just a URI to an S3 block. What matters is that it's in the same layer of abstraction.
By "user of such a database" you mean the end-user of an application, right? As opposed to the devs, DBAs, SREs, et cetera? Because those users absolutely care about how this feature works and absolutely do not want anything to be abstracted away, because that's the dif
Also, data: URIs need to die. It started off with lazy JS devs using <canvas>'s to-data-uri (instead of toBlob) and now it's infecting the entire dev ecosystem: just go to StackOverflow and there'll be a post every few hours where some asker genuinely believes we're supposed to store large, gigabyte-sized files in a Postgres or MSSQL table as a Base64 string in a varchar column with a text collation (aiiieeeeee).
I did not mean SREs and DBAs, as those typically need to know far too much of the innards of the systems they are maintaining.
But a developer? I honestly don't really care that "(BIG) TEXT" is stored using something called TOAST[1] in postgres. All I care about, is that I can store large texts of varying lengths.
The original argument, to me, sounded very much like someone with a lot of knowledge coming in and explaining "no, VARCHAR and TEXT are very different!". Sure, for someone hacking away on the storage-engine or even someone tuning pg, they are. But for "the average developer"? They are the same.
Sure, I am aware that all abstractions are leaky. But e.g. TEXT in pg is a good example, because I really didn't have to care, it just worked the same as storing ints, strings, etc. Untill it didn't and I had to go down that rabbithole, hence why I do know about it, as software dev - as user of postgres.
And about URIs: I didn't mean you'd get a data:uri. I didn't even imply that. I really meant a URI to where the asset can be found online: some S3 URI or so.
"practically no useful attributes of files are stored in their database."
Probably because you work on the field, you miss what the layman wants to see. The most useful attrivute for me is the data. The other use case you cited are legitimate, it's just that I, as a layperson, don't think about them when we talk about files.
That's part of the reason why experts exist: to tell laypeople how to correctly use the language.
In other words: why should laypeople (who have no clue about the technology) decide how to call things? And why should someone who dedicated substantial effort to understanding, cataloguing and organizing the terminology give any thought to how non-experts, who haven't spent as much time and effort decide to go about it?
Imagine going to a medical doctor and demanding that they switch terminology they use in medical reporting or pharmaceutical nomenclature to work by the "rules" established by people who have no knowledge of, say, anatomy or pharmaceutics? Like, say, you decide that it's convenient for you to call all pills "paracetamol" -- do you think a doctor would humor such an "initiative"?
So, why should anyone in the filesystem making business humor laypeople concepts of files?
Early operating systems did not support directory trees, and I'm sure some didn't even support timestamps and such. Would you say that those system do not store files at all? What even is the defining characteristic of a "file"?
> What even is the defining characteristic of a "file"?
This is where you have to start when you read b/s articles like the one in OP.
But, to answer your previous question, lets first look at a different example: the word "car". Back in the days, before we had horseless carriages, "car" was, basically, a contraction of "carriage". I.e. two hundreds years ago it would be completely natural to picture cars as being pulled by horses, and cars that weren't pulled by horses or other beasts of burden would be the stuff you find in fairy-tales.
Were English speakers stupid two hundreds years ago to so grandly misuse the word "car"? -- I don't think so. In their context it made it perfectly fine. The context is gone now, so, whoever calls a buggy with a horse a car today is not using the language correctly.
And such is the case of the early filesystems. By today's standards they wouldn't have qualified to be called that way. Probably, the early filesystem more closely resemble key-value stores we use today. The discrepancy happened due to semantic drift caused by changes in technology.
Similarly to how you wouldn't use the word "car" to describe a buggy today, you shouldn't use the word "filesystem" to describe a key-value store today. Well, unless you make sure the readers understand that you are talking about what happened some 50-60 years ago.
And people who get to decide what to call a filesystem and what not to call filesystem are the people who make filesystems (and I'm one of them).
If the people designing the storage system get to decide what is and what isn't a file, then the database does store files, because the developers of the database system are telling you that it does. That those files have properties unlike the files on the file system you have worked on is entirely irrelevant.
At my job, there's a thing where owners of internal systems always say "X is not a database" or "X is not a filesystem" in reference to things that anyone else would call databases or filesystems, to the point of it becoming an inside joke. I don't mind if people misuse terminology from my field (databases). Like I'm not going to stand up every time someone says "Postgres is a type of database" instead of saying "DBMS."
Can you send a write only file over the network? UNIX socket file? Device file? Setuid bit? What does setuid even mean when you push the file onto a Windows box?
Most people are only concerned with blob attachments when they say file.
> Can you send a write only file over the network?
Yes. NFS can do that, for example.
> UNIX socket file?
I don't know a system off the top of my head that does this, but rsync could be possibly preserving file type given some flags.
> Device file?
Not immediately, but sort of. Eg. device files under /dev are created by udev based on the information it reads from a socket. In principle, you could have the user-space side of udev running on a machine other than the one physically attached to the block device thus exposed.
Ultimately, however, you can do all of the above over the network if you use iSCSI, or have a driver (similar to Ceph, Lustre, BeeGFS etc.) that exposes filesystem over netwrok.
Most importantly, however, I don't see how your question is relevant to anything I wrote. Suppose you couldn't do any of the things you naively thought you cannot do... then what? Why is sending something over the network is an indication of anything?
> Most people are only concerned with blob attachments when they say file.
So what? Most "people" (by which, I think, you mean "programmers") are irredeemably dumb. It doesn't matter what a majority of a group of idiots think. But even if they were smart, again, who cares what the majority of smart people thinks? -- majority isn't what establishes the truth.
You're stuck in the UNIX-ism of everything is a file. So things are files in Unix that in other operating systems are not. COM1 is a file in Windows. It doesn't make sense to transfer COM1 over the network. You can't store it. You can't send /dev/null over the network. You can send a representation of it via NFS, but anything you write to it via NFS isn't writing to the file, you're writing it back to the original server that you mapped NFS from. The representation of it isn't the file, just like a map isn't the terrain. If I map a filesystem over the network, I'm still not sending the file over the network. I'm just sending a hook for the server that's getting mapped to know what's going on.
When I say most people, I mean most users. Most users aren't concerned with socket files, or write only files, or device files, because they aren't things they're concerned with. They think of a file as a blob of data, with maybe metadata for who can access it (et al). So, for the purposes of storing files, a file is a blob of data, not something being exposed across the network through special drivers.
Whoever Xata is, they aren't storing files in their database. They are storing blobs, or objects... depends on how you look at it, but definitely not files.
Simple way to see that that's not true: can they store write-only files in their database (eg. /dev/null?) Or can they store UNIX socket files? What about device files? Can they recognize that the file has setuid bit? And so on... practically no useful attributes of files are stored in their database.
They obviously didn't even want to store files... It just feels like some kind of marketing trick (especially since the article is excessively peppered with self-congratulatory quotes from people who have allegedly found the product useful). Just say it as it is: Blob attachments.