Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
X11 Universal File Opener and XDG Mess (vermaden.wordpress.com)
89 points by vermaden on April 22, 2021 | hide | past | favorite | 85 comments


Is the OP's install missing a bunch of MIME entries or something? I can't reproduce the examples given for xdg-mime on Fedora 34, e.g.

  $ xdg-mime query filetype file.docx
  application/vnd.openxmlformats-officedocument.wordprocessingml.document
  $ xdg-mime query filetype file.xls
  application/vnd.ms-excel


Can't reproduce any of these on Arch either, mine is definitely checking the contents of the file as it saw an empty file titled "file.pdf" as "text/plain" and a .doc as "application/msword".


Fedora 34 is as bleeding edge desktop as it can be.

Maybe you have more recent xdg-utils version?

I run 1.1.3 here.


I also have 1.1.3.

Maybe what's missing is the libreoffice package? It seems like all these types are provided by /usr/share/mime/packages/libreoffice.xml on my system (which is read by update-mime-types).

Arguably this is a common enough format that you shouldn't really need libreoffice installed to have it, though I would think any other office program would have it. I think /etc/mime.types serves as the ultimate fallback, though it doesn't have entries for libreoffice file extensions on my system.


I will look into that, thanks.


xdg-utils does not use /etc/mime.types; that file is used by Apache httpd, CUPS, and various other programs though.


OP is missing definitions that associate *.docx files, etc., with the appropriate MIME type.

The Garbage In, Garbage Out principal applies here.

For anyone interested about how the plumbing, here are the specs:

shared-mime-info-spec[0] is for mapping glob patterns, magic numbers, string fragments, etc. to MIME types; and desktop-entry-spec[1] is for defining what an application is, its icon, how to launch it, what MIME types it handles, etc.

[0] https://freedesktop.org/wiki/Specifications/shared-mime-info...

[1] https://www.freedesktop.org/wiki/Specifications/desktop-entr...

On my (Debian) system, the following packages provide the definitions that map *.docx files to the appropriate MIME type:

    $ grep -l '\*.docx' /usr/share/mime/packages/*.xml | xargs dpkg -S
    shared-mime-info: /usr/share/mime/packages/freedesktop.org.xml
    libreoffice-common: /usr/share/mime/packages/libreoffice.xml
... and subsequently:

    $ xdg-mime query filetype Downloads/something.docx
    application/vnd.openxmlformats-officedocument.wordprocessingml.document

    $ xdg-mime query default application/vnd.openxmlformats-officedocument.wordprocessingml.document
    libreoffice-writer.desktop*


Having lived through the same frustrations I've released https://github.com/chmln/handlr a year ago. It comes with lots of nifty scripting tools and significant improvements over xdg-open


How does it decide what mime-type a file has?


It uses the shared mime database, which in turn relies on both file type and content to detect the mime.


Doesn't seem like it does. You set it yourself.


Thanks - will look into that.


File associations are certainly an area where I feel desktop systems in general have failed a bit. Both magic sniffing and and file extensions are very crude ways of managing the associations. Mime type in xattrs (or similar) is slightly better, but does anyone actually use that? And of course they are very platform specific and well hidden in general.


I don't think storing the mime type separately is better: That requires every system which you might use to transfer or store your documents to support storing the mime type separately as well.


This is already how HTTP works.

Storing the MIME type separately is definitely a good thing. This isn't the only way you do it. You start by checking the MIME type and then fall back to one of our crude detection mechanisms, like guessing what the file type is based on extension.

We already store plenty of metadata with files--permissions and timestamps for starters. The metadata is sometimes lost when you transfer to another system. That's OK... you can repair the metadata.

Given that this metadata is already present in the HTTP protocol, it's present in files embedded in email messages, and given that extended attributes are supported by many different filesystems, I think this solution is completely workable. You could make the argument that since we spend so much time interacting with files over HTTP, that having a MIME type in the metadata is actually the common way to do things.


What is it that makes the MIME type good and reliable and the file extension crude and unreliable? In my experience it's the opposite - the file extension is much more user-visible, much more likely to be preserved when the file is transferred, and so more reliable than any other way of storing the file type. HTTP and email were overly influenced by the unix tradition in which file extensions are seen as less important, but nowadays the file extension is the most widely supported "extended attribute" that a file might have - why use a less reliable version of the same thing?


One huge problem is that the space of available prefixes is just too damn small. It's tiny. This may be fine for you if you only deal with a few mundane filetypes like plain text, PNG, etc. However, it's very easy to end up in a situation where multiple applications share the file extension. I've run into it all too often.

So you need something to disambiguate.

I can understand that if you've never personally run into this problem, it may seem unimportant. It is extremely frustrating to work with two different files with the same extension on a regular basis.


> However, it's very easy to end up in a situation where multiple applications share the file extension. I've run into it all too often.

Care to share some examples? I've very rarely had this problem when some specific program chose to use something like .dat as the extension, but I doubt someone making this decision would have picked a sensible mime type either.

Then you mention http already solves this problem. Great. Now you only need to fix the remaining five trillion protocols used to transfer files that don't.

If you're old enough you might know the unix haters handbook. It also made fun of this problem and smarty-pants explained how the next version of macos will strictly store any metadata, such as mime type, separately in the file system meta data and not in the file(name) anymore. I think they even gave the size of an image as an example. Thirty years later and this is still what macos does; Jpeg files still end in jpg and contain exif data.

Metadata goes in the file, file type is the same. That's the only sane way to get this to work in an interoperable way. It's the least common denominator, since you always have these two, no matter the environment. For anything else, the ship has sailed.


File extensions are flawed. The .spc file extension is used for both cryptographic certificates and SPC700 audio files. Every time I setup a Windows machine and want to play .spc music, I have to reassociate .spc with a music player. Every time I setup a Linux KDE machine and want to play .spc music, I associate .spc to a new MIME type in my KDE settings, and file picks up this new file type, but xdg-open and KDE insists on treating it as application/pkcs7-mime.

MIME types are flawed in a world that only stores file extensions. On Android, you can only register apps as handling specific file MIME types, not file extensions. And for a non-mainstream file extension like .spc, every file browser app synthesizes a different MIME type (which may be blank in some cases).


Not to diminish the argument about file extensions having a too-small namespace, but... why not just save your SNES music in wav or flac files or something? Wouldn't that be a simpler solution to manage in many ways (not just because of file extensions)?


You can't loop a .wav infinitely (in most music players), you can't mute or isolate channels, change playback speed or pitch or sample interpolation, mute or replace samples, extract score data from the game engine, study how the game engine drives the SPC700 hardware...


In the same sense there are a lot of advantages to storing your traditional music in the format of its original DAW project files, but wouldn't you agree that's impractical for listening purposes?


It's not impractical, there are certain music players (winamp/foobar2000 plugins, some mobile players) which make managing a library of emulated music as easy as managing regular music in those programs, with orders of magnitude smaller files, and added features like endless looping (which is an important feature to looped game music, but less so in commercial music). One reason it works well is because console sound dumps are self-contained and take up two full orders of magnitude less space than the recordings (because they're taken from consoles with small memories), whereas DAW project files often depend on large sample libraries and external VST plugins, taking up more space than the recordings and often come with setup/DRM issues.

There are flaws, like PS1 sound dumps often coming with wrong volumes and echo, but then again most recorded versions of those songs were also generated through inaccurate emulation, then fed through lossy codecs, and some have playback/recording glitches.


Those kind of files aren't the rendered audio, they're more akin to MIDI files, but specific to the SNES synthesizers. So they have a vastly smaller file size, and contain more semantic information, than a wav/flac/etc.


Yes they are flawed. Also they are the only thing that works for reasons I gave above. You simply lose interop since we can never have the whole world switch to a consistent mandatory-mimetype-as-file-metadata model at the same time.


> One huge problem is that the space of available prefixes is just too damn small. It's tiny.

Can't extensions be long? We're not stuck with 8.3 anymore; AFAIK, "foo.mycustomprogramnamehere" is a 100% valid name.


They can, but plenty of programs still choose to have a 3 or 4 character extension for their custom formats.

The number of video games that have a different .tex for textures ...


I hear you, but they would still say fuck it. They will not think of a proper mime type. What would it be? application/x-studio-game-texture-format-3 If one does not care enough with file extensions they will certainly not care with a mime type.


That's not really the point. They likely wouldn't have a content type. File extensions on linux are treated as more of a hint.

The point is that if content types were actually stored separately, it would be possible to not rely on detection. Furthermore, it would also be possible to know about a file's content type without it being present on the user's system.

You see, in xdg, there is a core database of content types and how to detect them, and more can be installed with programs if they care to do that. But if a file's content type is not in that db, it's unknown to your system entirely. So your system doesn't know that it doesn't know. It's very inconvenient.


At best the mime type in the file attributes would also be a hint (since there are many situations where it could be lost or not specified at all).

So, what makes it a better hint than file extensions? If you see an extension you don't recognize, why is that any worse of a signal than a mime type you don't recognize?


Why would adding a new custom mimetype and marking those files as using it be any easier than adding a new custom file extension and renaming those files to use it?


I don't think we're talking at the same level. Implementing content types at the file attribute level simply allows more flexibility in terms of handling file types.

I might write a proper blog post about this if I get some time to get my thoughts together. I worked for a year or so on this and with the xdg spec, it's all implemented in pretty frustrating ways.


And where does the MIME type originate? With many web servers, its based on file extensions or magic numbers...


Sure, static web servers serving files from a standard filesystem.

Does it apply to dynamically generated content? No. Does it apply to key-value stores like S3 and GCS? No.


Seems simpler just to use extensions or magic #s. Most normal users don't understand metadata.


Most users don’t understand file extensions.


MIME type feels as if it is as intrinsic a property as something like file size. It is a property of the data itself, not a property that a person chooses for the file like, say, the file name.

Could you give an example where one would want to choose the MIME type of some data, from a selection of options?


> It is a property of the data itself, not a property that a person chooses for the file like, say, the file name.

I entirely disagree. If I have an empty .json or empty .bin or empty .txt I expect a different behaviour in the software that I am opening the file with, even if in all cases the actual data is zero bytes.

More generally file formats can be polyglots: https://github.com/Polydet/polyglot-database


"It is a property of the data itself," meaning, it's metadata. I'm in favor of preserving this metadata.

You will sometimes need to change the MIME type of something. For example, if you use a text editor to create an HTML file, you'll want to change the MIME type to text/html. It's also nice to be able to mark *.h files as containing C or C++ code.


It seems like you’ve misunderstood me? MIME-type is an intrinsic property which can be derived from the data itself.

I have a cat. I can derive that it is a cat when the situation requires it. I don’t need to have a metadata label attached to it, in my own handwriting, saying “cat”.


I argue that it’s not possible to determine the mime type of a file from its contents. For example the content could be malformed. An editor or IDE should still highlight it so the user can find the syntax error.

Of course, for some files it is possible. But we need the metadata for the other files.


Well, they are bytes, not a cat. It can be interpreted as an image (in multiple formats) or text or whatever, so I don’t think this metaphor is apt.


It does, but that seems like something those system must do, if you want a robust solution. If the systems in question are only capable of transferring untyped blobs of bytes, then you get untyped blobs of bytes out the other end of that pipe.

And, if it happens that the file goes through such a system that doesn't understand mimetypes, you're only back to the status quo of having to sniff the type from hints like extension or content.


> ... that seems like something those system must do, if you want a robust solution.

Would you say that all the common desktop filesystems, flash memory filesystems, optical filesystems, archive formats, etc. today all aren't robust since they don't do this?

> ... you're only back to the status quo of having to sniff the type from hints like extension or content.

Combining multiple methods does solve that problem, but doesn't it kind of negate the benefits too?

Also, I still think there are some issues. For example, it would need the creation of new mime type changing UI, which might be a hard thing to teach laypeople (they already barely understand extensions). And what do you do if the internal mime type disagrees with the extension? Aside from just being confusing, that could be purposely abused by malware.


> Would you say that all the common desktop filesystems, flash memory filesystems, optical filesystems, archive formats, etc. today all aren't robust since they don't do this?

I mean, yeah. They all guess at the mimetype, all the time. They all fall on their face fairly often in that regard.

> Combining multiple methods does solve that problem, but doesn't it kind of negate the benefits too?

No? A system that just stores a two-tuple of (type, data) and doesn't have to guess — when it knows the type — is strictly better than a system that always guesses. Where it integrates with other systems that understand how to type the incoming data, it would work flawlessly, every time. Where it integrates with systems that send untyped bytes, there is again no choice but to guess: the data simply isn't there.

> For example, it would need the creation of new mime type changing UI, which might be a hard thing to teach laypeople (they already barely understand extensions).

Yes, I agree. But people are only going to hit trouble where the mimetype isn't known and the sniffing fails, which is the same issue they'd hit today. I'd argue setting a mimetype has a better shot at being a good UX than trying to get them to set the file extension ever will though.

E.g., Right click → Set file type → prompt with different types (use friendly names, if at all possible). E.g.,

  This file looks like it is probably one of the following:

    JPEG image
      (can be opened with GIMP, Photoshop.)
    PDF document
      (can be opened with Adobe Acrobat.)

  > See all options
    (accordion dropdown to show all options)
Yes, that still requires the user to know the differences between a JPEG & and PNG and to a layman, that's considerably sub-optimal. But we're at the point where the system didn't record the mimetype, and can't guess it correctly, so there's not much left, really: some human has to make a call.

A form of this exists today in most systems, with an "Open with" context menu, and generally with an option to "always open files of this type with this application". (But that's more about the binding between the mimetype — still determined currently through sniffing or extension — and the handler for that mimetype.)

But ideally, if the interfaces were there to allow the process to be deterministic & obvious when the data is known, things could or would grow to adopt those interfaces. (Though it'll likely be a looong time.) Unix screwed up, in that regard, in that it set us down the path of "everything is untyped bytes", vs. "everything is strongly typed bytes". With the latter, the system can start figuring out correct or incorrect actions. With the former, it simply lacks the information to make a decision.

> Aside from just being confusing, that could be purposely abused by malware.

OS X essentially just marks, in the metadata of the file, that it came from the Internet. It could keep doing that. Whatever process "this file could harm your machine" happens with in the browser today could keep on happening that same way.

Granted, there is the possibility of "foo.jpg" being sent with a "application/executable" mimetype. That's a real concern. I think this comes down to the system being clear about what you're dealing with, and the consequences of actions. This problem already exists today: people have crafted ".pdf" files that are valid executables. Having better security controls on apps (not having desktop apps run with the same privilege as the user) would help (limits the damage).

We've learned, repeatedly, that strong typing results in most robust systems. I don't think the answer is any difference with the bag of bytes a file comes with: knowing the type is better than guessing it. We've also learned, I think, that sniffing almost always leads to loopholes…

(I'm not a fan of the "or guess" bit of the proposed idea in my comment; I think a simple "the file carries the mime and that's that" would be better, but I suspect that the roughness of integrating with legacy code that can't communicate what type of data it is reading/writing would hamper that.)


I miss the old Mac system of explicit file type and associated application fields in every file metadata.


I seem to recall that system was a bit annoying if the file type matched but the creator didn't.


AFAIR the creator code determined which program would be used when you opened the file, as well as the file's icon.

The file type determine which applications could be used to open a file (by drag-and-drop onto the application, or via the application's Open dialog box).


The system is still there, but it's a bit vestigial.


I honestly don't see a problem with using file extensions to detect file type. Windows gets a lot of shit from Linux users for this but I don't know why.


That may be true in terms of possibility but not relatively. Compare it to the situation on mobile, where you're lucky if you get any choice at all about what programs open which files.


The problem with mime type in xattr is that you lose it as soon as you copy it to another filesystem. It also assumes that every system names its MIME types identically.


The original Mac really had something with the type and creator codes.

Applications could register to open a type, I think it would default to the creator.

Of course, that wound up with things like resource forks, and that didn't really play well with anyone else's file system.


Types and creator codes were stored in the file system, not in resource forks.

What was stored in a resource fork (of each application) was the declaration “I am [four byte creator code]; I can open these [four byte file types]. That’s the information that the Finder built its databases from (https://www.folklore.org/StoryView.py?project=Macintosh&stor...)

That easily could have been moved into the file proper (as AppleSingle (https://en.wikipedia.org/wiki/AppleSingle_and_AppleDouble_fo...) did, but another (and IMO better for this purpose) option would have been to move that info into a special code segment in the data fork (in Unix parlor; old-style Mac OS programs stored their code in code segments in the resource fork)

Also, I think Mac OS still uses https://en.wikipedia.org/wiki/Uniform_Type_Identifier, which are an improvement over file types and creator codes (you can, for example, express the fact that every html file is a text file with it)


Yah, details are hazy, it's been a long while since I was truly annoyed at this when it happened.


Here's my attempt at solving the same kind of problem for just URIs, I try to defer to .mailcap for file based auto-opening.

Never had much luck with xdg-open

https://github.com/mjsir911/browser


I like that (browser).

Will definitely dig into it to either use it directly or at least try to implement it in mine see.sh.

Also IMHO you should add short 'youtu.be' regex handler :)

Regards.


For a command-line relative, note that you can write a custom .lessfilter to decide the beginning of your paging pipeline. When syntax-highlighting, be sure to call less with -R.


One should be careful around lesspipe -- many of the tools it invokes by default have less than stellar record of handling untrusted input. https://news.ycombinator.com/item?id=8650952


That’s a good point. I wonder if xdg-open relies on any of the same routines.


I've run into too many problems with .lessfilter and I've had to disable it. It's hard to write one that works on anything.


Looks like mine relies heavily on pandoc, unoconv, xxd, jq, and OpenSSL. Apparently, some packagers include an enhanced OS customization with high-risk information retrieval tools (see seclists comment).

pdftotext seems reasonable and looks to be part of the Debian lessfilter.


Thanks - I will check that out.


xdg-open is so frustrating. Its defaults are wrong and its a pain to reconfigure. I replaced xdg-open with this shell script that simply copies the url and posts a desktop notification saying it did so (I got the idea from a post on twitter). I'm much happier to make the decision on my own what to do with external file handlers.

    #!/bin/sh
    echo -n "$1" | xclip -selection primary; notify-send "URL Copied" "$1"


It should be noted that the behavior of `echo` in this case is notable not standardized with regards to what should happen if `$1` starts with a dash, and whether it should interpret several special control sequences or not.

It is for this reason that `echo` is generally frowned upon for programmatic input, and `printf %s "$1"` is recommended; as this is guaranteed to not interpret, and copy dash-leading strings to the output verbātim.


Pretty sure the behavior of echo is standardized on all machines that have notify-send installed.


You would be wrong as pointed out below.

Modern shells also typically have `echo` as a builtin and can have quite different behavior.

https://stackoverflow.com/questions/33784892/bash-vs-dash-be...


notify-send works on Arch Linux, FreeBSD, and Alpine Linux, each of which uses a different echo(1).


This reminds me of Plan 9’s plumber: https://9p.io/wiki/plan9/using_plumbing/index.html


The XDG scripts are strange. Some are really simple while some aren't. A while back I wanted "dropbox shortcuts" to work on Linux. So I copied the xdg-open examples and wrote a Nim utility to parse and then open the file [1]. Then it was easy to set KDE File browser to use the program it for dropbox shortcuts. It's nice having a simple language that can compile to a little static binary or be run as a shell script.

1: https://github.com/elcritch/open_on_dropbox


Lol I was frustrated by this with my xfce4, where I could not setup the "Browser" options correctly (had to tweak DPI/scale options for chrome, and this was only doable through command-line). Then I've found out about "exo-open" apart from "xdg-open" and seems like more - https://twitter.com/__malkia__/status/1383898991864147974


I use xdg-open (alias xo) and it works for me. If file associations are not right, I can set them in the file manager (or in the launcher I use).


I find vermaden's blog (especially the FreeBSD series) extremely useful and informative. Thanks for this and please keep it up this way!


Thanks, I intend to.

Any topics You would like to be covered?


(self-promotion) https://github.com/bAndie91/mimeopen-gui

it shows a dialog with all the associated application to the user to choose from


Next time I start thinking it might be time to see if any of the 'desktop' guis have gotten any better, I'll reread this and save myself the time.


do that! It will spare you from suffering through the seemingly nowhere-realized-in-full-menu-standard and the awful editors that put lipstick on it.


I'm grateful for this article, and hope I will one day figure out how to change my URL opener for xdg-open :)


Each DE can have slightly different paths, but in general find the file mimeapps.list. It's typically looked for in the traditional manner of /etc/, /usr/share/ and /usr/local/share/ and $HOME to find global, local and personal configurations. In your $HOME it's usually something like this:

    ~/.local/share/applications/mimeapps.list
    ~/.config/mimeapps.list
In here you find the MIME type pointing at a .desktop file (app launcher), it's a matter of changing that. Mine for example related to using Firefox:

    [Default Applications]
    x-scheme-handler/http=firefox.desktop
    x-scheme-handler/https=firefox.desktop
Find the .desktop file you want to launch and just update that bad boy. Most DEs have a GUI tool to manage this for you without having to resort to manual editing...


What does it mean that I don't have the desktop file?


Then you don't have a program that can handle the URL scheme, at least according to desktop-entry-spec[0], which is the source of data that xdg-utils consults.

[0] https://www.freedesktop.org/wiki/Specifications/desktop-entr...

On my (Debian) system, firefox and chromium ship their .desktop files.


nod and the .desktop file is just an INI-style file that one can hand create for anything at all, the launched app just needs to accept the URL as a parameter input as if typed on the CLI for this scheme.


Thanks.

I use BROWSER environment variable for that.

It seems to even override XDG settings.

    % xdg-settings set default-web-browser firefox.sh.desktop
    xdg-settings: $BROWSER is set and can't be changed with xdg-settings
    
    % echo ${BROWSER}
    firefox
Hope that helps.

EDIT: You also got me nice idea - to add http(s):// and ftp:// support for mine see.sh opener :)


For anyone proposing a replacement for xdg-open: It is already a wrapper around kde-open, dde-open, gvfs-open, gnome-open, mate-open, and exo-open. We already have a classic xkcd 927 situation.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: