Just like the author, I too am quite particular about empty space in my MP3 files.
When I was ripping my CD collection, I religiously tagged the MP3 files with ID3v1. Later on, I had some tracks whose titles were just a tad longer than 30 chars so I had to use ID3v2 for those and I noticed the file size grew by _a lot_.
Frustrated by this, I opened them in a hex editor and learnt that ID3v1 was a fixed format of 128 bytes, but v2 was variable. I also found out that the software had added a 4KB zero-byte padding to the v2 tag, which was "necessary" because the tag is now at the front of the file, and this padding allows more tag data to be added easily later on.
I tried various ID3 tagging software at that time and all of them added a padding. So I learned about the tag format and wrote a tagger myself that didn't add any padding. It was a great learning experience, and I managed to shave those useless zero bytes from my MP3s.
Unless I had a very large library or very limited storage space and no way to expand, I'm having a hard time understanding why this would objectively really matter at the scale of a personal MP3 library. We're talking about a tiny bit under 1GB of additional data per million individual tracks.
Must have been pretty fun tracking this down, but I'm curious as to why you'd go through all this trouble. Storage is cheap, my time is not haha.
I think I was trying to get my entire music collection to fit onto my iPod, which at that time only had 10GB of space.
And also, I love to optimize things, so it made no sense to me why squeezing 1-2 more words in the track title would inflate the file size by 4KB. As a teenager, I probably had more time on my hands back then.
OP specified reencoding, which probably means converting lossy to lossy. Despite the fact that modern codecs (and even mp3, LAME is transparent at V2~192kbps for most tracks and everything except killer samples at V0~256kbps) are generally very transparent at low bitrates, the same guarantee does not apply when encoding lossy to lossy.
Think about it, if transparency guarantees applied for lossy transcodes, you could encode mp3 -> mp3 thousands for times with no generational loss. This clearly isn't possible.
Of course, you could rip all your files again in a better codec, but the time required to do that is pretty intense.
In all fairness, it wasn't that long ago Apple were still selling iPod Touch with 8GB base storage, having tons of storage is a relatively new thing for most people. But with that specific 10GB metric, I'd assume we're speaking about those earlier scrollwheel iPods.
It's all about how you value your time. These kind of quests are usually very valuable learning tool and fun. I have learned a lot with hobby projects which didn't create any value as end products but were valuable to me.
There is also a cumulative value in your knowledge and understanding when hacking on things for such specific personal niggles - which over time makes future efforts more likely to reap some combination of higher rewards, faster execution and less effort... and more often than not what we learn can be applied to our day jobs in unforeseen ways.
Having a problem organically emerge in front of us that affects us personally is just a really easy thread to pull on for learning things.
Implementing this would be a good learning experience that will greatly increase your understanding of fopen, fseek and how block storage works. It would be an interesting experiment to benchmark reading a million files with tags at the start vs the end, and maybe compare using spinning rust storage to a modern NVMe drive.
I worked with the FLAC format and it also recommends padding to make it easier to edit metadata. I think libflac reserves 4 KB by default.
> PADDING: This block allows for an arbitrary amount of padding. The contents of a PADDING block have no meaning. This block is useful when it is known that metadata will be edited after encoding; the user can instruct the encoder to reserve a PADDING block of sufficient size so that when metadata is added, it will simply overwrite the padding (which is relatively quick) instead of having to insert it into the right place in the existing file (which would normally require rewriting the entire file).
The article mentions this. It's just half a megabyte seems like a lot for metadata, especially since it's been doing this for so many years, when storage was more expensive
The article explained the reason for the AAC file padding. I'm pointing out that other formats like FLAC (not mentioned in the article) also adopt this strategy.
Honestly 0.5MB of blank space is never a good tradeoff. If you're adding album art that's a significant fraction of a megabyte, then just rewrite the file.
> which can easily exceed 0.5MB
If that happens then you were better off never reserving space at all.
Yeah this is the correct take on all this. It's simply not reasonable to add more than 10% to the size of a file on the off chance you need to perform a tiny manipulation to the file's metadata later.
If it's album art then it doesn't seem big _enough_. Either way it's a bit odd of a size. Maybe it was for album art back when resolutions were lower and it was enough.
The standard method for including album art (outside of iTunes world, I don't know what things are like there) has always been folder.jpg or folder.png in the folder. (And several other common names.) It can be as big as you like. Software should always copy the art, and of course dragging and dropping the album in your file manager makes this happen automatically.
It's absurd to spend the same 0.5MB or more on the same piece of art over and over on every tiny lossy-encoded track. Even with FLAC it's pretty questionable. Better to include a tiny 100x100 jpg or nothing at all.
I actually recently encountered a bug with a music encoding tool where it failed to resize my album art when creating Opus files. I ended up with an Opus-encoded album that was bigger than the FLAC version because every single song had a 30 MB high res scan of the vinyl album art embedded in it.
Ripping from cd reserves 0.05% (about 5kb) for metadata. I can imagine some communication breakdown where someone thought they meant 5% (about 500kb) when specifying the number 0.05 with a % next to it. You see percentage mixups all the time.
In the article he describes how the 500kb was originally used for album art but when Apple started storing that separately they didn't remove the space for it in the music files
I see this type of bloat all over with Apple and coincidentally their devices are, broadly speaking, priced differentiated in large part by disk size and difficult to expand storage. I actually don't believe it's a coincidence at all.
But there are things they've done to save space, with a much larger effect than this, that they could simply have... not done, if that's what they were aiming for. Much more likely they're just sloppy sometimes, like most companies.
From my understanding 0.05% is just a stat from the files the author collected, not that Apple encoder (which is used encode these CD rips) actually reserve X percent based on file size. It's more likely it reserve a static/absolute value of bytes for every audio file.
Apple’s album art has been janky for me for 20 years. I’m sure it would be less noticeable if I had mostly mainstream music tastes, but after years of doing album art manually, Apple screwed it all up multiple times and I stopped caring as much about fighting around the time I gave in and started streaming more music.
This is going to sound weird, but is your wifes phone set to a non US english (even something like australian/british english) language ?
I found the same problem with my phone a long time ago for a non US english account. Once i had that configured differently than what my setting was on the app store it would do all kinds of stupid things with the artwork.
I feel your pain: I have a large and meticulously tagged collection, and when I upgraded to a new iPhone, it seems to have hit "shuffle" on every single album art, even after I nuked it all and did a fresh sync. It's incredibly bizarre.
Yes, one of my favorite jankinesses is that iTunes would put a timestamp in the jpeg header of the embedded artwork for every m4a file, ensuring that a hash of the artwork (for a library system that tried to minimize duplicate artwork) would always be unique. So you’d have a visually-identical but duplicate artwork file for every song on the album.
If only Apple had some way to store metadata for a file separately from the content. Something like files with a "data fork" and a "resource fork", maybe.
Or even if there was a way for them store an image, like album art, as a file in a filesystem. They could keep a folder with multiple files...
That might be confusing, though. Too bad they don't have a way to show a directory as a single "package" in the Finder. I can imagine that would be useful for a lot of things.
They ditched the concept about ten years ago. Being out of the loop I had to look up if APFS even supported them anymore, and in doing so, found opinions on two ends of the spectrum: the concept was the best thing ever, or one of the most detestable features. I guess no one really understood how to use them anymore and added to it's demise.
Resource forks are still used for transparent compression, which is heavily used by macOS system files and when decompressing certain archives (e.g. Xcode XIPs).
How often do users change the metadata of music bought from Apple Music? When ripping stuff by hand, this is understandable.. you make a typo, want to add a year, do some locale changes to add local characters, etc... so in this case i see a possible need to waste 5 kB of space to save a couple of seconds after an edit.
But with downloadad music with all the metadata (hopefully) correctly set, even 5kB is a waste of space.
I always edit the genre so that the tracks work correctly with my various different smart playlists. In rare instances I'll also need to tweak the artist name to fit with the existing tracks in my library.
Sometimes the album or track names have extra info that I don't care about and will delete.
And TV show metadata is usually a mess and needs quite a bit of editing.
I can't speak for AppleMusic, but some online stores are terrible at providing metadata (e.g. JunoDownload often has ALL-CAPS artist names, & turns into &, etc...). Often it's simpler to just throw everything into Mp3tag and get new metadata from Discogs or Musicbrainz instead.
There are many artists/albums/songs on Apple Music which have all caps too.
I'm not talking about less-known artists too, two examples of this in my collection are Steven Wilson and Linkin Park.
Oh BTW talking about less-known artists, I've even seen the whole song being incorrect (duplicate of another song on same album) on Apple Music after switching part of my old MP3 collection to Apple Music native.
I edit metadata for Apple Music stuff all the time, so it is consistent with the other non-Apple Music stuff I have in my library. I also fix capitalization (this is more common in non-English songs, which inexplicably adopt English capitalization standards), the occasional error in old songs (by comparing to scans of physical copies), useless metadata on song/album titles (looking at you, “(2013 Remaster)”), and sometimes the year (for re-releases, I put in the original release year). Then of course sometimes Apple Music overrides my corrections because someone at a record label decided to change the metadata on some song, so I also have to keep track of it using a separate app (“Music Tracker”) that tracks changes to my library (which is also useful to know when something disappears from Apple Music).
I don’t expect normal people to do this, but for me the ability to edit metadata and mix streaming songs with ripped songs is a huge advantage of Apple Music over other streaming services. Unfortunately Apple Music has a lot of other issues, especially with syncing, and I still can’t understand why there’s no dedicated “featuring” metadata field (everyone pollutes the song title: Spotify solved this), but for me there’s no other possible alternative.
I might be misremembering things, but I think digital "box sets" came/come with something like the album and album art all set to the box set instead of the individual albums.
So it's cheaper than buying the albums individually, but then you have to spend some time unpicking them again and restoring the original albums. At least the track/disc numbering is still correct, so that helps a little.
Judging from some reviews left on some box sets, this wasn't always the case, i.e originally even box sets had the proper album tagging…
How often do users change the metadata of music bought from Apple Music?
I have to change the genre on all of the Christmas songs my wife buys from Apple Music from "Holiday" to "Christmas" because there are holidays other than Christmas.
Otherwise, a "Holiday" smart playlist transitions from Silent Night to Monster Mash.
> If you rip a CD with Apple Music using the Apple AAC Encoder (the default option), it will reserve approximately 5 kB of free space inside each file for this purpose.
I was saying that even this is too much, and 500kB is 'way way too much'.
You can’t make changes to the metadata (e.g. change the spelling of an artist’s name) without reassembling the entire file to recalculate the offsets.
Split the file after the metadata block; change the data inside the metadata block (possibly changing its size); add the difference between the old and new size to all the offsets in it; recat the two pieces together.
You could even do this "streamingly", to inject album art etc., since you know what size the added content will be. Simply add the diff-offsets to the right fields as they get streamed out.
The above solutions didn't take much thought to come up with. I wonder why no one at Apple thought of that (especially the latter)?
I would guess the original file structure was designed for hard drives, and was never changed. When storing on a hard drive, you want the data in a single file to be located physically close together, which becomes more difficult if you're splitting up the file when you make metadata changes.
Or, if the article's speculation is correct, Apple designed their formats assuming only 5kb of empty space would be allocated, which was insignificant even for the 1st Gen iPod's 5 GB hard drive. No reason to do more complicated stuff if a simple solution works. It only becomes a problem when people don't bother to keep track of how much empty space is being added automatically.
This is the sort of thing that also seems crazy rare to need to fix. Ok, so you need to reassemble a file and recalculate offsets when an artist changes their names... so what? How often does that actually happen? Do people that download your music CARE that the meta data reflects an old/wrong spelling?
Even the argument of "Well, they need to download the whole file before they can see the metadata" seems really weak. If someone is streaming the music, why not stream the metadata separately from the data? Why do those two things HAVE to be together?
Heck, why not have a metadata "db" (ala plex) instead of insisting that stuff be bundled on the media itself?
> Heck, why not have a metadata "db" (ala plex) instead of insisting that stuff be bundled on the media itself?
Because that isn't portable and then a different faction will be screaming "vendor lock-in". The metadata inside the media file itself is in a standard format that anybody can read, and storing it inside the file also ensures that it cannot become separated from the file it's belonging to.
Besides, any reasonable kind of software involving any kind of media library will already be keeping its own metadata database (and that includes iTunes/Apple Music), because rescanning the whole library on each program startup would be ridiculously inefficient (all the more so historically when hard disks were more common). But it can't be the sole source of truth because see above – people who still keep a local library of music around might want their media file collection to be interoperable with other software, too.
In 2022, we shouldn't be reserving any area of a file as 'spare space' like this.
On the very rare occasion you adjust the artists name in a music file, the user expects the whole file will be rewritten. So just rewrite the whole file.
Whats next? Photoshop only writing out a quarter of an image when I edit my ex out of a nice photo? MS Word only touching a few bytes of a 100 page document when I make a heading bold?
> In 2022, we shouldn't be reserving any area of a file as 'spare space' like this.
In 2022, we should have file formats that are not naive of the underlying filesystem. I store my text notes in Git, but I also have two LibreOffice documents in there because org-mode doesn't handle their contents well. It would be nice of only 5% of the file had to be rewritten when I change a single word, rather than the current 91% of the file. A 5% hit to storage for a 91% reduction each time I update it would be a great trade off.
Under the hood, odt and docx files are zip files.[0] The “91%” change has nothing to do with rewriting the file; it mostly has to do with zip files and how poorly git handles binary diffing. Plaintext files are rewritten on disk every time you save them in virtually every editor and that doesn’t cause problems for git.
I’ve never tried this, so take it with a heap of salt, but you might have better luck with the diffing if you developed a process that unzipped the file contents into your repo after you’re done editing it, and then zipped the contents back up with the right extension when you want to edit it again.
Alternatively, you could just switch to a text-based format like rtf, as long as you don’t need any specific features from odt or docx.
(EDIT: someone else mentions a promising sounding “flat xml” format, which I’ve never encountered before.)
>I’ve never tried this, so take it with a heap of salt, but you might have better luck with the diffing if you developed a process that unzipped the file contents into your repo after you’re done editing it, and then zipped the contents back up with the right extension when you want to edit it again.
I have a current project where one of the other developers has tried this. It might work for a solo project, but with multiple developers on Windows (Excel) and Linux (Open Office), I find that Open Office feels a need to continually fuck with unrelated fields. Editing a single field in an xslx file results in changes in a dozen files in the unpacked version. Making any more complicated changes results in a diff that's impossible to manage.
Do you actually mean OpenOffice, or do you mean LibreOffice? I would hope everyone is using the much more improved LibreOffice by now instead of OpenOffice.
Still, interesting to hear some feedback on how well (or not) that idea has worked for some.
The Flat XML option seems to be the best approach if you need to edit documents more complex than what can be represented in either markdown or rtf (rich text format).
In your case, LibreOffice allows you to save the file as Flat XML (e.g., .fodt), and the document will be written as a single uncompressed file. Unfortunately, few apps have a feature like this.
This is perfect for my use case, thank you! The .fodt file is 150K, vs 117K for the .odt file. But after changing a word less than 20 of the 1983 lines in the file have changed.
Wow, thanks for the Flat OpenDocument file type! Such kind of files are much better to treat with git (i.e. human readable diffs, for instance). However, unfortunately, these files tend to get very large (200kB ODS vs. 7MB FODS). What a pity the XML itself is not less verbose.
I dunno if that will help. I know Dia lets you store the uncompressed XML, but I have seen it sometimes have a 100% diff due to the serialiser moving *everything* around when all that happened was that I rearranged half the blocks on the diagram.
One option is to store files as a database like SQLite, which I assume is designed around not rewriting the whole file on every change. For example Audacity 3.x does this.
However SQLite stores multiple files on disk for crash persistence, complicating matters. But then again so does Audacity 2, and Microsoft Office to an extent (file locking rather than data integrity).
Microsoft Word used to only append changes to the document on save rather than rewriting the whole file, but this was disabled since if you deleted text and saved the file, it remained in the document. https://www.cnet.com/culture/microsoft-disabling-word-2003s-...
Modern file systems address all of that and more - without the added complexity of a whole freaking RDBMS system! If you are going to change an application it makes far more sense to just update it to leverage the features of a modern file system.
Can't understand why this comment is so far down on the page. Apple Music has 98M users [1]. Let's say conservatively users listen to one song per day and update metadata on one song per month. That's 30 * 0.5 MB * 98M ~= 1 petabyte extra per month sent over the wire to save ~4 MB * 98M ~= 392 terabytes per month from being written to disk.
Those 0.5 MB presumably also need to be written to disk, making this a trading away of 392 TB in disk writes in exchange for 17 PB over the wire + 17 PB in disk writes? I'm sure this is overly simple and overlooks a lot of technical details, but I can't see how this can be worth it.
It wouldn't be. This spare space is entirely optional in the file format specification, and it's up to the tool that makes the file if it wants to include it or not.
> For example, placing the metadata block at the end of the file means you must download (or read from local storage) the entire song before you can play it. To avoid this, encoders tend to place it before the much larger multimedia stream block. The metadata block refers to absolute positions inside the multimedia stream block measured as an offset from the start of the file. You can’t make changes to the metadata (e.g. change the spelling of an artist’s name) without reassembling the entire file to recalculate the offsets.
I know local storage can seek to a location in a file without reading the whole thing (ZIP files start with a pointer to the metadata directory which is at the end of the file IIRC), so then all you need is a download system that lets you grab the first block of the file, and then ask for a later block without the middle blocks.
Back in the HDD days, loading music metadata was far from instantaneous even with the metadata in the front. Putting it at the back would have doubled the time needed for the UI to update. Now, that SSDs are more ubiquitous though it might be time to revisit that decision.
This requires multiple requests (or did before QUIC, things may be better now) and a server expecting range requests. It's much more reliable, considering the general mess of web requests, to put everything at the front. It was even more reliable in 2005 to do so.
Yeah, PDF suffered for this because the catalog was at the end of the PDF file. My understanding is that "streamable PDFs" are something of a hack where only the first page is front-loaded. At least that used to be the case.
It could be a form of "look, this file is X large for Y minutes, that's much more than Z, so it must be better!".
I have a rising suspicion that some game companies with fluid ethics do this - avoiding to optimize and strip unused content, in order to inflate game size.
"Wow, it's 102 GB, the game must be huge with boatloads of content!" (ie huge == higher chance of I'm getting my moneys worth). Which in turn becomes a problem with the latest generation consoles that have quite little drive space (Playstation ca 625 GB).
> I have a rising suspicion that some game companies with fluid ethics do this
Occam's razor says that tooling, or developers, just aren't great at shaking unreferenced content from production builds.
I'm not a game developer, but its common to still ship unused code or css in websites, just because someone forgets to remove it, or its hard to tell if it's still used. I could only imagine this is more of a problem in more complicated game dev, especially when time to focus on non-functional requirements like that can be hard to come by.
I have a few apps built for macos which I sell on app store. I really like to keep app sizes small. Like if I would try to use a library and see that the app size blow up from 5MB to 50MB, I would not use it. So basically I don’t use any libraries and keep my apps in sizes of 3-5MB.
At the same time I saw some comments, reviews about my apps, that they are probably would not be good just because the app sizes are too small.
I used to work very hard to push game binary sizes down because it limits the ability for our customers to actually play the game if the patch sizes are too large (though, it must be noted a large part of the company is entirely apathetic to this), but nobody is inflating the sizes intentionally.
As much as people like to hate on consoles gamers, it's predominantly console "TRCs" which put limits on the size of games, that's the only time the company really cares about the sizes of games.
Download size is a "tragedy of the commons" situation. The overall size usually isn't owned by anyone and each contribution to increasing the size is usually small so monitoring it daily or weekly doesn't help. It is only when you are able to compare the size 1 month, 6 months, 12 months ago that a large jump can be observed - one accomplished by lots of tiny 0.1% increments.
But even if you do notice that and file bugs what team in their right mind would spend time trying to reduce their own component's size by 2% when they could spend that time on features or bug fixing instead? And if you asked the majority of their users would agree - "F'k the 20MB, fix bug XYZ/finally implement feature ABC".
Despite each individual decision along the way being arguably the correct decision the product still adds up to +30% every year.
"Never do any download size optimizations" is just as silly as "shave off 20MB at the expense of implementing a much-needed feature."
Usually there's some ultra low-hanging fruit that everyone knows about, and if download size has become an issue, a bit of time should absolutely be allocated to basic housekeeping tasks as long as the product is supported.
Almost nobody cares about the size, that's why its big - not so much a purposeful game - there's no extra money to be paid to bloat your own software, you'll do that fine on your own.
And asset duplication on older console generations. The consoles only had HDDs, and so to ensure a level's assets would stream from disk quick enough they packaged copies of all those assets together. So the exact same game can be smaller on PS5 than PS4 as the SSD makes fragmentation irrelevant.
Often the developers also forget to remove unused files and cut content, but I don't think this has a large effect on the game size.
>Wow, it's 102 GB, the game must be huge with boatloads of content!
Does anyone actually think this way? SNES games were some megabytes and often have more content than I can get through (I still haven't beaten the SNES zelda... maybe I should pick it back up.) Once you're in the Gigabyte range anything beyond is likely waste.
I would guess a lot of people, maybe more non technical people think that.
Every console generation the file size of games gets bigger because of bigger textures, better models, etc.
So it's easy to make the relation between huge file size and quality, which isn't always true. The difference between 5GB and 80GB should be noticeable, but now with 150GB games I don't think it's all content and more bad file optimization because it's cheaper this way and people don't (really) care.
I've always wondered.. Was a game like GTA5 80GB+ because it contained the audio for localized in-game speech for english/french/spanish/italian/etc.? Does this really happen? Aren't we at the point yet where the audio is only downloaded on demand for the localization of the PC unless opted-into later? Also. Don't need your textures for playing on an 8k monitor if I'm playing at 1440p.. unless I want them.
At least on Steam there is the _option_ for developers to publish individual depots for different locales, and a single game download can be made of multiple depots (so you could have a global base depot, and then put localised content in depots per locale). https://steamdb.info/sub/398272/depots/
Because I was trying to distract myself from stuff I looked at some files in a hex editor, and really old DRM AAC / M4P files from iTunes (still have a few around...) don't seem to have the same huge "free" padding block at the start of them (you can still see smaller "free" chunks, so they don't seem to be hidden by the encryption). Which is weird.
DRM iTunes Store files were 128kbs, whilst non-DRM ones are 256kbs, as beyond removing DRM one of the selling points of what was called iTunes Plus at the time was better quality. So it's possibly back in circa 2009 someone made a fuckup with the encoding settings and no-one has noticed since. Or its some extremely weird conspiracy to sell larger iPods.
Or perhaps they wanted to make the filesizes just a bit bigger so that iTunes Plus files felt like quality, like the way expensive products have metal weights added just to make them feel heavier and more solid.[1]
[1] For avoidance of doubt, this last suggestion may be what is known as "humour".
You're not going to further compress the AAC track. Also the iTunes DRM operated at the track level. The audio track in the otherwise normal MP4 file was encrypted AAC data. All the metadata atoms and file header were all unencrypted.
The key to understanding this issue seems to be "-movflags +faststart". The TLDR is that Apple apparently isn't "fast-starting" their MPEG-4 files before distribution.
This became a thing when distributing QuickTime Movies on the internet became a thing (it's not an issue with random-access media), because one needed Movie metadata at the front of the file in order to support progressive playback. Because the MPEG-4 file format is effectively the QuickTime Movie file format, the need to put metadata at the front of the file continues if you want viewers/listeners to be able to play files as they're downloading.
That Apple isn't performing this extremely trivial pre-distribution process is extremely curious. It makes me wonder if this "common knowledge" was lost along the way, or if the people who would know this kind of super-obvious production step just aren't the same people in charge of Apple Music standards.
For anyone curious about what fast-start implementations look like:
What? Faststart just means the metadata is at the front of the file, which these 500KB padded m4a files do have. They just have excessive paddings between the metadata and the content stream.
The author used this flag in FFMPEG because.. well, despite the main goal is to shrink the padding, you still want to have metadata at the beginning.
> Faststart just means the metadata is at the front of the file, which these 500KB padded m4a files do have.
You're correct in the sense that having metadata at the start of the file means that these files are ready for progressive download/playback, albeit with a chunk of unnecessary data transfer.
But the other important part of the "faststart" process is that you get what used to be called a "flattened" file, where the data is contiguous — metadata is followed immediately by compressed media data. (Tools for QuickTime Movies also compressed the 'moov' header, but I'm not sure if that's supported in MPEG-4.)
The article already detailed why Apple (or most of AAC encoders) have this padding, and a reasonable amount of them are useful. They just need to be smaller.
The author don't actually need it to be"contiguous"; it's just that FFMPEG stream copy is the easiest way to so.
In this sense, Apple don't need to "flatten" their files, they just need to have smaller padding just as what iTunes is already doing when ripping user-provided CDs.
You're talking about the space reserved to allow for in-place metadata updates on songs that you've ripped. As you've noted, it's not much and doesn't "need" to be smaller.
I and the TFA are talking about music files purchased from and delivered by the Apple Music Store, which include 500KB of wasted space. The solution the article suggests (and that I added a bit more background to in my post) is to do what any production workflow should do before distributing MPEG-4/Movie files — fast-start those suckers.
0.5 MB is not much, but multiplied by the billions of tracks Apple sells and streams in a year, pretty soon we're talking about a significant amount of wasted storage and transfer.
> In this sense, Apple don't need to "flatten" their files, they just need to have smaller padding just as what iTunes is already doing when ripping user-provided CDs.
What's the benefit to shipping any empty KBs with every song purchased or streamed?
>What's the benefit to shipping any empty KBs with every song purchased or streamed?
For (quicker) in-place metadata updates? You just said it. You can do so with the purchased music in iTunes.
Not sure about the streaming services, but the article isn't about it at all (nor did it talk about if they actually have these 500 KB padding to begin with).
Here's my slightly educated guess (I worked at Apple and for a time iTunes adjacent) the empty space is for file watermarking on edge servers.
All iTunes (Apple Music) content is served via CDN. By leaving some empty space in the file the edge server can just write the owner's account ID fingerprint without having to remux the whole file. Apple can just send a virgin copy of a song to the edge servers and they can do the customization inline with serving it. That will be computationally cheaper than remuxing it and require no extra storage since the file is tweaked in flight. Leaving a lot of space provides enough room for some worst case size of watermark data without producing invalid files.
But that's only my guess. I haven't looked at any files in question.
Good point about the watermarking, although 500 kB still seems rather generous for that: As far as I'm aware, the "watermark" only encompasses a few IDs plus name and e-mail of your Apple ID (compare https://github.com/avibrazil/un-istore).
You may feel it's very small delay, but if you batch edit your songs often (which I do) it's actually pretty noticeable. Try to select 10 tracks and update the album name and feel the difference.
Most of time wasted on remuxing the file comes from re-writing the whole thing on your hard disk, which is more noticable if you save your songs on HDD.
Is it really that bad? No. But it's nice to have some padding.
Same goes to MP3s and especially FLACs (which is even more noticeable because file size is much larger). Some of the digital distributors for hi-res (hi resolution) music do use "flatten" files, and it's quite annoying to edit them.
> But the other important part of the "faststart" process is that you get what used to be called a "flattened" file, where the data is contiguous — metadata is followed immediately by compressed media data.
According to who?
Do any media players care if there's a couple kilobytes empty there?
This is what happens when you teach developers 'storage/hardware/compute is cheap' if you ask me. That plus the lack of interest in Apple Music development in general. (I cannot imagine that shitty music player received much love over the pas years. Still better then spotify though.)
Funny side note: I recently made it's frontend crash. Turns out it _is_ a JS frontend after all. I suspected as much because of how slow & slugish it is, but I got an actual JS error & it fell back to the old table interface
Surely you are talking about different clients than I have on i/macOS? With Apple Music running on my MacBook, I couldn’t even control the player on the respective iOS app from the couch. It was insultingly trashy UX, and I’m deep, deep into apples ecosystem. My reaction was “what a joke, guess I have to use Spotify”.
Apple Music could lack 90% of Spotify’s features, but Spotify will never get me back to their awful desktop UI. It looks like it’s made for toddlers. Full screen on an artist page I don’t even see 5 songs because they have this stupid big banner section taking up 2/3 of the screen. Throw in their spamming of podcasts to me and the whole desktop experience is a mess. Apple Music has its issues but at least the UI is kind of ok. That’s how low Spotify has set the bar which is a shame because their UI was great for about a decade and then they just set it on fire.
Remember when Spotify made headlines last year for having a button to listen to an album in order? That's how low the bar is.
I switched to Apple Music as well. It's not perfect but at least it's not trying to shove Podcasts into my music.
Spotify also likes to complain about Apple being anti-competitive by locking them out of features, then when Apple does let 3rd party apps do things like Apple Watch offline background music playback, Spotify takes two and a half years to actually use it.
> then when Apple does let 3rd party apps do things like Apple Watch offline background music playback
Entirely Apples fault, by the time they enabled the possibility of the feature it had become apparent that the Watch as a platform wasn't really worth focusing on so it's on the backburner now.
If that feature was possible day one, it would have happened. But instead Apple has this attitude of locking things out from 3rd parties and only moving when it starts to make their products look bad.
Offline music is really about fitness users, if you’re not a runner I can see how it looks useless. Anyone who wants it has switched off of Spotify by now, so at this point it probably doesn't matter if they ever get around to it.
More or less, I should have said runners. No need for offline music if you're weightlifting or sitting on an exercise bike. So it's a subset of the fitness userbase. But a big enough one that Spotify did eventually get this done.
But I sorta wonder if their main motivation was that it was a bad look to continue complaining about Apple not letting them make a watch app on their "Apple is cheating" timeline, while also choosing to not make a watch app. If they actually cared about doing it, why take so long?
Yes this is big, with Apple Music I much less frequently have anything promoted at me unless I’m specifically looking for it.
Apple doesn’t twiddle with its UI nearly as much as Spotify does which is nice too. It has its shortcomings, but the consistency means that workarounds stick.
Finally, Apple is surprisingly more permissive with Apple Music’s SDK than Spotify is, even allowing full streaming/playback capabilities (a capability Spotify removed from its SDKs several years ago), which means that there are now several alternative front ends for Apple Music available that can play music themselves — alternative Spotify front ends can only ever control the official Spotify app and Spotify Connect devices which is a real shame.
> alternative Spotify front ends can only ever control the official Spotify app and Spotify Connect devices which is a real shame
I haven’t used Spotify for a long time since switching from Spotify Premium to Apple Music, but I know of a third party open source library that might be of interest to people who have Spotify Premium and would like to explore alternative frontends:
> librespot is an open source client library for Spotify. It enables applications to use Spotify's service to control and play music via various backends, and to act as a Spotify Connect receiver.
> The above command will create a receiver named Librespot, with bitrate set to 320 kbps, initial volume at 75%, with volume normalisation enabled, and the device displayed in the app as an Audio/Video Receiver. A folder named cache will be created/used in the current directory, and be used to cache audio data and credentials.
librespot is a great project, but I haven’t used it to resurrect the native Spotify client I had been working on because it’s technically reverse engineered, which means users of my client risk having their accounts banned (however unlikely that may be).
My favorite part of Apple Music has been the ability to share playlists through iMessage to my friends and family although I don't do that enough.
Then, I love the Essentials playlists, basically, if you search for any large enough artist they'll have a curated playlist always called Essentials which makes it easy to find. The curation is usually spot on although there's been one or two that felt like misses out of maybe fifteen or twenty that I've tried.
Finally, the Apple premium plan for 30 bucks a month, five members, 1TB online storage, Apple Music and TV (couldn't really care less about sharing my workouts or Arcade, but those are there too) is a pretty sweet deal for my family and I.
Dunno if Spotify or Tidal have things similar. And there is some funkiness around sharing occasionally as I'm not sure if my dad ever got access to Music, his account may have been in a weird state since he already had a subscription, unsure, will ask later.
My dad likes that for his own music that Apple doesn't have, if he imports it to his Music app on his computer, Apple will auto upload to the cloud so he can stream it from other devices. I haven't used this much but it sounds pretty neat.
It’s a bit odd, but Music on Mac and iTunes on Windows needs to be controlled through Remote[0]. I tested it now and it works fine. I agree that it’s a tad odd that remote control of the desktop version isn’t as integrated as it is for the Mac/Windows version.
That's interesting to know, but just not going to fly with me. While I agree with the other commenters here that Spotifys Desktop-client is terrible, I don't miss any functionality and I don't want to use an additional app to control the music playing on the mac when spotify does it natively.
(That, and all my friends use spotify, too, so sharing/group sessions etc. work out of the box)
I believe they just fixed this (after how many years?) by moving Music.app away from webviews to native views in macOS 12.2. I wouldn't be surprised to still see a JavaScript error in there though, or maybe a Java error because it probably still runs on WebObjects.
The iTunes store launched in 2003, when most users were on dialup. File sizes of music was critical to the success of the service. There is no way they would have had 500 kilobytes of empty space back then.
The earliest confirmed case is 2010. The author/I couldn’t get hold of any earlier sample files. Do let me know if you happen to have a local copy of some older files.
Glad to hear I'm not the only one who stuffs as much music into their phone as possible :) Not an apple user, but I'd also enjoy some sudden extra 6% for sure.
One is that it stores a "play count", the number of times you've played a song, directly in that song file. So it's always re-writing the source file of the track in the library.
That would be only mildly irritating -- radically blowing up the size of system backups, for instance -- but for the fact that it routinely corrupts the new song file stream.
I see so much bit rot in my iTunes audio files -- and nowhere else -- that the stupidity of not simply keeping some small database of play count veers over into uselessness.
-
The second complaint is that they decided that any file I had ripped from my CDs were pirated, and therefore would require an annual fee for me to keep. I understand that's a concession they agreed to from music industry in order to roll out their "iTunes Cloud" stuff.
But yes, I bought more than 400 CDs new and yes, I still have all of them, and indeed, I never loaned them out to anyone else (no one asked).
-
After a while, I just set my iTunes audio files to read-only, and a to-do item from ten years ago is to go through and re-encode my physical CDs. Still haven't done that, yet.
Stopped using iTunes, instead. Still need to rip them discs...
> One is that it stores a "play count", the number of times you've played a song, directly in that song file. So it's always re-writing the source file of the track in the library.
It writes ratings to the file, but not play count. I don't believe any player does that.
Does it? My personal experience is that both play count and ratings are only stored inside the iTunes database and not inside the media files, too.
(And a cursory search does indeed bring up various people searching for workarounds to that in order to transfer the rating from iTunes to some different software…)
It’s the SI and ISO standard decimal separator. It’s also the world’s most commonly used decimal separator. (Everywhere but the US uses space as the thousand separator.)
I do wonder if this was done because Apple noticed people change Album artwork more than any other metadata, and wanted to leave room so such an action would generally be quick, even on slower drives.
At first I thought it was some sort of fingerprinting for anti-piracy (which seems extreme), but seems like its just to store potential future album art :)
> You can’t make changes to the metadata (e.g. change the spelling of an artist’s name) without reassembling the entire file to recalculate the offsets.
Isn't this a really, really infrequent operation? And aren't hard drives fast enough these days that reassembling the entire file doesn't even take that long?
With the advent of streaming music as popular as it is (and surely Apple had to see that becoming the trend), this seems largely like a non-issue from a storage perspective. In fact, it's really only costing Apple more money in bandwidth because they still have to transmit that data - though I'd bet it compresses extremely well.
And local music apps already build their own library so they don't have to scan all files for metadata, I don't see how the location in the file changes anything.
Devil’s advocate— 1.25gb additional storage for 2500 songs isn’t really with the effort to scam people, particularly when you’re already one of the most profitable companies in the world
The other advocate - that’s exactly how they’re one of the most profitable company in the world. Fleecing customer - by making them buy chargers, earphones (ensuring its wire gets dirty faster than any other), extra storage, open repair hostile devices. List is quite long really.
Besides that’s not the only place they eat storage. Actually it’s so opaque you don’t really know what’s “really” happening.
I'm still using the charger I got with my 2008 (pre-unibody) MacBook Pro for my 2014 MacBook Pro, which is perfectly operational in 2021.
I'm using a Spigen 48W USB-C + USB-A charger (and a 60W, 2m Baseus USB-C to USB-C cable) to charge my Macbook Air M1 2020.
I'm using a Belkin wireless charging mat for iPhone X.
So, no.
> earphones (ensuring its wire gets dirty faster than any other)
Making rugged plastics requires a lot of nasty chemicals. Either be nasty and durable, or be less nasty and be less durable.
Also none of my Apple devices' cables have frayed (incl. a 30 pin connector cable for my now lost iPod Nano 2G). How come?
> extra storage
I think their 200G plan has good bang for the buck, and even I can't fill it up with normal usage.
> Actually it’s so opaque you don’t really know what’s “really” happening.
System preferences -> iCloud provides pretty good rundown of which application is saving what, including the ones you can't see in your drives (i.e. game saves, application stuff, whatnot).
So while Apple is not the best company out there, it's not that worse either.
P.S.: Repairability is a problem. I have nothing to say about that, but at least battery change program is very good. They've even found and changed some damaged parts in my phone, for free, when I sent it in just for a battery change.
For chargers I believe they're referring to the fact that apple stopped including chargers with new iphones.
> Either be nasty and durable, or be less nasty and be less durable.
In my experience apple cables are both nasty and less durable than most other cables I own. I have multiple old 30 pin cables that are frayed at the connector, my old ipod headphones had electrical tape on the cable where the rubber was breaking, and all my apple cables are a gross yellow color, while similarly old non-apple cables I own are all still perfectly white.
> I think their 200G plan has good bang for the buck
It's average at best. It's the same price as google for monthly payments, but google is cheaper if you pay yearly.
I'd argue, though, in my experience working at a large tech company, the impact of a particular decision on the company's overall bottom line rarely comes into play for a decision like this. Instead, it's a mid-level manager who has been given a goal to increase a particular number, and will be rewarded based on quarterly performance against said arbitrary number.
Which manager put in their OKRs for the year to grow usage of Apple's Music Library using a primary metric of total storage consumed? Now, they didn't necessarily cause the insertion of blank space into the files, but they'll be damned if they let some engineer make a change that makes it significantly harder to hit their growth goal for the year.
Organizations don't cause things to be the way what they are: the dynamics between people inside of organizations cause things to be the way that they are.
Having briefly worked (as a grunt with no visibility into exec decisions) in the maelstrom that is apple, I think the only sinister thing that could be happening is no one has an incentive to fix things like this, they only get exposure / promotions on shipping new features or large quality bugs. And so if they weren't bothered by it, and no execs we're bothered by it, it would just go unfixed.
I'd say "dangerous" more than "useless." Big companies have plenty of incompetence to go around, but they also have plenty of malice disguised as incompetence, because children learn how to play dumb before they can talk and don't get worse at it when they grow up to become megacorp employees.
Sure this 1.25gb:2500files is not massive - But if this little conspiracy theory is true it is easy to imagine them doing this sort of file padding in other systems as well. A little bit here and there adds up quick.
"When you do things right, people won't be sure you've done anything at all."
Personally, I think it is likely the padding was created for future use. Maybe tagging or some other meta data.
My vague understanding (just from reading blog articles) is that iTunes became an insane metamorphization of the Eye of Sauron, an infinite pit of quicksand (blub blub) and Yes™, Inc (all things to all people).
Y'know, iPod (and later iPhone) sync, music browsing and purchasing, (payment handling), intelligent media organization (or some approximation thereof), working with giant media libraries...
I get the idea that everyone just feature-piled on top of the SoundJam code without fully scaling the design to the point of being able to coherently load-bear the additional functionality. So basically failure to adequately PM where it counts. That requires PM that understands engineering, which is sadly a tall order out of the gate and generally the most crucial before projects get big and the need for support becomes obvious.
*IFF* that happened in this case (I have absolutely no concrete idea), my question would be a) where "Steve really wants this - put a good PM on it" and/or b) where "this code explosion is killing everyone, we need PM help", ended up.
iTunes history anybody? (Genuinely curious, wondered for a while)
Curious as well. All I can add is that iTunes also needed to run on Windows (and I believe QuickTime had, in effect, the old MacToolbox running cross-platform). So add throw that on the pile.
iTunes had to move fast. There was not time, from what I understand, to step back and refactor it in any meaningful way.
Adding on to this, I heard a rumor (probably here) that iTunes for Windows was ported by Apple porting Objective-C and Cocoa to Windows, which is part of the reason it acted so funny. I have no idea if that's true though.
Safari for Windows was a side effect of the porting work that was happening on the Store to move away from a custom renderer to just using WebKit. And the reason iTunes on Windows was the way it was, and indeed, why iTunes on Mac stayed in this hellacious status for so long, is that it was built on _QuickTime_, which brought most of the Mac Toolbox over to Windows, and then -- eventually -- Carbon.
Objective-C and Cocoa for Windows dates back to the NeXTSTEP/OPENSTEP/Rhapsody days, but yes it’s true. Safari for Windows also utilized this, perhaps to a greater extent, because its UI looked significantly more like its Mac counterpart than iTunes did, to the point that even the text rendering was the same between Windows and OS X.
Yeah OPENSTEP Enterprise (OSE) was a commercial product sold by NeXT and briefly by Apple. Also later versions of WebObjects that ran on Windows provided much of the same tools.
Yea, the idea this is some concerted effort to sell more iCloud storage is kind of funny to me.
I'd love to see those discussions - "The OKR was to sell more iCloud storage so the Music team added half a meg of junk to every song file"
"Great!"
... I mean... weird shit has happened, but... I dunno, I've worked at quite a few companies and I just don't get the sense that that's Apple's MO. The bad publicity if it became public that they actually did this on purpose would be terrible, it seems pretty conspiratorial in this thread.
Also, Music is a real shit, buggy app that constantly annoys me.
It also falls apart any time there is a legitimate responsibility to do something. Negligence is malicious even if it’s convenient to blame incompetence.
Incompetent management wouldn’t be able to be consistently incompetent. That’s the big issue with incompetence.
Apple is unique in that their product marketing has a big role in the product design process. If you step back and look at what they do objectively, there are a few journey mapped “happy paths” that Apple designs to. They actively remove features that distract from the true paths.
It’s noticeable with them as Apple has no sympathy for legacy. But you can see it with other companies. An easy one is Google Drive/Docs - they make it easy to find a file, but didn’t consider that you may want to see the folder context of where the file is.
Sorta. I'm an iTunes Match person. They sort of integrated it as Apple Music lite.
They wrecked the UI completely. Which makes sense - I imagine that the program management of iTunes Match is probably a low priority. Basically, i use search history as a sort of playlist.
If all of the individuals are acting without malice, how could the institution act with malice? Isn't the institution just the collective will of the individuals?
I could see it playing out like this, incompetence causes the problem, incompetence fails to identify the problem, incompetence fails to fix the problem.
> If all of the individuals are acting without malice, how could the institution act with malice? Isn't the institution just the collective will of the individuals?
I think we are on an unproductive tangent, but these are quite important questions. An institution doesn't have its own free will, so I don't think I can claim it can have "intent to cause harm", but it can certainly cause harm.
An institution is not just a collection of people, it is also the business processes and policies. While these are created by people within the institution, they stick around over time. As the world changes around the organization, well intentioned policy can begin to harm.
Individual employees following policy and normal business practice (i.e. just doing their jobs) can behave maliciously without their own intent to do so (sorry, my hands are tied).
An institution needs not only non-malicious members, it needs to constantly re-evaluate its practices. Importantly, when harmful outcomes are observed the institution must re-evaluate and adjust, otherwise the harm will continue.
It's not enough for members to conduct their own behavior without malice, they must also observe outcomes and actively fight against the inertia of existing policy.
Not really. the article posits that this space was originally reserved for album art and when that was moved to external storage nobody remembered to remove the reserved space. OP is suggesting apple did this intentionally for some sort of profit motive.
> I believe the 500 kB block was originally reserved for album art. Apple’s servers would inject it on-the-fly into the metadata at the time of delivery.
This absolutely seems like the most reasonable explanation.
When I was ripping my CD collection, I religiously tagged the MP3 files with ID3v1. Later on, I had some tracks whose titles were just a tad longer than 30 chars so I had to use ID3v2 for those and I noticed the file size grew by _a lot_.
Frustrated by this, I opened them in a hex editor and learnt that ID3v1 was a fixed format of 128 bytes, but v2 was variable. I also found out that the software had added a 4KB zero-byte padding to the v2 tag, which was "necessary" because the tag is now at the front of the file, and this padding allows more tag data to be added easily later on.
I tried various ID3 tagging software at that time and all of them added a padding. So I learned about the tag format and wrote a tagger myself that didn't add any padding. It was a great learning experience, and I managed to shave those useless zero bytes from my MP3s.