Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
An even slimmer pdf.js (blog.mozilla.org)
166 points by AndrewDucker on June 16, 2014 | hide | past | favorite | 72 comments


The reduction in memory usage seems impressive for large PDFs, but peak usage of 695MiB to display a 14-page pdf [1] still seems like loads of memory. On my machine, a native reader (Foxit) [2] takes ~30MiB.

[1] http://cdn.mozilla.net/pdfjs/tracemonkey.pdf

[2] http://www.foxitsoftware.com/Secure_PDF_Reader/


1. You're using a completely different system (at all levels of resolution) so the comparison hardly makes sense

2. The browser measurement includes the browser's overhead. On my non-retina MBP a freshly started Firefox 30 yields an RSS of 240MB (Preview — the system's default image and PDF viewer — is 16MB and grows to 44MB on opening the Tracemonkey PDF). And that's with just Firebug and the WD toolbar, we've got no idea what manners of extensions or building options OP has. And for reference, opening the Tracemonkey PDF grows Firefox's RSS from 240 to 350MB.

3. The article's measurements includes rendered and cached pages, opening the TraceMonkey PDF grows my Firefox's RSS to 350MB but browsing to the end (and ensuring every page is loaded) ends up with a 480MB RSS, a behaviour which also exists (to a lower extent) on Preview, which grows from 44MB RSS to 85MB (then back down to 62MB)

4. Finally, the article specifically notes that "pdf.js still uses substantially more memory than native PDF viewers."


It does make sense when you look at it as a user. The result is the same.. requirements are not. With one you need a dedicated application, with other you need excessive amounts of memory.

Also, there's about:memory available in firefox for quite detailed memory break-down.


> It does make sense when you look at it as a user.

No. You misunderstand point 1, I'm not saying comparing Firefox to Foxit as PDF viewers is nonsensical[0], I'm saying GP comparing the number he gets on his machine to the numbers provided by OP is not sensible: OP uses 64b Firefox on 64b OSX on an rMBP (the retina part quadrupling the size of the render target if software is compatible) while GP uses unknown bitness Foxit on unknown bitness Windows on unknown hardware.

That's a fractally nonsensical comparison, just about everything which can be changed is changed. A somewhat real comparison would require that GP install a FF33 nightly on his machine and look at what happens there, and only looks at the difference between browser open and browser open + PDF. That would give the browser and PDF.js's overhead over opening PDFs in a native client.

> Also, there's about:memory available in firefox for quite detailed memory break-down.

Which is not what's in OP's graphs, not what GP compares his numbers to, and not relevant to my comment.

[0] although it kind-of is, firefox bundles a PDF viewing utility for convenience, it's not a PDF viewer (unless you're in firefox os maybe?)


<fanboy>With MuPDF it peaks at 15MiB. http://www.mupdf.com/ </fanboy>


And it is also blazing fast.


How's compatibility?


All I can speak to is that I've never had a problem reading. I didn't even realize the recent versions support forms until this thread.


Good to know! I'll try it for a while, see how I like it.


I don't want to sound negative and I certainly don't want to start a war here. I also really appreciate what Mozilla is doing and I am an otherwise happy user of FF on both desktop and mobile.

That said, FF memory usage (not just pdf.js) is sometimes just insane. Granted, I only have 2GB on this Linux, but FF invariably eats most of it (with just 10-20 tabs) if left running for long periods. Interestingly enough, once I restart FF and reopen the same tabs, memory usage is far lower. Memory leak? Inefficient caching? Who knows...

Every now and then there is a version of FF that claims to be using less memory, but in my experience the differences have been negligible. So as far I am concerned memory friendly FF is something like fast Java... I am still waiting to meet one of these beasts in the wild. :)


Sorry to hear you're having problems. We need more data to take any kind of action. Are there particular sites that cause problems? Do you have any extensions installed?

Even better: can you visit about:memory and use the "Measure and save..." button to get a snapshot of memory usage when it gets high? You can then either email it to me or (preferably) file a bug at bugzilla.mozilla.org and CC me. Bugzilla can be intimidating, but don't worry too much about getting every field right. Just make sure the description is clear.

If all that is too hard (hopefully not!), you could try just resetting Firefox: https://support.mozilla.org/en-US/kb/reset-firefox-easily-fi.... That's not a guaranteed fix, but it does help in a lot of cases.


Thanks for taking your time to answer, will try to provide as much info as possible.

I didn't notice any particular pages which would cause problems, but OTOH there must be a pattern to the pages I visit. I am using FireBug, Abduction, Add to search bar, Greasemonkey (disabling now - didn't know I had it enabled), NoScript and Pencil.

Since this happens during normal course of work I would rather send memory usage patterns privately if that's OK with you. The challenge will be to catch FF when the machine hasn't started swapping yet (by then I can count myself lucky if I can still issue killall firefox). Will try to send you "measure and save" output as soon as possible. Thanks!


Greasemonkey scripts often cause bad behaviour, though it of course depends on the particular scripts enabled. Firebug can also slow things down a lot, even if it's not enabled. (Apparently Firebug 2.0 is better in this regard.)

Selectively disabling extensions isn't the most fun game in the world, but it's often effective in working out if one is causing problems.

And private email for the about:memory data is fine. For my email address, take my HN username and append @mozilla.com.


I second other people. A while back FF ate ram, but nnethercote and other mozillians have constantly chased sources of memory waste and change memory reclaim heuristics to the point where, to my own surprise, Chrome is now the most memory demanding browser (on my usage, lots of tabs).


Indeed, tabs have much higher overhead in Chrome than Firefox.

Firefox is possibly the most memory efficient browser out there nowadays. It's just that all browsers eat tons of memory, out of necessity.


You can just use the latest Nightly or Aurora to see how fast these things are compared to yours. I had a same problem, Firefox being slow and clunky so I switched to Nightly and it did really increase the speed.

I think Firefox is getting faster but fact is that users also acclimate to new speed and expect it to be faster.

EDIT: Right now with about 10 tabs open FF is eating around 580 MB of RAM.


Most browsers use this amount of memory. Between JITs, lots of overlapping images, and dynamic content, webpages do actually take a lot of memory to display. In "about:memory", thee is a "Minimize Memory" option, which can work quite well.

Personally I have about 70 tabs open, with 1.6GB of memory. This is quite good compared to other browsers I've used.


Thanks for your realism and understanding! It's nice to hear something other than complaints every once in a while.

The things you mentioned all contribute, but most of all, people greatly underestimate the amount of memory required to run JavaScript code. Not so much the JITs, which don't actually use that much, but just all the objects and arrays and strings and closures.

Like all high-level, dynamically-typed, garbage collected languages, it uses a fair amount of memory. And the amount of JS code running on many sites is enormous. But people always ask "why is the browser using so much memory", not "why is this site using so much memory". Maybe because per-site memory measurements aren't prominent enough.


Sorry if it sounded like I am complaining, the fact that I still use FF even though it cripples my VM from time to time shows that I am a fan. ;) It is a bit annoying though.

> But people always ask "why is the browser using so much memory", not "why is this site using so much memory". Maybe because per-site memory measurements aren't prominent enough.

Do you mean to say these measurements exist? I couldn't find them in about:memory (per-site, that is). That would be an awesome feature! I could easily kill the tabs which are eating up my RAM if only I knew which ones were doing it. Better yet, I could see if I am being reckless with memory in my own webapps.

The issue I am having is not that browser uses too much RAM. The issue is that I (as a user) am not aware of high memory usage until it is too late. Even worse, I have no clue what caused this state and have no option to learn from previous "mistakes" (if some site is using too much RAM, I could avoid it or close it when not needed anymore). So yes, per-site (per-tab?) memory usage indicators would solve my issue completely.

Thanks again for taking your time to answer, and keep up the good work! You have a great product there.


In fact, "about:memory" does show separate memory allocations per tab. Look under "explicit", "window", and you'll find a thing named "tiny", this contains most of the windows (it's a tree, click to expand). There are also a number of tabs not in the tiny branch, which are considered a little too big.


Thanks, but if I understand correctly, not all tabs are there and this is not their total memory usage. If this is so, then this is not what I meant. I meant a single indicator of tab memory usage in a similar way PSS is calculated [1]. That is: all the memory resources this tab uses (including JS) plus appropriate share of the shared resources.

From user's point of view I don't care HOW the page/tab uses my memory - I just want to know how much of it it used.

[1] http://lwn.net/Articles/230975/


All the tabs are there. Look at the "top(...)" entries just below "window-objects".

Most of the per-tab memory is correctly reported there. Images are the main thing that isn't reported on a per-tab basis.


Uninstall your extensions, particularly Ad Block Plus, and see if the problem recurs:

https://blog.mozilla.org/nnethercote/2014/05/14/adblock-plus...


At which point, I have no reason to be using FF!

FF's plugin system is one of its huge advantages. With IE11 at a "good enough" state now, if I'm not installing ABP I'm not going to bother switching away from IE on my machines!


I'm not saying uninstall forever, simply to do so before making claims about Firefox's performance, stability, memory usage, etc. – basic debugging 101 stuff about removing variables.

That said, extensions are at the bottom of my list of reasons to use Firefox: support for the latest web standards, performance, security, text rendering quality, usability of the core UI, quality of the developer tools, etc.


Keep in mind that Adblock Edge is (at least from my point of view) better than ABP


Honestly, Mozilla should put together an official version of this plugin, since this feature is absolutely essential to browsers nowadays.


I don't use it at all and rarely notice a problem. Disabling Flash solves the performance / stability issues and I simply don't visit ad infested sites. Ad blockers are a losing game, as demonstrated by the rise of increasingly spammy content, and the only way to change that is stop supporting sites which don't respect you.


Or just create and try a brand new profile for some time.


I also have this problem, but on a Windows laptop with 2GB. However, Firefox has recently been crashing several times per day, which sorted out the long-running memory problem but rather inconveniences usage.

It's a shame because I really do prefer it to the other browsers, for a variety of reasons.


As I said to the parent: please report your problems. Please include about:memory data. Vague complaints in online forums don't help us fix things.


Is firefox actually using memory in such a way that other programs can't use it, or is it simply making efficient use of memory but caching everything it can?


VM starts swapping so much that it is not usable anymore.


Ah, okay. By the way, I was just fiddling around and found "about:memory", which contains a lot of interesting information about memory usage and a couple tools for cleaning up memory. Might be interesting to look at when the browser starts bogging down your system.


"about:memory" is probably great, but it is too detailed for someone who is not familiar with FF internal architecture. The way I see it it is mainly useful for FF developers.

I like "about:addons-memory" (available if you install https://addons.mozilla.org/en-US/firefox/addon/about-addons-...).


pdf.js uses HTML canvas elements to render each PDF page

If it's doing this for text or vector graphics content, then I'm of the strong opinion that they're doing it very wrong - HTML already has facilities to render text, and browsers support SVG for vector graphics. The canvas elements should be used for bitmap images only, since that's what they were designed for.

Edit: this thought occurred to me because the Chinese site Baidu has an online document viewer which basically converts uploaded PDFs into HTML and does it without needing to canvas everything; here's an example:

http://wenku.baidu.com/view/544340cea1c7aa00b52acb38.html

In my Chrome it takes ~60M sitting on the first page, complete with all the other pieces of the page (including ads). Scrolling through all the pages of the document, it ends up at ~130M.

Here is the original document:

http://www.promelec.ru/pdf/MBI5030%20Preliminary.pdf

When it is opened in PDF.js (v1.0.277) to the first page, I see ~95M and scrolling through all the pages makes it peak at ~270M; and this is just for the PDF viewer only, so there is clearly much room for improvement.


Sorry, but this is definitely a case where code speaks louder than words. PDF.js was a miracle for its time, and the guys behind it definitely aren't amateurs.

The PDF standard is far richer than the SVG object model, and not to mention SVG performance in just about every browser except for Chrome has always sucked balls.

Using canvas seems a solid trade off - pay upfront with RAM in order to consistently render and cache a page once, or try to jerry-rig some nasty translation to SVG that's painfully slow to zoom or scroll, and no longer renders consistently across browsers.


I'm not doubting that PDF.js is an amazing work of effort -they have essentially replicated the entire PDF rendering engine - but that considering a different approach, with a different set of tradeoffs, could yield a solution requiring far less memory and still have practical use.

When considering memory, keep in mind that using too much - combined with every other process the user could be running - means a higher chance of needing to use the swapfile, and then any performance advantage otherwise gained disappears rather quickly. It should be noted that browsers already cache rendered page images (from HTML) in memory and are very good at that, along with rendering HTML.


> SVG performance in just about every browser except for Chrome has always sucked balls.

That seems like a self-fulfilling prophecy to me. If it's broke, fix it. Don't do clunky workarounds that mean that it's less likely to be fixed in the future.


I don't agree with you. HTML has some facilities to render text, but I expect it's an intractable problem to use them to accurately render a PDF. SVGs might be easier, but I'm not sure they have the required facilities to allow for accurate PDF rendering either.

IMO, if Mozilla chose to use Canvas then there's a good reason for it.


So your argument is that because Mozilla does it they must have a good reason for it?


No - the argument is that it's likely pretty difficult to implement an accurate PDF renderer with HTML and SVG. There might be ways around that, but they folks at Mozilla are pretty smart and very aware of the limits of their technology, so if they decided that Canvas was better, I have no reason to argue with that - especially given that there are obvious reasons that might be the case.


The argument is that you shouldn't unthinkingly assume they don't have a good reason for it.


It's a good thought, and it does seem a much better way to go at face value. However, I strongly suspect there are good technical reasons why you basically need to use canvas, to get the complete, pixel-perfect control over the layout and font rendering that you'd need for PDFs.


to get the complete, pixel-perfect control over the layout and font rendering that you'd need for PDFs.

I suspect this is part of the problem too - the desire to render every pixel of fonts the same as the native application. But honestly, 99.9% of the time I'm reading a PDF, I don't care if each pixel is in exactly the same place as it would be in some other reader application so long as I can read the content, and would be willing to sacrifice that if it meant a drastic decrease in memory consumption (and possibly better performance.) That doesn't mean a "precise mode" is a bad idea, so maybe make this an option - you can either have pixel-perfect rendering but use more memory and CPU, or have a "close enough" rendering that doesn't.


"But honestly, 99.9% of the time I'm reading a PDF, I don't care if each pixel is in exactly the same place as it would be in some other reader application so long as I can read the content"

No, you don't actually want that. I've seen this in Linux where a PDF gets rendered with the wrong font because the correct one wasn't on my system or embedded in the PDF. It becomes illegible. PDFs are deeply pixel perfect in specification; you can't just decide to not worry about that anymore and expect anything legible to come out the other side. It is not HTML under the hood; it's a printer format under the hood.


Spot on. Pixel-perfection is probably the single most compelling thing about PDF, and it certainly is a fundamental design goal of pdf.js.


If you don't care you should feel free to replace pdf.js with a degraded version, but I'd expect something which claims to render PDFs to actually do that.


Seems to me that the development time spent maintaining two different modes might be better spent simply improving the efficiency of the one correct version.


It’s worth pointing out that an accessible content structure is presented to assistive technology, and you can also find-in-page without any problem.


It already appears to use html to display text. View a pdf.js demo (google pdf.js and click demo button on project page). Select some text and inspect it in your browser dev tools. It's a div.


No, it isn't. The div that you see is probably inside an element named textLayer, which is there to enable copy-pasting. The actual rendering is on a canvas.


The example is quite illustrative of the difference in what is possible in either method. Figure 1 is not displayed correctly in baidu's viewer, and it's not a matter of details like line width (which are present and quite obvious), but it also fails to display some content (vertically oriented text).

See http://imgur.com/a/khf0F


I don't see any reason why canvases should use more memory than content in general. Content uses content layers to store pixel data; canvases use canvas layers. Either way, the image data is only stored once.

It comes down to tuning more than anything fundamental.


The current release is v1.0.68, v1.0.277 is a pre-release, so this will probably be in the next release or the one after that. ( https://github.com/mozilla/pdf.js/releases )

I have recently switched my project from v0.8.1013 (from feb 2014) to v1.0.68 and rendering times have improved enormously. Good to see that there are more improvements coming.


Useful reminder: you can set "pdfjs.disabled" in about:config to turn it off and redirect PDFs to a native app of your choice.

The concept of rendering PDFs in the browser isn't fundamentally bad, but large files crashing the browser is a disaster. This happened with Adobe as a plugin and it continues to happen.

Now if someone can provide a means of un-embedding streaming video as well, that would be very useful.


This is pretty risky as native PDF readers are among the most targeted malware infection vectors. At least make sure you don't have Adobe Reader if you decide to risk this.


> you can set "pdfjs.disabled" in about:config to turn it off

Already do that. Bonus: I'm on a laptop intermittently without internet access, and can open up PDFs to read offline without having them be lost when my browser crashes.

> Now if someone can provide a means of un-embedding streaming video as well, that would be very useful.

Really hard to do, unfortunately. For HTML5 video, maybe (and I'd love it if someone came up with something of the sort), but for flash video the problem is that there isn't really any standard for flash video players, so you can't really parse the flash file to figure out what video to play without running it.

That being said, maybe some sort of local proxy that recognizes when a video file is requested by flash and reports a 404 to it, with a popup "do you wish to open this link in a local player". But that way lies madness.


This could, perhaps, initially be implemented as a plugin that inspected/listened for link clicks and forwarded them to your application of choice. I believe VLC is capable of opening streams without too much hassle (not sure about other players).

Sounds like a fun weekend project, at any rate!


VLC's stream handling isn't the best, though.

(VLC uses a fixed-length buffer, and requires the buffer to be full before it starts playing. Not the best for an iffy connection.)

That being said, worth a shot! (Not to mention that a plugin could redirect to an arbitrary application without too too much hassle)


Just a hint: "pdfjs.disabled" is hard switch off and it makes it harder to recover native PDF plugin viewing capability. Switching using Preferences/Options->Applications makes it easier. Try it and stop complaining that you don't have a choice :)


So.. the author is focusing on metrics - the memory usage. Basicly what's being done is release the caching sooner. Well what if a user's usecase scenario is such that he's looking at 2 pages after another a few times which have 10 pages in between.. This would cause a lot more rerenderings than before, making CPU go up. I'm not sure if I want to trade memory for CPU when running on battery


Software is full of trade-offs. If slightly pessimizing an unusual case is the cost of drastically improving an extremely common case, then I'm happy to do that.


Switching between two non-consecutive pages in a pdf is not an unusual case.


Switching by scrolling, or by jumping directly to the pages using a page selector? The latter case will be handled fine by a 10 page cache.

Anyway, I think this fear is overblown. The CPU cost of rendering a single page isn't that high.


Maybe Chrome team should have this MemShrink meeting every once in a while. At the current state it will eat every memory available be it 8GB or 16GB system.


I wonder if it is feasible to compress cached canvases, either as part of pdf.js or natively in firefox.


Out curiousity of the process: why wouldnt something like this show up in a earlier release?


Likely because it follows the highly organized release cycle Firefox has adopted a couple of years ago. If the new pdf.js is included right now in Nightly, which is necessary in order to verify its integration in the browser, its interaction with other elements and to ensure a long enough test phase on rendering before it lands on release channel… well, if it is right now in Nightly, let us do the math:

* Firefox (release channel) is now 30 * Beta is 31 * Aurora is 32 * Nightly is 33

That's why.



This incident should serve as a very good lesson to those in the Firefox community who often claim that memory and performance problems don't exist, even after numerous users report experiencing such problems.

Such memory and CPU consumption problems are often blamed on vague "third-party extensions", even when they happen with fresh installations on systems that have never before had Firefox installed. Or they're otherwise blamed on the user somehow, even when the user is engaging in a perfectly reasonable workflow.

And even if the problems don't happen on one system, they very well could be happening on another system, as this very incident shows quite well.

This part of the article is particularly relevant: "Shortly after that, while on a plane I tried pdf.js on my Mac laptop. I was horrified to find that memory usage was much higher than on the Linux desktop machine on which I had done my optimizations. The first document I measured made Firefox’s RSS (resident set size, a.k.a. physical memory usage) peak at over 1,000 MiB. The corresponding number on my Linux desktop had been 400 MiB!"

This matches very well with what so many Firefox users describe as happening to them. The memory consumption ends up skyrocketing to well above what it reasonably should be. Gigabytes of memory are unjustifiably consumed.

Perhaps now instead of ridiculing or ignoring people who report such issues, those in the Firefox community will perhaps do the responsible thing and take them seriously. It should be very obvious now that memory consumption problems can happen on one system, while not happening on another.


Hi, Pacabel!

Can you give specific examples of this ridicule? While I don't claim that everything is perfect, my experience is that for the past few years Mozilla has taken Firefox performance issues extremely seriously.

For example, I started a project called MemShrink exactly three years ago to reduce Firefox's memory consumption. In fact, I even wrote a blog post today that discussed the major improvements from the past year, and what areas we still fall short on: https://blog.mozilla.org/nnethercote/2014/06/16/memshrinks-3.... And if you want more detail about this particular effort, you can read the 70+ status reports I've written in those three years here: https://wiki.mozilla.org/Performance/MemShrink.

And since you mentions extensions, you could also read about how we solved the vast majority of the memory leaks that involved extensions here: https://blog.mozilla.org/nnethercote/2012/07/19/firefox-15-p.... That was almost two years ago.

As for pdf.js, here's the bug I filed last year about it using too much memory: https://bugzilla.mozilla.org/show_bug.cgi?id=881974. The title of the bug is "pdf.js uses too much memory". No ridicule or ignoring the problem there.

MemShrink is just one of numerous performance-related projects that have been undertaken at Mozilla over the past few years. I'm sure with a little Googling you could find out about some of the others.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: