Unlock the Power of Direct PDF Editing with WebViewer 10.7

Benchmark: How Reliable is PDF.js Versus a PDF.js Alternative?

By Adam Pez | 2019 Sep 13

Sanity Image
Read time

7 min

If your users share documents in a high-stakes, fast-paced environment, they may require their PDFs to open quickly and flawlessly. PDF.js, Mozilla’s open-source JavaScript library, offers one way for them to open and view their PDF files seamlessly in a website using a PDF viewer.

However, we recently surveyed 57 unique organizations that tried PDF.js and later decided to look for an alternative. And 15.8% cited failure to open files or browser crashes as a reason for switching to a PDF.js alternative.

To help you avoid making the same mistakes, we wanted to find out: what types of PDFs does PDF.js crash on?

Our research involved opening 1,663 PDF files in PDF.js. These documents included random PDFs from Google, as well as business documents, financial statements, construction drawings, college textbooks, and more.

What we found is that PDF.js will open 98.6% of PDFs found in the wild. However, some types of PDFs crashed or froze the browser more than others. While simple PDFs, like invoices, performed well enough, graphics-heavy documents tended to have higher failure rates.

Pie chart of PDF.js document open rate

(For more information on other topics from performance to supported functionality, read our comprehensive guide to PDF.js.)

Background

Copied to clipboard
Quote

PDFs are an incredibly complex file format; this is especially so given that a PDF can be generated a hundred different ways, all of which a renderer needs to handle gracefully.
– Developer, Linkedin

PDFs found in the wild come in all different sizes and compositions, from small and simple invoices⁠ — to massive reports and intricate designs shared in workflows across government and enterprise settings.

While simple PDFs may use few PDF features, more complex documents may make full use of the PDF specification to embed images with different compression types, optional content groups, transparencies, gradients, patterns, and more. Not all PDFs are equal, and therefore, they won’t all behave the same when opened in a JavaScript PDF viewer like PDF.js.

Quote

PDF is an incredibly complex file format—the specification is more than a thousand pages long, not including the extensions and supplements.
– Senior Developer, Dropbox

Another consideration is that your risk tolerance for documents crashing or freezing the browser will vary with the requirements of your users. For example, our AEC customers tell us that a JavaScript PDF viewer needs to be 100% reliable as their users are quick to reject anything less.

Quote

If at the 11th hour your takeoff system is not reading that last file, even if you rendered the last 10 out of 11 files perfectly, you can’t get your estimate. That person can’t do their job. Effectively, you could lose business by not being able to open one of these PDFs… It’s like getting it 99% right and 1% wrong⁠ — and you actually fail.
Tony Cornwall, Construction Computer Software

Quote

As soon as your tool is seen as not 100% reliable, even if it’s still 99% reliable, the customer is going to switch off and default to the next-lowest common denominator⁠ — the Adobe Acrobats or pen and paper.
– CEO, AEC Project Collaboration Software

What Our Customers Said

Copied to clipboard

Our customers reported reliability issues with PDF.js that caused them to seek an alternative.

Quote

We also tried PDF.js to render pdf using a blob object. It is working on iPad and iPhones with a few limitations like it is not able to open PDFs bigger than 100MB, and it doesn’t support pinch zoom.
– Developer, Fortune 50 Company

Quote

We are using PDF.js now as an embedded viewer for PDF documents in a single page application, and we are having some issues with crashing browsers and suspect issues with the viewer.
– CTO, Training & Compliance Software

Quote

At present, we’re working with open-source PDF.js which is great for the 95% of PDFs, but the other 5% is critical. Larger PDFs are tricky.
– Co-founder, eDiscovery Software

Evaluating PDF.js Reliability

Copied to clipboard

To understand these issues better, we opened 1,663 PDFs using Chrome 76 on a new laptop and the latest version of the PDF.js demo viewer (v2.3.146).

These PDFs included:

  • AEC drawings submitted by developers as part of the project permitting process for the City of Vancouver.
  • Geospatial PDFs downloaded from the US Geological Survey.
  • Fortune 100 text-based filings for the Securities Exchange Commission available on SEDAR.
  • Government Forms including court forms, police forms, and tax documents from the UK, Canada, and the US.
  • Magazines downloaded from freemagazinepdf.com
  • Scientific documents from the open-science repository Zenodo.org
  • Model Renderings including CAD-based PDFs from Grabcad.com
  • College Textbooks from a variety of websites.
  • 200+ random PDFs taken from Google via a “filetype: PDF” search.

Note: Even though PDF.js may open documents, it may not render content quickly or accurately. For this benchmark, we only looked at whether documents would crash or hang the browser.

The Results:

Copied to clipboard
PDF open rate by document type

Documents such as text-based financial filings, government forms, e-magazines, textbooks, and scientific reports opened in PDF.js without any apparent difficulty.

Other documents did not perform as well, particularly graphics-heavy documents.

For example, Architecture, Construction, and Engineering drawings showed a 1% failure rate, while PDFs from Grabcad.com performed the worst, with as many as 1 in 10 (10%) failing to open or crashing the browser. These were PDFs generated from models using a variety of different CAD applications.

Random PDFs found on Google also had a failure rate of 1%. These findings are consistent with an older yet similar PDF.js benchmark, published on Mozilla Hacks.

This study looked at about 7,000 PDFs taken from Google and found 0.8% (roughly 1/100) would crash the browser with PDF.js. It also noted 2.8% of documents produced a “less-than-optimal” UX and that PDF.js had difficulty with graphics-heavy documents.

What Failure Looks Like

Copied to clipboard

Documents that crashed PDF.js would do so in a couple common ways:

First were corrupted documents. PDF.js would throw an exception and close them right away:

Documents that crashed PDF.js would do so in a couple common ways. One is shown here.

Many other documents, however, crashed due to memory issues. PDF.js simply could not allocate memory efficiently enough, especially for graphics-heavy PDFs.

As a result, the browser would throw an exception after trying to load the file:

The browser throws an exception after trying to load the file

Other times, PDF.js would open the file⁠—only to hang indefinitely when rendering the page:

PDF.js opens the file⁠—only to hang indefinitely when rendering the page

Explaining Memory Issues

Copied to clipboard

As illustrated on the PDF.js GitHub and elsewhere, PDF.js may not allocate memory efficiently, especially on certain browsers, such as when it needs to render a large embedded jpeg, or when rendering an especially large and complicated page.

There are a couple reasons why this may happen.

Lack of Support for Canvas Tiling

First, large canvases are essentially huge bitmaps and thus consume lots of memory. This is especially true when one interacts with (zooms into, pans, and scrolls across) a document, and PDF.js is forced to re-render complicated canvases at a larger size and higher resolution.

Lack of support for canvas tiling described in pdf.js community post

Due to lack of support for canvas tiling that would break up rendering into smaller manageable pieces, PDF.js renders page content all at once onto a single large canvas image, which in some cases, may be larger than what the browser permits or consumes too much memory.

PDF.js therefore struggles to handle larger design documents, maps, and blueprints, especially on mobile browsers where memory constraints are tightest.

Lack of Support for OCG Layers

Another issue pertains to large PDFs with many layers, such as the Geospatial PDFs with a 3% failure rate.

Geospatial PDFs, for example, may include a street or topographic vector layer over top a satellite imagery raster background. The latter is switched off by default to ensure readability and performance.

But since PDF.js does not support OCG layers, it will render every layer⁠—even layers switched off by default.

PDF.js renders layers switched off by default

And with an especially big and complex map, it can quickly hit a wall.

How You Can Test Your Documents

Copied to clipboard

We encourage customers to test their own PDFs in a JavaScript PDF viewer before making a decision, as your experience will vary considerably with your documents and across different browsers, including mobile.

First, grab a selection of representative files. (Our customer Construction Computer Software gathered over 150 demanding AEC drawings from their users which they later used to evaluate different PDF viewers for JavaScript, including Apryse SDK.)

You’ll want to see whether your files open in the PDF.js demo viewer on the browsers and devices you expect your users will prefer. You’ll also want to interact with these documents to test whether PDF.js viewer options deliver the desired UX.

Try scrolling and panning across a document, and zooming into and out of areas where you expect users will want to read small text or perform measurements.

If after 20-30 seconds of heavy interaction, performance is still relatively smooth and the browser hasn’t crashed, then PDF.js may work for your PDFs.

However, if your browser hangs or crashes, or if the UX degrades considerably⁠ — you may wish to consider alternatives.

Next Steps

Copied to clipboard

Once you’ve run your tests and if you haven’t experienced any problems, you could try PDF.js. If you encountered issues that may be of concern, however, you could consider a more robust commercial solution, like Apryse WebViewer.

We always appreciate feedback on our blog. If you have any questions, don’t hesitate to contact us directly.

Sanity Image

Adam Pez

Share this post

email
linkedIn
twitter