Blog

OMG-WTF-PDF Dénouement

You may have heard something in the news about PDF recently… By the power of Google!

I recently gave this presentation at the 27th Chaos Computer Congress in Berlin.
For some reason, the slides never made it from Pentabarf to the Fahrplan.
(They should be here: http://events.ccc.de/congress/2010/Fahrplan/attachments/1796_27C3_Julia_Wolf_OMG-WTF-PDF.pdf Curerntly 404, not by intent.) So first order of business, here are the long sought after slides:
27C3_Julia_Wolf_OMG-WTF-PDF.pdf
(I have had so many requests for these.)

The talk was approximately a twenty minute summary of about 2,500 pages of dense technical documentation, twenty minutes of I wonder what Acrobat does if I feed it this., and a little bit of (incomplete) stuff about A/V programs —
because every talk like this has to mention A/V because everyone asks about it. Speaking of which: if anyone would like to perform a rigorous test of how well A/V detects obfuscated PDFs, please go right ahead; I have neither the time nor interest.

(That 2,500 is just for the ISO spec, PDF32000_2008.pdf; the Javascript Reference 8.1 from the SDK, js_api_reference.pdf ; and the XML Forms Architecture Specification 2.6, xfa_spec_2_6.pdf. I’m not counting the 3D Annotation specification, XMP specification, nor any of the font specifications.)

I was inspired by the 2009 work of Meredith L. Patterson, Len Sassaman, Moxie Marlinspike, Dan Kaminsky, Sergey Bratus, and any one else I may have forgotten, on ambiguities in ASN.1 and X.509 parsing.
As I read through the PDF ISO specification two years ago, certain oddities jumped out at me. I can’t possibly be the first person to have read the specification and noticed this stuff, right? Right?
Can you really just execute arbitrary programs from a PDF? [Answer: Yes (And more than one PDF reader does this!)]

While reading the ISO 32000-1 [PDF] document – or really any technical specification – what you really need to pay the most attention to, is what is not said. Not only is ISO 32000-1 absent of any formal language definition (BNF, etc.) but many possible glosses which can be formed that are not defined. (As it says right at the very beginning of the ISO 32000-1, there’s nothing in this document that defines whether or not a PDF file is well-formed or not.

It’s called Adobe Acrobat because it’ll bend over backwards!

Since I started presenting this, several unrelated people have told me that they were using foo-trick, or bar-trick for years, but I seem to be the first person to stand in front of a room, and tell people about it for an hour. (Well, that and the first to make hybridized PDF files.)

Considering the existence of PDF/A, a committee somewhere must have had the same ideas about tightening up the PDF specification.

Part of the original brainstorm for this talk, was to create a chart of features/quirks across several different PDF readers and versions.
That could have taken way more time, than I could have invested at the time.
If anyone would like to actually perform these tests, I encourage you to please do so, and publish your results. I’ll be posting the test files I used for my talk soon.


Live in Person



iSec Open Forum

I’m speaking at the iSEC Open Forum Bay Area next week. This is the info I have on it:


iSEC Open Forum Bay Area



DATE:

Thursday, February 3, 2011

TIME:

6:00pm-9:00pm

LOCATION:

Intuit Building 9, Cook Conference Room

2600 Casey Ave

Mountain View, CA 94043

Please visit http://www.meetup.com/iSECOpenForums/ or RSVP to rsvp @ isecpartners.com if you wish to attend!

***technical managers and engineers only please***

***food and beverage provided***


Troopers

I’m also speaking at TROOPERS in Heidelberg, Germany on Mar 28-Apr 1, 2011. It will be about PDF, but not the same stuff as 27C3.


Corrigenda



Adobe Reader X

I wrote the bulk of this talk back in May for PH-Neutral 0x7DA. Adobe Acrobat X
hadn’t even been announced yet. The CCC submission deadline was three months ago.
A month before my talk, Adobe Reader X is released.
So, I should have updated my talk to mention Acrobat X more prominently, rather than in passing at the end. However, I’ve done no testing with it at all.

If you haven’t heard yet, Adobe Acrobat X is running (most) of itself within a sandbox. Which is probably the only feasible way for Adobe to secure Acrobat. I tested my tiny Javascript-launching PDF test file in it, and still worked just like 9.0, and that’s about all I know currently about its parser.


About PDF Printer Engines

So, I did not say that you can scan a network from a printer with a PDF. I was speculating about just how much of the PDF spec a printer may implement.
If it did implement the whole thing, then crazy stuff like scanning a network via a printer would be possible, and I’d be very surprised if any printer maker would ever do that.

I’ve been informed by someone familiar with HP printers, that the HP PDF engine does not execute Javascript. I have no information about any other printer.


Not OpenGL

I reread the ISO specification, and found that it says that 3D data is encoded in the Universal 3D format (ECMA-363); Not OpenGL like I said.

Error in Presentation Tests

Paul Baccas has a pretty good summary of most public research on this kind of thing here: http://nakedsecurity.sophos.com/2011/01/24/review-omg-wtf-pdf/

He’s quite possibly the only other person on Earth who’s actually read through my test files, and actually found an error with one.

Most of my tests were written in one day. I think I spent maybe five minutes on the duplicate object tests, so messing up the startxref is no surprise.

This means that Slide 123, and Slide 124 in the
27C3 [PDF] talk
are the exact opposite of what they should say. The first Object will be used if xref points to it, otherwise if ref is broken, the last Object defined is used.
(This actually seems much more consitant with Acrobat’s other behaviors.)


Stuff I forgot to mention

PDF syntax seems to have been influenced by TeX

Office, OpenOffice, and iWork documents are ZIP files; Java ARchives are ZIP files. Use your imagination.


PDF/A

I can barely fit the information I’ve got into 50 minutes. Acrobat has a lot of code, I mean A LOT, A LOT. Load it in gdb sometime and just list the symbols. It’ll take about half an hour on a 2GHz Macbook.
That said, I should have at least mentioned PDF/A, and it’s cousin PDF/X.

Short summary (what I’d say in my talk): PDF/A is a stripped down version of the PDF-1.4 spec, with mandatory font embedding, and without Javascript and all that nonsense,
and tighter requirements on the PDF syntax itself. For example, the first byte of “%PDF-” must be at file offset zero. I don’t know how many readers enforce these things in practice.

The official ISO documentation is available for sale here: http://www.iso.org/iso/catalogue_detail?csnumber=38920

Currently, it costs about US$125 if you’d like to buy a copy. There are actually several documents, but I think this is the main one. (I haven’t read it.)

What of PDF/A? Attackers won’t use it. Currently, an attacker can buy advertising which points to some HTML/JS which launches Acrobat in the browser to display a malicious PDF. And then the user gets 0wned. Happens thousands of times a day. Well, that and you don’t need Javascript or other interactive features to exploit a PDF reader.

PDF/X is a stripped down (e.g. no Javascript) version of PDF-1.4 which has been created for the publishing industry. So there’s lots of stuff about CMYK color space, embedded fonts, and trim area verses bleed area. There are several ISO documents which describe PDF/X

PDF/UA is for Universal Accessibility, screen readers and the like. It should also make it possible to easily convert a PDF into a plain text document.

PDF/E PDF for Engineers, just read the Wiki page on it.

References:

http://en.wikipedia.org/wiki/PDF/A

http://en.wikipedia.org/wiki/PDF/X

http://en.wikipedia.org/wiki/PDF/E

http://en.wikipedia.org/wiki/PDF/UA

http://xmlgraphics.apache.org/fop/0.95/pdfa.pdf [PDF]

http://wiki.apache.org/xmlgraphics-fop/PDFA1ConformanceNotes

http://www.pdfa.org/doku.php

http://en.wikipedia.org/wiki/Portable_Document_Format


One More Thing

While Googling around to write this blog post, I discovered this: http://hul.harvard.edu/jhove/pdf-hul.html It’s a PDF Syntax Validator. I don’t know anything else about it other than what it says on that web page.


Frequently Asked Questions

I’ve got this dialogue going on in my head about many of the comments I’ve heard…


DIALOGO
SOPRA I
DOCUMENTO
FORMATO
PORTATILE


The Firſt Dialogue

INTERLOCVTORS.

Salviati, Sagredo, and Simplicio.





Sagredo: It has been evidently demonstrated that PDF sekurite is not good; What may be done in time to come? And forgive me if I continue in Modern English in order to save time.

Simplicio: You will be safe long as you keep patching your PDF software. And besides it will take a long time to learn how to use a new PDF reader.

Salviati: You will not be safe by just patching. Acrobat and Flash have had an actively-used-in-the-wild to install malware, zero-day, about every two months for the last three years. Sure, all PDF readers have vulnerabilities, and switching software is just exchanging one set of problems for another. However, most PDF readers won’t run Javascript or play Flash upon open; The attack surface is much much smaller. And really, a PDF reader is not hard to use. (Open file, read stuff, quit.)

Simplicio: Users should only open documents from people they trust.

Salviati: Most attacks performed via PDFs are drive-by attacks, that is, a malicious web page (sometimes a modified legitimate one, or a paid advertisement) instructed your web browser to open a PDF file in Acrobat. You don’t have a choice in opening the malicious PDF file.

Simplicio: Everyone should switch to using PDF/A!

Salviati: Yes, let’s have all the malware authors use PDF/A from now on!
This doesn’t help mitigate vulnerabilities like integer overflows in embedded image handlers.
You could get everyone to reject old-style PDF documents, and only accept PDF/A…
You’d need to convert every single old PDF document on earth, but keep in mind that documents that accept form input won’t function as a PDF/A. And you need everyone to throw out their old vulnerable PDF readers.

Sagredo: What’s a good PDF reader to use that’s not Adobe Acrobat?

Salviati: http://en.wikipedia.org/wiki/List_of_PDF_software

[Lacuna]




Appendix A

This is a list of every version of my talk presented to more than 20 people at a time.
It’s mostly the same talk each time, except with some new stuff added from the last time I presented it.

May 29, 2010 ph-neutral 0x7da     PH-Neutral2010.pdf
Sep 09, 2010 SEC-T 2010           Sec-T_2010_Julia_Wolf_final.pdf
[Video] Julia Wolf - PDF.mp4
Oct 24, 2010 ToorCon 12 San Diego Julia_Wolf_ToorCon12_OMG_WTF.pdf
Oct 27, 2010 SecTor 2010          Julia_Wolf_SecTor_OMG_WTF.pdf
[Video] SecTor 2010 - Julia Wolf - OMG-WTF-PDF.wmv
Dec 11, 2010 BayThreat 2010       BayThreat2010.pdf
Dec 29, 2010 (BerlinSides was almost identical to the 27C3 talk.)
Dec 30, 2010 27th Chaos Communication Congress 27C3_Julia_Wolf_OMG-WTF-PDF.pdf
[Video] http://media.ccc.de/browse/congress/2010/27c3-4221-en-omg_wtf_pdf.html
[Direct Link] http://ftp.ccc.de/congress/2010/mp4-h264-HQ/27c3-4221-en-omg_wtf_pdf.mp4
[Part 1/4] http://www.youtube.com/watch?v=4F2xMw3987I
[Part 2/4] http://www.youtube.com/watch?v=2zbWTUkrfJM
[Part 3/4] http://www.youtube.com/watch?v=fvUB3xcMavw
[Part 4/4] http://www.youtube.com/watch?v=k9gyaYebjyQ
[Live Recording] http://achtbaan.nikhef.nl/27c3-stream/releases/mkv/%5b4221%5d%20OMG%20WTF%20PDF/ [Matroska]



Appendix B

On Sep 9, 2010, just after my presentation at Sec-T, I was interviewed by a reporter. This was the result:
http://www.idg.se/2.1085/1.339673/adobe-reader-ar-en-oerhord-risk

Included here by permission of the copyright holder.

Adobe Reader is a tremendous risk

by Frida Sundkvist

Do you have Adobe Reader installed? Then neither your documents nor passwords are protected completely. Expert advice is to replace the PDF reader while Adobe solves the problems.

The last year has highlighted many security problems with PDF files. Often it’s enough that a user has a PDF reader installed to be targeted by an attack.

– PDF is the biggest threat today. You can do almost anything without being detected. Adobe Reader is a tremendous risk, says Julia Wolf, security researcher at Fire Eye.

Julia Wolf is in Stockholm to speak at the Sec-T Security Conference. She says that despite that, it is good that people are discovering the problems with Adobe Reader or other PDF readers.

– My advice is to uninstall Adobe Reader and select another PDF reader,
she says

There are major weaknesses in the pdf format which means that infected
files are difficult to detect and there are also vulnerabilities in Adobe
Reader. Since PDF reader is by far the most popular, hackers have put most
of their resources into targeting Adobe.

A pdf-attack often goes unnoticed. If the user has a PDF reader installed, it may be enough to accidentally go to the wrong website. Seconds later, your personal documents uploaded to a waiting third party. Your keystrokes are recorded and thus more people than you may know your password.

– I have heard many different amounts on how much money companies lose. The conclusion is that there is a lot, she says.

In May Julia Wolf experimented with rewriting a pdf file to see if antivirus software detected the infected file. Only nine of 41 anti-virus detected it. The remaining 32 anti-virus programs, didn’t detect it. No surprise, says Julia Wolf.

Thomas Kristensen, security expert at Secunia, is on the same track. If users in a company do not need very advanced PDF files, the company should
consider changing pdf readers.

– If you are a criminal and want to cause as much harm as possible, which route do you choose? In almost all cases, you choose Adobe Reader before for example Foxit, he says.

Two things will help if you still want to keep Adobe Reader. The first is according to Thomas Kristensen to disable two things: the Flash Plugin and Javascript support. The second he says is instructing employees to only open files they need in their work.

Per Hellqvist, security expert at Symantec, does not think that the solution is to uninstall Adobe Reader, but he understands that line of thinking. He argues instead that it’s most important to patch, that is to say to update the security settings in Reader.

– All PDF readers have security holes and it can take a long time to train users to use a new program, he says.

The only way to be protected from PDF attacks is to not have a PDF reader installed. Towards the end of the year, comes the next update for Adobe
Reader. This version will have isolated the program itself within a “sandbox” (see article above).

Photo Caption: [Some kind of Swedish idiom about poking fun at security, or saws, or something] Problems with Adobe Reader have been known for a
long time. “You can do almost anything without being detected,” said Julia Wolf, security researcher at Fire Eye, which calls on companies to use a
different PDF reader until Adobe sorts out its program.


Appendix C

I can read French better than Google Translate, so this is the relevant excerpt from http://blog.scrt.ch/2011/01/07/27c3-we-come-in-peace/.

The vulnerabilities of PDF


This last decade, the Portable Document Format, or PDF, is now clearly the most common format for the publication of electronic documents.


It is used in the publishing industry as a reference format as much for printing as for the publication of ebooks.


And the PDF reference implementation is still and always the popular Adobe Acrobat, in both versions, Writer or simply “Reader”.


Unfortunately, as Julia Wolf has mentioned in her presentation, PDF is a standard that has more or less been agglomerated over the needs of it’s original version, and Acrobat is the result of this organic evolution: more than 15 million lines of code, Acrobat is bigger than Mozilla firefox, bigger than the Linux kernel and the majority of the code was written in the ’90s, in Adobe’s secret laboratories.


The result is that it’s possible to easily fool Acrobat.


The syntax of PDF and along with other reasons that the standard has no explicitly described method for validation of PDF documents allows the creation of PDF documents that are at the same time executables, which contain malicious code or that pose as any of a number of data formats.


In a PDF, one can call system commands, execute arbitrary programs, form documents that display different contents depending on the PDF reader program used, and so on.


Julia Wolf is focused on Acrobat, because it is still today the most common PDF client, but the standard on on which these programs are based is sufficiently confusing and complex that the exploitation of vulnerabilities is possible with each independent implementation.


Appendix D

Stuff that should be here, but isn’t:

  • Testing Samples
  • PDF Portmanteau technical explanation
  • PDF Quine, with explanation
  • Misc. stuff, like that forwards-backwards-pages PDF file (It’s in the Test Samples)

Once I’ve organized it all, I’ll publish it on this blog. But I’m rushing to get this one [the blog post you're now reading] published.



FireEye Malware Intelligence Lab

Creative Commons License




Send Questions and Comments to:

researchfireeyecom


2 thoughts on “OMG-WTF-PDF Dénouement

  1. Dear Julia Wolf,
    I’ve just finished to read your presentation
    [PDF Obfuscation] Dec 30 2010 27th Chaos Computer Congress
    May I say that you made a great job? It was really interesting and well done: clear and exhaustive.
    clap, clap, clap
    Best regards

  2. A little over my head…. but going through the slides I laughed out loud at “Water is wet.” Thank you!
    I’ve been leery of Adobe since delving into page description languages back when a friend got an Atari laser printer. Haven’t paid much attention in years – you have given real-world reasons for my suspicions.
    It’s amazing that they’ve been able to cram all that vuln crap into only 15 million lines of code. [snicker]

Comments are closed.