Malicious origami in PDF, Hacking and IT E-Book Dump Release
[ Pobierz całość w formacie PDF ]
J Comput Virol (2010) 6:289–315
DOI 10.1007/s11416-009-0128-2
ORIGINAL PAPER
Malicious origami in PDF
Frédéric Raynal
·
Guillaume Delugré
·
Damien Aumaitre
Received: 4 January 2009 / Accepted: 30 July 2009 / Published online: 26 August 2009
© Springer-Verlag France 2009
Abstract
People have now come to understand the risks
associated with MS Office documents: whether those risks
are caused by macros or associated breaches. PDF docu-
ments on the contrary seem to be much more secure and
reliable. This false sense of security mainly comes from the
fact that these documents appear to be static. The wide-
spread use of Acrobat Reader is most likely also account-
able for this phenomenon to the detriment of software that
modifies PDFs. As a consequence, PDF documents are per-
ceived as images rather than active documents. And as every-
one knows, images are not dangerous, so PDFs aren’t either.
In this article we present the PDF language and its security
model, and then the market leader of PDF software, Acrobat
Reader. Finally, we will show how this format can be used
for malicious purposes.
don’t connect to the Internet» and «documents are com-
pletely static»: no risk whatsoever!
What if this weren’t exactly true?
Moving on: What is an origami? The term comes from the
Japanese for the art of folding (
oru
) paper (
kami
). The idea
is to give shape to a piece of paper by folding it, preferably
without using any glue or scissors. Origami designs are based
on just a few different folds, but they can easily be combined
to obtain a huge variety of shapes (Fig.
1
).
The same applies to PDFs. When we consider the way
Readers
manage this format, and the intrinsic functionalities
of the language, a new horizon appears: Use PDF against
PDF. This process is more tedious and takes more time and
effort than discovering a
0-day
attack, but such attacks are
corrected much more quickly. On the contrary, attacks on
design
ensure longevity, and sometimes they cannot even be
corrected.
1
From an historical point of view, it is interesting to see that
this format is spreading more and more, mostly supported
by Adobe. Software associated with PDF has often been
searched for flaws, yet the very first study on the risks implied
by this format/language appeared in 2008 [
1
,
2
]. Its authors
first propose a simple phishing attack by sending an email that
reproduces a bank portal, and then a targeted attack divided
into two steps, using a k-aire code. Soon after these results
were published, we began our own investigation on this topic.
Our approach differs in that we studied the standard while
asking 5 questions, reproducing the logic of an attacker:
1 Introduction
User awareness on the issue raised by macros has increased
due to several viral attacks and repeated critical flaws in all
software included in the office suite: people have naturally
grown suspicious when it comes to MS Office documents.
PDF files, on the contrary, are considered secure and
reliable. Indeed, «they do not contain any macros», «they
B
F. Raynal (
)
MISC, Paris, France
e-mail: frederic.raynal@sogeti.com; fred@security-labs.org
–
How can a PDF-led attack be hidden?
–
How can a denial of service be generated using PDF?
F. Raynal
D. Aumaitre
Sogeti/ESEC, Paris, France
e-mail: guillaume.delugre@ensiie.fr; guillaume@security-labs.org
D. Aumaitre
e-mail: damien.aumaitre@sogeti.com; damien@security-labs.org
·
G. Delugré
·
–
How can a PDF-based communication channel be set up
between a target and an attacker?
1
To this respect the DNS attack in summer 2008 is typical.
123
290
F.Raynaletal.
Fig. 2
General structure of a
PDF file
File
Header
Object
Object
Object
Object
Fig. 1
An origami X-Wing craft
–
How can the PDF format be used to read/write on a target?
Cross Ref.
–
How can arbitrary code be executed on the target using
the PDF format?
Trailer
Along these lines, we managed to propose two operational
scenarios built around the PDF format/language. This led to
a publication at the end of 2008 [
3
].
We have since continued these studies. First we modified
the previous five questions that guided our thinking, which
evolved over time. We also took into account the changes
to the standard, published in November 2008 [
4
,
5
]. Then,
though we had tried to think it through without consider-
ing the reader used to visualize the PDF file, this time we
tried to dig into the universe of the
leader
, Adobe, with its
main product
Reader
. We also distinguished the visualisa-
tion environment, to investigate into the changes in behaviour
that occur when a PDF file is visualised in a Web browser’s
plug-in
. This time we want to present attacks that are based
on PDF, but not limited to it. This is why we wanted to under-
stand the «environment surrounding the PDFf», to make our
actions as simple and efficient as possible.
This article summarises all of this research. Some results,
though they may have been presented elsewhere, can also be
found in these pages. If they were already presented in the
slides, the complete details are available here.
To lead you into the PDF universe, we shall start with a
presentation of the main issues, from the file format to the
nature of the elements it is made of (objects). In the sec-
ond part, we will tackle security management from the PDF
reader’s point of view. Then we will delve into the standard
by examining the potential of the language from an attacker’s
point of view. The format, however, is nothing without the
tools for processing files, which is why the subsequent sec-
tion is dedicated to Adobe’s world (or at least a small part of
it), and mostly focuses on its software Reader. Finally, two
offensive scenarios using the format will be presented.
2
2 PDF: an overview
The first version of the standard that described the PDF for-
mat dates back to 1991. Ever since then, every two or three
years, new functionalities are added to the standard, and we
may ask: are they really useful? From a security point of
view, the addition of a JavaScript interpreter (1999), of a 3D
engine (2005) or Flash support (2007) are puzzling.
2.1 Structure of a PDF file
PDF files are broken down into 4 sections (see Fig.
2
):
–
the header
that indicates the version of the standard that
the file uses,
–the
body
of the file, composed of a collection of objects,
each object describing a character font, size or even an
image,
–the
reference table
which makes it possible for the soft-
ware in charge of displaying the file to quickly find the
objects that are necessary for processing,
–
and finally, the
trailer
which contains the addresses of
elements in the file that are important for reading, such
as the address of the
catalogue
indicated by the entry
Root
.
3
Footnote 2 continued
work with this last version of the reader, but still do with the previous
ones.
3
Like everything in PDF, the catalog is itself an object, a dictionary
object.
2
In March 2009 Adobe issued the 9.1 version of its reader. This version
corrects the bug that we exploited for our scenarios, so they no longer
123
Malicious origami in PDF
291
2.2 Thinking in PDF
Thinking in PDF, requires us to distinguish several elements:
–
Objects make up the majority of elements in a PDF file,
–
File structure is in charge of the way the objects are stored,
as far as their organisation (see previous section on the
way a file is structured) and properties are concerned (file
encryption or signature management, for instance),
–
As for document structure, it is in charge of the logical
organisation of objects for display (ex: breaking down
into pages, chapters, annotations, etc.),
–
Content streams
are particular objects that describe the
appearance of a page or, more generally, of a graphic
entity.
As a consequence, two views of a PDF file can be singled
out (see Fig.
3
):
–
The physical view, including the succession of objects,
corresponding to the file stored on a device,
–
The logical view, including references from one object to
other objects, which corresponds to the semantics of the
file.
(a)
Physical view
No matter which element we refer to, it is always described
as an object. In addition, because an object is always
described by a unique number in the file, indirect references
between objects can be used.
2.3 At the heart of PDFs: objects
In PDF, the
objects
are the entities that define everything:
text or images, their layout, actions. They are at the heart of
the PDF. Their structure (see Fig.
4
) is the same, no matter
what they represent:
–
It always starts with a reference number and a generation
number,
–
The
definition
of
the
object
is
delimited
by
obj
<<
…
>>
endobj
–
Key words that are used to describe the object depend on
its nature,
(b)
Logical view
–
Those key words can use references to point to other
objects (ex: a font is defined once, and this reference is
reused by any elements that use the font).
Fig. 3
Organisation of objects in PDF
There are several elementary types, such as integers and
real numbers, Booleans, tables, dictionaries (associative
arrays). The most peculiar object is called
stream
(see Fig.
5
):
a dictionary and raw data that must be processed before its
final shape can be obtained.
The same key words apply to the definition of a
stream
:
–
Subtype
specifies the kind of
stream
,
–
Filter
indicates transformations to be applied to the
data, such as decompression, etc. Note that these opera-
tions can be linked together one after the other,
–
DecodeParms
contains additional parameters that are
necessary for the filter.
123
292
F.Raynaletal.
Generation
number
3.1 Basic functionalities
Reference
number
3.1.1 PDF actions
Interaction between a document and the user is mainly
achieved by way of
actions
. An action is a PDF object that
enables the activation of dynamic content.
There are a limited number of actions:
Keywords specific to
each type of object
Object
delimiter
Reference to
another object
–
Goto*
to move throughout the document view, or go to
a page of another document;
–
Submit
to send a form to an HTTP server;
–
Launch
to launch an application on the system;
–
URI
to connect to a URI via the system’s browser;
–
Sound
to play sound;
–
Movie
to read a video;
–
Hide
to hide or display annotations on the document;
–
Named
to launch a predefined action (print, next page…);
–
Set-OCG-Stage
to manage display of optional con-
tent;
–
Rendition
to manage reading of multimedia content;
–
Transition
to manage display between actions;
–
Go-To-3D
to display 3D content,
–
JavaScript
to launch a JavaScript.
Fig. 4
Objects
in PDF
Type
Transformation
Filter
Parameters
Raw data to
be
filtered
Actions are triggered by events. While some events are
performed on purpose by the user (mouse clicking, moving
the mouse in the document…), other events are linked to
indirect interventions. For instance, a JavaScript can be exe-
cuted when opening the document, or a specific page in the
document.
Of course, even though actions can involuntarily be acti-
vated, most of them lead to the display of an alert box. Most
of these alerts can however be configured in the user security
profile.
The following code causes a document to be printed: The
only restriction is that the path to the file to be printed must
be known (here,
secret.pdf
). From there, an attacker
could send a malicious PDF containing several identical com-
mands. Printing will only be started by those with a valid path.
Fig. 5
Streams
in PDF
3 PDF security model
Over time, Adobe introduced dynamic elements in its stan-
dard so as to make the documents interactive. Really danger-
ous functionalities appeared this way, JavaScript and PDF
actions can especially be mentioned. Reader proposes, or
imposes, configuration parameters aiming to restrict their
functions. However we will see that most of these restric-
tions rely on a blacklist model: all that is not forbidden
is
/OpenAction <<
/S /Launch
/Win << /O (print) /F (C:
secret.pdf) >>
>>
authorised,
which
in
terms
of
security
is
always
a
problem.
In this respect, we shall consider the security parameters
that are available to the readers,
4
and describe the idea of
trust that is the foundation of security.
Figure
6
a shows that the warning message does not mean
that the document is being printed, but that Acrobat Reader
has been launched… A standard user could believe this is
normal because he/she is already using a PDF: This user will
validate without asking himself anymore questions, but the
document will be sent for printing without him knowing.
4
Reader and Foxit, the other readers (Preview in Mac, and those built
on poppler, such as xpdf) do not implement dangerous functionalities,
that is to say JavaScript, attachments and forms for the most part.
123
Malicious origami in PDF
293
–
If it is in Reader’s configuration directory, it is executed
every time Reader starts and benefits from privileged
rights;
–
If it is embedded in the PDF document, it is executed in
a non-privileged mode, except if the document is consid-
ered trustworthy (see Sect.
5.3
).
3.2 Security parameters
In Adobe Reader security parameters are mostly stored in
files the user can access in writing. In other words, each
user is responsible for the security of his account. The secu-
rity configuration can therefore be modified with simple user
rights. The first table (Table
1
) shows the location of these
different elements.
This section goes on to present the most critical function-
alities in regards to security. They can be found in the configu-
ration, which, by default, can be modified by the user himself.
As we’ll see further on, this can lead to serious breaches. A
PDF file, however, cannot modify this configuration by itself
without certain privileges. Nevertheless, it would seem rea-
sonable to protect this configuration by means of a third party
(for instance, an access control system).
3.2.1 Filtering attachments
Fig. 6
Using
/Launch
to print
Attached files can be embedded in PDF documents (also
called attachments or embedded files). The content of the
attachment is placed in a
stream
inside the document. It can
be extracted and saved on the user’s hard drive, after he is
warned by a
message box
(cf. Fig.
7
). We will also see that
an attachment can be executed on the system.
Adobe decided to set up a filtering procedure for these
embedded files. To do so, a blacklist of file name extensions is
registered in the Reader’s configuration system. Any attach-
ment that ends with
exe,com,pif
and about twenty other
extensions is considered dangerous and Reader will not
export it to the disk. Whitelisted
6
extensions will be executed
without the user’s intervention. If an extension is unknown,
a dialog box is displayed, and user confirmation is required.
This security model suffers from two flaws.
First, the file name extension does not say anything about
the true nature of its content. On a Unix-type system, this
kind of protection is useless.
Second, this black list model is limited because the lists
are seldom exhaustive.
7
All there is to do is find an extension
that has been forgotten, and then move it to the whitelist. All
the attachments using this extension will be executed silently,
If the attacker, instead of requiring printing of
secret.
pdf
,asksfor
secret.doc
to be printed, a new alert comes
up. This alert lets the user know that Wordpad has been started
(see Fig.
6
b) which is a lot more suspicious. In this case, as
the user can see, Wordpad is only briefly launched; just long
enough for printing to start.
Finally, it must be noted that these actions are only avail-
able in MS Windows.
3.1.2 JavaScript
Adobe Reader uses a modified
SpiderMonkey
5
engine to run
the scripts. There are two different execution contexts:
priv-
ileged
and
non-privileged
.
Non-privileged is the default mode for any script embed-
ded in a document. The script can only get access to functions
enabling modification of the document’s layout or update
data in the fields of a form. Privileged mode grants access to
many more potentially dangerous functions such as sending
customised HTTP requests, saving documents, etc.
Several methods can be used to execute a script:
6
By default
pdf, fdf
only.
7
Not to mention that they are set in time, and do not take into account
system evolutions: for instance, the apparition of python, ruby, php
scripts, etc.
5
Adobe’s Website announces that modifications will be made public,
in conformity with SpiderMonkey licence… The announcement was
issued/published 3 years ago!!!
123
[ Pobierz całość w formacie PDF ]