[vgwort line=“71″ server=“vg08″ openid=“fc30999c3de74f278c4b67b36475990e“]
This article describes my investigations of the Java-Script obfuscation currently used by the Blackhole Exploit Kit. My intend was to write some smal scripts to automatically deobfuscate the Java-Script without using Java-Script itself. Using Java-Script with a little help would be an easy task because you only have to identify the point where the „eval“-fucntion is called and repalce it with an alert- or document-write function. My intend was to fully automate the deobfuscation to used it by an mail-scanner or a proxy-server.
I an former article I investigated the LinkedIn Spam and showed an example of a landing page assure_numb_engineers.php I downloaded from the server. Thanks to the daily spam and some web-sites listing the active servers I could get my hand on some more landing pages. Most of them looked like the example below.
In the next few paragraphs I will explain how the decoding and encoding works and develop explain a solution for decoding without deciphering the javascripts of a certain class.
How the pages look like
This is a modified example to explain of the general layout of the php page delivered by the landing server.
2 <head>
3 <title>
4 </title>
5 </head>
6 <body>
7 <div dqa=“asd“>
8 </div>
9 10 dd=“div“;
11 asd=function(){
12 a=a.replace(/[^012a-z3-9]/g,““);
13 };
14 ss=String.fromCharCode
15 </script>
16 <div
17 41=“..“ 38=“..“ 63=“..“ 50=“..“ 73=“..“ 84=“..“ 26=“..“ 66=“..“ 77=“..“ 81=“..“
18 56=“..“ 75=“..“ 55=“..“ 17=“..“ 5=“..“ 70=“..“ 58=“..“ 21=“..“ 1=“..“ 91=“..“ 76=“..“
19 57=“..“ 19=“..“ 31=“..“ 34=“..“ 45=“..“ 22=“..“ 15=“..“ 52=“..“ 9=“..“ 88=“..“ 27=“..“
20 10=“..“ 90=“..“ 28=“..“ 65=“..“ 30=“..“ 53=“..“ 67=“..“ 62=“..“ 3=“..“ 54=“..“ 61=“..“
21 39=“..“ 43=“..“ 89=“..“ 36=“..“ 85=“..“ 42=“..“ 32=“..“ 78=“..“ 0=“..“ 24=“..“ 47=“..“
22 44=“..“ 23=“..“ 92=“..“ 29=“..“ 87=“..“ 60=“..“ 59=“..“ 49=“..“ 83=“..“ 6=“..“ 46=“..“
23 74=“..“ 80=“..“ 51=“..“ 40=“..“ 4=“..“ 20=“..“ 13=“..“ 71=“..“ 69=“..“ 8=“..“ 33=“..“
24 72=“..“ 11=“..“ 82=“..“ 68=“..“ 14=“..“ 35=“..“ 16=“..“ 2=“..“ 64=“..“ 25=“..“ 18=“..“
25 12=“..“ 48=“..“ 37=“..“ 86=“..“ 79=“..“ 7=“..“>
26 </div>
27 <script>
28 if(020==0x10)a=document.getElementsByTagName(dd)[1];
29 s=““;
30 for(i=0;;i++){
31 if(window.document)r=a.getAttribute(i);
32 if(r){s=s+r;}else break;
33 }
34 a=s;
35 asd();
36 s=““;
37 for(i=0;i<a.length;i+=2){
38 s+=ss(parseInt(a.substr(i,2),31));
39 }
40 c=s;
41 e=window[“ev“+“a“+“l“]];
42 try{(„321″.substr+“zxc“)();}catch(gdsgdsg){e(c);}
43 </script>
44 </body>
45 </html>
I replaced the long strings with two dots. I inserted some line breaks, line numbers and color markings for better readability. But even with this formating you can see that the obfuscation is a bit lousy. Better for me. Other landing pages are a little bit trickier and harder to decipher.
The data is stored in the attributes of a tag
Lets start with the div-Tag in line 16. The tag has nearly 100 attributes named with a number. Every parameter is assigned a very long string. This is of no need for the browser, but you can easily store data in those parameters. The numbers are not ordered but there is no need to do it. The attribute values look like this:
0=“!3n3l3s3u3p^343l112i3f_3o3a3c3h26/383n38363n(1u3u3p383l@3m3c3i3h1r#131h1f1o1f..
Looks a bit complicated to crack the code. Strip of the Java-Script and it would be a nice task for a codebreaker. But when the script can do it, we can do it.
The strings contain the data for an encoded script which is passed to the eval-function in line 42. How is it decoded?
Cleaning the string from garbage
Lets start with the beginning of the script? In line 12 you see a regular expression replace(/[^012a-z3-9]/g,”“) which deletes all characters except ‚0‘ to ‚9‘ and ‚a‘ to ‚z‘ from a string. Guess which! When we apply this to the attribute values of the div tag they look much prettier.
3n3l3s3u3p343l112i3f3o3a3c3h26383n38363n1u3u3p383l3m3c3i3h1r131h1f1o1f
In the for-loop starting at line 30 all attributes are put together into one long string in their natural order. After that step all the unneeded characters are deleted in line 35 from the resulting string.
Decoding of the data to the script
The next for-loop starting in line 37 decodes the string.
Unobfuscated the loop looks like this one:
for ( i=0; i < a.length; i+=2 ) {
s+=String.fromCharCode(parseInt(a.substr(i,2),31));
}
This loop takes two characters form the start to the end of the string together, interprets them as an integer of the radix 31 and appends the result as a character to a new string.
How could we do this without JavaScript?
First step: Extracting the data
First we must extract the n=“..“ strings from the page. That can be done by a combination of grep and sed. Deleting the unneeded characters can be done with tr . sort will sort the strings and sed.
These thoughts lead us to the following script.
# Blackhole decode Part 1
sed ’s# \([0-9]\{1,2\}=“[^“]*“\)#\n\1\n#g;‘ \
| grep ‚^[0-9]*=‘ \
| sort -n \
| sed ’s#^[0-9]*=“##; s#“$##;‘ \
| tr -dc ‚0-9a-z‘ \
| tr -d ‚\n‘
Second step: decoding the data
The second step needs a bit more thinking. We need script or program for the function parseInt. But wait a minute? What if the Radix is changend every time the landing page is requested? ParseInt accepts a a radix from 2 to 36. With pairing two digits for a new character only values from 16 to 36 are useful. We could easily try 21 possibilities and select the right one manually but the solution should not require manual interference. What can we do?
If the string is long enough we can assume that at least one character with the highest digit (the ‚9‘ in the decimal system) is present. With command line functions we could do this by splitting the string after each digit into lines, sorting the result unique and looking for the last line. This would be a pretty good guess.
I decided not to take this way. It would be a slow, resource consuming solution. To gain speed I decided for a C-Program to guess the radix. The program should prove the result against … what?
There are some possible test.
- The lowest printable character would be the blank if there were no line breaks. In my examples are no characters below blank. But this cold be adjusted. 0x0a would be the encoding for the carriage return for every radix greater 10.
- The braces in an Java-Script are equal, when they are not used in strings. Every opening brace ‚(‚ or ‚{‚ must be closed by ‚)‘ or ‚}‘. The counts for the opening and closing braces should be equal.
This leads to the following program to guess the radix.
/* (c) 2012 by Thomas Arend, 2012/10/25 * Purpose: Guess the radix for parseInt from the input * Assumption: Highest digit is used at least once * Input: parseInt coded file * Output: possible radix for decoding with parseInt * Return-Codes: * 0 everything well * 1 input was not tidy * 2 with radix r the blank is not the lowest character * 4 opening ( not as much as closing ) * 8 opening { not as much as closing } * * The Toolkit Blackhole codes a Java-Script * in the attribute values of a tag. * Every two characters are interpreted as an integer and * parsed with parseInt and fromCharCode into an new character * The radix for parseInt is obfuscated in the calling script. * Because we don't want to reassemble the obfuscated script * we have to guess the radix from the input. * * We assume that the highest digit is used in the input. * * That the highest digit is not used has a very low probability in * a large javascript. The useful range for the radix is 16 til 36. * * $Id: $ * $Log:$ */ #include <stdlib.h> #include <stdio.h> #include <string> #define MAXRADIX 36 #define MAXCHAR 256 using namespace std; char validdigits [MAXRADIX+1] = "0123456789abcdefghijklmnopqrstuvwxyz"; long usedchar [MAXCHAR] = { 0 } ; long statistic[MAXRADIX][MAXRADIX] = { 0 }; int validdigit (int digit ) { if ('0' <= digit && digit <= '9') return 1 ; else if ('a' <= digit && digit <= 'z' ) return 1; else if ( digit == 10 ) return 1; else return 0; } int digitindex (int digit ) { if ('0' <= digit && digit <= '9') return (digit - '0'); else if ('a' <= digit && digit <= 'z' ) return (digit -'a' + 10); else return 255; } // Check if the input file consisted only of 0-9, a-z int check_tidy_charset () { int isdirty = 0; int dirty = 0; for ( dirty = 1 ; dirty < MAXCHAR ; dirty++) { if ( !validdigit(dirty) && usedchar[dirty] ) { isdirty++; } } if (isdirty > 0) { printf ("Dirty characters %d\n", isdirty ) ; return 1; } else return 0; } // If the code contains blanks then the ' ' // should be the lowest cahracter. int blank_check ( int radix ) { int found = 0; int i = 0, j = 0; found = 0; for ( i = 0; i < MAXRADIX && !found ; i++ ) { for ( j = 0; j < MAXRADIX && !found ; j++ ) { found = statistic[i][j] > 0; } } if ((i-1)*radix + j-1 != ' ') { printf ("Blank check failed at [%d,%d] = %d\n", i-1 , j-1 , statistic[i-1][j-1]) ; return 2; } else return 0; } // The characters ( and ) should have equal counts. int partentheses_check ( int radix ) { if (statistic['(' / radix]['(' % radix] != statistic[')' / radix][')' % radix]) { printf ("Parentheses '()' check failed with %d,%d\n", statistic['(' / radix]['(' % radix] , statistic[')' / radix][')' % radix] ) ; return 4; } else return 0; } // The characters { and } should have equal counts. int curly_brace_check ( int radix ) { if (statistic['{' / radix]['{' % radix] != statistic['}' / radix]['}' % radix]) { printf ("Bracket '[]' check failed with %d,%d\n", statistic['{' / radix]['{' % radix] , statistic['}' / radix]['}' % radix] ) ; return 8; } else return 0; } int main ( int argc, char *argv[ ]) { int figure = 0; int previous = 0; int paired = 0; int radix = 0; int dirty = 0; int isdirty = 0; int found = 0; int i = 0, j = 0; int error = 0; // Count all characters paired = 0; while (( figure = getchar()) != EOF ) { usedchar[figure]++; if (paired) { i = digitindex(previous); j = digitindex(figure); if (i < 255 && j < 255) { statistic[i][j]++;} paired = 0; } else { paired = 1; previous = figure; } } // Seek highest character for ( figure = 255 ; ( figure > 0) && (usedchar[figure] == 0); figure-- ) {} // Print radix radix = digitindex (figure) + 1; printf ( "%d\n" , radix ); // Check input and guess error += check_tidy_charset(); error += blank_check(radix); error += partentheses_check(radix); error += curly_brace_check(radix); return error; }
Listing: piRadix
Decoding the data
I decided for a second C-program piDecode to decode the data.
/* (c) 2012 by Thomas Arend, 2012/10/25 * * * Purpose: Decode parseInt encoded input file * Parameter: radix for parseInt * Input: parseInt encoded file * Output: decodet file * * The Toolkit Blackhole codes a Java-Script * in the attribute values of a tag. * Every two characters are interpreted as an integer and * parsed with parseInt and fromCharCode into an new character * * The radix can be guessed with the program piRadix * $Id: $ * $Log:$ */ #include <stdlib.h> #include <stdio.h> #include <string> using namespace std; int digittoint ( int digit ){ if ('0' <= digit && digit <= '9') return (digit - '0'); else if ('a' <= digit && digit <= 'z' ) return (digit -'a' + 10); else return 255; } int char2toint ( int z1, int z2, int radix) { return (digittoint(z1) * radix + digittoint(z2)) ; } int main ( int argc, char *argv[ ]) { int character1 = 0; int character2 = 0 ; int code = 0; int radix = 16 ; if (argc > 1 ) { radix = atoi(argv[1]); } else { radix = 16; } while (( character1 = getchar()) != EOF ) { if ( (character2 = getchar()) == EOF ) break; code = char2toint ( character1, character2, radix ); if ( code < 256 ) { putchar(code); } else { putchar(code >> 8); putchar(code & 256); } } printf ( "\n" ) ; return 0; }
Listing: piDecode
This program is pretty simple. There could be some improvement and error check but it works when the input is fine.
The „blackhole“ script part 2
This lead to the final solution for the blackhole script. It has to be called twice. First run without a parameter to guess the radix and second run to decode the input with a given radix.
#!/bin/sh # Blackhole decode Part 1 if [ -z "$1" ] then CMD="piRadix" else CMD="piDecode" fi sed 's# \([0-9]\{1,2\}="[^"]*"\)#\n\1\n#g;' \ | grep '^[0-9]*=' \ | sort -n \ | sed 's#^[0-9]*="##; s#"$##;' \ | tr -dc '0-9a-z' \ | tr -d '\n' \ | $CMD $1
Listing: blackhole.sh
Example:
thomas@x1:~> blackhole.sh <term_covering.php
30
thomas@x1:~> blackhole.sh 30 <term_covering.php
try{var PluginDetect={version:“0.7.8″,name:“PluginDetect“, …
There are some nearly the same obfuscations with the relaying pages. They have to be handled differently but the two C-programs do there work, when then data is extracted. Here an example from a relaying page.
try{dsfsd++}
catch(wEGWEGWEg){
try{(v+v)()}
catch(fsebgreber){
try{v[„document“][„body“]=“123″}
catch(gds){m=123;if((alert+““).indexOf(„native“)!==-1)ev=window[„e“+“val“];
}
}
n=[„53″,“45″,“4m“,“23″,“2f“,“26″,..,..,..,];
h=2;
s=““;
if(m)for(i=0;i-105!=0;i++){
k=i;
if(window[„document“])s+=String[„fro“+“mCharCode“](parseInt(n[i],23));
}
try{febwnrth–}
catch(bterste){
alert(s);
}
}
The modification of the blackhole shell script to detect this encoding and extract the data should be not the greatest task.
Attached you find the Blackhole-decode source code in a ZIP-archive.
Use at your own risk. Upps, I forgot a license info. Will ad it tomorrow. Will be GPL.
Good night!
My programs read the file character for character. There programs load the whole file into a string. In their approach it should be easy to calculate the radix by scanning the string for the highest digit before decoding it.
[…] Was auffällt, dass gestern keine Trojaner eingingen. Ärgerlich. Nun habe ich so viel Zeit in die Entzifferung gesteckt und die lassen mich aus. […]
[…] recht kurz ist, kann der richtige Radix mittels des Programmes piRadix, das ich in dem Artikel How to deobfuscate Blackhole Java-Script vorgestellt habe, richtig erraten […]
[…] a former article I described a method to deobfuscate Blackhole obfuscated JavaScript with shell commands and two simple […]