From simple to complex

The last week Elias ran a sample malware in the Bochs emulator and I was curious to see what it exactly does. So I took the unpacked version of the malware and fed it into the decompiler. It turned out to be a pretty short downloadler (different AV vendors give it different names: Lighty after the compression method, or FraudLoad, or FakeAlert, etc). Such simple code is very easy to decompile. I renamed some functions and added some
comments to it. The final text looks like this:

  download_winivstr();  // Download a program from the internet
                        // It will be launched later
                        // Create a thread to scare the user
  icon_thread = CreateThread(0, 0, icon_thread_entry, 0, 0, &Data);
  Sleep(1000u);         // 1 second
  while ( 1 )
  {
    if ( check_security_guards() == GUARD_EXISTS )
      SendMessageA(main_hwnd, WM_DESTROY, 0, 0);
    else
      scare_user();    // Tell the user that his computer is infected
    Sleep(180000u);    // 3 minutes
  }

The original listing is 118KB and the decompilation result is 23KB. the difference is 5 times, which is a good ratio for such a simple program. The assember listing and the decompilation output can be downloaded here:
http://www.hex-rays.com/decompilation/files/lighty_fraudload.zip
The next time I’ll try to find something more complicated to show you more advanced features of the decompiler. Something really difficult to understand even for seasoned reverse engineers. For example, can you make sense out the following code? How much time does it take for you?
push ebp
mov ebp, esp
sub esp, 8
mov edx, [ebp+arg_C]
mov eax, [ebp+arg_8]
push edx
push eax
fild [esp+10h+var_10]
add esp, 8
test edx, edx
js short loc_2A23
fstp [ebp+var_8]
fld [ebp+var_8]
fld [ebp+arg_0]
leave
fucompp
fnstsw ax
sahf
setnz al
setp dl
or al, dl
movzx eax, al
retn
loc_2A23:
fadd ds:flt_AEEC
fstp [ebp+var_8]
fld [ebp+var_8]
fld [ebp+arg_0]
leave
fucompp
fnstsw ax
sahf
setnz al
setp dl
or al, dl
movzx eax, al
retn

This mess was originally a one-line statement:

bool cmpeq_double_longlong(double a, unsigned __int64 b)
{
return a == b;
}
(you knew that it would be that simple, didn’t you? 😉
As you see, we are playing with the floating point arithmetic now. Who knows, maybe the decompiler will handle it in the nearest future. Do not hold your breath yet: there is a long way ahead and many problems to solve. The above listing is from our sample test file. The test file has ~750 trivial functions and we compile it with 3 different compilers in optimized and non-optimized modes. So we ‘just’ need to make sure that all 750*3*2=4500 functions decompile correctly and we will have the first decompilation step over. Then we will need to make sure that all possible combinations of integer and floating point arithmetic decompile well, type conversions do not spoil the result, and the output is generated correctly. For integer arithmetic, a similar test file took more than one month, I bet that floating point will take longer… But we will eventually be there, with a good result (it must be portable too, since we do
not plan to stay forever with x86). Stay tuned! 🙂

Bochs Emulator and IDA? Bochs plugin goes alpha