Hex Rays
Hex Rays Blog —  State of the art code analysis

From simple to complex

The last week Elias ran a sample malware in the Bochs emulator and I was curious to see what it exactly does. So I took the unpacked version of the malware and fed it into the decompiler. It turned out to be a pretty short downloadler (different AV vendors give it different names: Lighty after the compression method, or FraudLoad, or FakeAlert, etc). Such simple code is very easy to decompile. I renamed some functions and added some comments to it. The final text looks like this:
  download_winivstr();  // Download a program from the internet
                        // It will be launched later
                        // Create a thread to scare the user
  icon_thread = CreateThread(0, 0, icon_thread_entry, 0, 0, &Data);
  Sleep(1000u);         // 1 second
  while ( 1 )
    if ( check_security_guards() == GUARD_EXISTS )
      SendMessageA(main_hwnd, WM_DESTROY, 0, 0);
      scare_user();    // Tell the user that his computer is infected
    Sleep(180000u);    // 3 minutes
The original listing is 118KB and the decompilation result is 23KB. the difference is 5 times, which is a good ratio for such a simple program. The assember listing and the decompilation output can be downloaded here: http://www.hex-rays.com/decompilation/files/lighty_fraudload.zip The next time I’ll try to find something more complicated to show you more advanced features of the decompiler. Something really difficult to understand even for seasoned reverse engineers. For example, can you make sense out the following code? How much time does it take for you? push ebp mov ebp, esp sub esp, 8 mov edx, [ebp+arg_C] mov eax, [ebp+arg_8] push edx push eax fild [esp+10h+var_10] add esp, 8 test edx, edx js short loc_2A23 fstp [ebp+var_8] fld [ebp+var_8] fld [ebp+arg_0] leave fucompp fnstsw ax sahf setnz al setp dl or al, dl movzx eax, al retn loc_2A23: fadd ds:flt_AEEC fstp [ebp+var_8] fld [ebp+var_8] fld [ebp+arg_0] leave fucompp fnstsw ax sahf setnz al setp dl or al, dl movzx eax, al retn This mess was originally a one-line statement: bool cmpeq_double_longlong(double a, unsigned __int64 b) { return a == b; } (you knew that it would be that simple, didn’t you? 😉 As you see, we are playing with the floating point arithmetic now. Who knows, maybe the decompiler will handle it in the nearest future. Do not hold your breath yet: there is a long way ahead and many problems to solve. The above listing is from our sample test file. The test file has ~750 trivial functions and we compile it with 3 different compilers in optimized and non-optimized modes. So we ‘just’ need to make sure that all 750*3*2=4500 functions decompile correctly and we will have the first decompilation step over. Then we will need to make sure that all possible combinations of integer and floating point arithmetic decompile well, type conversions do not spoil the result, and the output is generated correctly. For integer arithmetic, a similar test file took more than one month, I bet that floating point will take longer… But we will eventually be there, with a good result (it must be portable too, since we do not plan to stay forever with x86). Stay tuned! 🙂
Go to top of page