Decompilation vs. disassembly

Welcome to the Hex-Rays comparison page! Below you will find side-by-side comparisons of disassembly and decompilation outputs. Please maximize the window too see both columns simultaneously.

The following exhibits are displayed on this page:

  1. Division by two
  2. Simple enough?
  3. Where's my variable?
  4. Arithmetics is not a rocket science
  5. Sample window procedure
  6. Short-circuit evaluation
  7. Inlined string operations
Division by two
; =============== S U B R O U T I N E =======================================
; Attributes: bp-based frame
; mod_ll(long long)
public __Z6mod_llx
__Z6mod_llx proc near
var_10 = dword ptr -10h
var_C = dword ptr -0Ch
arg_0 = qword ptr 8
push ebp
mov ebp, esp
push ebx
sub esp,
0Ch
mov ecx, dword ptr [ebp+arg_0]
mov ebx, dword ptr [ebp+
arg_0+4]
mov eax, ecx
mov edx, ebx
mov eax, edx
mov edx, eax
sar edx,
1Fh
sar eax, 1Fh
mov eax, edx
mov edx,
0
shr eax, 1Fh
add eax, ecx
adc edx, ebx
shrd eax, edx,
1
sar edx, 1
mov [ebp+var_10], eax
mov [ebp+
var_C], edx
mov eax, [ebp+
var_10]
mov edx, [ebp+
var_C]
shld edx, eax,
1
add eax, eax
sub ecx, eax
sbb ebx, edx
mov [ebp+
var_10], ecx
mov [ebp+
var_C], ebx
mov eax, [ebp+
var_10]
mov edx, [ebp+
var_C]
add esp,
0Ch
pop ebx
pop ebp
retn
__Z6mod_llx endp
__int64 __cdecl mod_ll(__int64 a1) { return a1 % 2; }
Just note the difference in size! While the disassemble output requires you not only to know that the compilers generate such convoluted code for signed divisions and modulo operations, but you will also have to spend your time recognizing the patterns. Needless to say, the decompiler makes things really simple.
Simple enough?
; =============== S U B R O U T I N E ======================================= ; int __cdecl sub_4061C0(char *Str, char *Dest) sub_4061C0 proc near ; CODE XREF: sub_4062F0+15p ; sub_4063D4+21p ... Str = dword ptr 4 Dest = dword ptr 8 push esi push offset aSmtp_ ; "smtp." push [esp+8+Dest] ; Dest call _strcpy mov esi, [esp+0Ch+Str] push esi ; Str call _strlen add esp, 0Ch xor ecx, ecx test eax, eax jle short loc_4061ED loc_4061E2: ; CODE XREF: sub_4061C0+2Bj cmp byte ptr [ecx+esi], 40h jz short loc_4061ED inc ecx cmp ecx, eax jl short loc_4061E2 loc_4061ED: ; CODE XREF: sub_4061C0+20j ; sub_4061C0+26j dec eax cmp ecx, eax jl short loc_4061F6 xor eax, eax pop esi retn ; --------------------------------------------------------------------------- loc_4061F6: ; CODE XREF: sub_4061C0+30j lea eax, [ecx+esi+1] push eax ; Source push [esp+8+Dest] ; Dest call _strcat pop ecx pop ecx push 1 pop eax pop esi retn sub_4061C0 endp signed int __cdecl sub_4061C0(char *Str, char *Dest) { int len; // eax@1 int i; // ecx@1 char *str2; // esi@1 signed int result; // eax@5 strcpy(Dest, "smtp."); str2 = Str; len = strlen(Str); for ( i = 0; i < len; ++i ) { if ( str2[i] == 64 ) break; } if ( i < len - 1 ) { strcat(Dest, &str2[i + 1]); result = 1; } else { result = 0; } return result; }
Questions like can be answered almost instantaneously looking at the decompiler output. Needless to say that it looks better because I renamed the local variables. In the disassembler, registers are renamed very rarely because it hides the register use and can lead to confusion.
Where's my variable?
; =============== S U B R O U T I N E ======================================= ; int __cdecl myfunc(wchar_t *Str, int) myfunc proc near ; CODE XREF: sub_4060+76p ; .text:42E4p Str = dword ptr 4 arg_4 = dword ptr 8 mov eax, dword_1001F608 cmp eax, 0FFFFFFFFh jnz short loc_10003AB6 push offset aGetsystemwindo ; "GetSystemWindowsDirectoryW" push offset aKernel32_dll ; "KERNEL32.DLL" call ds:GetModuleHandleW push eax ; hModule call ds:GetProcAddress mov dword_1001F608, eax loc_10003AB6: ; CODE XREF: myfunc+8j test eax, eax push esi mov esi, [esp+4+arg_4] push edi mov edi, [esp+8+Str] push esi push edi jz short loc_10003ACA call eax ; dword_1001F608 jmp short loc_10003AD0 ; --------------------------------------------------------------------------- loc_10003ACA: ; CODE XREF: myfunc+34j call ds:GetWindowsDirectoryW loc_10003AD0: ; CODE XREF: myfunc+38j sub esi, eax cmp esi, 5 jnb short loc_10003ADD pop edi add eax, 5 pop esi retn ; --------------------------------------------------------------------------- loc_10003ADD: ; CODE XREF: myfunc+45j push offset aInf_0 ; "\\inf" push edi ; Dest call _wcscat push edi ; Str call _wcslen add esp, 0Ch pop edi pop esi retn myfunc endp size_t __cdecl myfunc(wchar_t *buf, int bufsize) { int (__stdcall *func)(_DWORD, _DWORD); // eax@1 wchar_t *buf2; // edi@3 int bufsize; // esi@3 UINT dirlen; // eax@4 size_t outlen; // eax@7 HMODULE h; // eax@2 func = g_fptr; if ( g_fptr == (int (__stdcall *)(_DWORD, _DWORD))-1 ) { h = GetModuleHandleW(L"KERNEL32.DLL"); func = (int (__stdcall *)(_DWORD, _DWORD)) GetProcAddress(h, "GetSystemWindowsDirectoryW"); g_fptr = func; } bufsize = bufsize; buf2 = buf; if ( func ) dirlen = func(buf, bufsize); else dirlen = GetWindowsDirectoryW(buf, bufsize); if ( bufsize - dirlen >= 5 ) { wcscat(buf2, L"\\inf"); outlen = wcslen(buf2); } else { outlen = dirlen + 5; } return outlen; }
IDA highlights the current identifier. This feature turns out to be much more useful with high level output. In this sample, I tried to trace how the retrieved function pointer is used by the function. In the disassembly output, many wrong eax occurrences are highlighted while the decompiler did exactly what I wanted.
Arithmetics is not a rocket science
; =============== S U B R O U T I N E ======================================= ; Attributes: bp-based frame ; sgell(__int64, __int64) public @sgell$qjj @sgell$qjj proc near arg_0 = dword ptr 8 arg_4 = dword ptr 0Ch arg_8 = dword ptr 10h arg_C = dword ptr 14h push ebp mov ebp, esp mov eax, [ebp+arg_0] mov edx, [ebp+arg_4] cmp edx, [ebp+arg_C] jnz short loc_10226 cmp eax, [ebp+arg_8] setnb al jmp short loc_10229 ; --------------------------------------------------------------------------- loc_10226: ; CODE XREF: sgell(__int64,__int64)+Cj setnl al loc_10229: ; CODE XREF: sgell(__int64,__int64)+14j and eax, 1 pop ebp retn @sgell$qjj endp bool __cdecl sgell(__int64 a1, __int64 a2) { return a1 >= a2; }
Arithmetics is not a rocket science but it is always better if someone handles it for you.
You have more important things to focus on.
Sample window procedure
; =============== S U B R O U T I N E =======================================
wndproc proc near ; DATA XREF: sub_4010E0+21o
Paint = tagPAINTSTRUCT ptr -0A4h
Buffer = byte ptr -64h
hWnd = dword ptr 4
Msg = dword ptr 8
wParam = dword ptr 0Ch
lParam = dword ptr 10h
mov ecx, hInstance
sub esp,
0A4h
lea eax, [esp+0A4h+Buffer]
push
64h ; nBufferMax
push eax ; lpBuffer
push 6Ah ; uID
push ecx ; hInstance
call ds:LoadStringA
mov ecx, [esp+0A4h+Msg]
mov eax, ecx
sub eax,
2
jz loc_4013E8
sub eax,
0Dh
jz loc_4013B2
sub eax,
102h
jz short loc_401336
mov edx, [esp+
0A4h+lParam]
mov eax, [esp+
0A4h+wParam]
push edx
; lParam
push eax ; wParam
push ecx ; Msg
mov ecx, [esp+0B0h+hWnd]
push ecx
; hWnd
call ds:DefWindowProcA
add esp, 0A4h
retn 10h
; ---------------------------------------------------------------------------
loc_401336: ; CODE XREF: wndproc+3Cj
mov ecx, [esp+0A4h+wParam]
mov eax, ecx
and eax,
0FFFFh
sub eax, 68h
jz short loc_40138A
dec eax
jz short loc_401371
mov edx, [esp+
0A4h+lParam]
mov eax, [esp+
0A4h+hWnd]
push edx
; lParam
push ecx ; wParam
push 111h ; Msg
push eax ; hWnd
call ds:DefWindowProcA
add esp, 0A4h
retn 10h
; ---------------------------------------------------------------------------
loc_401371: ; CODE XREF: wndproc+7Aj
mov ecx, [esp+0A4h+hWnd]
push ecx
; hWnd
call ds:DestroyWindow
xor eax, eax
add esp,
0A4h
retn 10h
; ---------------------------------------------------------------------------
loc_40138A: ; CODE XREF: wndproc+77j
mov edx, [esp+0A4h+hWnd]
mov eax, hInstance
push
0 ; dwInitParam
push offset DialogFunc ; lpDialogFunc
push edx ; hWndParent
push 67h ; lpTemplateName
push eax ; hInstance
call ds:DialogBoxParamA
xor eax, eax
add esp,
0A4h
retn 10h
; ---------------------------------------------------------------------------
loc_4013B2: ; CODE XREF: wndproc+31j
push esi
mov esi, [esp+
0A8h+hWnd]
lea ecx, [esp+
0A8h+Paint]
push ecx
; lpPaint
push esi ; hWnd
call ds:BeginPaint
push eax ; HDC
push esi ; hWnd
call my_paint
add esp,
8
lea edx, [esp+0A8h+Paint]
push edx
; lpPaint
push esi ; hWnd
call ds:EndPaint
pop esi
xor eax, eax
add esp,
0A4h
retn 10h
; ---------------------------------------------------------------------------
loc_4013E8: ; CODE XREF: wndproc+28j
push 0 ; nExitCode
call ds:PostQuitMessage
xor eax, eax
add esp,
0A4h
retn 10h
wndproc endp
LRESULT __stdcall wndproc(HWND hWnd, UINT Msg, WPARAM wParam, LPARAM lParam) { LRESULT result; // eax@4 HWND h; // esi@10 HDC dc; // eax@10 CHAR Buffer; // [sp+40h] [bp-64h]@1 struct tagPAINTSTRUCT Paint; // [sp+0h] [bp-A4h]@10 LoadStringA(hInstance, 0x6Au, &Buffer, 100); switch ( Msg ) { case 2u: PostQuitMessage(0); result = 0; break; case 15u: h = hWnd; dc = BeginPaint(hWnd, &Paint); my_paint(h, dc); EndPaint(h, &Paint); result = 0; break; case 273u: if ( (_WORD)wParam == 104 ) { DialogBoxParamA(hInstance, (LPCSTR)0x67, hWnd, DialogFunc, 0); result = 0; } else { if ( (_WORD)wParam == 105 ) { DestroyWindow(hWnd); result = 0; } else { result = DefWindowProcA(hWnd, 0x111u, wParam, lParam); } } break; default: result = DefWindowProcA(hWnd, Msg, wParam, lParam); break; } return result; }
The decompiler recognized a switch statement and nicely represented the window procedure. Without this little help the user would have to calculate the message numbers herself. Nothing particularly difficult, just time consuming and boring. What if she makes a mistake?...
Short-circuit evaluation
loc_804BCC7: ; CODE XREF: sub_804BB10+A42j
mov [esp+28h+var_24], offset aUnzip ; "unzip"
xor eax, eax
test esi, esi
setnz al
mov edx, 1
mov ds:dword_804FBAC, edx
lea eax, [eax+eax+1]
mov ds:dword_804F780, eax
mov eax, ds:dword_804FFD4
mov [esp+28h+var_28], eax
call _strstr
test eax, eax
jz loc_804C4F1
loc_804BCFF: ; CODE XREF: sub_804BB10+9F8j
mov eax, 2
mov ds:dword_804FBAC, eax
loc_804BD09: ; CODE XREF: sub_804BB10+9FEj
mov [esp+28h+var_24], offset aZ2cat ; "z2cat"
mov eax, ds:dword_804FFD4
mov [esp+28h+var_28], eax
call _strstr
test eax, eax
jz loc_804C495
loc_804BD26: ; CODE XREF: sub_804BB10+99Cj
; sub_804BB10+9B9j ...
mov eax, 2
mov ds:dword_804FBAC, eax
xor eax, eax
test esi, esi
setnz al
inc eax
mov ds:dword_804F780, eax
.............................. SKIP ............................
loc_804C495: ; CODE XREF: sub_804BB10+210j
mov [esp+28h+var_24], offset aZ2cat_0 ; "Z2CAT"
mov eax, ds:dword_804FFD4
mov [esp+28h+var_28], eax
call _strstr
test eax, eax
jnz loc_804BD26
mov [esp+28h+var_24], offset aZcat ; "zcat"
mov eax, ds:dword_804FFD4
mov [esp+28h+var_28], eax
call _strstr
test eax, eax
jnz loc_804BD26
mov [esp+28h+var_24], offset aZcat_0 ; "ZCAT"
mov eax, ds:dword_804FFD4
mov [esp+28h+var_28], eax
call _strstr
test eax, eax
jnz loc_804BD26
jmp loc_804BD3D
; ---------------------------------------------------------------------------
loc_804C4F1: ; CODE XREF: sub_804BB10+1E9j
mov [esp+28h+var_24], offset aUnzip_0 ; "UNZIP"
mov eax, ds:dword_804FFD4
mov [esp+28h+var_28], eax
call _strstr
test eax, eax
jnz loc_804BCFF
jmp loc_804BD09
dword_804F780 = 2 * (v9 != 0) + 1;
if ( strstr(dword_804FFD4, "unzip") || strstr(dword_804FFD4, "UNZIP") )
dword_804FBAC = 2;
if ( strstr(dword_804FFD4, "z2cat")
|| strstr(dword_804FFD4, "Z2CAT")
|| strstr(dword_804FFD4, "zcat")
|| strstr(dword_804FFD4, "ZCAT") )
{
dword_804FBAC = 2;
dword_804F780 = (v9 != 0) + 1;
}
This is an excerpt from a big function to illustrate short-circuit evaluation. Complex things happen in long functions and it is very handy to have the decompiler to represent things in a human way. Please note how the code that was scattered over the address space is concisely displayed in two if statements.
Inlined string operations
mov eax, [esp+argc]
sub esp, 8
push ebx
push ebp
push esi
lea ecx, ds:0Ch[eax*4]
push edi
push ecx ; unsigned int
call ??2@YAPAXI@Z ; operator new(uint)
mov edx, [esp+1Ch+argv]
mov ebp, eax
or ecx, 0FFFFFFFFh
xor eax, eax
mov esi, [edx]
add esp, 4
mov edi, esi
repne scasb
not ecx
dec ecx
cmp ecx, 4
jl short loc_401064
cmp byte ptr [ecx+esi-4], '.'
jnz short loc_401064
mov al, [ecx+esi-3]
cmp al, 'e'
jz short loc_401047
cmp al, 'E'
jnz short loc_401064
loc_401047: ; CODE XREF: _main+41j
mov al, [ecx+esi-2]
cmp al, 'x'
jz short loc_401053
cmp al, 'X'
jnz short loc_401064
loc_401053: ; CODE XREF: _main+4Dj
mov al, [ecx+esi-1]
cmp al, 'e'
jz short loc_40105F
cmp al, 'E'
jnz short loc_401064
loc_40105F: ; CODE XREF: _main+59j
mov byte ptr [ecx+esi-4], 0
loc_401064: ; CODE XREF: _main+32j _main+39j ...
mov edi, esi
or ecx, 0FFFFFFFFh
xor eax, eax
repne scasb
not ecx
add ecx, 3
push ecx ; unsigned int
call ??2@YAPAXI@Z ; operator new(uint)
mov edx, eax
v4 = operator new(4 * argc + 12);
v5 = *argv;
v77 = strlen(*argv);
v3 = v77 - 1;
if ( (signed int)(v77 - 1) >= 4 )
{
if ( v5[v3 - 4] == '.' )
{
chr = v5[v3 - 3];
if ( chr == 'e' || chr == 'E' )
{
v7 = v5[v3 - 2];
if ( v7 == 'x' || v7 == 'X' )
{
v8 = v5[v3 - 1];
if ( v8 == 'e' || v8 == 'E' )
v5[v3 - 4] = 0;
}
}
}
}
v9 = operator new(strlen(v5) + 3);
The decompiler tries to recognize frequently inlined string functions such as strcmp, strchr, strlen, etc. In this code snippet, calls to the strlen function has been recognized.

This is all for the moment! Please come back for more examples!