Hex-Rays ARM Decompiler v1.0 Comparison Page

Welcome to the Hex-Rays ARM Decompiler v1.0 comparison page! Below you will find side-by-side comparisons of disassembly and decompiler. Please maximize the window too see both columns simultaneously.

The following original exhibits are displayed on this page:

  1. Simple case
  2. 64-bit arithmetics
  3. Conditional instructions
  4. Conditional instructions - 2
  5. Complex instructions
  6. Compiler helper functions
  7. Immediate constants
  8. Position independent code
Simple case
; struct_result *__fastcall sub_210DC(struct_result *result) var_10 = -0x10 var_4 = -4 MOV R12, SP STMFD SP!, {R0} STMFD SP!, {R12,LR} SUB SP, SP, #4 LDR R2, [SP,#0x10+var_4] MOV R3, #0 STR R3, [R2] LDR R3, [SP,#0x10+var_4] ADD R2, R3, #4 MOV R3, #0 STR R3, [R2] LDR R3, [SP,#0x10+var_4] ADD R2, R3, #8 MOV R3, #0 STR R3, [R2] LDR R3, [SP,#0x10+var_4] STR R3, [SP,#0x10+var_10] LDR R0, [SP,#0x10+var_10] ADD SP, SP, #4 LDMFD SP, {SP,LR} BX LR ; End of function sub_210DC struct_result *__fastcall sub_210DC(struct_result *result) { result->dword0 = 0; result->dword4 = 0; result->dword8 = 0; return result; }
Let's start with a very simple function. It accepts a pointer to a structure and zeroes out its first three fields. While the function logic is obvious by just looking at the decompiler output, the assembly listing has too much noise and requires studying it.

The decompiler saves your time and allows you to concentrate on more exciting aspects of reverse engineering.
64-bit arithmetics
; bool __cdecl uh_gt_uc() EXPORT _uh_gt_uc__YA_NXZ _uh_gt_uc__YA_NXZ ; DATA XREF: .pdata:$T7452o var_2C = -0x2C var_28 = -0x28 var_24 = -0x24 var_20 = -0x20 var_1C = -0x1C var_18 = -0x18 var_14 = -0x14 var_10 = -0x10 var_C = -0xC var_8 = -8 var_4 = -4 STR LR, [SP,#var_4]! ; $M7441 ; [email protected]_gt_uc SUB SP, SP, #0x28 $M7449 BL uh STR R1, [SP,#0x2C+var_24] STR R0, [SP,#0x2C+var_28] BL uc STRB R0, [SP,#0x2C+var_20] LDRB R3, [SP,#0x2C+var_20] STR R3, [SP,#0x2C+var_1C] LDR R1, [SP,#0x2C+var_1C] LDR R3, [SP,#0x2C+var_1C] MOV R2, R3,ASR#31 LDR R3, [SP,#0x2C+var_28] STR R3, [SP,#0x2C+var_18] LDR R3, [SP,#0x2C+var_24] STR R3, [SP,#0x2C+var_14] LDR R3, [SP,#0x2C+var_18] STR R3, [SP,#0x2C+var_10] STR R1, [SP,#0x2C+var_C] LDR R3, [SP,#0x2C+var_14] CMP R3, R2 BCC $LN3_8 loc_6AC BHI $LN5_0 loc_6B0 LDR R2, [SP,#0x2C+var_10] LDR R3, [SP,#0x2C+var_C] CMP R2, R3 BLS $LN3_8 $LN5_0 MOV R3, #1 STR R3, [SP,#0x2C+var_8] B $LN4_8 ; --------------------------------------------------------------------------- $LN3_8 ; uh_gt_uc(void)+68j MOV R3, #0 STR R3, [SP,#0x2C+var_8] $LN4_8 LDR R3, [SP,#0x2C+var_8] AND R3, R3, #0xFF STRB R3, [SP,#0x2C+var_2C] LDRB R0, [SP,#0x2C+var_2C] ADD SP, SP, #0x28 LDR PC, [SP+4+var_4],#4 ; End of function uh_gt_uc(void) bool __fastcall uh_gt_uc() { unsigned __int64 v0; // [email protected] v0 = uh(); return v0 > uc(); }
Sorry for a long code snippet, ARM code tends to be longer compared to x86 code. This makes our comparison even more impressive: look at how concise is the decompiler output!
Conditional instructions
; int __cdecl ReadShort(void *, unsigned __int32 offset, int whence) ReadShort whence = -0x18 var_A = -0xA var_8 = -8 STMFD SP!, {R4,LR} SUB SP, SP, #0x10 ; whence MOV R4, #0 ADD R3, SP, #0x18+var_8 STRH R4, [R3,#-2]! STR R2, [SP,#0x18+whence] ; whence MOV R2, R3 ; buffer MOV R3, #2 ; len BL ReadData CMP R0, R4 MOVNE R0, R4 LDREQSH R0, [SP,#0x18+var_A] ADD SP, SP, #0x10 LDMFD SP!, {R4,PC} ; End of function ReadShort int __cdecl ReadShort(void *a1, unsigned __int32 offset, int whence) { int result; // [email protected] __int16 v4; // [sp+Eh] [bp-Ah]@1 v4 = 0; if ( ReadData(a1, offset, &v4, 2u, whence) ) result = 0; else result = v4; return result; }
The ARM processor has conditional instructions that can shorten the code but require high attention from the reader. The case above is very simple, just note that there is a pair of instructions: MOVNE and LDREQSH. Only one of them will be executed at once. This is how simple if-then-else looks in ARM.

The pseudocode shows it much better and does not require any explanations.

A quiz question: did you notice that MOVNE loads zero to R0? (because I didn't:)
Also note that in the disassembly listing we see var_8 but the location really used is var_A, which corresponds to v4.
Conditional instructions - 2
; signed int __fastcall get_next_byte(entry_t *entry) get_next_byte ; DATA XREF: sub_3BC+30o ; LDR R2, [R0,#4] CMP R2, #0 LDRNE R3, [R0] LDRNEB R1, [R3],#1 CMPNE R1, #0 MOVEQ R1, #1 STREQ R1, [R0,#0xC] MOVEQ R0, 0xFFFFFFFF MOVEQ PC, LR SUB R2, R2, #1 STR R2, [R0,#4] STR R3, [R0] MOV R0, R1 RET ; End of function get_next_byte signed int __fastcall get_next_byte(entry_t *entry) { signed int chr; // [email protected] unsigned __int8 *ptr; // [email protected] int count; // [email protected] char done; // [email protected] signed int result; // [email protected] count = entry->count; done = count == 0; if ( count ) { ptr = entry->ptr + 1; chr = *entry->ptr; done = chr == 0; } if ( done ) { entry->done = 1; result = -1; } else { entry->count = count - 1; entry->ptr = ptr; result = chr; } return result; }
Look, the decompiler output is longer! This is a rare case when the pseudocode is longer than the disassembly listing, but it is a for a good cause: to keep it readable. There are so many conditional instructions here, it is very easy to misunderstand the dependencies. For example, did you notice that the first MOVEQ may use the condition codes set by CMP? The subtle detail is that CMPNE may be skipped and the condition codes set by CMP may reach MOVEQs.

The decompiler represented it perfectly well. I renamed some variables and set their types, but this was an easy task.
Complex instructions
; void __fastcall sub_2A38(list_t *ptr, unsigned int a2) sub_2A38 ; CODE XREF: sub_5C8+48p ; sub_648+5Cp ... MOV R2, #0 STMFD SP!, {LR} MOV R3, R2 MOV R12, R2 MOV LR, R2 SUBS R1, R1, #0x20 loc_2A50 ; CODE XREF: sub_2A38+24j STMCSIA R0!, {R2,R3,R12,LR} STMCSIA R0!, {R2,R3,R12,LR} SUBCSS R1, R1, #0x20 BCS loc_2A50 MOVS R1, R1,LSL#28 STMCSIA R0!, {R2,R3,R12,LR} STMMIIA R0!, {R2,R3} LDMFD SP!, {LR} MOVS R1, R1,LSL#2 STRCS R2, [R0],#4 MOVEQ PC, LR STRMIH R2, [R0],#2 TST R1, #0x40000000 STRNEB R2, [R0],#1 RET ; End of function sub_2A38 void __fastcall sub_2A38(list_t *ptr, unsigned int a2) { char copybig; // [email protected] unsigned int size; // [email protected] list_t *v4; // [email protected] int remains; // [email protected] int final; // [email protected] copybig = a2 >= 0x20; size = a2 - 32; do { if ( !copybig ) break; ptr->dword0 = 0; ptr->dword4 = 0; ptr->dword8 = 0; ptr->dwordC = 0; v4 = ptr + 1; v4->dword0 = 0; v4->dword4 = 0; v4->dword8 = 0; v4->dwordC = 0; ptr = v4 + 1; copybig = size >= 0x20; size -= 32; } while ( copybig ); remains = size << 28; if ( copybig ) { ptr->dword0 = 0; ptr->dword4 = 0; ptr->dword8 = 0; ptr->dwordC = 0; ++ptr; } if ( remains < 0 ) { ptr->dword0 = 0; ptr->dword4 = 0; ptr = (list_t *)((char *)ptr + 8); } final = 4 * remains; if ( copybig ) { ptr->dword0 = 0; ptr = (list_t *)((char *)ptr + 4); } if ( final ) { if ( final < 0 ) { LOWORD(ptr->dword0) = 0; ptr = (list_t *)((char *)ptr + 2); } if ( final & 0x40000000 ) LOBYTE(ptr->dword0) = 0; } }
Conditional instructions are just part of the story. ARM is also famous for having a plethora of data movement instructions. They come with a set of possible suffixes that subtly change the meaning of the instruction. Take STMCSIA, for example. It is a STM instruction, but then you have to remember that CS means "carry set" and IA means "increment after".

In short, the disassembly listing is like Chinese. The pseudocode is longer but requires much less time to understand.
Compiler helper functions
EXPORT op_two64 op_two64 ; CODE XREF: refer_all+31Cp ; main+78p anonymous_1 = -0x28 var_20 = -0x20 anonymous_0 = -0x18 var_10 = -0x10 arg_0 = 4 000 MOV R12, SP 000 STMFD SP!, {R4,R11,R12,LR,PC} 014 SUB R11, R12, #4 014 SUB SP, SP, #0x18 02C SUB R4, R11, #-var_10 02C STMDB R4, {R0,R1} 02C MOV R1, 0xFFFFFFF0 02C SUB R12, R11, #-var_10 02C ADD R1, R12, R1 02C STMIA R1, {R2,R3} 02C LDR R3, [R11,#arg_0] 02C CMP R3, #1 02C BNE loc_9C44 02C MOV R3, 0xFFFFFFF0 02C SUB R0, R11, #-var_10 02C ADD R3, R0, R3 02C SUB R4, R11, #-var_10 02C LDMDB R4, {R1,R2} 02C LDMIA R3, {R3,R4} 02C ADDS R3, R3, R1 02C ADC R4, R4, R2 02C SUB R12, R11, #-var_20 02C STMDB R12, {R3,R4} 02C B loc_9D04 ; --------------------------------------------------------------------------- loc_9C44 ; CODE XREF: op_two64+30j 02C LDR R3, [R11,#arg_0] 02C CMP R3, #2 02C BNE loc_9C7C 02C MOV R3, 0xFFFFFFF0 02C SUB R0, R11, #-var_10 02C ADD R3, R0, R3 02C SUB R4, R11, #-var_10 02C LDMDB R4, {R1,R2} 02C LDMIA R3, {R3,R4} 02C SUBS R3, R1, R3 02C SBC R4, R2, R4 02C SUB R12, R11, #-var_20 02C STMDB R12, {R3,R4} 02C B loc_9D04 ; --------------------------------------------------------------------------- loc_9C7C ; CODE XREF: op_two64+68j 02C LDR R3, [R11,#arg_0] 02C CMP R3, #3 02C BNE loc_9CB8 02C MOV R3, 0xFFFFFFF0 02C SUB R0, R11, #-var_10 02C ADD R3, R0, R3 02C SUB R2, R11, #-var_10 02C LDMDB R2, {R0,R1} 02C LDMIA R3, {R2,R3} 02C BL __muldi3 02C MOV R4, R1 02C MOV R3, R0 02C SUB R12, R11, #-var_20 02C STMDB R12, {R3,R4} 02C B loc_9D04 ; --------------------------------------------------------------------------- loc_9CB8 ; CODE XREF: op_two64+A0j 02C LDR R3, [R11,#arg_0] 02C CMP R3, #4 02C BNE loc_9CF4 02C MOV R3, 0xFFFFFFF0 02C SUB R0, R11, #-var_10 02C ADD R3, R0, R3 02C SUB R2, R11, #-var_10 02C LDMDB R2, {R0,R1} 02C LDMIA R3, {R2,R3} 02C BL __divdi3 02C MOV R4, R1 02C MOV R3, R0 02C SUB R12, R11, #-var_20 02C STMDB R12, {R3,R4} 02C B loc_9D04 ; --------------------------------------------------------------------------- loc_9CF4 ; CODE XREF: op_two64+DCj 02C MOV R3, 0xFFFFFFFF 02C MOV R2, 0xFFFFFFFF 02C SUB R4, R11, #-var_20 02C STMDB R4, {R2,R3} loc_9D04 ; CODE XREF: op_two64+5Cj ; op_two64+94j ... 02C SUB R12, R11, #-var_20 02C LDMDB R12, {R0,R1} 02C SUB SP, R11, #0x10 014 LDMFD SP, {R4,R11,SP,PC} ; End of function op_two64 signed __int64 __fastcall op_two64(signed __int64 a1, signed __int64 a2, int a3) { signed __int64 v4; // [sp+0h] [bp-28h]@2 switch ( a3 ) { case 1: v4 = a2 + a1; break; case 2: v4 = a1 - a2; break; case 3: v4 = a1 * a2; break; case 4: v4 = a1 / a2; break; default: v4 = -1LL; break; } return v4; }
Sorry for another long code snippet. Just wanted to show you that the decompiler can handle compiler helper functions (like __divdi3) and handles 64-bit arithmetic quite well.
Immediate constants
loc_110D6 ; CODE XREF: sub_10E38+43Cj ; sub_10E38+442j ... LDR R1, =(tmin_ptr - 0x1CDB8) LDR R2, =(tmax_ptr - 0x1CDB8) LDR R0, =(aRttMinAvgMaxMd - 0x1CDB8) LDR R6, [R7,R1] LDR R5, [R7,R2] MOVS R3, #0xFA LDR R4, [R6] LSLS R1, R3, #2 LDR R6, [R5] ADDS R5, R7, R0 ; "rtt min/avg/max/mdev = %ld.%03ld/%lu.%0"... MOVS R0, R4 BLX __aeabi_idiv MOV R8, R0 MOVS R0, R4 MOVS R4, #0xFA LSLS R1, R4, #2 BLX __aeabi_idivmod LDR R3, =0 LDR R2, =0x3E8 MOVS R4, R1 LDR R0, [SP,#0x78+var_40] LDR R1, [SP,#0x78+var_40+4] BLX __aeabi_ldivmod LDR R3, =0 LDR R2, =0x3E8 STR R0, [SP,#0x78+var_50] STR R1, [SP,#0x78+var_4C] LDR R0, [SP,#0x78+var_40] LDR R1, [SP,#0x78+var_40+4] BLX __aeabi_ldivmod MOVS R1, #0xFA MOVS R0, R6 LSLS R1, R1, #2 STR R2, [SP,#0x78+var_78] BLX __aeabi_idiv STR R0, [SP,#0x78+var_74] MOVS R0, R6 MOVS R6, #0xFA LSLS R1, R6, #2 BLX __aeabi_idivmod MOVS R2, #0xFA STR R1, [SP,#0x78+var_70] LDR R0, [SP,#0x78+var_38] LSLS R1, R2, #2 BLX __aeabi_idiv MOVS R3, #0xFA STR R0, [SP,#0x78+var_6C] LSLS R1, R3, #2 LDR R0, [SP,#0x78+var_38] BLX __aeabi_idivmod MOVS R0, R5 ; format STR R1, [SP,#0x78+var_68] MOVS R2, R4 MOV R1, R8 LDR R3, [SP,#0x78+var_50] BLX printf printf( "rtt min/avg/max/mdev = %ld.%03ld/%lu.%03ld/%ld.%03ld/%ld.%03ld ms", tmin / 1000, tmin % 1000, v27 / 1000, v27 % 1000, tmax / 1000, tmax % 1000, v28 / 1000, v28 % 1000);
Since ARM instructions can not have big immediate constants, sometimes they are loaded with two isntructions. There are many 0xFA (250 decimal) constants in the disassembly listing, but all of them are shifted to the left by 2 before use. The decompiler saves you from these petty details.

Also a side: the decompiler can handle ARM mode as well as Thumb mode instructions. It just does not care about the instruction encoding because it is already handled by IDA.
Position independent code
sub_65768 ; DATA XREF: .data:007E37A4o var_18 = -0x18 var_14 = -0x14 var_10 = -0x10 arg_0 = 0 PUSH {LR} LDR.W R12, =aResponsetype ; "responseType" SUB SP, SP, #0x14 ADR.W LR, loc_65774 loc_65774 ; DATA XREF: sub_65768+8o ADD R12, LR LDR.W LR, [SP,#0x18+arg_0] STR.W LR, [SP,#0x18+var_18] MOV.W LR, #0x10 STR.W LR, [SP,#0x18+var_14] LDR.W LR, =0xFFF0883C ADD R12, LR STR.W R12, [SP,#0x18+var_10] BL sub_65378 ADD SP, SP, #0x14 POP {PC} ; End of function sub_65768 int __fastcall sub_65768(int a1, int a2, int a3, int a4, int a5) { return sub_65378(a1, a2, a3, a4, a5, 16, (int)myarray); }
In some case the disassembly listing can be misleading, especially with PIC (position independent code). While the address of a constant string is loaded into R12, the code does not care about it. It is just how variable addresses are calculated in PIC-code (it is .got-someoffset). Such calculations are very frequent in shared objects and unfortunately IDA can not handle all of them.

But the decompiler did a great job of tracing R12.

NOTE: these are just some selected examples that can be illustrated as side-by-side differences. There are lots of features that are not mentioned on this page - simply because there was nothing to compare them with. The ARM decompiler can handle variadic functions (and we added a special command to specify the number of call arguments if it is miscalculated), indirect calls (the user can control the call arguments), structures passed by value, user defined calling conventions, etc. As this is the first version, floating point calculations and fancy SIMD instructions are not supported but we will eventually added them.

Hex-Rays ARM Decompiler v1.0 is capable of handling real world compiler generated code.

This is all for the moment. Please come back for more examples!