Comparisons of ARM disassembly and decompilation
Here are some side-by-side comparisons of disassembly and decompiler for ARM. Please maximize the window too see both columns simultaneously.
The following examples are displayed on this page:
Let's start with a very simple function. It accepts a pointer to a structure and zeroes out its first three fields. While the function logic is obvious by just looking at the decompiler output, the assembly listing has too much noise and requires studying it.The decompiler saves your time and allows you to concentrate on more exciting aspects of reverse engineering.
Sorry for a long code snippet, ARM code tends to be longer compared to x86 code. This makes our comparison even more impressive: look at how concise is the decompiler output!
The ARM processor has conditional instructions that can shorten the code but
require high attention from the reader. The case above is very simple, just note that
there is a pair of instructions:
LDREQSH. Only one of them will
be executed at once. This is how simple
looks in ARM.
A quiz question: did you notice that
MOVNE loads zero to
R0? (because I didn't:)
Also note that in the disassembly listing we see
var_8 but the location really used
var_A, which corresponds to
Conditional instructions - 2
Look, the decompiler output is longer! This is a rare case when the pseudocode
is longer than the disassembly listing, but it is a for a good cause: to keep
it readable. There are so many conditional instructions here, it is very easy
to misunderstand the dependencies. For example, did you notice that the first
may use the condition codes set by
CMP? The subtle detail is that
CMP may reach
Conditional instructions are just part of the story. ARM is also famous for having a plethora
of data movement instructions. They come with a set of possible suffixes that subtly change
the meaning of the instruction. Take
STMCSIA, for example. It is a
instruction, but then you have to remember that
CS means "carry set" and
IA means "increment after".
In short, the disassembly listing is like Chinese. The pseudocode is longer but requires much less time to understand.
Compiler helper functions
Sorry for another long code snippet. Just wanted to show you that the decompiler can
handle compiler helper functions (like
__divdi3) and handles 64-bit arithmetic
Since ARM instructions cannot have big immediate constants, sometimes they
are loaded with two instructions. There are many
0xFA (250 decimal) constants
in the disassembly listing, but all of them are shifted to the left by 2 before
use. The decompiler saves you from these petty details.
Also a side: the decompiler can handle ARM mode as well as Thumb mode instructions. It just does not care about the instruction encoding because it is already handled by IDA.
Position independent code
In some case the disassembly listing can be misleading, especially with PIC (position independent code).
While the address of a constant string is loaded into
R12, the code does not
care about it. It is just how variable addresses are calculated in PIC-code (it is .got-someoffset).
Such calculations are very frequent in shared objects and unfortunately IDA cannot
handle all of them. But the decompiler did a great job of tracing