Decompilation gets real
On the other hand, if you are new to the binary analysis, the assembly language is very unusual at first. You need to learn not only the processor instructions but also the standard code sequences, calling conventions, and lots of other stuff. It is understandable that you might nostalgically remember the high level languages you know.
Here is the tool that can help you in both cases: a decompiler that can handle the real world code. Take a binary file, analyze it, and get a nicely formatted C text as the output. I’m tempted to say that you could even recompile it but not, recompilation is not the goal – the program analysis is.
Let’s see how it works. Take a file, say, a virus – these days many of them are written in a high level language. One could say that the virus writers have more comfortable environment than the virus analysts. Hopefully the decompiler will change the situation a bit.
Go to the WinMain, check the code.
Here we call several functions, apparently the worm works with the Internet (see WSAStartup). We could switch to the graph view to display the logic more explicitly.
As we see, the first block checks for a condition, if it is not satisfied, we go to the end of the function, otherwise there are some checks and some actions. We have too zoom in and check each block to understand how the worm works.
Or we could use the decompiler to get this nice text:
Watch the demo to see the decompiler at work (5.5MB flash with audio):
The decompiler as it is today can handle compiler generated code. While it produces nice results, it lacks many features, like floating point support, exception handling, proper type derivation (and I’m sure there are hundreds of bugs are there) but eventually these things will be implemented and fixed.
The beta testing will be open soon. If you want to participate, please apply at (only professional email addresses, please; keep in mind that the decompiler works only with IDA v5.1)
For more news, subscribe to the mailing list. It is a read only list.
I’d like to say a couple of words about the decompiler internals. As anything else in reverse engineering, the decompiler uses many probabilistic methods and heuristics. This means that its output can not be made 100% reliable. This is just a caveat to anyone who wants a decompiler to recover a lost source code. If you want an automatic recovery, try something else, this decompiler won’t work for you.
It heavily uses the data-flow analysis methods to analyze the program. In fact the decompiler consists of two parts: the first part is an engine which works with a microcode. This engine can reason about the microcode and optimize it with the goal of making it as concise as possible. The second part converts microcode to a human readable form, to a C text.
The second part is quite simple: it just displays a nicely formatted text on the screen. The first part, the optimization engine (I haven’t come up with a nice name for it yet, neither for the decompiler), it much more interesting. It can be developed into something bigger. Something which is capable of answering questions about the variable ranges, code and data coverage (it will need to use inter-function analysis for that), and other things. It could check if some invariants hold at the given program locations. A program verification tool can be built on the top of such an engine quite easily. Imagine an automatically generated nice report about not only trivial buffer overflows but also about other logic flaws in the program. This engine can evolve into such a platform. This is how I see its bright and promising future 🙂