Return to the sources?
A decompiler is commonly viewed as a tool to recover the source code of a program, the same way as a disassembler is a tool to convert a binary executable program to an assembler text. This is true in some cases but only in some. Most of the time the goal is not to get a compilable text. What would one do with such a text? Recompile and link it to get a copy of the input binary program? The only legitimate situation I can imagine is when the source text has been lost and the owner of the program tries to recover it. Even in this case I personally would rewrite everything from scratch because the disassembler/decompiler output is a nightmare to maintain: it lacks comments, inline functions, preprocessor symbols, abstractions, class inheritance, etc. In short, I wouldn’t consider obtaining a compilable text as a major goal of a binary reverse engineering tool (RET). One goal of a RET is to represent the program logic in the most precise and clear way. This representation should be easily understandable by a human being. This means that the fundamental blocks comprising the program logic should be easily visualized on request. Questions like “what is the function called from the current instruction”, “where is the current variable initialized or used”, “how many times is the current instruction executed” arise continuously during the analysis of almost any program. In simple cases a disassembler can answer these questions, a decompiler is more intelligent and can perform deeper analysis (e.g. pointer analysis and value propagation could help to find the answer even in the presence of indirect addressing modes and loops). In both cases the RET helps the analyst to grasp the meaning of the program and convert a sequence of instructions into very high level descriptions like “send a mail message to all contacts” or “get list of .doc files on the computer”. The clearest program representation is of a paramount importance to understand its purpose. Another important goal of a RET is to automate program analysis and instrumentation. In this case RET can be used as a basic building unit for another program. We could analyze the input program in order to:
- find logically equivalent blocks (copyright breaches; recognition of library functions)
- find logical flaws (automatic vulnerability search; code quality assessment)
- deobfuscate it (malware analysis)
- add profiling instrumentation (optimization; code coverage)
- prepare various inputs (testing)