Latest available version: IDA and decompilers v8.2.230124sp1 see all releases
Hex-Rays logo State-of-the-art binary code analysis tools
email icon
hexrays.hpp
Go to the documentation of this file.
1/*!
2 * Hex-Rays Decompiler project
3 * Copyright (c) 1990-2022 Hex-Rays
4 * ALL RIGHTS RESERVED.
5 * \mainpage
6 * There are 2 representations of the binary code in the decompiler:
7 * - microcode: processor instructions are translated into it and then
8 * the decompiler optimizes and transforms it
9 * - ctree: ctree is built from the optimized microcode and represents
10 * AST-like tree with C statements and expressions. It can
11 * be printed as C code.
12 *
13 * Microcode is represented by the following classes:
14 * - mba_t keeps general info about the decompiled code and
15 * array of basic blocks. usually mba_t is named 'mba'
16 * - mblock_t a basic block. includes list of instructions
17 * - minsn_t an instruction. contains 3 operands: left, right, and
18 * destination
19 * - mop_t an operand. depending on its type may hold various info
20 * like a number, register, stack variable, etc.
21 * - mlist_t list of memory or register locations; can hold vast areas
22 * of memory and multiple registers. this class is used
23 * very extensively in the decompiler. it may represent
24 * list of locations accessed by an instruction or even
25 * an entire basic block. it is also used as argument of
26 * many functions. for example, there is a function
27 * that searches for an instruction that refers to a mlist_t.
28
29 * See https://www.hex-rays.com/blog/microcode-in-pictures for some pictures.
30 *
31 * Ctree is represented by:
32 * - cfunc_t keeps general info about the decompiled code, including a
33 * pointer to mba_t. deleting cfunc_t will delete
34 * mba_t too (however, decompiler returns cfuncptr_t,
35 * which is a reference counting object and deletes the
36 * underlying function as soon as all references to it go
37 * out of scope). cfunc_t has 'body', which represents the
38 * decompiled function body as cinsn_t.
39 * - cinsn_t a C statement. can be a compound statement or any other
40 * legal C statements (like if, for, while, return,
41 * expression-statement, etc). depending on the statement
42 * type has pointers to additional info. for example, the
43 * 'if' statement has poiner to cif_t, which holds the
44 * 'if' condition, 'then' branch, and optionally 'else'
45 * branch. Please note that despite of the name cinsn_t
46 * we say "statements", not "instructions". For us
47 * instructions are part of microcode, not ctree.
48 * - cexpr_t a C expression. is used as part of a C statement, when
49 * necessary. cexpr_t has 'type' field, which keeps the
50 * expression type.
51 * - citem_t a base class for cinsn_t and cexpr_t, holds common info
52 * like the address, label, and opcode.
53 * - cnumber_t a constant 64-bit number. in addition to its value also
54 * holds information how to represent it: decimal, hex, or
55 * as a symbolic constant (enum member). please note that
56 * numbers are represented by another class (mnumber_t)
57 * in microcode.
58
59 * See https://www.hex-rays.com/blog/hex-rays-decompiler-primer
60 * for some pictures and more details.
61 *
62 * Both microcode and ctree use the following class:
63 * - lvar_t a local variable. may represent a stack or register
64 * variable. a variable has a name, type, location, etc.
65 * the list of variables is stored in mba->vars.
66 * - lvar_locator_t holds a variable location (vdloc_t) and its definition
67 * address.
68 * - vdloc_t describes a variable location, like a register number,
69 * a stack offset, or, in complex cases, can be a mix of
70 * register and stack locations. very similar to argloc_t,
71 * which is used in ida. the differences between argloc_t
72 * and vdloc_t are:
73 * - vdloc_t never uses ARGLOC_REG2
74 * - vdloc_t uses micro register numbers instead of
75 * processor register numbers
76 * - the stack offsets are never negative in vdloc_t, while
77 * in argloc_t there can be negative offsets
78 *
79 * The above are the most important classes in this header file. There are
80 * many auxiliary classes, please see their definitions in the header file.
81 *
82 * See also the description of \ref vmpage.
83 *
84 */
85
86#ifndef __HEXRAYS_HPP
87#define __HEXRAYS_HPP
88
89#include <pro.h>
90#include <fpro.h>
91#include <ida.hpp>
92#include <idp.hpp>
93#include <gdl.hpp>
94#include <ieee.h>
95#include <loader.hpp>
96#include <kernwin.hpp>
97#include <typeinf.hpp>
98#include <deque>
99#include <queue>
100
101/*!
102 * \page vmpage Virtual Machine used by Microcode
103 * We can imagine a virtual micro machine that executes microcode.
104 * This virtual micro machine has many registers.
105 * Each register is 8 bits wide. During translation of processor
106 * instructions into microcode, multibyte processor registers are mapped
107 * to adjacent microregisters. Processor condition codes are also
108 * represented by microregisters. The microregisters are grouped
109 * into following groups:
110 * - 0..7: condition codes
111 * - 8..n: all processor registers (including fpu registers, if necessary)
112 * this range may also include temporary registers used during
113 * the initial microcode generation
114 * - n.. : so called kernel registers; they are used during optimization
115 * see is_kreg()
116 *
117 * Each micro-instruction (minsn_t) has zero to three operands.
118 * Some of the possible operands types are:
119 * - immediate value
120 * - register
121 * - memory reference
122 * - result of another micro-instruction
123 *
124 * The operands (mop_t) are l (left), r (right), d (destination).
125 * An example of a microinstruction:
126 *
127 * add r0.4, #8.4, r2.4
128 *
129 * which means 'add constant 8 to r0 and place the result into r2'.
130 * where
131 * - the left operand is 'r0', its size is 4 bytes (r0.4)
132 * - the right operand is a constant '8', its size is 4 bytes (#8.4)
133 * - the destination operand is 'r2', its size is 4 bytes (r2.4)
134 * Note that 'd' is almost always the destination but there are exceptions.
135 * See mcode_modifies_d(). For example, stx does not modify 'd'.
136 * See the opcode map below for the list of microinstructions and their
137 * operands. Most instructions are very simple and do not need
138 * detailed explanations. There are no side effects in microinstructions.
139 *
140 * Each operand has a size specifier. The following sizes can be used in
141 * practically all contexts: 1, 2, 4, 8, 16 bytes. Floating types may have
142 * other sizes. Functions may return objects of arbitrary size, as well as
143 * operations upon UDT's (user-defined types, i.e. are structs and unions).
144 *
145 * Memory is considered to consist of several segments.
146 * A memory reference is made using a (selector, offset) pair.
147 * A selector is always 2 bytes long. An offset can be 4 or 8 bytes long,
148 * depending on the bitness of the target processor.
149 * Currently the selectors are not used very much. The decompiler tries to
150 * resolve (selector, offset) pairs into direct memory references at each
151 * opportunity and then operates on mop_v operands. In other words,
152 * while the decompiler can handle segmented memory models, internally
153 * it still uses simple linear addresses.
154 *
155 * The following memory regions are recognized:
156 * - GLBLOW global memory: low part, everything below the stack
157 * - LVARS stack: local variables
158 * - RETADDR stack: return address
159 * - SHADOW stack: shadow arguments
160 * - ARGS stack: regular stack arguments
161 * - GLBHIGH global memory: high part, everything above the stack
162 * Any stack region may be empty. Objects residing in one memory region
163 * are considered to be completely distinct from objects in other regions.
164 * We allocate the stack frame in some memory region, which is not
165 * allocated for any purposes in IDA. This permits us to use linear addresses
166 * for all memory references, including the stack frame.
167 *
168 * If the operand size is bigger than 1 then the register
169 * operand references a block of registers. For example:
170 *
171 * ldc #1.4, r8.4
172 *
173 * loads the constant 1 to registers 8, 9, 10, 11:
174 *
175 * #1 -> r8
176 * #0 -> r9
177 * #0 -> r10
178 * #0 -> r11
179 *
180 * This example uses little-endian byte ordering.
181 * Big-endian byte ordering is supported too. Registers are always little-
182 * endian, regardless of the memory endianness.
183 *
184 * Each instruction has 'next' and 'prev' fields that are used to form
185 * a doubly linked list. Such lists are present for each basic block (mblock_t).
186 * Basic blocks have other attributes, including:
187 * - dead_at_start: list of dead locations at the block start
188 * - maybuse: list of locations the block may use
189 * - maybdef: list of locations the block may define (or spoil)
190 * - mustbuse: list of locations the block will certainly use
191 * - mustbdef: list of locations the block will certainly define
192 * - dnu: list of locations the block will certainly define
193 * but will not use (registers or non-aliasable stkack vars)
194 *
195 * These lists are represented by the mlist_t class. It consists of 2 parts:
196 * - rlist_t: list of microregisters (possibly including virtual stack locations)
197 * - ivlset_t: list of memory locations represented as intervals
198 * we use linear addresses in this list.
199 * The mlist_t class is used quite often. For example, to find what an operand
200 * can spoil, we build its 'maybe-use' list. Then we can find out if this list
201 * is accessed using the is_accessed() or is_accessed_globally() functions.
202 *
203 * All basic blocks of the decompiled function constitute an array called
204 * mba_t (array of microblocks). This is a huge class that has too
205 * many fields to describe here (some of the fields are not visible in the sdk)
206 * The most importants ones are:
207 * - stack frame: frregs, stacksize, etc
208 * - memory: aliased, restricted, and other ranges
209 * - type: type of the current function, its arguments (argidx) and
210 * local variables (vars)
211 * - natural: array of pointers to basic blocks. the basic blocks
212 * are also accessible as a doubly linked list starting from 'blocks'.
213 * - bg: control flow graph. the graph gives access to the use-def
214 * chains that describe data dependencies between basic blocks
215 *
216 * Facilities for debugging decompiler plugins:
217 * Many decompiler objects have a member function named dstr().
218 * These functions create a text representation of the object and return
219 * a pointer to it. They are very convenient to use in a debugger instead of
220 * inspecting class fields manually. The mba_t object does not have the
221 * dstr() function because its text representation very long. Instead, we
222 * provide the mba_t::dump_mba() and mba_t::dump() functions.
223 *
224 * To ensure that your plugin manipulates the microcode in a correct way,
225 * please call mba_t::verify() before returning control to the decompiler.
226 *
227 */
228
229#ifdef __NT__
230#pragma warning(push)
231#pragma warning(disable:4062) // enumerator 'x' in switch of enum 'y' is not handled
232#pragma warning(disable:4265) // virtual functions without virtual destructor
233#endif
234
235#define hexapi ///< Public functions are marked with this keyword
236
237// Warning suppressions for PVS Studio:
238//-V:2:654 The condition '2' of loop is always true.
239//-V::719 The switch statement does not cover all values
240//-V:verify:678
241//-V:chain_keeper_t:690 copy ctr will be generated
242//-V:add_block:656 call to the same function
243//-V:add:792 The 'add' function located to the right of the operator '|' will be called regardless of the value of the left operand
244//-V:sub:792 The 'sub' function located to the right of the operator '|' will be called regardless of the value of the left operand
245//-V:intersect:792 The 'intersect' function located to the right of the operator '|' will be called regardless of the value of the left operand
246// Lint suppressions:
247//lint -sem(mop_t::_make_cases, custodial(1))
248//lint -sem(mop_t::_make_pair, custodial(1))
249//lint -sem(mop_t::_make_callinfo, custodial(1))
250//lint -sem(mop_t::_make_insn, custodial(1))
251//lint -sem(mop_t::make_insn, custodial(1))
252
253// Microcode level forward definitions:
254class mop_t; // microinstruction operand
255class mop_pair_t; // pair of operands. example, :(edx.4,eax.4).8
256class mop_addr_t; // address of an operand. example: &global_var
257class mcallinfo_t; // function call info. example: <cdecl:"int x" #10.4>.8
258class mcases_t; // jump table cases. example: {0 => 12, 1 => 13}
259class minsn_t; // microinstruction
260class mblock_t; // basic block
261class mba_t; // array of blocks, represents microcode for a function
262class codegen_t; // helper class to generate the initial microcode
263class mbl_graph_t; // control graph of microcode
264struct vdui_t; // widget representing the pseudocode window
265struct hexrays_failure_t; // decompilation failure object, is thrown by exceptions
266struct mba_stats_t; // statistics about decompilation of a function
267struct mlist_t; // list of memory and register locations
268struct voff_t; // value offset (microregister number or stack offset)
269typedef std::set<voff_t> voff_set_t;
270struct vivl_t; // value interval (register or stack range)
271typedef int mreg_t; ///< Micro register
272
273// Ctree level forward definitions:
274struct cfunc_t; // result of decompilation, the highest level object
275struct citem_t; // base class for cexpr_t and cinsn_t
276struct cexpr_t; // C expression
277struct cinsn_t; // C statement
278struct cblock_t; // C statement block (sequence of statements)
279struct cswitch_t; // C switch statement
280struct carg_t; // call argument
281struct carglist_t; // vector of call arguments
282
283typedef std::set<ea_t> easet_t;
284typedef std::set<minsn_t *> minsn_ptr_set_t;
285typedef std::set<qstring> strings_t;
286typedef qvector<minsn_t*> minsnptrs_t;
287typedef qvector<mop_t*> mopptrs_t;
288typedef qvector<mop_t> mopvec_t;
289typedef qvector<uint64> uint64vec_t;
290typedef qvector<mreg_t> mregvec_t;
291typedef qrefcnt_t<cfunc_t> cfuncptr_t;
292
293// Function frames must be smaller than this value, otherwise
294// the decompiler will bail out with MERR_HUGESTACK
295#define MAX_SUPPORTED_STACK_SIZE 0x100000 // 1MB
296
297//-------------------------------------------------------------------------
298// Original version of macro DEFINE_MEMORY_ALLOCATION_FUNCS
299// (uses decompiler-specific memory allocation functions)
300#define HEXRAYS_PLACEMENT_DELETE void operator delete(void *, void *) {}
301#define HEXRAYS_MEMORY_ALLOCATION_FUNCS() \
302 void *operator new (size_t _s) { return hexrays_alloc(_s); } \
303 void *operator new[](size_t _s) { return hexrays_alloc(_s); } \
304 void *operator new(size_t /*size*/, void *_v) { return _v; } \
305 void operator delete (void *_blk) { hexrays_free(_blk); } \
306 void operator delete[](void *_blk) { hexrays_free(_blk); } \
307 HEXRAYS_PLACEMENT_DELETE
308
309void *hexapi hexrays_alloc(size_t size);
310void hexapi hexrays_free(void *ptr);
311
312typedef uint64 uvlr_t;
313typedef int64 svlr_t;
314enum { MAX_VLR_SIZE = sizeof(uvlr_t) };
315const uvlr_t MAX_VALUE = uvlr_t(-1);
316const svlr_t MAX_SVALUE = svlr_t(uvlr_t(-1) >> 1);
317const svlr_t MIN_SVALUE = ~MAX_SVALUE;
318
319enum cmpop_t
320{ // the order of comparisons is the same as in microcode opcodes
321 CMP_NZ,
322 CMP_Z,
323 CMP_AE,
324 CMP_B,
325 CMP_A,
326 CMP_BE,
327 CMP_GT,
328 CMP_GE,
329 CMP_LT,
330 CMP_LE,
331};
332
333//-------------------------------------------------------------------------
334// value-range class to keep possible operand value(s).
336{
337protected:
338 int flags;
339#define VLR_TYPE 0x0F // valrng_t type
340#define VLR_NONE 0x00 // no values
341#define VLR_ALL 0x01 // all values
342#define VLR_IVLS 0x02 // union of disjoint intervals
343#define VLR_RANGE 0x03 // strided range
344#define VLR_SRANGE 0x04 // strided range with signed bound
345#define VLR_BITS 0x05 // known bits
346#define VLR_SECT 0x06 // intersection of sub-ranges
347 // each sub-range should be simple or union
348#define VLR_UNION 0x07 // union of sub-ranges
349 // each sub-range should be simple or
350 // intersection
351#define VLR_UNK 0x08 // unknown value (like 'null' in SQL)
352 int size; // operand size: 1..8 bytes
353 // all values must fall within the size
354 union
355 {
356 struct // VLR_RANGE/VLR_SRANGE
357 { // values that are between VALUE and LIMIT
358 // and conform to: value+stride*N
359 uvlr_t value; // initial value
360 uvlr_t limit; // final value
361 // we adjust LIMIT to be on the STRIDE lattice
362 svlr_t stride; // stride between values
363 };
364 struct // VLR_BITS
365 {
366 uvlr_t zeroes; // bits known to be clear
367 uvlr_t ones; // bits known to be set
368 };
369 char reserved[sizeof(qvector<int>)];
370 // VLR_IVLS/VLR_SECT/VLR_UNION
371 };
372 void hexapi clear(void);
373 void hexapi copy(const valrng_t &r);
374 valrng_t &hexapi assign(const valrng_t &r);
375
376public:
377 explicit valrng_t(int size_ = MAX_VLR_SIZE)
378 : flags(VLR_NONE), size(size_), value(0), limit(0), stride(0) {}
379 valrng_t(const valrng_t &r) { copy(r); }
380 ~valrng_t(void) { clear(); }
381 valrng_t &operator=(const valrng_t &r) { return assign(r); }
382 void swap(valrng_t &r) { qswap(*this, r); }
383 DECLARE_COMPARISONS(valrng_t);
384 DEFINE_MEMORY_ALLOCATION_FUNCS()
385
386 void set_none(void) { clear(); }
387 void set_all(void) { clear(); flags = VLR_ALL; }
388 void set_unk(void) { clear(); flags = VLR_UNK; }
389 void hexapi set_eq(uvlr_t v);
390 void hexapi set_cmp(cmpop_t cmp, uvlr_t _value);
391
392 // reduce size
393 // it takes the low part of size NEW_SIZE
394 // it returns "true" if size is changed successfully.
395 // e.g.: valrng_t vr(2); vr.set_eq(0x1234);
396 // vr.reduce_size(1);
397 // uvlr_t v; vr.cvt_to_single_value(&v);
398 // assert(v == 0x34);
399 bool hexapi reduce_size(int new_size);
400
401 // Perform intersection or union or inversion.
402 // \return did we change something in THIS?
403 bool hexapi intersect_with(const valrng_t &r);
404 bool hexapi unite_with(const valrng_t &r);
405 void hexapi inverse(); // works for VLR_IVLS only
406
407 bool empty(void) const { return flags == VLR_NONE; }
408 bool all_values(void) const { return flags == VLR_ALL; }
409 bool is_unknown(void) const { return flags == VLR_UNK; }
410 bool hexapi has(uvlr_t v) const;
411
412 void hexapi print(qstring *vout) const;
413 const char *hexapi dstr(void) const;
414
415 bool hexapi cvt_to_single_value(uvlr_t *v) const;
416 bool hexapi cvt_to_cmp(cmpop_t *cmp, uvlr_t *val, bool strict) const;
417
418 int get_size() const { return size; }
419 static uvlr_t max_value(int size_)
420 {
421 return size_ == MAX_VLR_SIZE
422 ? MAX_VALUE
423 : (uvlr_t(1) << (size_ * 8)) - 1;
424 }
425 static uvlr_t min_svalue(int size_)
426 {
427 return size_ == MAX_VLR_SIZE
428 ? MIN_SVALUE
429 : (uvlr_t(1) << (size_ * 8 - 1));
430 }
431 static uvlr_t max_svalue(int size_)
432 {
433 return size_ == MAX_VLR_SIZE
434 ? MAX_SVALUE
435 : (uvlr_t(1) << (size_ * 8 - 1)) - 1;
436 }
437 uvlr_t max_value() const { return max_value(size); }
438 uvlr_t min_svalue() const { return min_svalue(size); }
439 uvlr_t max_svalue() const { return max_svalue(size); }
440};
441DECLARE_TYPE_AS_MOVABLE(valrng_t);
442
443//-------------------------------------------------------------------------
444// Are we looking for 'must access' or 'may access' information?
445// 'must access' means that the code will always access the specified location(s)
446// 'may access' means that the code may in some cases access the specified location(s)
447// Example: ldx cs.2, r0.4, r1.4
448// MUST_ACCESS: r0.4 and r1.4, usually displayed as r0.8 because r0 and r1 are adjacent
449// MAY_ACCESS: r0.4 and r1.4, and all aliasable memory, because
450// ldx may access any part of the aliasable memory
451typedef int maymust_t;
452const maymust_t
453 // One of the following two bits should be specified:
454 MUST_ACCESS = 0x00, // access information we can count on
455 MAY_ACCESS = 0x01, // access information we should take into account
456 // Optionally combined with the following bits:
457 MAYMUST_ACCESS_MASK = 0x01,
458
459 ONE_ACCESS_TYPE = 0x20, // for find_first_use():
460 // use only the specified maymust access type
461 // (by default it inverts the access type for def-lists)
462 INCLUDE_SPOILED_REGS = 0x40, // for build_def_list() with MUST_ACCESS:
463 // include spoiled registers in the list
464 EXCLUDE_PASS_REGS = 0x80, // for build_def_list() with MAY_ACCESS:
465 // exclude pass_regs from the list
466 FULL_XDSU = 0x100, // for build_def_list():
467 // if xds/xdu source and targets are the same
468 // treat it as if xdsu redefines the entire destination
469 WITH_ASSERTS = 0x200, // for find_first_use():
470 // do not ignore assertions
471 EXCLUDE_VOLATILE = 0x400, // for build_def_list():
472 // exclude volatile memory from the list
473 INCLUDE_UNUSED_SRC = 0x800, // for build_use_list():
474 // do not exclude unused source bytes for m_and/m_or insns
475 INCLUDE_DEAD_RETREGS = 0x1000, // for build_def_list():
476 // include dead returned registers in the list
477 INCLUDE_RESTRICTED = 0x2000,// for MAY_ACCESS: include restricted memory
478 CALL_SPOILS_ONLY_ARGS = 0x4000;// for build_def_list() & MAY_ACCESS:
479 // do not include global memory into the
480 // spoiled list of a call
481
482inline THREAD_SAFE bool is_may_access(maymust_t maymust)
483{
484 return (maymust & MAYMUST_ACCESS_MASK) != MUST_ACCESS;
485}
486
487//-------------------------------------------------------------------------
488/// \defgroup MERR_ Microcode error codes
489//@{
491{
492 MERR_OK = 0, ///< ok
493 MERR_BLOCK = 1, ///< no error, switch to new block
494 MERR_INTERR = -1, ///< internal error
495 MERR_INSN = -2, ///< cannot convert to microcode
496 MERR_MEM = -3, ///< not enough memory
497 MERR_BADBLK = -4, ///< bad block found
498 MERR_BADSP = -5, ///< positive sp value has been found
499 MERR_PROLOG = -6, ///< prolog analysis failed
500 MERR_SWITCH = -7, ///< wrong switch idiom
501 MERR_EXCEPTION = -8, ///< exception analysis failed
502 MERR_HUGESTACK = -9, ///< stack frame is too big
503 MERR_LVARS = -10, ///< local variable allocation failed
504 MERR_BITNESS = -11, ///< 16-bit functions cannot be decompiled
505 MERR_BADCALL = -12, ///< could not determine call arguments
506 MERR_BADFRAME = -13, ///< function frame is wrong
507 MERR_UNKTYPE = -14, ///< undefined type %s (currently unused error code)
508 MERR_BADIDB = -15, ///< inconsistent database information
509 MERR_SIZEOF = -16, ///< wrong basic type sizes in compiler settings
510 MERR_REDO = -17, ///< redecompilation has been requested
511 MERR_CANCELED = -18, ///< decompilation has been cancelled
512 MERR_RECDEPTH = -19, ///< max recursion depth reached during lvar allocation
513 MERR_OVERLAP = -20, ///< variables would overlap: %s
514 MERR_PARTINIT = -21, ///< partially initialized variable %s
515 MERR_COMPLEX = -22, ///< too complex function
516 MERR_LICENSE = -23, ///< no license available
517 MERR_ONLY32 = -24, ///< only 32-bit functions can be decompiled for the current database
518 MERR_ONLY64 = -25, ///< only 64-bit functions can be decompiled for the current database
519 MERR_BUSY = -26, ///< already decompiling a function
520 MERR_FARPTR = -27, ///< far memory model is supported only for pc
521 MERR_EXTERN = -28, ///< special segments cannot be decompiled
522 MERR_FUNCSIZE = -29, ///< too big function
523 MERR_BADRANGES = -30, ///< bad input ranges
524 MERR_BADARCH = -31, ///< current architecture is not supported
525 MERR_DSLOT = -32, ///< bad instruction in the delay slot
526 MERR_STOP = -33, ///< no error, stop the analysis
527 MERR_CLOUD = -34, ///< cloud: %s
528 MERR_MAX_ERR = 34,
529 MERR_LOOP = -35, ///< internal code: redo last loop (never reported)
530};
531//@}
532
533/// Get textual description of an error code
534/// \param out the output buffer for the error description
535/// \param code \ref MERR_
536/// \param mba the microcode array
537/// \return the error address
538
539ea_t hexapi get_merror_desc(qstring *out, merror_t code, mba_t *mba);
540
541//-------------------------------------------------------------------------
542// List of microinstruction opcodes.
543// The order of setX and jX insns is important, it is used in the code.
544
545// Instructions marked with *F may have the FPINSN bit set and operate on fp values
546// Instructions marked with +F must have the FPINSN bit set. They always operate on fp values
547// Other instructions do not operate on fp values.
548
549enum mcode_t
550{
551 m_nop = 0x00, // nop // no operation
552 m_stx = 0x01, // stx l, {r=sel, d=off} // store register to memory *F
553 m_ldx = 0x02, // ldx {l=sel,r=off}, d // load register from memory *F
554 m_ldc = 0x03, // ldc l=const, d // load constant
555 m_mov = 0x04, // mov l, d // move *F
556 m_neg = 0x05, // neg l, d // negate
557 m_lnot = 0x06, // lnot l, d // logical not
558 m_bnot = 0x07, // bnot l, d // bitwise not
559 m_xds = 0x08, // xds l, d // extend (signed)
560 m_xdu = 0x09, // xdu l, d // extend (unsigned)
561 m_low = 0x0A, // low l, d // take low part
562 m_high = 0x0B, // high l, d // take high part
563 m_add = 0x0C, // add l, r, d // l + r -> dst
564 m_sub = 0x0D, // sub l, r, d // l - r -> dst
565 m_mul = 0x0E, // mul l, r, d // l * r -> dst
566 m_udiv = 0x0F, // udiv l, r, d // l / r -> dst
567 m_sdiv = 0x10, // sdiv l, r, d // l / r -> dst
568 m_umod = 0x11, // umod l, r, d // l % r -> dst
569 m_smod = 0x12, // smod l, r, d // l % r -> dst
570 m_or = 0x13, // or l, r, d // bitwise or
571 m_and = 0x14, // and l, r, d // bitwise and
572 m_xor = 0x15, // xor l, r, d // bitwise xor
573 m_shl = 0x16, // shl l, r, d // shift logical left
574 m_shr = 0x17, // shr l, r, d // shift logical right
575 m_sar = 0x18, // sar l, r, d // shift arithmetic right
576 m_cfadd = 0x19, // cfadd l, r, d=carry // calculate carry bit of (l+r)
577 m_ofadd = 0x1A, // ofadd l, r, d=overf // calculate overflow bit of (l+r)
578 m_cfshl = 0x1B, // cfshl l, r, d=carry // calculate carry bit of (l<<r)
579 m_cfshr = 0x1C, // cfshr l, r, d=carry // calculate carry bit of (l>>r)
580 m_sets = 0x1D, // sets l, d=byte SF=1 Sign
581 m_seto = 0x1E, // seto l, r, d=byte OF=1 Overflow of (l-r)
582 m_setp = 0x1F, // setp l, r, d=byte PF=1 Unordered/Parity *F
583 m_setnz = 0x20, // setnz l, r, d=byte ZF=0 Not Equal *F
584 m_setz = 0x21, // setz l, r, d=byte ZF=1 Equal *F
585 m_setae = 0x22, // setae l, r, d=byte CF=0 Unsigned Above or Equal *F
586 m_setb = 0x23, // setb l, r, d=byte CF=1 Unsigned Below *F
587 m_seta = 0x24, // seta l, r, d=byte CF=0 & ZF=0 Unsigned Above *F
588 m_setbe = 0x25, // setbe l, r, d=byte CF=1 | ZF=1 Unsigned Below or Equal *F
589 m_setg = 0x26, // setg l, r, d=byte SF=OF & ZF=0 Signed Greater
590 m_setge = 0x27, // setge l, r, d=byte SF=OF Signed Greater or Equal
591 m_setl = 0x28, // setl l, r, d=byte SF!=OF Signed Less
592 m_setle = 0x29, // setle l, r, d=byte SF!=OF | ZF=1 Signed Less or Equal
593 m_jcnd = 0x2A, // jcnd l, d // d is mop_v or mop_b
594 m_jnz = 0x2B, // jnz l, r, d // ZF=0 Not Equal *F
595 m_jz = 0x2C, // jz l, r, d // ZF=1 Equal *F
596 m_jae = 0x2D, // jae l, r, d // CF=0 Unsigned Above or Equal *F
597 m_jb = 0x2E, // jb l, r, d // CF=1 Unsigned Below *F
598 m_ja = 0x2F, // ja l, r, d // CF=0 & ZF=0 Unsigned Above *F
599 m_jbe = 0x30, // jbe l, r, d // CF=1 | ZF=1 Unsigned Below or Equal *F
600 m_jg = 0x31, // jg l, r, d // SF=OF & ZF=0 Signed Greater
601 m_jge = 0x32, // jge l, r, d // SF=OF Signed Greater or Equal
602 m_jl = 0x33, // jl l, r, d // SF!=OF Signed Less
603 m_jle = 0x34, // jle l, r, d // SF!=OF | ZF=1 Signed Less or Equal
604 m_jtbl = 0x35, // jtbl l, r=mcases // Table jump
605 m_ijmp = 0x36, // ijmp {r=sel, d=off} // indirect unconditional jump
606 m_goto = 0x37, // goto l // l is mop_v or mop_b
607 m_call = 0x38, // call l d // l is mop_v or mop_b or mop_h
608 m_icall = 0x39, // icall {l=sel, r=off} d // indirect call
609 m_ret = 0x3A, // ret
610 m_push = 0x3B, // push l
611 m_pop = 0x3C, // pop d
612 m_und = 0x3D, // und d // undefine
613 m_ext = 0x3E, // ext in1, in2, out1 // external insn, not microcode *F
614 m_f2i = 0x3F, // f2i l, d int(l) => d; convert fp -> integer +F
615 m_f2u = 0x40, // f2u l, d uint(l)=> d; convert fp -> uinteger +F
616 m_i2f = 0x41, // i2f l, d fp(l) => d; convert integer -> fp +F
617 m_u2f = 0x42, // i2f l, d fp(l) => d; convert uinteger -> fp +F
618 m_f2f = 0x43, // f2f l, d l => d; change fp precision +F
619 m_fneg = 0x44, // fneg l, d -l => d; change sign +F
620 m_fadd = 0x45, // fadd l, r, d l + r => d; add +F
621 m_fsub = 0x46, // fsub l, r, d l - r => d; subtract +F
622 m_fmul = 0x47, // fmul l, r, d l * r => d; multiply +F
623 m_fdiv = 0x48, // fdiv l, r, d l / r => d; divide +F
624#define m_max 0x49 // first unused opcode
625};
626
627/// Must an instruction with the given opcode be the last one in a block?
628/// Such opcodes are called closing opcodes.
629/// \param mcode instruction opcode
630/// \param including_calls should m_call/m_icall be considered as the closing opcodes?
631/// If this function returns true, the opcode cannot appear in the middle
632/// of a block. Calls are a special case: unknown calls (\ref is_unknown_call)
633/// are considered as closing opcodes.
634
635THREAD_SAFE bool hexapi must_mcode_close_block(mcode_t mcode, bool including_calls);
636
637
638/// May opcode be propagated?
639/// Such opcodes can be used in sub-instructions (nested instructions)
640/// There is a handful of non-propagatable opcodes, like jumps, ret, nop, etc
641/// All other regular opcodes are propagatable and may appear in a nested
642/// instruction.
643
644THREAD_SAFE bool hexapi is_mcode_propagatable(mcode_t mcode);
645
646
647// Is add or sub instruction?
648inline THREAD_SAFE bool is_mcode_addsub(mcode_t mcode) { return mcode == m_add || mcode == m_sub; }
649// Is xds or xdu instruction? We use 'xdsu' as a shortcut for 'xds or xdu'
650inline THREAD_SAFE bool is_mcode_xdsu(mcode_t mcode) { return mcode == m_xds || mcode == m_xdu; }
651// Is a 'set' instruction? (an instruction that sets a condition code)
652inline THREAD_SAFE bool is_mcode_set(mcode_t mcode) { return mcode >= m_sets && mcode <= m_setle; }
653// Is a 1-operand 'set' instruction? Only 'sets' is in this group
654inline THREAD_SAFE bool is_mcode_set1(mcode_t mcode) { return mcode == m_sets; }
655// Is a 1-operand conditional jump instruction? Only 'jcnd' is in this group
656inline THREAD_SAFE bool is_mcode_j1(mcode_t mcode) { return mcode == m_jcnd; }
657// Is a conditional jump?
658inline THREAD_SAFE bool is_mcode_jcond(mcode_t mcode) { return mcode >= m_jcnd && mcode <= m_jle; }
659// Is a 'set' instruction that can be converted into a conditional jump?
660inline THREAD_SAFE bool is_mcode_convertible_to_jmp(mcode_t mcode) { return mcode >= m_setnz && mcode <= m_setle; }
661// Is a conditional jump instruction that can be converted into a 'set'?
662inline THREAD_SAFE bool is_mcode_convertible_to_set(mcode_t mcode) { return mcode >= m_jnz && mcode <= m_jle; }
663// Is a call instruction? (direct or indirect)
664inline THREAD_SAFE bool is_mcode_call(mcode_t mcode) { return mcode == m_call || mcode == m_icall; }
665// Must be an FPU instruction?
666inline THREAD_SAFE bool is_mcode_fpu(mcode_t mcode) { return mcode >= m_f2i; }
667// Is a commutative instruction?
668inline THREAD_SAFE bool is_mcode_commutative(mcode_t mcode)
669{
670 return mcode == m_add
671 || mcode == m_mul
672 || mcode == m_or
673 || mcode == m_and
674 || mcode == m_xor
675 || mcode == m_setz
676 || mcode == m_setnz
677 || mcode == m_cfadd
678 || mcode == m_ofadd;
679}
680// Is a shift instruction?
681inline THREAD_SAFE bool is_mcode_shift(mcode_t mcode)
682{
683 return mcode == m_shl
684 || mcode == m_shr
685 || mcode == m_sar;
686}
687// Is a kind of div or mod instruction?
688inline THREAD_SAFE bool is_mcode_divmod(mcode_t op)
689{
690 return op == m_udiv || op == m_sdiv || op == m_umod || op == m_smod;
691}
692// Is an instruction with the selector/offset pair?
693inline THREAD_SAFE bool has_mcode_seloff(mcode_t op)
694{
695 return op == m_ldx || op == m_stx || op == m_icall || op == m_ijmp;
696}
697
698// Convert setX opcode into corresponding jX opcode
699// This function relies on the order of setX and jX opcodes!
700inline THREAD_SAFE mcode_t set2jcnd(mcode_t code)
701{
702 return mcode_t(code - m_setnz + m_jnz);
703}
704
705// Convert setX opcode into corresponding jX opcode
706// This function relies on the order of setX and jX opcodes!
707inline THREAD_SAFE mcode_t jcnd2set(mcode_t code)
708{
709 return mcode_t(code + m_setnz - m_jnz);
710}
711
712// Negate a conditional opcode.
713// Conditional jumps can be negated, example: jle -> jg
714// 'Set' instruction can be negated, example: seta -> setbe
715// If the opcode cannot be negated, return m_nop
716THREAD_SAFE mcode_t hexapi negate_mcode_relation(mcode_t code);
717
718
719// Swap a conditional opcode.
720// Only conditional jumps and set instructions can be swapped.
721// The returned opcode the one required for swapped operands.
722// Example "x > y" is the same as "y < x", therefore swap(m_jg) is m_jl.
723// If the opcode cannot be swapped, return m_nop
724
725THREAD_SAFE mcode_t hexapi swap_mcode_relation(mcode_t code);
726
727// Return the opcode that performs signed operation.
728// Examples: jae -> jge; udiv -> sdiv
729// If the opcode cannot be transformed into signed form, simply return it.
730
731THREAD_SAFE mcode_t hexapi get_signed_mcode(mcode_t code);
732
733
734// Return the opcode that performs unsigned operation.
735// Examples: jl -> jb; xds -> xdu
736// If the opcode cannot be transformed into unsigned form, simply return it.
737
738THREAD_SAFE mcode_t hexapi get_unsigned_mcode(mcode_t code);
739
740// Does the opcode perform a signed operation?
741inline THREAD_SAFE bool is_signed_mcode(mcode_t code) { return get_unsigned_mcode(code) != code; }
742// Does the opcode perform a unsigned operation?
743inline THREAD_SAFE bool is_unsigned_mcode(mcode_t code) { return get_signed_mcode(code) != code; }
744
745
746// Does the 'd' operand gets modified by the instruction?
747// Example: "add l,r,d" modifies d, while instructions
748// like jcnd, ijmp, stx does not modify it.
749// Note: this function returns 'true' for m_ext but it may be wrong.
750// Use minsn_t::modifies_d() if you have minsn_t.
751
752THREAD_SAFE bool hexapi mcode_modifies_d(mcode_t mcode);
753
754
755// Processor condition codes are mapped to the first microregisters
756// The order is important, see mop_t::is_cc()
757const mreg_t mr_none = mreg_t(-1);
758const mreg_t mr_cf = mreg_t(0); // carry bit
759const mreg_t mr_zf = mreg_t(1); // zero bit
760const mreg_t mr_sf = mreg_t(2); // sign bit
761const mreg_t mr_of = mreg_t(3); // overflow bit
762const mreg_t mr_pf = mreg_t(4); // parity bit
763const int cc_count = mr_pf - mr_cf + 1; // number of condition code registers
764const mreg_t mr_cc = mreg_t(5); // synthetic condition code, used internally
765const mreg_t mr_first = mreg_t(8); // the first processor specific register
766
767//-------------------------------------------------------------------------
768/// Operand locator.
769/// It is used to denote a particular operand in the ctree, for example,
770/// when the user right clicks on a constant and requests to represent it, say,
771/// as a hexadecimal number.
773{
774private:
775 // forbid the default constructor, force the user to initialize objects of this class.
776 operand_locator_t(void) {}
777public:
778 ea_t ea; ///< address of the original processor instruction
779 int opnum; ///< operand number in the instruction
780 operand_locator_t(ea_t _ea, int _opnum) : ea(_ea), opnum(_opnum) {}
781 DECLARE_COMPARISONS(operand_locator_t);
782 HEXRAYS_MEMORY_ALLOCATION_FUNCS()
783};
784
785//-------------------------------------------------------------------------
786/// Number representation.
787/// This structure holds information about a number format.
789{
790 flags_t flags; ///< ida flags, which describe number radix, enum, etc
791 char opnum; ///< operand number: 0..UA_MAXOP
792 char props; ///< properties: combination of NF_ bits (\ref NF_)
793/// \defgroup NF_ Number format property bits
794/// Used in number_format_t::props
795//@{
796#define NF_FIXED 0x01 ///< number format has been defined by the user
797#define NF_NEGDONE 0x02 ///< temporary internal bit: negation has been performed
798#define NF_BINVDONE 0x04 ///< temporary internal bit: inverting bits is done
799#define NF_NEGATE 0x08 ///< The user asked to negate the constant
800#define NF_BITNOT 0x10 ///< The user asked to invert bits of the constant
801#define NF_VALID 0x20 ///< internal bit: stroff or enum is valid
802 ///< for enums: this bit is set immediately
803 ///< for stroffs: this bit is set at the end of decompilation
804//@}
805 uchar serial; ///< for enums: constant serial number
806 char org_nbytes; ///< original number size in bytes
807 qstring type_name; ///< for stroffs: structure for offsetof()\n
808 ///< for enums: enum name
809 /// Contructor
810 number_format_t(int _opnum=0)
811 : flags(0), opnum(char(_opnum)), props(0), serial(0), org_nbytes(0) {}
812 /// Get number radix
813 /// \return 2,8,10, or 16
814 int get_radix(void) const { return ::get_radix(flags, opnum); }
815 /// Is number representation fixed?
816 /// Fixed representation cannot be modified by the decompiler
817 bool is_fixed(void) const { return props != 0; }
818 /// Is a hexadecimal number?
819 bool is_hex(void) const { return ::is_numop(flags, opnum) && get_radix() == 16; }
820 /// Is a decimal number?
821 bool is_dec(void) const { return ::is_numop(flags, opnum) && get_radix() == 10; }
822 /// Is a octal number?
823 bool is_oct(void) const { return ::is_numop(flags, opnum) && get_radix() == 8; }
824 /// Is a symbolic constant?
825 bool is_enum(void) const { return ::is_enum(flags, opnum); }
826 /// Is a character constant?
827 bool is_char(void) const { return ::is_char(flags, opnum); }
828 /// Is a structure field offset?
829 bool is_stroff(void) const { return ::is_stroff(flags, opnum); }
830 /// Is a number?
831 bool is_numop(void) const { return !is_enum() && !is_char() && !is_stroff(); }
832 /// Does the number need to be negated or bitwise negated?
833 /// Returns true if the user requested a negation but it is not done yet
834 bool needs_to_be_inverted(void) const
835 {
836 return (props & (NF_NEGATE|NF_BITNOT)) != 0 // the user requested it
837 && (props & (NF_NEGDONE|NF_BINVDONE)) == 0; // not done yet
838 }
839 // symbolic constants and struct offsets cannot easily change
840 // their sign or size without a cast. only simple numbers can do that.
841 // for example, by modifying the expression type we can convert:
842 // 10u -> 10
843 // but replacing the type of a symbol constant would lead to an inconsistency.
844 bool has_unmutable_type() const
845 {
846 return (props & NF_VALID) != 0 && (is_stroff() || is_enum());
847 }
848 HEXRAYS_MEMORY_ALLOCATION_FUNCS()
849};
850
851// Number formats are attached to (ea,opnum) pairs
852typedef std::map<operand_locator_t, number_format_t> user_numforms_t;
853
854//-------------------------------------------------------------------------
855/// Base helper class to convert binary data structures into text.
856/// Other classes are derived from this class.
858{
859 qstring tmpbuf;
860 int hdrlines; ///< number of header lines (prototype+typedef+lvars)
861 ///< valid at the end of print process
862 /// Print.
863 /// This function is called to generate a portion of the output text.
864 /// The output text may contain color codes.
865 /// \return the number of printed characters
866 /// \param indent number of spaces to generate as prefix
867 /// \param format printf-style format specifier
868 /// \return length of printed string
869 AS_PRINTF(3, 4) virtual int hexapi print(int indent, const char *format,...);
870 HEXRAYS_MEMORY_ALLOCATION_FUNCS()
871};
872
873/// Helper class to convert cfunc_t into text.
875{
876 const cfunc_t *func; ///< cfunc_t to generate text for
877 char lastchar; ///< internal: last printed character
878 /// Constructor
879 vc_printer_t(const cfunc_t *f) : func(f), lastchar(0) {}
880 /// Are we generating one-line text representation?
881 /// \return \c true if the output will occupy one line without line breaks
882 virtual bool idaapi oneliner(void) const newapi { return false; }
883};
884
885/// Helper class to convert binary data structures into text and put into a file.
887{
888 FILE *fp; ///< Output file pointer
889 /// Print.
890 /// This function is called to generate a portion of the output text.
891 /// The output text may contain color codes.
892 /// \return the number of printed characters
893 /// \param indent number of spaces to generate as prefix
894 /// \param format printf-style format specifier
895 /// \return length of printed string
896 AS_PRINTF(3, 4) int hexapi print(int indent, const char *format, ...) override;
897 /// Constructor
898 file_printer_t(FILE *_fp) : fp(_fp) {}
899};
900
901/// Helper class to convert cfunc_t into a text string
903{
904 bool with_tags; ///< Generate output with color tags
905 qstring &s; ///< Reference to the output string
906 /// Constructor
907 qstring_printer_t(const cfunc_t *f, qstring &_s, bool tags)
908 : vc_printer_t(f), with_tags(tags), s(_s) {}
909 /// Print.
910 /// This function is called to generate a portion of the output text.
911 /// The output text may contain color codes.
912 /// \return the number of printed characters
913 /// \param indent number of spaces to generate as prefix
914 /// \param format printf-style format specifier
915 /// \return length of the printed string
916 AS_PRINTF(3, 4) int hexapi print(int indent, const char *format, ...) override;
917};
918
919//-------------------------------------------------------------------------
920/// \defgroup type Type string related declarations
921/// Type related functions and class.
922//@{
923
924/// Print the specified type info.
925/// This function can be used from a debugger by typing "tif->dstr()"
926
927const char *hexapi dstr(const tinfo_t *tif);
928
929
930/// Verify a type string.
931/// \return true if type string is correct
932
933bool hexapi is_type_correct(const type_t *ptr);
934
935
936/// Is a small structure or union?
937/// \return true if the type is a small UDT (user defined type).
938/// Small UDTs fit into a register (or pair or registers) as a rule.
939
940bool hexapi is_small_udt(const tinfo_t &tif);
941
942
943/// Is definitely a non-boolean type?
944/// \return true if the type is a non-boolean type (non bool and well defined)
945
946bool hexapi is_nonbool_type(const tinfo_t &type);
947
948
949/// Is a boolean type?
950/// \return true if the type is a boolean type
951
952bool hexapi is_bool_type(const tinfo_t &type);
953
954
955/// Is a pointer or array type?
956inline THREAD_SAFE bool is_ptr_or_array(type_t t)
957{
958 return is_type_ptr(t) || is_type_array(t);
959}
960
961/// Is a pointer, array, or function type?
962inline THREAD_SAFE bool is_paf(type_t t)
963{
964 return is_ptr_or_array(t) || is_type_func(t);
965}
966
967/// Is struct/union/enum definition (not declaration)?
968inline THREAD_SAFE bool is_inplace_def(const tinfo_t &type)
969{
970 return type.is_decl_complex() && !type.is_typeref();
971}
972
973/// Calculate number of partial subtypes.
974/// \return number of partial subtypes. The bigger is this number, the uglier is the type.
975
976int hexapi partial_type_num(const tinfo_t &type);
977
978
979/// Get a type of a floating point value with the specified width
980/// \returns type info object
981/// \param width width of the desired type
982
983tinfo_t hexapi get_float_type(int width);
984
985
986/// Create a type info by width and sign.
987/// Returns a simple type (examples: int, short) with the given width and sign.
988/// \param srcwidth size of the type in bytes
989/// \param sign sign of the type
990
991tinfo_t hexapi get_int_type_by_width_and_sign(int srcwidth, type_sign_t sign);
992
993
994/// Create a partial type info by width.
995/// Returns a partially defined type (examples: _DWORD, _BYTE) with the given width.
996/// \param size size of the type in bytes
997
998tinfo_t hexapi get_unk_type(int size);
999
1000
1001/// Generate a dummy pointer type
1002/// \param ptrsize size of pointed object
1003/// \param isfp is floating point object?
1004
1005tinfo_t hexapi dummy_ptrtype(int ptrsize, bool isfp);
1006
1007
1008/// Get type of a structure field.
1009/// This function performs validity checks of the field type. Wrong types are rejected.
1010/// \param mptr structure field
1011/// \param type pointer to the variable where the type is returned. This parameter can be nullptr.
1012/// \return false if failed
1013
1014bool hexapi get_member_type(const member_t *mptr, tinfo_t *type);
1015
1016
1017/// Create a pointer type.
1018/// This function performs the following conversion: "type" -> "type*"
1019/// \param type object type.
1020/// \return "type*". for example, if 'char' is passed as the argument,
1021// the function will return 'char *'
1022
1023tinfo_t hexapi make_pointer(const tinfo_t &type);
1024
1025
1026/// Create a reference to a named type.
1027/// \param name type name
1028/// \return type which refers to the specified name. For example, if name is "DWORD",
1029/// the type info which refers to "DWORD" is created.
1030
1031tinfo_t hexapi create_typedef(const char *name);
1032
1033
1034/// Create a reference to an ordinal type.
1035/// \param n ordinal number of the type
1036/// \return type which refers to the specified ordinal. For example, if n is 1,
1037/// the type info which refers to ordinal type 1 is created.
1038
1039inline tinfo_t create_typedef(int n)
1040{
1041 tinfo_t tif;
1042 tif.create_typedef(nullptr, n);
1043 return tif;
1044}
1045
1046/// Type source (where the type information comes from)
1048{
1049 GUESSED_NONE, // not guessed, specified by the user
1050 GUESSED_WEAK, // not guessed, comes from idb
1051 GUESSED_FUNC, // guessed as a function
1052 GUESSED_DATA, // guessed as a data item
1053 TS_NOELL = 0x8000000, // can be used in set_type() to avoid merging into ellipsis
1054 TS_SHRINK = 0x4000000, // can be used in set_type() to prefer smaller arguments
1055 TS_DONTREF = 0x2000000, // do not mark type as referenced (referenced_types)
1056 TS_MASK = 0xE000000, // all high bits
1057};
1058
1059
1060/// Get a global type.
1061/// Global types are types of addressable objects and struct/union/enum types
1062/// \param id address or id of the object
1063/// \param tif buffer for the answer
1064/// \param guess what kind of types to consider
1065/// \return success
1066
1067bool hexapi get_type(uval_t id, tinfo_t *tif, type_source_t guess);
1068
1069
1070/// Set a global type.
1071/// \param id address or id of the object
1072/// \param tif new type info
1073/// \param source where the type comes from
1074/// \param force true means to set the type as is, false means to merge the
1075/// new type with the possibly existing old type info.
1076/// \return success
1077
1078bool hexapi set_type(uval_t id, const tinfo_t &tif, type_source_t source, bool force=false);
1079
1080//@}
1081
1082//-------------------------------------------------------------------------
1083// We use our own class to store argument and variable locations.
1084// It is called vdloc_t that stands for 'vd location'.
1085// 'vd' is the internal name of the decompiler, it stands for 'visual decompiler'.
1086// The main differences between vdloc and argloc_t:
1087// ALOC_REG1: the offset is always 0, so it is not used. the register number
1088// uses the whole ~VLOC_MASK field.
1089// ALOCK_STKOFF: stack offsets are always positive because they are based on
1090// the lowest value of sp in the function.
1091class vdloc_t : public argloc_t
1092{
1093 int regoff(void); // inaccessible & undefined: regoff() should not be used
1094public:
1095 // Get the register number.
1096 // This function works only for ALOC_REG1 and ALOC_REG2 location types.
1097 // It uses all available bits for register number for ALOC_REG1
1098 int reg1(void) const { return atype() == ALOC_REG2 ? argloc_t::reg1() : get_reginfo(); }
1099
1100 // Set vdloc to point to the specified register without cleaning it up.
1101 // This is a dangerous function, use set_reg1() instead unless you understand
1102 // what it means to cleanup an argloc.
1103 void _set_reg1(int r1) { argloc_t::_set_reg1(r1, r1>>16); }
1104
1105 // Set vdloc to point to the specified register.
1106 void set_reg1(int r1) { cleanup_argloc(this); _set_reg1(r1); }
1107
1108 // Use member functions of argloc_t for other location types.
1109
1110 // Return textual representation.
1111 // Note: this and all other dstr() functions can be used from a debugger.
1112 // It is much easier than to inspect the memory contents byte by byte.
1113 const char *hexapi dstr(int width=0) const;
1114 DECLARE_COMPARISONS(vdloc_t);
1115 bool hexapi is_aliasable(const mba_t *mb, int size) const;
1116};
1117
1118/// Print vdloc.
1119/// Since vdloc does not always carry the size info, we pass it as NBYTES..
1120void hexapi print_vdloc(qstring *vout, const vdloc_t &loc, int nbytes);
1121
1122//-------------------------------------------------------------------------
1123/// Do two arglocs overlap?
1124bool hexapi arglocs_overlap(const vdloc_t &loc1, size_t w1, const vdloc_t &loc2, size_t w2);
1125
1126/// Local variable locator.
1127/// Local variables are located using definition ea and location.
1128/// Each variable must have a unique locator, this is how we tell them apart.
1130{
1131 vdloc_t location; ///< Variable location.
1132 ea_t defea; ///< Definition address. Usually, this is the address
1133 ///< of the instruction that initializes the variable.
1134 ///< In some cases it can be a fictional address.
1135
1136 lvar_locator_t(void) : defea(BADADDR) {}
1137 lvar_locator_t(const vdloc_t &loc, ea_t ea) : location(loc), defea(ea) {}
1138 /// Get offset of the varialbe in the stack frame.
1139 /// \return a non-negative value for stack variables. The value is
1140 /// an offset from the bottom of the stack frame in terms of
1141 /// vd-offsets.
1142 /// negative values mean error (not a stack variable)
1143 sval_t get_stkoff(void) const
1144 {
1145 return location.is_stkoff() ? location.stkoff() : -1;
1146 }
1147 /// Is variable located on one register?
1148 bool is_reg1(void) const { return location.is_reg1(); }
1149 /// Is variable located on two registers?
1150 bool is_reg2(void) const { return location.is_reg2(); }
1151 /// Is variable located on register(s)?
1152 bool is_reg_var(void) const { return location.is_reg(); }
1153 /// Is variable located on the stack?
1154 bool is_stk_var(void) const { return location.is_stkoff(); }
1155 /// Is variable scattered?
1156 bool is_scattered(void) const { return location.is_scattered(); }
1157 /// Get the register number of the variable
1158 mreg_t get_reg1(void) const { return location.reg1(); }
1159 /// Get the number of the second register (works only for ALOC_REG2 lvars)
1160 mreg_t get_reg2(void) const { return location.reg2(); }
1161 /// Get information about scattered variable
1162 const scattered_aloc_t &get_scattered(void) const { return location.scattered(); }
1163 scattered_aloc_t &get_scattered(void) { return location.scattered(); }
1164 DECLARE_COMPARISONS(lvar_locator_t);
1165 HEXRAYS_MEMORY_ALLOCATION_FUNCS()
1166 // Debugging: get textual representation of a lvar locator.
1167 const char *hexapi dstr() const;
1168};
1169
1170/// Definition of a local variable (register or stack) #var #lvar
1172{
1173 friend class mba_t;
1174 int flags; ///< \ref CVAR_
1175/// \defgroup CVAR_ Local variable property bits
1176/// Used in lvar_t::flags
1177//@{
1178#define CVAR_USED 0x00000001 ///< is used in the code?
1179#define CVAR_TYPE 0x00000002 ///< the type is defined?
1180#define CVAR_NAME 0x00000004 ///< has nice name?
1181#define CVAR_MREG 0x00000008 ///< corresponding mregs were replaced?
1182#define CVAR_NOWD 0x00000010 ///< width is unknown
1183#define CVAR_UNAME 0x00000020 ///< user-defined name
1184#define CVAR_UTYPE 0x00000040 ///< user-defined type
1185#define CVAR_RESULT 0x00000080 ///< function result variable
1186#define CVAR_ARG 0x00000100 ///< function argument
1187#define CVAR_FAKE 0x00000200 ///< fake variable (return var or va_list)
1188#define CVAR_OVER 0x00000400 ///< overlapping variable
1189#define CVAR_FLOAT 0x00000800 ///< used in a fpu insn
1190#define CVAR_SPOILED 0x00001000 ///< internal flag, do not use: spoiled var
1191#define CVAR_MAPDST 0x00002000 ///< other variables are mapped to this var
1192#define CVAR_PARTIAL 0x00004000 ///< variable type is partialy defined
1193#define CVAR_THISARG 0x00008000 ///< 'this' argument of c++ member functions
1194#define CVAR_FORCED 0x00010000 ///< variable was created by an explicit request
1195 ///< otherwise we could reuse an existing var
1196#define CVAR_REGNAME 0x00020000 ///< has a register name (like _RAX): if lvar
1197 ///< is used by an m_ext instruction
1198#define CVAR_NOPTR 0x00040000 ///< variable cannot be a pointer (user choice)
1199#define CVAR_DUMMY 0x00080000 ///< dummy argument (added to fill a hole in
1200 ///< the argument list)
1201#define CVAR_NOTARG 0x00100000 ///< variable cannot be an input argument
1202#define CVAR_AUTOMAP 0x00200000 ///< variable was automatically mapped
1203#define CVAR_BYREF 0x00400000 ///< the address of the variable was taken
1204#define CVAR_INASM 0x00800000 ///< variable is used in instructions translated
1205 ///< into __asm {...}
1206#define CVAR_UNUSED 0x01000000 ///< user-defined __unused attribute
1207 ///< meaningful only if: is_arg_var() && !mba->final_type
1208#define CVAR_SHARED 0x02000000 ///< variable is mapped to several chains
1209//@}
1210
1211public:
1212 qstring name; ///< variable name.
1213 ///< use mba_t::set_nice_lvar_name() and
1214 ///< mba_t::set_user_lvar_name() to modify it
1215 qstring cmt; ///< variable comment string
1216 tinfo_t tif; ///< variable type
1217 int width = 0; ///< variable size in bytes
1218 int defblk = -1; ///< first block defining the variable.
1219 ///< 0 for args, -1 if unknown
1220 uint64 divisor = 0; ///< max known divisor of the variable
1221
1222 lvar_t(void) : flags(CVAR_USED) {}
1223 lvar_t(const qstring &n, const vdloc_t &l, ea_t e, const tinfo_t &t, int w, int db)
1224 : lvar_locator_t(l, e), flags(CVAR_USED), name(n), tif(t), width(w), defblk(db)
1225 {
1226 }
1227 // Debugging: get textual representation of a local variable.
1228 const char *hexapi dstr() const;
1229
1230 /// Is the variable used in the code?
1231 bool used(void) const { return (flags & CVAR_USED) != 0; }
1232 /// Has the variable a type?
1233 bool typed(void) const { return (flags & CVAR_TYPE) != 0; }
1234 /// Have corresponding microregs been replaced by references to this variable?
1235 bool mreg_done(void) const { return (flags & CVAR_MREG) != 0; }
1236 /// Does the variable have a nice name?
1237 bool has_nice_name(void) const { return (flags & CVAR_NAME) != 0; }
1238 /// Do we know the width of the variable?
1239 bool is_unknown_width(void) const { return (flags & CVAR_NOWD) != 0; }
1240 /// Has any user-defined information?
1241 bool has_user_info(void) const
1242 {
1243 return (flags & (CVAR_UNAME|CVAR_UTYPE|CVAR_NOPTR|CVAR_UNUSED)) != 0
1244 || !cmt.empty();
1245 }
1246 /// Has user-defined name?
1247 bool has_user_name(void) const { return (flags & CVAR_UNAME) != 0; }
1248 /// Has user-defined type?
1249 bool has_user_type(void) const { return (flags & CVAR_UTYPE) != 0; }
1250 /// Is the function result?
1251 bool is_result_var(void) const { return (flags & CVAR_RESULT) != 0; }
1252 /// Is the function argument?
1253 bool is_arg_var(void) const { return (flags & CVAR_ARG) != 0; }
1254 /// Is the promoted function argument?
1255 bool hexapi is_promoted_arg(void) const;
1256 /// Is fake return variable?
1257 bool is_fake_var(void) const { return (flags & CVAR_FAKE) != 0; }
1258 /// Is overlapped variable?
1259 bool is_overlapped_var(void) const { return (flags & CVAR_OVER) != 0; }
1260 /// Used by a fpu insn?
1261 bool is_floating_var(void) const { return (flags & CVAR_FLOAT) != 0; }
1262 /// Is spoiled var? (meaningful only during lvar allocation)
1263 bool is_spoiled_var(void) const { return (flags & CVAR_SPOILED) != 0; }
1264 /// Variable type should be handled as a partial one
1265 bool is_partialy_typed(void) const { return (flags & CVAR_PARTIAL) != 0; }
1266 /// Variable type should not be a pointer
1267 bool is_noptr_var(void) const { return (flags & CVAR_NOPTR) != 0; }
1268 /// Other variable(s) map to this var?
1269 bool is_mapdst_var(void) const { return (flags & CVAR_MAPDST) != 0; }
1270 /// Is 'this' argument of a C++ member function?
1271 bool is_thisarg(void) const { return (flags & CVAR_THISARG) != 0; }
1272 /// Is a forced variable?
1273 bool is_forced_var(void) const { return (flags & CVAR_FORCED) != 0; }
1274 /// Has a register name? (like _RAX)
1275 bool has_regname(void) const { return (flags & CVAR_REGNAME) != 0; }
1276 /// Is variable used in an instruction translated into __asm?
1277 bool in_asm(void) const { return (flags & CVAR_INASM) != 0; }
1278 /// Is a dummy argument (added to fill a hole in the argument list)
1279 bool is_dummy_arg(void) const { return (flags & CVAR_DUMMY) != 0; }
1280 /// Is a local variable? (local variable cannot be an input argument)
1281 bool is_notarg(void) const { return (flags & CVAR_NOTARG) != 0; }
1282 /// Was the variable automatically mapped to another variable?
1283 bool is_automapped(void) const { return (flags & CVAR_AUTOMAP) != 0; }
1284 /// Was the address of the variable taken?
1285 bool is_used_byref(void) const { return (flags & CVAR_BYREF) != 0; }
1286 /// Was declared as __unused by the user? See CVAR_UNUSED
1287 bool is_decl_unused(void) const { return (flags & CVAR_UNUSED) != 0; }
1288 /// Is lvar mapped to several chains
1289 bool is_shared(void) const { return (flags & CVAR_SHARED) != 0; }
1290 void set_used(void) { flags |= CVAR_USED; }
1291 void clear_used(void) { flags &= ~CVAR_USED; }
1292 void set_typed(void) { flags |= CVAR_TYPE; clr_noptr_var(); }
1293 void set_non_typed(void) { flags &= ~CVAR_TYPE; }
1294 void clr_user_info(void) { flags &= ~(CVAR_UNAME|CVAR_UTYPE|CVAR_NOPTR); }
1295 void set_user_name(void) { flags |= CVAR_NAME|CVAR_UNAME; }
1296 void set_user_type(void) { flags |= CVAR_TYPE|CVAR_UTYPE; }
1297 void clr_user_type(void) { flags &= ~CVAR_UTYPE; }
1298 void clr_user_name(void) { flags &= ~CVAR_UNAME; }
1299 void set_mreg_done(void) { flags |= CVAR_MREG; }
1300 void clr_mreg_done(void) { flags &= ~CVAR_MREG; }
1301 void set_unknown_width(void) { flags |= CVAR_NOWD; }
1302 void clr_unknown_width(void) { flags &= ~CVAR_NOWD; }
1303 void set_arg_var(void) { flags |= CVAR_ARG; }
1304 void clr_arg_var(void) { flags &= ~(CVAR_ARG|CVAR_THISARG); }
1305 void set_fake_var(void) { flags |= CVAR_FAKE; }
1306 void clr_fake_var(void) { flags &= ~CVAR_FAKE; }
1307 void set_overlapped_var(void) { flags |= CVAR_OVER; }
1308 void clr_overlapped_var(void) { flags &= ~CVAR_OVER; }
1309 void set_floating_var(void) { flags |= CVAR_FLOAT; }
1310 void clr_floating_var(void) { flags &= ~CVAR_FLOAT; }
1311 void set_spoiled_var(void) { flags |= CVAR_SPOILED; }
1312 void clr_spoiled_var(void) { flags &= ~CVAR_SPOILED; }
1313 void set_mapdst_var(void) { flags |= CVAR_MAPDST; }
1314 void clr_mapdst_var(void) { flags &= ~CVAR_MAPDST; }
1315 void set_partialy_typed(void) { flags |= CVAR_PARTIAL; }
1316 void clr_partialy_typed(void) { flags &= ~CVAR_PARTIAL; }
1317 void set_noptr_var(void) { flags |= CVAR_NOPTR; }
1318 void clr_noptr_var(void) { flags &= ~CVAR_NOPTR; }
1319 void set_thisarg(void) { flags |= CVAR_THISARG; }
1320 void clr_thisarg(void) { flags &= ~CVAR_THISARG; }
1321 void set_forced_var(void) { flags |= CVAR_FORCED; }
1322 void clr_forced_var(void) { flags &= ~CVAR_FORCED; }
1323 void set_dummy_arg(void) { flags |= CVAR_DUMMY; }
1324 void clr_dummy_arg(void) { flags &= ~CVAR_DUMMY; }
1325 void set_notarg(void) { clr_arg_var(); flags |= CVAR_NOTARG; }
1326 void clr_notarg(void) { flags &= ~CVAR_NOTARG; }
1327 void set_automapped(void) { flags |= CVAR_AUTOMAP; }
1328 void clr_automapped(void) { flags &= ~CVAR_AUTOMAP; }
1329 void set_used_byref(void) { flags |= CVAR_BYREF; }
1330 void clr_used_byref(void) { flags &= ~CVAR_BYREF; }
1331 void set_decl_unused(void) { flags |= CVAR_UNUSED; }
1332 void clr_decl_unused(void) { flags &= ~CVAR_UNUSED; }
1333 void set_shared(void) { flags |= CVAR_SHARED; }
1334 void clr_shared(void) { flags &= ~CVAR_SHARED; }
1335
1336 /// Do variables overlap?
1337 bool has_common(const lvar_t &v) const
1338 {
1339 return arglocs_overlap(location, width, v.location, v.width);
1340 }
1341 /// Does the variable overlap with the specified location?
1342 bool has_common_bit(const vdloc_t &loc, asize_t width2) const
1343 {
1344 return arglocs_overlap(location, width, loc, width2);
1345 }
1346 /// Get variable type
1347 const tinfo_t &type(void) const { return tif; }
1348 tinfo_t &type(void) { return tif; }
1349
1350 /// Check if the variable accept the specified type.
1351 /// Some types are forbidden (void, function types, wrong arrays, etc)
1352 bool hexapi accepts_type(const tinfo_t &t, bool may_change_thisarg=false);
1353 /// Set variable type
1354 /// Note: this function does not modify the idb, only the lvar instance
1355 /// in the memory. For permanent changes see modify_user_lvars()
1356 /// Also, the variable type is not considered as final by the decompiler
1357 /// and may be modified later by the type derivation.
1358 /// In some cases set_final_var_type() may work better, but it does not
1359 /// do persistent changes to the database neither.
1360 /// \param t new type
1361 /// \param may_fail if false and type is bad, interr
1362 /// \return success
1363 bool hexapi set_lvar_type(const tinfo_t &t, bool may_fail=false);
1364
1365 /// Set final variable type.
1366 void set_final_lvar_type(const tinfo_t &t)
1367 {
1368 set_lvar_type(t);
1369 set_typed();
1370 }
1371
1372 /// Change the variable width.
1373 /// We call the variable size 'width', it is represents the number of bytes.
1374 /// This function may change the variable type using set_lvar_type().
1375 /// \param w new width
1376 /// \param svw_flags combination of SVW_... bits
1377 /// \return success
1378 bool hexapi set_width(int w, int svw_flags=0);
1379#define SVW_INT 0x00 // integer value
1380#define SVW_FLOAT 0x01 // floating point value
1381#define SVW_SOFT 0x02 // may fail and return false;
1382 // if this bit is not set and the type is bad, interr
1383
1384 /// Append local variable to mlist.
1385 /// \param mba ptr to the current mba_t
1386 /// \param lst list to append to
1387 /// \param pad_if_scattered if true, append padding bytes in case of scattered lvar
1388 void hexapi append_list(const mba_t *mba, mlist_t *lst, bool pad_if_scattered=false) const;
1389
1390 /// Is the variable aliasable?
1391 /// \param mba ptr to the current mba_t
1392 /// Aliasable variables may be modified indirectly (through a pointer)
1393 bool is_aliasable(const mba_t *mba) const
1394 {
1395 return location.is_aliasable(mba, width);
1396 }
1397
1398};
1399DECLARE_TYPE_AS_MOVABLE(lvar_t);
1400
1401/// Vector of local variables
1402struct lvars_t : public qvector<lvar_t>
1403{
1404 /// Find input variable at the specified location.
1405 /// \param argloc variable location
1406 /// \param _size variable size
1407 /// \return -1 if failed, otherwise the index into the variables vector.
1408 int find_input_lvar(const vdloc_t &argloc, int _size) { return find_lvar(argloc, _size, 0); }
1409
1410
1411 /// Find stack variable at the specified location.
1412 /// \param spoff offset from the minimal sp
1413 /// \param width variable size
1414 /// \return -1 if failed, otherwise the index into the variables vector.
1415 int hexapi find_stkvar(sval_t spoff, int width);
1416
1417
1418 /// Find variable at the specified location.
1419 /// \param ll variable location
1420 /// \return pointer to variable or nullptr
1421 lvar_t *hexapi find(const lvar_locator_t &ll);
1422
1423
1424 /// Find variable at the specified location.
1425 /// \param location variable location
1426 /// \param width variable size
1427 /// \param defblk definition block of the lvar. -1 means any block
1428 /// \return -1 if failed, otherwise the index into the variables vector.
1429 int hexapi find_lvar(const vdloc_t &location, int width, int defblk=-1) const;
1430};
1431
1432/// Saved user settings for local variables: name, type, comment.
1434{
1435 lvar_locator_t ll; ///< Variable locator
1436 qstring name; ///< Name
1437 tinfo_t type; ///< Type
1438 qstring cmt; ///< Comment
1439 ssize_t size; ///< Type size (if not initialized then -1)
1440 int flags; ///< \ref LVINF_
1441/// \defgroup LVINF_ saved user lvar info property bits
1442/// Used in lvar_saved_info_t::flags
1443//@{
1444#define LVINF_KEEP 0x0001 ///< preserve saved user settings regardless of vars
1445 ///< for example, if a var loses all its
1446 ///< user-defined attributes or even gets
1447 ///< destroyed, keep its lvar_saved_info_t.
1448 ///< this is used for ephemeral variables that
1449 ///< get destroyed by macro recognition.
1450#define LVINF_FORCE 0x0002 ///< force allocation of a new variable.
1451 ///< forces the decompiler to create a new
1452 ///< variable at ll.defea
1453#define LVINF_NOPTR 0x0004 ///< variable type should not be a pointer
1454#define LVINF_NOMAP 0x0008 ///< forbid automatic mapping of the variable
1455#define LVINF_UNUSED 0x0010 ///< unused argument, corresponds to CVAR_UNUSED
1456//@}
1457 lvar_saved_info_t(void) : size(BADSIZE), flags(0) {}
1458 bool has_info(void) const
1459 {
1460 return !name.empty()
1461 || !type.empty()
1462 || !cmt.empty()
1463 || is_forced_lvar()
1464 || is_noptr_lvar()
1465 || is_nomap_lvar();
1466 }
1467 bool operator==(const lvar_saved_info_t &r) const
1468 {
1469 return name == r.name
1470 && cmt == r.cmt
1471 && ll == r.ll
1472 && type == r.type;
1473 }
1474 bool operator!=(const lvar_saved_info_t &r) const { return !(*this == r); }
1475 bool is_kept(void) const { return (flags & LVINF_KEEP) != 0; }
1476 void clear_keep(void) { flags &= ~LVINF_KEEP; }
1477 void set_keep(void) { flags |= LVINF_KEEP; }
1478 bool is_forced_lvar(void) const { return (flags & LVINF_FORCE) != 0; }
1479 void set_forced_lvar(void) { flags |= LVINF_FORCE; }
1480 void clr_forced_lvar(void) { flags &= ~LVINF_FORCE; }
1481 bool is_noptr_lvar(void) const { return (flags & LVINF_NOPTR) != 0; }
1482 void set_noptr_lvar(void) { flags |= LVINF_NOPTR; }
1483 void clr_noptr_lvar(void) { flags &= ~LVINF_NOPTR; }
1484 bool is_nomap_lvar(void) const { return (flags & LVINF_NOMAP) != 0; }
1485 void set_nomap_lvar(void) { flags |= LVINF_NOMAP; }
1486 void clr_nomap_lvar(void) { flags &= ~LVINF_NOMAP; }
1487 bool is_unused_lvar(void) const { return (flags & LVINF_UNUSED) != 0; }
1488 void set_unused_lvar(void) { flags |= LVINF_UNUSED; }
1489 void clr_unused_lvar(void) { flags &= ~LVINF_UNUSED; }
1490};
1491DECLARE_TYPE_AS_MOVABLE(lvar_saved_info_t);
1492typedef qvector<lvar_saved_info_t> lvar_saved_infos_t;
1493
1494/// Local variable mapping (is used to merge variables)
1495typedef std::map<lvar_locator_t, lvar_locator_t> lvar_mapping_t;
1496
1497/// All user-defined information about local variables
1499{
1500 /// User-specified names, types, comments for lvars. Variables without
1501 /// user-specified info are not present in this vector.
1502 lvar_saved_infos_t lvvec;
1503
1504 /// Local variable mapping (used for merging variables)
1506
1507 /// Delta to add to IDA stack offset to calculate Hex-Rays stack offsets.
1508 /// Should be set by the caller before calling save_user_lvar_settings();
1510
1511 /// Various flags. Possible values are from \ref ULV_
1513/// \defgroup ULV_ lvar_uservec_t property bits
1514/// Used in lvar_uservec_t::ulv_flags
1515//@{
1516#define ULV_PRECISE_DEFEA 0x0001 ///< Use precise defea's for lvar locations
1517//@}
1518
1519 lvar_uservec_t(void) : stkoff_delta(0), ulv_flags(ULV_PRECISE_DEFEA) {}
1520 void swap(lvar_uservec_t &r)
1521 {
1522 lvvec.swap(r.lvvec);
1523 lmaps.swap(r.lmaps);
1524 std::swap(stkoff_delta, r.stkoff_delta);
1525 std::swap(ulv_flags, r.ulv_flags);
1526 }
1527 void clear()
1528 {
1529 lvvec.clear();
1530 lmaps.clear();
1531 stkoff_delta = 0;
1532 ulv_flags = ULV_PRECISE_DEFEA;
1533 }
1534 bool empty() const
1535 {
1536 return lvvec.empty()
1537 && lmaps.empty()
1538 && stkoff_delta == 0
1539 && ulv_flags == ULV_PRECISE_DEFEA;
1540 }
1541
1542 /// find saved user settings for given var
1544 {
1545 for ( lvar_saved_infos_t::iterator p=lvvec.begin(); p != lvvec.end(); ++p )
1546 {
1547 if ( p->ll == vloc )
1548 return p;
1549 }
1550 return nullptr;
1551 }
1552
1553 /// Preserve user settings for given var
1554 void keep_info(const lvar_t &v)
1555 {
1556 lvar_saved_info_t *p = find_info(v);
1557 if ( p != nullptr )
1558 p->set_keep();
1559 }
1560};
1561
1562/// Restore user defined local variable settings in the database.
1563/// \param func_ea entry address of the function
1564/// \param lvinf ptr to output buffer
1565/// \return success
1566
1567bool hexapi restore_user_lvar_settings(lvar_uservec_t *lvinf, ea_t func_ea);
1568
1569
1570/// Save user defined local variable settings into the database.
1571/// \param func_ea entry address of the function
1572/// \param lvinf user-specified info about local variables
1573
1574void hexapi save_user_lvar_settings(ea_t func_ea, const lvar_uservec_t &lvinf);
1575
1576
1577/// Helper class to modify saved local variable settings.
1579{
1580 /// Modify lvar settings.
1581 /// Returns: true-modified
1582 virtual bool idaapi modify_lvars(lvar_uservec_t *lvinf) = 0;
1583};
1584
1585/// Modify saved local variable settings.
1586/// \param entry_ea function start address
1587/// \param mlv local variable modifier
1588/// \return true if modified variables
1589
1590bool hexapi modify_user_lvars(ea_t entry_ea, user_lvar_modifier_t &mlv);
1591
1592
1593/// Modify saved local variable settings of one variable.
1594/// \param func_ea function start address
1595/// \param info local variable info attrs
1596/// \param mli_flags bits that specify which attrs defined by INFO are to be set
1597/// \return true if modified, false if invalid MLI_FLAGS passed
1598
1600 ea_t func_ea,
1601 uint mli_flags,
1602 const lvar_saved_info_t &info);
1603
1604/// \defgroup MLI_ user info bits
1605//@{
1606#define MLI_NAME 0x01 ///< apply lvar name
1607#define MLI_TYPE 0x02 ///< apply lvar type
1608#define MLI_CMT 0x04 ///< apply lvar comment
1609#define MLI_SET_FLAGS 0x08 ///< set LVINF_... bits
1610#define MLI_CLR_FLAGS 0x10 ///< clear LVINF_... bits
1611//@}
1612
1613
1614/// Find a variable by name.
1615/// \param out output buffer for the variable locator
1616/// \param func_ea function start address
1617/// \param varname variable name
1618/// \return success
1619/// Since VARNAME is not always enough to find the variable, it may decompile
1620/// the function.
1621
1622bool hexapi locate_lvar(
1623 lvar_locator_t *out,
1624 ea_t func_ea,
1625 const char *varname);
1626
1627
1628/// Rename a local variable.
1629/// \param func_ea function start address
1630/// \param oldname old name of the variable
1631/// \param newname new name of the variable
1632/// \return success
1633/// This is a convenience function.
1634/// For bulk renaming consider using modify_user_lvars.
1635
1636inline bool rename_lvar(
1637 ea_t func_ea,
1638 const char *oldname,
1639 const char *newname)
1640{
1641 lvar_saved_info_t info;
1642 if ( !locate_lvar(&info.ll, func_ea, oldname) )
1643 return false;
1644 info.name = newname;
1645 return modify_user_lvar_info(func_ea, MLI_NAME, info);
1646}
1647
1648//-------------------------------------------------------------------------
1649/// User-defined function calls
1651{
1652 qstring name; // name of the function
1653 tinfo_t tif; // function prototype
1654 DECLARE_COMPARISONS(udcall_t)
1655 {
1656 int code = ::compare(name, r.name);
1657 if ( code == 0 )
1658 code = ::compare(tif, r.tif);
1659 return code;
1660 }
1661
1662 bool empty() const { return name.empty() && tif.empty(); }
1663};
1664
1665// All user-defined function calls (map address -> udcall)
1666typedef std::map<ea_t, udcall_t> udcall_map_t;
1667
1668/// Restore user defined function calls from the database.
1669/// \param udcalls ptr to output buffer
1670/// \param func_ea entry address of the function
1671/// \return success
1672
1673bool hexapi restore_user_defined_calls(udcall_map_t *udcalls, ea_t func_ea);
1674
1675
1676/// Save user defined local function calls into the database.
1677/// \param func_ea entry address of the function
1678/// \param udcalls user-specified info about user defined function calls
1679
1680void hexapi save_user_defined_calls(ea_t func_ea, const udcall_map_t &udcalls);
1681
1682
1683/// Convert function type declaration into internal structure
1684/// \param udc - pointer to output structure
1685/// \param decl - function type declaration
1686/// \param silent - if TRUE: do not show warning in case of incorrect type
1687/// \return success
1688
1689bool hexapi parse_user_call(udcall_t *udc, const char *decl, bool silent);
1690
1691
1692/// try to generate user-defined call for an instruction
1693/// \return \ref MERR_ code:
1694/// MERR_OK - user-defined call generated
1695/// else - error (MERR_INSN == inacceptable udc.tif)
1696
1698
1699
1700//-------------------------------------------------------------------------
1701/// Generic microcode generator class.
1702/// An instance of a derived class can be registered to be used for
1703/// non-standard microcode generation. Before microcode generation for an
1704/// instruction all registered object will be visited by the following way:
1705/// if ( filter->match(cdg) )
1706/// code = filter->apply(cdg);
1707/// if ( code == MERR_OK )
1708/// continue; // filter generated microcode, go to the next instruction
1710{
1711 /// check if the filter object is to be applied
1712 /// \return success
1713 virtual bool match(codegen_t &cdg) = 0;
1714
1715 /// generate microcode for an instruction
1716 /// \return MERR_... code:
1717 /// MERR_OK - user-defined microcode generated, go to the next instruction
1718 /// MERR_INSN - not generated - the caller should try the standard way
1719 /// else - error
1720 virtual merror_t apply(codegen_t &cdg) = 0;
1721};
1722
1723/// register/unregister non-standard microcode generator
1724/// \param filter - microcode generator object
1725/// \param install - TRUE - register the object, FALSE - unregister
1726/// \return success
1727bool hexapi install_microcode_filter(microcode_filter_t *filter, bool install=true);
1728
1729//-------------------------------------------------------------------------
1730/// Abstract class: User-defined call generator
1731/// derived classes should implement method 'match'
1733{
1734 udcall_t udc;
1735
1736public:
1737 ~udc_filter_t() { cleanup(); }
1738
1739 /// Cleanup the filter
1740 /// This function properly clears type information associated to this filter.
1741 void hexapi cleanup(void);
1742
1743 /// return true if the filter object should be applied to given instruction
1744 virtual bool match(codegen_t &cdg) override = 0;
1745
1746 bool hexapi init(const char *decl);
1747 virtual merror_t hexapi apply(codegen_t &cdg) override;
1748
1749 bool empty(void) const { return udc.empty(); }
1750};
1751
1752//-------------------------------------------------------------------------
1753typedef size_t mbitmap_t;
1754const size_t bitset_width = sizeof(mbitmap_t) * CHAR_BIT;
1755const size_t bitset_align = bitset_width - 1;
1756const size_t bitset_shift = 6;
1757
1758/// Bit set class. See https://en.wikipedia.org/wiki/Bit_array
1760{
1761 mbitmap_t *bitmap; ///< pointer to bitmap
1762 size_t high; ///< highest bit+1 (multiply of bitset_width)
1763
1764public:
1765 bitset_t(void) : bitmap(nullptr), high(0) {}
1766 hexapi bitset_t(const bitset_t &m); // copy constructor
1767 ~bitset_t(void)
1768 {
1769 qfree(bitmap);
1770 bitmap = nullptr;
1771 }
1772 void swap(bitset_t &r)
1773 {
1774 std::swap(bitmap, r.bitmap);
1775 std::swap(high, r.high);
1776 }
1777 bitset_t &operator=(const bitset_t &m) { return copy(m); }
1778 bitset_t &hexapi copy(const bitset_t &m); // assignment operator
1779 bool hexapi add(int bit); // add a bit
1780 bool hexapi add(int bit, int width); // add bits
1781 bool hexapi add(const bitset_t &ml); // add another bitset
1782 bool hexapi sub(int bit); // delete a bit
1783 bool hexapi sub(int bit, int width); // delete bits
1784 bool hexapi sub(const bitset_t &ml); // delete another bitset
1785 bool hexapi cut_at(int maxbit); // delete bits >= maxbit
1786 void hexapi shift_down(int shift); // shift bits down
1787 bool hexapi has(int bit) const; // test presence of a bit
1788 bool hexapi has_all(int bit, int width) const; // test presence of bits
1789 bool hexapi has_any(int bit, int width) const; // test presence of bits
1790 void print(
1791 qstring *vout,
1792 int (*get_bit_name)(qstring *out, int bit, int width, void *ud)=nullptr,
1793 void *ud=nullptr) const;
1794 const char *hexapi dstr() const;
1795 bool hexapi empty(void) const; // is empty?
1796 int hexapi count(void) const; // number of set bits
1797 int hexapi count(int bit) const; // get number set bits starting from 'bit'
1798 int hexapi last(void) const; // get the number of the last bit (-1-no bits)
1799 void clear(void) { high = 0; } // make empty
1800 void hexapi fill_with_ones(int maxbit);
1801 bool hexapi fill_gaps(int total_nbits);
1802 bool hexapi has_common(const bitset_t &ml) const; // has common elements?
1803 bool hexapi intersect(const bitset_t &ml); // intersect sets. returns true if changed
1804 bool hexapi is_subset_of(const bitset_t &ml) const; // is subset of?
1805 bool includes(const bitset_t &ml) const { return ml.is_subset_of(*this); }
1806 void extract(intvec_t &out) const;
1807 DECLARE_COMPARISONS(bitset_t);
1808 HEXRAYS_MEMORY_ALLOCATION_FUNCS()
1810 {
1811 friend class bitset_t;
1812 int i;
1813 public:
1814 iterator(int n=-1) : i(n) {}
1815 bool operator==(const iterator &n) const { return i == n.i; }
1816 bool operator!=(const iterator &n) const { return i != n.i; }
1817 int operator*(void) const { return i; }
1818 };
1819 typedef iterator const_iterator;
1820 iterator itat(int n) const { return iterator(goup(n)); }
1821 iterator begin(void) const { return itat(0); }
1822 iterator end(void) const { return iterator(high); }
1823 int front(void) const { return *begin(); }
1824 int back(void) const { return *end(); }
1825 void inc(iterator &p, int n=1) const { p.i = goup(p.i+n); }
1826private:
1827 int hexapi goup(int reg) const;
1828};
1829DECLARE_TYPE_AS_MOVABLE(bitset_t);
1830typedef qvector<bitset_t> array_of_bitsets;
1831
1832//-------------------------------------------------------------------------
1833template <class T>
1834struct ivl_tpl // an interval
1835{
1836 ivl_tpl() = delete;
1837public:
1838 T off;
1839 T size;
1840 ivl_tpl(T _off, T _size) : off(_off), size(_size) {}
1841 bool valid() const { return last() >= off; }
1842 T end() const { return off + size; }
1843 T last() const { return off + size - 1; }
1844
1845 DEFINE_MEMORY_ALLOCATION_FUNCS()
1846};
1847
1848//-------------------------------------------------------------------------
1850struct ivl_t : public uval_ivl_t
1851{
1852private:
1853 typedef ivl_tpl<uval_t> inherited;
1854
1855public:
1856 ivl_t(uval_t _off=0, uval_t _size=0) : inherited(_off,_size) {}
1857 bool empty(void) const { return size == 0; }
1858 void clear(void) { size = 0; }
1859 void print(qstring *vout) const;
1860 const char *hexapi dstr(void) const;
1861
1862 bool extend_to_cover(const ivl_t &r) // extend interval to cover 'r'
1863 {
1864 uval_t new_end = end();
1865 bool changed = false;
1866 if ( off > r.off )
1867 {
1868 off = r.off;
1869 changed = true;
1870 }
1871 if ( new_end < r.end() )
1872 {
1873 new_end = r.end();
1874 changed = true;
1875 }
1876 if ( changed )
1877 size = new_end - off;
1878 return changed;
1879 }
1880 void intersect(const ivl_t &r)
1881 {
1882 uval_t new_off = qmax(off, r.off);
1883 uval_t new_end = end();
1884 if ( new_end > r.end() )
1885 new_end = r.end();
1886 if ( new_off < new_end )
1887 {
1888 off = new_off;
1889 size = new_end - off;
1890 }
1891 else
1892 {
1893 size = 0;
1894 }
1895 }
1896
1897 // do *this and ivl overlap?
1898 bool overlap(const ivl_t &ivl) const
1899 {
1900 return interval::overlap(off, size, ivl.off, ivl.size);
1901 }
1902 // does *this include ivl?
1903 bool includes(const ivl_t &ivl) const
1904 {
1905 return interval::includes(off, size, ivl.off, ivl.size);
1906 }
1907 // does *this contain off2?
1908 bool contains(uval_t off2) const
1909 {
1910 return interval::contains(off, size, off2);
1911 }
1912
1913 DECLARE_COMPARISONS(ivl_t);
1914 static const ivl_t allmem;
1915#define ALLMEM ivl_t::allmem
1916};
1917DECLARE_TYPE_AS_MOVABLE(ivl_t);
1918
1919//-------------------------------------------------------------------------
1921{
1922 ivl_t ivl;
1923 const char *whole; // name of the whole interval
1924 const char *part; // prefix to use for parts of the interval (e.g. sp+4)
1925 ivl_with_name_t(): ivl(0, BADADDR), whole("<unnamed inteval>"), part(nullptr) {}
1926 DEFINE_MEMORY_ALLOCATION_FUNCS()
1927};
1928
1929//-------------------------------------------------------------------------
1930template <class Ivl, class T>
1931class ivlset_tpl // set of intervals
1932{
1933public:
1934 typedef qvector<Ivl> bag_t;
1935
1936protected:
1937 bag_t bag;
1938 bool verify(void) const;
1939 // we do not store the empty intervals in bag so size == 0 denotes
1940 // MAX_VALUE<T>+1, e.g. 0x100000000 for uint32
1941 static bool ivl_all_values(const Ivl &ivl) { return ivl.off == 0 && ivl.size == 0; }
1942
1943public:
1944 ivlset_tpl(void) {}
1945 ivlset_tpl(const Ivl &ivl) { if ( ivl.valid() ) bag.push_back(ivl); }
1946 DEFINE_MEMORY_ALLOCATION_FUNCS()
1947
1948 void swap(ivlset_tpl &r) { bag.swap(r.bag); }
1949 const Ivl &getivl(int idx) const { return bag[idx]; }
1950 const Ivl &lastivl(void) const { return bag.back(); }
1951 size_t nivls(void) const { return bag.size(); }
1952 bool empty(void) const { return bag.empty(); }
1953 void clear(void) { bag.clear(); }
1954 void qclear(void) { bag.qclear(); }
1955 bool all_values() const { return nivls() == 1 && ivl_all_values(bag[0]); }
1956 void set_all_values() { clear(); bag.push_back(Ivl(0, 0)); }
1957 bool single_value() const { return nivls() == 1 && bag[0].size == 1; }
1958 bool single_value(T v) const { return single_value() && bag[0].off == v; }
1959
1960 bool operator==(const Ivl &v) const { return nivls() == 1 && bag[0] == v; }
1961 bool operator!=(const Ivl &v) const { return !(*this == v); }
1962
1963 typedef typename bag_t::iterator iterator;
1964 typedef typename bag_t::const_iterator const_iterator;
1965 const_iterator begin(void) const { return bag.begin(); }
1966 const_iterator end(void) const { return bag.end(); }
1967 iterator begin(void) { return bag.begin(); }
1968 iterator end(void) { return bag.end(); }
1969};
1970
1971//-------------------------------------------------------------------------
1972/// Set of address intervals.
1973/// Bit arrays are efficient only for small sets. Potentially huge
1974/// sets, like memory ranges, require another representation.
1975/// ivlset_t is used for a list of memory locations in our decompiler.
1978{
1980 ivlset_t() {}
1981 ivlset_t(const ivl_t &ivl) : inherited(ivl) {}
1982 bool hexapi add(const ivl_t &ivl);
1983 bool add(ea_t ea, asize_t size) { return add(ivl_t(ea, size)); }
1984 bool hexapi add(const ivlset_t &ivs);
1985 bool hexapi addmasked(const ivlset_t &ivs, const ivl_t &mask);
1986 bool hexapi sub(const ivl_t &ivl);
1987 bool sub(ea_t ea, asize_t size) { return sub(ivl_t(ea, size)); }
1988 bool hexapi sub(const ivlset_t &ivs);
1989 bool hexapi has_common(const ivl_t &ivl, bool strict=false) const;
1990 void hexapi print(qstring *vout) const;
1991 const char *hexapi dstr(void) const;
1992 asize_t hexapi count(void) const;
1993 bool hexapi has_common(const ivlset_t &ivs) const;
1994 bool hexapi contains(uval_t off) const;
1995 bool hexapi includes(const ivlset_t &ivs) const;
1996 bool hexapi intersect(const ivlset_t &ivs);
1997
1998 DECLARE_COMPARISONS(ivlset_t);
1999
2000};
2001DECLARE_TYPE_AS_MOVABLE(ivlset_t);
2002typedef qvector<ivlset_t> array_of_ivlsets;
2003//-------------------------------------------------------------------------
2004// We use bitset_t to keep list of registers.
2005// This is the most optimal storage for them.
2006class rlist_t : public bitset_t
2007{
2008public:
2009 rlist_t(void) {}
2010 rlist_t(const rlist_t &m) : bitset_t(m)
2011 {
2012 }
2013 rlist_t(mreg_t reg, int width) { add(reg, width); }
2014 ~rlist_t(void) {}
2015 rlist_t &operator=(const rlist_t &) = default;
2016 void hexapi print(qstring *vout) const;
2017 const char *hexapi dstr() const;
2018};
2019DECLARE_TYPE_AS_MOVABLE(rlist_t);
2020
2021//-------------------------------------------------------------------------
2022// Microlist: list of register and memory locations
2024{
2025 rlist_t reg; // registers
2026 ivlset_t mem; // memory locations
2027
2028 mlist_t(void) {}
2029 mlist_t(const ivl_t &ivl) : mem(ivl) {}
2030 mlist_t(mreg_t r, int size) : reg(r, size) {}
2031
2032 void swap(mlist_t &r) { reg.swap(r.reg); mem.swap(r.mem); }
2033 bool hexapi addmem(ea_t ea, asize_t size);
2034 bool add(mreg_t r, int size) { return add(mlist_t(r, size)); } // also see append_def_list()
2035 bool add(const rlist_t &r) { return reg.add(r); }
2036 bool add(const ivl_t &ivl) { return add(mlist_t(ivl)); }
2037 bool add(const mlist_t &lst) { return reg.add(lst.reg) | mem.add(lst.mem); }
2038 bool sub(mreg_t r, int size) { return sub(mlist_t(r, size)); }
2039 bool sub(const ivl_t &ivl) { return sub(mlist_t(ivl)); }
2040 bool sub(const mlist_t &lst) { return reg.sub(lst.reg) | mem.sub(lst.mem); }
2041 asize_t count(void) const { return reg.count() + mem.count(); }
2042 void hexapi print(qstring *vout) const;
2043 const char *hexapi dstr() const;
2044 bool empty(void) const { return reg.empty() && mem.empty(); }
2045 void clear(void) { reg.clear(); mem.clear(); }
2046 bool has(mreg_t r) const { return reg.has(r); }
2047 bool has_all(mreg_t r, int size) const { return reg.has_all(r, size); }
2048 bool has_any(mreg_t r, int size) const { return reg.has_any(r, size); }
2049 bool has_memory(void) const { return !mem.empty(); }
2050 bool has_allmem(void) const { return mem == ALLMEM; }
2051 bool has_common(const mlist_t &lst) const { return reg.has_common(lst.reg) || mem.has_common(lst.mem); }
2052 bool includes(const mlist_t &lst) const { return reg.includes(lst.reg) && mem.includes(lst.mem); }
2053 bool intersect(const mlist_t &lst) { return reg.intersect(lst.reg) | mem.intersect(lst.mem); }
2054 bool is_subset_of(const mlist_t &lst) const { return lst.includes(*this); }
2055
2056 DECLARE_COMPARISONS(mlist_t);
2057 HEXRAYS_MEMORY_ALLOCATION_FUNCS()
2058};
2059DECLARE_TYPE_AS_MOVABLE(mlist_t);
2060typedef qvector<mlist_t> mlistvec_t;
2061DECLARE_TYPE_AS_MOVABLE(mlistvec_t);
2062
2063//-------------------------------------------------------------------------
2064/// Get list of temporary registers.
2065/// Tempregs are temporary registers that are used during code generation.
2066/// They do not map to regular processor registers. They are used only to
2067/// store temporary values during execution of one instruction.
2068/// Tempregs may not be used to pass a value from one block to another.
2069/// In other words, at the end of a block all tempregs must be dead.
2070const mlist_t &hexapi get_temp_regs(void);
2071
2072/// Is a kernel register?
2073/// Kernel registers are temporary registers that can be used freely.
2074/// They may be used to store values that cross instruction or basic block
2075/// boundaries. Kernel registers do not map to regular processor registers.
2076/// See also \ref mba_t::alloc_kreg()
2077bool hexapi is_kreg(mreg_t r);
2078
2079/// Map a processor register to a microregister.
2080/// \param reg processor register number
2081/// \return microregister register id or mr_none
2082mreg_t hexapi reg2mreg(int reg);
2083
2084/// Map a microregister to a processor register.
2085/// \param reg microregister number
2086/// \param width size of microregister in bytes
2087/// \return processor register id or -1
2088int hexapi mreg2reg(mreg_t reg, int width);
2089
2090/// Get the microregister name.
2091/// \param out output buffer, may be nullptr
2092/// \param reg microregister number
2093/// \param width size of microregister in bytes. may be bigger than the real
2094/// register size.
2095/// \param ud reserved, must be nullptr
2096/// \return width of the printed register. this value may be less than
2097/// the WIDTH argument.
2098
2099int hexapi get_mreg_name(qstring *out, mreg_t reg, int width, void *ud=nullptr);
2100
2101//-------------------------------------------------------------------------
2102/// User defined callback to optimize individual microcode instructions
2104{
2105 /// Optimize an instruction.
2106 /// \param blk current basic block. maybe nullptr, which means that
2107 /// the instruction must be optimized without context
2108 /// \param ins instruction to optimize; it is always a top-level instruction.
2109 /// the callback may not delete the instruction but may
2110 /// convert it into nop (see mblock_t::make_nop). to optimize
2111 /// sub-instructions, visit them using minsn_visitor_t.
2112 /// sub-instructions may not be converted into nop but
2113 /// can be converted to "mov x,x". for example:
2114 /// add x,0,x => mov x,x
2115 /// this callback may change other instructions in the block,
2116 /// but should do this with care, e.g. to no break the
2117 /// propagation algorithm if called with OPTI_NO_LDXOPT.
2118 /// \param optflags combination of \ref OPTI_ bits
2119 /// \return number of changes made to the instruction.
2120 /// if after this call the instruction's use/def lists have changed,
2121 /// you must mark the block level lists as dirty (see mark_lists_dirty)
2122 virtual int idaapi func(mblock_t *blk, minsn_t *ins, int optflags) = 0;
2123};
2124
2125/// Install an instruction level custom optimizer
2126/// \param opt an instance of optinsn_t. cannot be destroyed before calling
2127/// remove_optinsn_handler().
2129
2130/// Remove an instruction level custom optimizer
2132
2133/// User defined callback to optimize microcode blocks
2135{
2136 /// Optimize a block.
2137 /// This function usually performs the optimizations that require analyzing
2138 /// the entire block and/or its neighbors. For example it can recognize
2139 /// patterns and perform conversions like:
2140 /// b0: b0:
2141 /// ... ...
2142 /// jnz x, 0, @b2 => jnz x, 0, @b2
2143 /// b1: b1:
2144 /// add x, 0, y mov x, y
2145 /// ... ...
2146 /// \param blk Basic block to optimize as a whole.
2147 /// \return number of changes made to the block. See also mark_lists_dirty.
2148 virtual int idaapi func(mblock_t *blk) = 0;
2149};
2150
2151/// Install a block level custom optimizer.
2152/// \param opt an instance of optblock_t. cannot be destroyed before calling
2153/// remove_optblock_handler().
2155
2156/// Remove a block level custom optimizer
2158
2159
2160//-------------------------------------------------------------------------
2161// abstract graph interface
2162class simple_graph_t : public gdl_graph_t
2163{
2164public:
2165 qstring title;
2166 bool colored_gdl_edges;
2167private:
2168 friend class iterator;
2169 virtual int goup(int node) const newapi;
2170};
2171
2172//-------------------------------------------------------------------------
2173// Since our data structures are quite complex, we use the visitor pattern
2174// in many of our algorthims. This functionality is available for plugins too.
2175// https://en.wikipedia.org/wiki/Visitor_pattern
2176
2177// All our visitor callbacks return an integer value.
2178// Visiting is interrupted as soon an the return value is non-zero.
2179// This non-zero value is returned as the result of the for_all_... function.
2180// If for_all_... returns 0, it means that it successfully visited all items.
2181
2182/// The context info used by visitors
2184{
2185 mba_t *mba; // current microcode
2186 mblock_t *blk; // current block
2187 minsn_t *topins; // top level instruction (parent of curins or curins itself)
2188 minsn_t *curins; // currently visited instruction
2190 mba_t *_mba=nullptr,
2191 mblock_t *_blk=nullptr,
2192 minsn_t *_topins=nullptr)
2193 : mba(_mba), blk(_blk), topins(_topins), curins(nullptr) {}
2194 HEXRAYS_MEMORY_ALLOCATION_FUNCS()
2195 bool really_alloc(void) const;
2196};
2197
2198/// Micro instruction visitor.
2199/// See mba_t::for_all_topinsns, minsn_t::for_all_insns,
2200/// mblock_::for_all_insns, mba_t::for_all_insns
2202{
2204 mba_t *_mba=nullptr,
2205 mblock_t *_blk=nullptr,
2206 minsn_t *_topins=nullptr)
2207 : op_parent_info_t(_mba, _blk, _topins) {}
2208 virtual int idaapi visit_minsn(void) = 0;
2209};
2210
2211/// Micro operand visitor.
2212/// See mop_t::for_all_ops, minsn_t::for_all_ops, mblock_t::for_all_insns,
2213/// mba_t::for_all_insns
2215{
2217 mba_t *_mba=nullptr,
2218 mblock_t *_blk=nullptr,
2219 minsn_t *_topins=nullptr)
2220 : op_parent_info_t(_mba, _blk, _topins), prune(false) {}
2221 /// Should skip sub-operands of the current operand?
2222 /// visit_mop() may set 'prune=true' for that.
2223 bool prune;
2224 virtual int idaapi visit_mop(mop_t *op, const tinfo_t *type, bool is_target) = 0;
2225};
2226
2227/// Scattered mop: visit each of the scattered locations as a separate mop.
2228/// See mop_t::for_all_scattered_submops
2230{
2231 virtual int idaapi visit_scif_mop(const mop_t &r, int off) = 0;
2232};
2233
2234// Used operand visitor.
2235// See mblock_t::for_all_uses
2237{
2238 minsn_t *topins;
2239 minsn_t *curins;
2240 bool changed;
2241 mlist_t *list;
2242 mlist_mop_visitor_t(void): topins(nullptr), curins(nullptr), changed(false), list(nullptr) {}
2243 virtual int idaapi visit_mop(mop_t *op) = 0;
2244};
2245
2246//-------------------------------------------------------------------------
2247/// Instruction operand types
2248
2249typedef uint8 mopt_t;
2250const mopt_t
2251 mop_z = 0, ///< none
2252 mop_r = 1, ///< register (they exist until MMAT_LVARS)
2253 mop_n = 2, ///< immediate number constant
2254 mop_str = 3, ///< immediate string constant (user representation)
2255 mop_d = 4, ///< result of another instruction
2256 mop_S = 5, ///< local stack variable (they exist until MMAT_LVARS)
2257 mop_v = 6, ///< global variable
2258 mop_b = 7, ///< micro basic block (mblock_t)
2259 mop_f = 8, ///< list of arguments
2260 mop_l = 9, ///< local variable
2261 mop_a = 10, ///< mop_addr_t: address of operand (mop_l, mop_v, mop_S, mop_r)
2262 mop_h = 11, ///< helper function
2263 mop_c = 12, ///< mcases
2264 mop_fn = 13, ///< floating point constant
2265 mop_p = 14, ///< operand pair
2266 mop_sc = 15; ///< scattered
2267
2268const int NOSIZE = -1; ///< wrong or unexisting operand size
2269
2270//-------------------------------------------------------------------------
2271/// Reference to a local variable. Used by mop_l
2273{
2274 /// Pointer to the parent mba_t object.
2275 /// Since we need to access the 'mba->vars' array in order to retrieve
2276 /// the referenced variable, we keep a pointer to mba_t here.
2277 /// Note: this means this class and consequently mop_t, minsn_t, mblock_t
2278 /// are specific to a mba_t object and cannot migrate between
2279 /// them. fortunately this is not something we need to do.
2280 /// second, lvar_ref_t's appear only after MMAT_LVARS.
2281 mba_t *const mba;
2282 sval_t off; ///< offset from the beginning of the variable
2283 int idx; ///< index into mba->vars
2284 lvar_ref_t(mba_t *m, int i, sval_t o=0) : mba(m), off(o), idx(i) {}
2285 lvar_ref_t(const lvar_ref_t &r) : mba(r.mba), off(r.off), idx(r.idx) {}
2286 lvar_ref_t &operator=(const lvar_ref_t &r)
2287 {
2288 off = r.off;
2289 idx = r.idx;
2290 return *this;
2291 }
2292 DECLARE_COMPARISONS(lvar_ref_t);
2293 HEXRAYS_MEMORY_ALLOCATION_FUNCS()
2294 void swap(lvar_ref_t &r)
2295 {
2296 std::swap(off, r.off);
2297 std::swap(idx, r.idx);
2298 }
2299 lvar_t &hexapi var(void) const; ///< Retrieve the referenced variable
2300};
2301
2302//-------------------------------------------------------------------------
2303/// Reference to a stack variable. Used for mop_S
2305{
2306 /// Pointer to the parent mba_t object.
2307 /// We need it in order to retrieve the referenced stack variable.
2308 /// See notes for lvar_ref_t::mba.
2309 mba_t *const mba;
2310
2311 /// Offset to the stack variable from the bottom of the stack frame.
2312 /// It is called 'decompiler stkoff' and it is different from IDA stkoff.
2313 /// See a note and a picture about 'decompiler stkoff' below.
2314 sval_t off;
2315
2316 stkvar_ref_t(mba_t *m, sval_t o) : mba(m), off(o) {}
2317 DECLARE_COMPARISONS(stkvar_ref_t);
2318 HEXRAYS_MEMORY_ALLOCATION_FUNCS()
2319 void swap(stkvar_ref_t &r)
2320 {
2321 std::swap(off, r.off);
2322 }
2323 /// Retrieve the referenced stack variable.
2324 /// \param p_off if specified, will hold IDA stkoff after the call.
2325 /// \return pointer to the stack variable
2326 member_t *hexapi get_stkvar(uval_t *p_off=nullptr) const;
2327};
2328
2329//-------------------------------------------------------------------------
2330/// Scattered operand info. Used for mop_sc
2331struct scif_t : public vdloc_t
2332{
2333 /// Pointer to the parent mba_t object.
2334 /// Some operations may convert a scattered operand into something simpler,
2335 /// (a stack operand, for example). We will need to create stkvar_ref_t at
2336 /// that moment, this is why we need this pointer.
2337 /// See notes for lvar_ref_t::mba.
2339
2340 /// Usually scattered operands are created from a function prototype,
2341 /// which has the name information. We preserve it and use it to name
2342 /// the corresponding local variable.
2343 qstring name;
2344
2345 /// Scattered operands always have type info assigned to them
2346 /// because without it we won't be able to manipulte them.
2347 tinfo_t type;
2348
2349 scif_t(mba_t *_mba, tinfo_t *tif, qstring *n=nullptr) : mba(_mba)
2350 {
2351 if ( n != nullptr )
2352 n->swap(name);
2353 tif->swap(type);
2354 }
2355 scif_t &operator =(const vdloc_t &loc)
2356 {
2357 *(vdloc_t *)this = loc;
2358 return *this;
2359 }
2360};
2361
2362//-------------------------------------------------------------------------
2363/// An integer constant. Used for mop_n
2364/// We support 64-bit values but 128-bit values can be represented with mop_p
2366{
2367 uint64 value;
2368 uint64 org_value; // original value before changing the operand size
2369 mnumber_t(uint64 v, ea_t _ea=BADADDR, int n=0)
2370 : operand_locator_t(_ea, n), value(v), org_value(v) {}
2371 HEXRAYS_MEMORY_ALLOCATION_FUNCS()
2372 DECLARE_COMPARISONS(mnumber_t)
2373 {
2374 if ( value < r.value )
2375 return -1;
2376 if ( value > r.value )
2377 return -1;
2378 return 0;
2379 }
2380 // always use this function instead of manually modifying the 'value' field
2381 void update_value(uint64 val64)
2382 {
2383 value = val64;
2384 org_value = val64;
2385 }
2386};
2387
2388//-------------------------------------------------------------------------
2389/// Floating point constant. Used for mop_fn
2390/// For more details, please see the ieee.h file from IDA SDK.
2392{
2393 fpvalue_t fnum; ///< Internal representation of the number
2394 int nbytes; ///< Original size of the constant in bytes
2395 operator uint16 *(void) { return fnum.w; }
2396 operator const uint16 *(void) const { return fnum.w; }
2397 void hexapi print(qstring *vout) const;
2398 const char *hexapi dstr() const;
2399 HEXRAYS_MEMORY_ALLOCATION_FUNCS()
2400 DECLARE_COMPARISONS(fnumber_t)
2401 {
2402 return ecmp(fnum, r.fnum);
2403 }
2404};
2405
2406//-------------------------------------------------------------------------
2407/// \defgroup SHINS_ Bits to control how we print instructions
2408//@{
2409#define SHINS_NUMADDR 0x01 ///< display definition addresses for numbers
2410#define SHINS_VALNUM 0x02 ///< display value numbers
2411#define SHINS_SHORT 0x04 ///< do not display use-def chains and other attrs
2412#define SHINS_LDXEA 0x08 ///< display address of ldx expressions (not used)
2413//@}
2414
2415//-------------------------------------------------------------------------
2416/// How to handle side effect of change_size()
2417/// Sometimes we need to create a temporary operand and change its size in order
2418/// to check some hypothesis. If we revert our changes, we do not want that the
2419/// database (global variables, stack frame, etc) changes in any manner.
2421{
2422 NO_SIDEFF, ///< change operand size but ignore side effects
2423 ///< if you decide to keep the changed operand,
2424 ///< handle_new_size() must be called
2425 WITH_SIDEFF, ///< change operand size and handle side effects
2426 ONLY_SIDEFF, ///< only handle side effects
2427 ANY_REGSIZE = 0x80, ///< any register size is permitted
2428 ANY_FPSIZE = 0x100, ///< any size of floating operand is permitted
2429};
2430
2431//-------------------------------------------------------------------------
2432/// A microinstruction operand.
2433/// This is the smallest building block of our microcode.
2434/// Operands will be part of instructions, which are then grouped into basic blocks.
2435/// The microcode consists of an array of such basic blocks + some additional info.
2437{
2438 void hexapi copy(const mop_t &rop);
2439public:
2440 /// Operand type.
2442
2443 /// Operand properties.
2444 uint8 oprops;
2445#define OPROP_IMPDONE 0x01 ///< imported operand (a pointer) has been dereferenced
2446#define OPROP_UDT 0x02 ///< a struct or union
2447#define OPROP_FLOAT 0x04 ///< possibly floating value
2448#define OPROP_CCFLAGS 0x08 ///< mop_n: a pc-relative value
2449 ///< else: value of a condition code register (like mr_cc)
2450#define OPROP_UDEFVAL 0x10 ///< uses undefined value
2451#define OPROP_LOWADDR 0x20 ///< a low address offset
2452
2453 /// Value number.
2454 /// Zero means unknown.
2455 /// Operands with the same value number are equal.
2456 uint16 valnum;
2457
2458 /// Operand size.
2459 /// Usually it is 1,2,4,8 or NOSIZE but for UDTs other sizes are permitted
2460 int size;
2461
2462 /// The following union holds additional details about the operand.
2463 /// Depending on the operand type different kinds of info are stored.
2464 /// You should access these fields only after verifying the operand type.
2465 /// All pointers are owned by the operand and are freed by its destructor.
2466 union
2467 {
2468 mreg_t r; // mop_r register number
2469 mnumber_t *nnn; // mop_n immediate value
2470 minsn_t *d; // mop_d result (destination) of another instruction
2471 stkvar_ref_t *s; // mop_S stack variable
2472 ea_t g; // mop_v global variable (its linear address)
2473 int b; // mop_b block number (used in jmp,call instructions)
2474 mcallinfo_t *f; // mop_f function call information
2475 lvar_ref_t *l; // mop_l local variable
2476 mop_addr_t *a; // mop_a variable whose address is taken
2477 char *helper; // mop_h helper function name
2478 char *cstr; // mop_str utf8 string constant, user representation
2479 mcases_t *c; // mop_c cases
2480 fnumber_t *fpc; // mop_fn floating point constant
2481 mop_pair_t *pair; // mop_p operand pair
2482 scif_t *scif; // mop_sc scattered operand info
2483 };
2484 // -- End of data fields, member function declarations follow:
2485
2486 void set_impptr_done(void) { oprops |= OPROP_IMPDONE; }
2487 void set_udt(void) { oprops |= OPROP_UDT; }
2488 void set_undef_val(void) { oprops |= OPROP_UDEFVAL; }
2489 void set_lowaddr(void) { oprops |= OPROP_LOWADDR; }
2490 bool is_impptr_done(void) const { return (oprops & OPROP_IMPDONE) != 0; }
2491 bool is_udt(void) const { return (oprops & OPROP_UDT) != 0; }
2492 bool probably_floating(void) const { return (oprops & OPROP_FLOAT) != 0; }
2493 bool is_undef_val(void) const { return (oprops & OPROP_UDEFVAL) != 0; }
2494 bool is_lowaddr(void) const { return (oprops & OPROP_LOWADDR) != 0; }
2495 bool is_ccflags(void) const
2496 {
2497 return (oprops & OPROP_CCFLAGS) != 0
2498 && (t == mop_l || t == mop_v || t == mop_S || t == mop_r);
2499 }
2500 bool is_pcval(void) const
2501 {
2502 return t == mop_n && (oprops & OPROP_CCFLAGS) != 0;
2503 }
2504
2505 mop_t(void) { zero(); }
2506 mop_t(const mop_t &rop) { copy(rop); }
2507 mop_t(mreg_t _r, int _s) : t(mop_r), oprops(0), valnum(0), size(_s), r(_r) {}
2508 mop_t &operator=(const mop_t &rop) { return assign(rop); }
2509 mop_t &hexapi assign(const mop_t &rop);
2510 ~mop_t(void)
2511 {
2512 erase();
2513 }
2514 HEXRAYS_MEMORY_ALLOCATION_FUNCS()
2515 void zero() { t = mop_z; oprops = 0; valnum = 0; size = NOSIZE; nnn = nullptr; }
2516 void hexapi swap(mop_t &rop);
2517 void hexapi erase(void);
2518 void erase_but_keep_size(void) { int s2 = size; erase(); size = s2; }
2519
2520 void hexapi print(qstring *vout, int shins_flags=SHINS_SHORT|SHINS_VALNUM) const;
2521 const char *hexapi dstr() const; // use this function for debugging
2522
2523 //-----------------------------------------------------------------------
2524 // Operand creation
2525 //-----------------------------------------------------------------------
2526 /// Create operand from mlist_t.
2527 /// Example: if LST contains 4 bits for R0.4, our operand will be
2528 /// (t=mop_r, r=R0, size=4)
2529 /// \param mba pointer to microcode
2530 /// \param lst list of locations
2531 /// \param fullsize mba->fullsize
2532 /// \return success
2533 bool hexapi create_from_mlist(mba_t *mba, const mlist_t &lst, sval_t fullsize);
2534
2535 /// Create operand from ivlset_t.
2536 /// Example: if IVS contains [glbvar..glbvar+4), our operand will be
2537 /// (t=mop_v, g=&glbvar, size=4)
2538 /// \param mba pointer to microcode
2539 /// \param ivs set of memory intervals
2540 /// \param fullsize mba->fullsize
2541 /// \return success
2542 bool hexapi create_from_ivlset(mba_t *mba, const ivlset_t &ivs, sval_t fullsize);
2543
2544 /// Create operand from vdloc_t.
2545 /// Example: if LOC contains (type=ALOC_REG1, r=R0), our operand will be
2546 /// (t=mop_r, r=R0, size=_SIZE)
2547 /// \param mba pointer to microcode
2548 /// \param loc location
2549 /// \param _size operand size
2550 /// Note: this function cannot handle scattered locations.
2551 /// \return success
2552 void hexapi create_from_vdloc(mba_t *mba, const vdloc_t &loc, int _size);
2553
2554 /// Create operand from scattered vdloc_t.
2555 /// Example: if LOC is (ALOC_DIST, {EAX.4, EDX.4}) and TYPE is _LARGE_INTEGER,
2556 /// our operand will be
2557 /// (t=mop_sc, scif={EAX.4, EDX.4})
2558 /// \param mba pointer to microcode
2559 /// \param name name of the operand, if available
2560 /// \param type type of the operand, must be present
2561 /// \param loc a scattered location
2562 /// \return success
2563 void hexapi create_from_scattered_vdloc(
2564 mba_t *mba,
2565 const char *name,
2566 tinfo_t type,
2567 const vdloc_t &loc);
2568
2569 /// Create operand from an instruction.
2570 /// This function creates a nested instruction that can be used as an operand.
2571 /// Example: if m="add x,y,z", our operand will be (t=mop_d,d=m).
2572 /// The destination operand of 'add' (z) is lost.
2573 /// \param m instruction to embed into operand. may not be nullptr.
2574 void hexapi create_from_insn(const minsn_t *m);
2575
2576 /// Create an integer constant operand.
2577 /// \param _value value to store in the operand
2578 /// \param _size size of the value in bytes (1,2,4,8)
2579 /// \param _ea address of the processor instruction that made the value
2580 /// \param opnum operand number of the processor instruction
2581 void hexapi make_number(uint64 _value, int _size, ea_t _ea=BADADDR, int opnum=0);
2582
2583 /// Create a floating point constant operand.
2584 /// \param bytes pointer to the floating point value as used by the current
2585 /// processor (e.g. for x86 it must be in IEEE 754)
2586 /// \param _size number of bytes occupied by the constant.
2587 /// \return success
2588 bool hexapi make_fpnum(const void *bytes, size_t _size);
2589
2590 /// Create a register operand without erasing previous data.
2591 /// \param reg micro register number
2592 /// Note: this function does not erase the previous contents of the operand;
2593 /// call erase() if necessary
2595 {
2596 t = mop_r;
2597 r = reg;
2598 }
2599 void _make_reg(mreg_t reg, int _size)
2600 {
2601 t = mop_r;
2602 r = reg;
2603 size = _size;
2604 }
2605 /// Create a register operand.
2606 void make_reg(mreg_t reg) { erase(); _make_reg(reg); }
2607 void make_reg(mreg_t reg, int _size) { erase(); _make_reg(reg, _size); }
2608
2609 /// Create a local variable operand.
2610 /// \param mba pointer to microcode
2611 /// \param idx index into mba->vars
2612 /// \param off offset from the beginning of the variable
2613 /// Note: this function does not erase the previous contents of the operand;
2614 /// call erase() if necessary
2615 void _make_lvar(mba_t *mba, int idx, sval_t off=0)
2616 {
2617 t = mop_l;
2618 l = new lvar_ref_t(mba, idx, off);
2619 }
2620
2621 /// Create a global variable operand without erasing previous data.
2622 /// \param ea address of the variable
2623 /// Note: this function does not erase the previous contents of the operand;
2624 /// call erase() if necessary
2625 void hexapi _make_gvar(ea_t ea);
2626 /// Create a global variable operand.
2627 void hexapi make_gvar(ea_t ea);
2628
2629 /// Create a stack variable operand.
2630 /// \param mba pointer to microcode
2631 /// \param off decompiler stkoff
2632 /// Note: this function does not erase the previous contents of the operand;
2633 /// call erase() if necessary
2634 void _make_stkvar(mba_t *mba, sval_t off)
2635 {
2636 t = mop_S;
2637 s = new stkvar_ref_t(mba, off);
2638 }
2639 void make_stkvar(mba_t *mba, sval_t off) { erase(); _make_stkvar(mba, off); }
2640
2641 /// Create pair of registers.
2642 /// \param loreg register holding the low part of the value
2643 /// \param hireg register holding the high part of the value
2644 /// \param halfsize the size of each of loreg/hireg
2645 void hexapi make_reg_pair(int loreg, int hireg, int halfsize);
2646
2647 /// Create a nested instruction without erasing previous data.
2648 /// \param ins pointer to the instruction to encapsulate into the operand
2649 /// Note: this function does not erase the previous contents of the operand;
2650 /// call erase() if necessary
2651 /// See also create_from_insn, which is higher level
2652 void _make_insn(minsn_t *ins);
2653 /// Create a nested instruction.
2654 void make_insn(minsn_t *ins) { erase(); _make_insn(ins); }
2655
2656 /// Create a block reference operand without erasing previous data.
2657 /// \param blknum block number
2658 /// Note: this function does not erase the previous contents of the operand;
2659 /// call erase() if necessary
2660 void _make_blkref(int blknum)
2661 {
2662 t = mop_b;
2663 b = blknum;
2664 }
2665 /// Create a global variable operand.
2666 void make_blkref(int blknum) { erase(); _make_blkref(blknum); }
2667
2668 /// Create a helper operand.
2669 /// A helper operand usually keeps a built-in function name like "va_start"
2670 /// It is essentially just an arbitrary identifier without any additional info.
2671 void hexapi make_helper(const char *name);
2672
2673 /// Create a constant string operand.
2674 void _make_strlit(const char *str)
2675 {
2676 t = mop_str;
2677 cstr = ::qstrdup(str);
2678 }
2679 void _make_strlit(qstring *str) // str is consumed
2680 {
2681 t = mop_str;
2682 cstr = str->extract();
2683 }
2684
2685 /// Create a call info operand without erasing previous data.
2686 /// \param fi callinfo
2687 /// Note: this function does not erase the previous contents of the operand;
2688 /// call erase() if necessary
2690 {
2691 t = mop_f;
2692 f = fi;
2693 }
2694
2695 /// Create a 'switch cases' operand without erasing previous data.
2696 /// Note: this function does not erase the previous contents of the operand;
2697 /// call erase() if necessary
2698 void _make_cases(mcases_t *_cases)
2699 {
2700 t = mop_c;
2701 c = _cases;
2702 }
2703
2704 /// Create a pair operand without erasing previous data.
2705 /// Note: this function does not erase the previous contents of the operand;
2706 /// call erase() if necessary
2708 {
2709 t = mop_p;
2710 pair = _pair;
2711 }
2712
2713 //-----------------------------------------------------------------------
2714 // Various operand tests
2715 //-----------------------------------------------------------------------
2716 bool empty(void) const { return t == mop_z; }
2717 /// Is a register operand?
2718 /// See also get_mreg_name()
2719 bool is_reg(void) const { return t == mop_r; }
2720 /// Is the specified register?
2721 bool is_reg(mreg_t _r) const { return t == mop_r && r == _r; }
2722 /// Is the specified register of the specified size?
2723 bool is_reg(mreg_t _r, int _size) const { return t == mop_r && r == _r && size == _size; }
2724 /// Is a list of arguments?
2725 bool is_arglist(void) const { return t == mop_f; }
2726 /// Is a condition code?
2727 bool is_cc(void) const { return is_reg() && r >= mr_cf && r < mr_first; }
2728 /// Is a bit register?
2729 /// This includes condition codes and eventually other bit registers
2730 static bool hexapi is_bit_reg(mreg_t reg);
2731 bool is_bit_reg(void) const { return is_reg() && is_bit_reg(r); }
2732 /// Is a kernel register?
2733 bool is_kreg(void) const;
2734 /// Is a block reference to the specified block?
2735 bool is_mob(int serial) const { return t == mop_b && b == serial; }
2736 /// Is a scattered operand?
2737 bool is_scattered(void) const { return t == mop_sc; }
2738 /// Is address of a global memory cell?
2739 bool is_glbaddr() const;
2740 /// Is address of the specified global memory cell?
2741 bool is_glbaddr(ea_t ea) const;
2742 /// Is address of a stack variable?
2743 bool is_stkaddr() const;
2744 /// Is a sub-instruction?
2745 bool is_insn(void) const { return t == mop_d; }
2746 /// Is a sub-instruction with the specified opcode?
2747 bool is_insn(mcode_t code) const;
2748 /// Has any side effects?
2749 /// \param include_ldx_and_divs consider ldx/div/mod as having side effects?
2750 bool has_side_effects(bool include_ldx_and_divs=false) const;
2751 /// Is it possible for the operand to use aliased memory?
2752 bool hexapi may_use_aliased_memory(void) const;
2753
2754 /// Are the possible values of the operand only 0 and 1?
2755 /// This function returns true for 0/1 constants, bit registers,
2756 /// the result of 'set' insns, etc.
2757 bool hexapi is01(void) const;
2758
2759 /// Does the high part of the operand consist of the sign bytes?
2760 /// \param nbytes number of bytes that were sign extended.
2761 /// the remaining size-nbytes high bytes must be sign bytes
2762 /// Example: is_sign_extended_from(xds.4(op.1), 1) -> true
2763 /// because the high 3 bytes are certainly sign bits
2764 bool hexapi is_sign_extended_from(int nbytes) const;
2765
2766 /// Does the high part of the operand consist of zero bytes?
2767 /// \param nbytes number of bytes that were zero extended.
2768 /// the remaining size-nbytes high bytes must be zero
2769 /// Example: is_zero_extended_from(xdu.8(op.1), 2) -> true
2770 /// because the high 6 bytes are certainly zero
2771 bool hexapi is_zero_extended_from(int nbytes) const;
2772
2773 /// Does the high part of the operand consist of zero or sign bytes?
2774 bool is_extended_from(int nbytes, bool is_signed) const
2775 {
2776 if ( is_signed )
2777 return is_sign_extended_from(nbytes);
2778 else
2779 return is_zero_extended_from(nbytes);
2780 }
2781
2782 //-----------------------------------------------------------------------
2783 // Comparisons
2784 //-----------------------------------------------------------------------
2785 /// Compare operands.
2786 /// This is the main comparison function for operands.
2787 /// \param rop operand to compare with
2788 /// \param eqflags combination of \ref EQ_ bits
2789 bool hexapi equal_mops(const mop_t &rop, int eqflags) const;
2790 bool operator==(const mop_t &rop) const { return equal_mops(rop, 0); }
2791 bool operator!=(const mop_t &rop) const { return !equal_mops(rop, 0); }
2792
2793 /// Lexographical operand comparison.
2794 /// It can be used to store mop_t in various containers, like std::set
2795 bool operator <(const mop_t &rop) const { return lexcompare(rop) < 0; }
2796 friend int lexcompare(const mop_t &a, const mop_t &b) { return a.lexcompare(b); }
2797 int hexapi lexcompare(const mop_t &rop) const;
2798
2799 //-----------------------------------------------------------------------
2800 // Visiting operand parts
2801 //-----------------------------------------------------------------------
2802 /// Visit the operand and all its sub-operands.
2803 /// This function visits the current operand as well.
2804 /// \param mv visitor object
2805 /// \param type operand type
2806 /// \param is_target is a destination operand?
2807 int hexapi for_all_ops(
2808 mop_visitor_t &mv,
2809 const tinfo_t *type=nullptr,
2810 bool is_target=false);
2811
2812 /// Visit all sub-operands of a scattered operand.
2813 /// This function does not visit the current operand, only its sub-operands.
2814 /// All sub-operands are synthetic and are destroyed after the visitor.
2815 /// This function works only with scattered operands.
2816 /// \param sv visitor object
2817 int hexapi for_all_scattered_submops(scif_visitor_t &sv) const;
2818
2819 //-----------------------------------------------------------------------
2820 // Working with mop_n operands
2821 //-----------------------------------------------------------------------
2822 /// Retrieve value of a constant integer operand.
2823 /// These functions can be called only for mop_n operands.
2824 /// See is_constant() that can be called on any operand.
2825 uint64 value(bool is_signed) const { return extend_sign(nnn->value, size, is_signed); }
2826 int64 signed_value(void) const { return value(true); }
2827 uint64 unsigned_value(void) const { return value(false); }
2828 void update_numop_value(uint64 val)
2829 {
2830 nnn->update_value(extend_sign(val, size, false));
2831 }
2832
2833 /// Retrieve value of a constant integer operand.
2834 /// \param out pointer to the output buffer
2835 /// \param is_signed should treat the value as signed
2836 /// \return true if the operand is mop_n
2837 bool hexapi is_constant(uint64 *out=nullptr, bool is_signed=true) const;
2838
2839 bool is_equal_to(uint64 n, bool is_signed=true) const
2840 {
2841 uint64 v;
2842 return is_constant(&v, is_signed) && v == n;
2843 }
2844 bool is_zero(void) const { return is_equal_to(0, false); }
2845 bool is_one(void) const { return is_equal_to(1, false); }
2846 bool is_positive_constant(void) const
2847 {
2848 uint64 v;
2849 return is_constant(&v, true) && int64(v) > 0;
2850 }
2851 bool is_negative_constant(void) const
2852 {
2853 uint64 v;
2854 return is_constant(&v, true) && int64(v) < 0;
2855 }
2856
2857 //-----------------------------------------------------------------------
2858 // Working with mop_S operands
2859 //-----------------------------------------------------------------------
2860 /// Retrieve the referenced stack variable.
2861 /// \param p_off if specified, will hold IDA stkoff after the call.
2862 /// \return pointer to the stack variable
2863 member_t *get_stkvar(uval_t *p_off) const { return s->get_stkvar(p_off); }
2864
2865 /// Get the referenced stack offset.
2866 /// This function can also handle mop_sc if it is entirely mapped into
2867 /// a continuous stack region.
2868 /// \param p_off the output buffer
2869 /// \return success
2870 bool hexapi get_stkoff(sval_t *p_off) const;
2871
2872 //-----------------------------------------------------------------------
2873 // Working with mop_d operands
2874 //-----------------------------------------------------------------------
2875 /// Get subinstruction of the operand.
2876 /// If the operand has a subinstruction with the specified opcode, return it.
2877 /// \param code desired opcode
2878 /// \return pointer to the instruction or nullptr
2879 const minsn_t *get_insn(mcode_t code) const;
2880 minsn_t *get_insn(mcode_t code);
2881
2882 //-----------------------------------------------------------------------
2883 // Transforming operands
2884 //-----------------------------------------------------------------------
2885 /// Make the low part of the operand.
2886 /// This function takes into account the memory endianness (byte sex)
2887 /// \param width the desired size of the operand part in bytes
2888 /// \return success
2889 bool hexapi make_low_half(int width);
2890
2891 /// Make the high part of the operand.
2892 /// This function takes into account the memory endianness (byte sex)
2893 /// \param width the desired size of the operand part in bytes
2894 /// \return success
2895 bool hexapi make_high_half(int width);
2896
2897 /// Make the first part of the operand.
2898 /// This function does not care about the memory endianness
2899 /// \param width the desired size of the operand part in bytes
2900 /// \return success
2901 bool hexapi make_first_half(int width);
2902
2903 /// Make the second part of the operand.
2904 /// This function does not care about the memory endianness
2905 /// \param width the desired size of the operand part in bytes
2906 /// \return success
2907 bool hexapi make_second_half(int width);
2908
2909 /// Shift the operand.
2910 /// This function shifts only the beginning of the operand.
2911 /// The operand size will be changed.
2912 /// Examples: shift_mop(AH.1, -1) -> AX.2
2913 /// shift_mop(qword_00000008.8, 4) -> dword_0000000C.4
2914 /// shift_mop(xdu.8(op.4), 4) -> #0.4
2915 /// shift_mop(#0x12345678.4, 3) -> #12.1
2916 /// \param offset shift count (the number of bytes to shift)
2917 /// \return success
2918 bool hexapi shift_mop(int offset);
2919
2920 /// Change the operand size.
2921 /// Examples: change_size(AL.1, 2) -> AX.2
2922 /// change_size(qword_00000008.8, 4) -> dword_00000008.4
2923 /// change_size(xdu.8(op.4), 4) -> op.4
2924 /// change_size(#0x12345678.4, 1) -> #0x78.1
2925 /// \param nsize new operand size
2926 /// \param sideff may modify the database because of the size change?
2927 /// \return success
2928 bool hexapi change_size(int nsize, side_effect_t sideff=WITH_SIDEFF);
2929 bool double_size(side_effect_t sideff=WITH_SIDEFF) { return change_size(size*2, sideff); }
2930
2931 /// Move subinstructions with side effects out of the operand.
2932 /// If we decide to delete an instruction operand, it is a good idea to
2933 /// call this function. Alternatively we should skip such operands
2934 /// by calling mop_t::has_side_effects()
2935 /// For example, if we transform: jnz x, x, @blk => goto @blk
2936 /// then we must call this function before deleting the X operands.
2937 /// \param blk current block
2938 /// \param top top level instruction that contains our operand
2939 /// \param moved_calls pointer to the boolean that will track if all side
2940 /// effects get handled correctly. must be false initially.
2941 /// \return false failed to preserve a side effect, it is not safe to
2942 /// delete the operand
2943 /// true no side effects or successfully preserved them
2944 bool hexapi preserve_side_effects(
2945 mblock_t *blk,
2946 minsn_t *top,
2947 bool *moved_calls=nullptr);
2948
2949 /// Apply a unary opcode to the operand.
2950 /// \param mcode opcode to apply. it must accept 'l' and 'd' operands
2951 /// but not 'r'. examples: m_low/m_high/m_xds/m_xdu
2952 /// \param ea value of minsn_t::ea for the newly created insruction
2953 /// \param newsize new operand size
2954 /// Example: apply_ld_mcode(m_low) will convert op => low(op)
2955 void hexapi apply_ld_mcode(mcode_t mcode, ea_t ea, int newsize);
2956 void apply_xdu(ea_t ea, int newsize) { apply_ld_mcode(m_xdu, ea, newsize); }
2957 void apply_xds(ea_t ea, int newsize) { apply_ld_mcode(m_xds, ea, newsize); }
2958};
2959DECLARE_TYPE_AS_MOVABLE(mop_t);
2960
2961/// Pair of operands
2963{
2964public:
2965 mop_t lop; ///< low operand
2966 mop_t hop; ///< high operand
2967 HEXRAYS_MEMORY_ALLOCATION_FUNCS()
2968};
2969
2970/// Address of an operand (mop_l, mop_v, mop_S, mop_r)
2971class mop_addr_t : public mop_t
2972{
2973public:
2974 int insize; // how many bytes of the pointed operand can be read
2975 int outsize; // how many bytes of the pointed operand can be written
2976
2977 mop_addr_t(): insize(NOSIZE), outsize(NOSIZE) {}
2978 mop_addr_t(const mop_addr_t &ra)
2979 : mop_t(ra), insize(ra.insize), outsize(ra.outsize) {}
2980 mop_addr_t(const mop_t &ra, int isz, int osz)
2981 : mop_t(ra), insize(isz), outsize(osz) {}
2982
2983 mop_addr_t &operator=(const mop_addr_t &rop)
2984 {
2985 *(mop_t *)this = mop_t(rop);
2986 insize = rop.insize;
2987 outsize = rop.outsize;
2988 return *this;
2989 }
2990 int lexcompare(const mop_addr_t &ra) const
2991 {
2992 int code = mop_t::lexcompare(ra);
2993 return code != 0 ? code
2994 : insize != ra.insize ? (insize-ra.insize)
2995 : outsize != ra.outsize ? (outsize-ra.outsize)
2996 : 0;
2997 }
2998};
2999
3000/// A call argument
3001class mcallarg_t : public mop_t // #callarg
3002{
3003public:
3004 ea_t ea = BADADDR; ///< address where the argument was initialized.
3005 ///< BADADDR means unknown.
3006 tinfo_t type; ///< formal argument type
3007 qstring name; ///< formal argument name
3008 argloc_t argloc; ///< ida argloc
3009 uint32 flags = 0; ///< FAI_...
3010
3011 mcallarg_t() {}
3012 mcallarg_t(const mop_t &rarg) : mop_t(rarg) {}
3013 void copy_mop(const mop_t &op) { *(mop_t *)this = op; }
3014 void hexapi print(qstring *vout, int shins_flags=SHINS_SHORT|SHINS_VALNUM) const;
3015 const char *hexapi dstr() const;
3016 void hexapi set_regarg(mreg_t mr, int sz, const tinfo_t &tif);
3017 void set_regarg(mreg_t mr, const tinfo_t &tif)
3018 {
3019 set_regarg(mr, tif.get_size(), tif);
3020 }
3021 void set_regarg(mreg_t mr, char dt, type_sign_t sign = type_unsigned)
3022 {
3023 int sz = get_dtype_size(dt);
3024 set_regarg(mr, sz, get_int_type_by_width_and_sign(sz, sign));
3025 }
3026 void make_int(int val, ea_t val_ea, int opno = 0)
3027 {
3028 type = tinfo_t(BTF_INT);
3029 make_number(val, inf_get_cc_size_i(), val_ea, opno);
3030 }
3031 void make_uint(int val, ea_t val_ea, int opno = 0)
3032 {
3033 type = tinfo_t(BTF_UINT);
3034 make_number(val, inf_get_cc_size_i(), val_ea, opno);
3035 }
3036};
3037DECLARE_TYPE_AS_MOVABLE(mcallarg_t);
3038typedef qvector<mcallarg_t> mcallargs_t;
3039
3040/// Function roles.
3041/// They are used to calculate use/def lists and to recognize functions
3042/// without using string comparisons.
3044{
3045 ROLE_UNK, ///< unknown function role
3046 ROLE_EMPTY, ///< empty, does not do anything (maybe spoils regs)
3047 ROLE_MEMSET, ///< memset(void *dst, uchar value, size_t count);
3048 ROLE_MEMSET32, ///< memset32(void *dst, uint32 value, size_t count);
3049 ROLE_MEMSET64, ///< memset64(void *dst, uint64 value, size_t count);
3050 ROLE_MEMCPY, ///< memcpy(void *dst, const void *src, size_t count);
3051 ROLE_STRCPY, ///< strcpy(char *dst, const char *src);
3052 ROLE_STRLEN, ///< strlen(const char *src);
3053 ROLE_STRCAT, ///< strcat(char *dst, const char *src);
3054 ROLE_TAIL, ///< char *tail(const char *str);
3055 ROLE_BUG, ///< BUG() helper macro: never returns, causes exception
3056 ROLE_ALLOCA, ///< alloca() function
3057 ROLE_BSWAP, ///< bswap() function (any size)
3058 ROLE_PRESENT, ///< present() function (used in patterns)
3059 ROLE_CONTAINING_RECORD, ///< CONTAINING_RECORD() macro
3060 ROLE_FASTFAIL, ///< __fastfail()
3061 ROLE_READFLAGS, ///< __readeflags, __readcallersflags
3062 ROLE_IS_MUL_OK, ///< is_mul_ok
3063 ROLE_SATURATED_MUL, ///< saturated_mul
3064 ROLE_BITTEST, ///< [lock] bt
3065 ROLE_BITTESTANDSET, ///< [lock] bts
3066 ROLE_BITTESTANDRESET, ///< [lock] btr
3068 ROLE_VA_ARG, ///< va_arg() macro
3069 ROLE_VA_COPY, ///< va_copy() function
3070 ROLE_VA_START, ///< va_start() function
3071 ROLE_VA_END, ///< va_end() function
3072 ROLE_ROL, ///< rotate left
3073 ROLE_ROR, ///< rotate right
3074 ROLE_CFSUB3, ///< carry flag after subtract with carry
3075 ROLE_OFSUB3, ///< overflow flag after subtract with carry
3076 ROLE_ABS, ///< integer absolute value
3077 ROLE_3WAYCMP0, ///< 3-way compare helper, returns -1/0/1
3078 ROLE_3WAYCMP1, ///< 3-way compare helper, returns 0/1/2
3079 ROLE_WMEMCPY, ///< wchar_t *wmemcpy(wchar_t *dst, const wchar_t *src, size_t n)
3080 ROLE_WMEMSET, ///< wchar_t *wmemset(wchar_t *dst, wchar_t wc, size_t n)
3081 ROLE_WCSCPY, ///< wchar_t *wcscpy(wchar_t *dst, const wchar_t *src);
3082 ROLE_WCSLEN, ///< size_t wcslen(const wchar_t *s)
3083 ROLE_WCSCAT, ///< wchar_t *wcscat(wchar_t *dst, const wchar_t *src)
3084 ROLE_SSE_CMP4, ///< e.g. _mm_cmpgt_ss
3085 ROLE_SSE_CMP8, ///< e.g. _mm_cmpgt_sd
3086};
3087
3088/// \defgroup FUNC_NAME_ Well known function names
3089//@{
3090#define FUNC_NAME_MEMCPY "memcpy"
3091#define FUNC_NAME_WMEMCPY "wmemcpy"
3092#define FUNC_NAME_MEMSET "memset"
3093#define FUNC_NAME_WMEMSET "wmemset"
3094#define FUNC_NAME_MEMSET32 "memset32"
3095#define FUNC_NAME_MEMSET64 "memset64"
3096#define FUNC_NAME_STRCPY "strcpy"
3097#define FUNC_NAME_WCSCPY "wcscpy"
3098#define FUNC_NAME_STRLEN "strlen"
3099#define FUNC_NAME_WCSLEN "wcslen"
3100#define FUNC_NAME_STRCAT "strcat"
3101#define FUNC_NAME_WCSCAT "wcscat"
3102#define FUNC_NAME_TAIL "tail"
3103#define FUNC_NAME_VA_ARG "va_arg"
3104#define FUNC_NAME_EMPTY "$empty"
3105#define FUNC_NAME_PRESENT "$present"
3106#define FUNC_NAME_CONTAINING_RECORD "CONTAINING_RECORD"
3107//@}
3108
3109
3110// the default 256 function arguments is too big, we use a lower value
3111#undef MAX_FUNC_ARGS
3112#define MAX_FUNC_ARGS 64
3113
3114/// Information about a call
3115class mcallinfo_t // #callinfo
3116{
3117public:
3118 ea_t callee; ///< address of the called function, if known
3119 int solid_args; ///< number of solid args.
3120 ///< there may be variadic args in addtion
3121 int call_spd; ///< sp value at call insn
3122 int stkargs_top; ///< first offset past stack arguments
3123 cm_t cc; ///< calling convention
3124 mcallargs_t args; ///< call arguments
3125 mopvec_t retregs; ///< return register(s) (e.g., AX, AX:DX, etc.)
3126 ///< this vector is built from return_regs
3127 tinfo_t return_type; ///< type of the returned value
3128 argloc_t return_argloc; ///< location of the returned value
3129
3130 mlist_t return_regs; ///< list of values returned by the function
3131 mlist_t spoiled; ///< list of spoiled locations (includes return_regs)
3132 mlist_t pass_regs; ///< passthrough registers: registers that depend on input
3133 ///< values (subset of spoiled)
3134 ivlset_t visible_memory; ///< what memory is visible to the call?
3135 mlist_t dead_regs; ///< registers defined by the function but never used.
3136 ///< upon propagation we do the following:
3137 ///< - dead_regs += return_regs
3138 ///< - retregs.clear() since the call is propagated
3139 int flags; ///< combination of \ref FCI_... bits
3140/// \defgroup FCI_ Call properties
3141//@{
3142#define FCI_PROP 0x001 ///< call has been propagated
3143#define FCI_DEAD 0x002 ///< some return registers were determined dead
3144#define FCI_FINAL 0x004 ///< call type is final, should not be changed
3145#define FCI_NORET 0x008 ///< call does not return
3146#define FCI_PURE 0x010 ///< pure function
3147#define FCI_NOSIDE 0x020 ///< call does not have side effects
3148#define FCI_SPLOK 0x040 ///< spoiled/visible_memory lists have been
3149 ///< optimized. for some functions we can reduce them
3150 ///< as soon as information about the arguments becomes
3151 ///< available. in order not to try optimize them again
3152 ///< we use this bit.
3153#define FCI_HASCALL 0x080 ///< A function is an synthetic helper combined
3154 ///< from several instructions and at least one
3155 ///< of them was a call to a real functions
3156#define FCI_HASFMT 0x100 ///< A variadic function with recognized
3157 ///< printf- or scanf-style format string
3158#define FCI_EXPLOCS 0x400 ///< all arglocs are specified explicitly
3159//@}
3160 funcrole_t role; ///< function role
3161 type_attrs_t fti_attrs; ///< extended function attributes
3162
3163 mcallinfo_t(ea_t _callee=BADADDR, int _sargs=0)
3164 : callee(_callee), solid_args(_sargs), call_spd(0), stkargs_top(0),
3165 cc(CM_CC_INVALID), flags(0), role(ROLE_UNK) {}
3166 HEXRAYS_MEMORY_ALLOCATION_FUNCS()
3167 int hexapi lexcompare(const mcallinfo_t &f) const;
3168 bool hexapi set_type(const tinfo_t &type);
3169 tinfo_t hexapi get_type(void) const;
3170 bool is_vararg(void) const { return is_vararg_cc(cc); }
3171 void hexapi print(qstring *vout, int size=-1, int shins_flags=SHINS_SHORT|SHINS_VALNUM) const;
3172 const char *hexapi dstr() const;
3173};
3174
3175/// List of switch cases and targets
3176class mcases_t // #cases
3177{
3178public:
3179 casevec_t values; ///< expression values for each target
3180 intvec_t targets; ///< target block numbers
3181
3182 void swap(mcases_t &r) { values.swap(r.values); targets.swap(r.targets); }
3183 DECLARE_COMPARISONS(mcases_t);
3184 HEXRAYS_MEMORY_ALLOCATION_FUNCS()
3185 bool empty(void) const { return targets.empty(); }
3186 size_t size(void) const { return targets.size(); }
3187 void resize(int s) { values.resize(s); targets.resize(s); }
3188 void hexapi print(qstring *vout) const;
3189 const char *hexapi dstr() const;
3190};
3191
3192//-------------------------------------------------------------------------
3193/// Value offset (microregister number or stack offset)
3195{
3196 sval_t off; ///< register number or stack offset
3197 mopt_t type; ///< mop_r - register, mop_S - stack, mop_z - undefined
3198
3199 voff_t() : off(-1), type(mop_z) {}
3200 voff_t(mopt_t _type, sval_t _off) : off(_off), type(_type) {}
3201 voff_t(const mop_t &op) : off(-1), type(mop_z)
3202 {
3203 if ( op.is_reg() || op.t == mop_S )
3204 set(op.t, op.is_reg() ? op.r : op.s->off);
3205 }
3206
3207 void set(mopt_t _type, sval_t _off) { type = _type; off = _off; }
3208 void set_stkoff(sval_t stkoff) { set(mop_S, stkoff); }
3209 void set_reg(mreg_t mreg) { set(mop_r, mreg); }
3210 void undef() { set(mop_z, -1); }
3211
3212 bool defined() const { return type != mop_z; }
3213 bool is_reg() const { return type == mop_r; }
3214 bool is_stkoff() const { return type == mop_S; }
3215 mreg_t get_reg() const { QASSERT(51892, is_reg()); return off; }
3216 sval_t get_stkoff() const { QASSERT(51893, is_stkoff()); return off; }
3217
3218 void inc(sval_t delta) { off += delta; }
3219 voff_t add(int width) const { return voff_t(type, off+width); }
3220 sval_t diff(const voff_t &r) const { QASSERT(51894, type == r.type); return off - r.off; }
3221
3222 DECLARE_COMPARISONS(voff_t)
3223 {
3224 int code = ::compare(type, r.type);
3225 return code != 0 ? code : ::compare(off, r.off);
3226 }
3227};
3228
3229//-------------------------------------------------------------------------
3230/// Value interval (register or stack range)
3232{
3233 int size; ///< Interval size in bytes
3234
3235 vivl_t(mopt_t _type = mop_z, sval_t _off = -1, int _size = 0)
3236 : voff_t(_type, _off), size(_size) {}
3237 vivl_t(const class chain_t &ch);
3238 vivl_t(const mop_t &op) : voff_t(op), size(op.size) {}
3239
3240 // Make a value interval
3241 void set(mopt_t _type, sval_t _off, int _size = 0)
3242 { voff_t::set(_type, _off); size = _size; }
3243 void set(const voff_t &voff, int _size)
3244 { set(voff.type, voff.off, _size); }
3245 void set_stkoff(sval_t stkoff, int sz = 0) { set(mop_S, stkoff, sz); }
3246 void set_reg (mreg_t mreg, int sz = 0) { set(mop_r, mreg, sz); }
3247
3248 /// Extend a value interval using another value interval of the same type
3249 /// \return success
3250 bool hexapi extend_to_cover(const vivl_t &r);
3251
3252 /// Intersect value intervals the same type
3253 /// \return size of the resulting intersection
3254 uval_t hexapi intersect(const vivl_t &r);
3255
3256 /// Do two value intervals overlap?
3257 bool overlap(const vivl_t &r) const
3258 {
3259 return type == r.type
3260 && interval::overlap(off, size, r.off, r.size);
3261 }
3262 /// Does our value interval include another?
3263 bool includes(const vivl_t &r) const
3264 {
3265 return type == r.type
3266 && interval::includes(off, size, r.off, r.size);
3267 }
3268
3269 /// Does our value interval contain the specified value offset?
3270 bool contains(const voff_t &voff2) const
3271 {
3272 return type == voff2.type
3273 && interval::contains(off, size, voff2.off);
3274 }
3275
3276 // Comparisons
3277 DECLARE_COMPARISONS(vivl_t)
3278 {
3279 int code = voff_t::compare(r);
3280 return code; //return code != 0 ? code : ::compare(size, r.size);
3281 }
3282 bool operator==(const mop_t &mop) const
3283 {
3284 return type == mop.t && off == (mop.is_reg() ? mop.r : mop.s->off);
3285 }
3286 void hexapi print(qstring *vout) const;
3287 const char *hexapi dstr() const;
3288};
3289
3290//-------------------------------------------------------------------------
3291/// ud (use->def) and du (def->use) chain.
3292/// We store in chains only the block numbers, not individual instructions
3293/// See https://en.wikipedia.org/wiki/Use-define_chain
3294class chain_t : public intvec_t // sequence of block numbers
3295{
3296 voff_t k; ///< Value offset of the chain.
3297 ///< (what variable is this chain about)
3298
3299public:
3300 int width; ///< size of the value in bytes
3301 int varnum; ///< allocated variable index (-1 - not allocated yet)
3302 uchar flags; ///< combination \ref CHF_ bits
3303/// \defgroup CHF_ Chain properties
3304//@{
3305#define CHF_INITED 0x01 ///< is chain initialized? (valid only after lvar allocation)
3306#define CHF_REPLACED 0x02 ///< chain operands have been replaced?
3307#define CHF_OVER 0x04 ///< overlapped chain
3308#define CHF_FAKE 0x08 ///< fake chain created by widen_chains()
3309#define CHF_PASSTHRU 0x10 ///< pass-thru chain, must use the input variable to the block
3310#define CHF_TERM 0x20 ///< terminating chain; the variable does not survive across the block
3311//@}
3312 chain_t() : width(0), varnum(-1), flags(CHF_INITED) {}
3313 chain_t(mopt_t t, sval_t off, int w=1, int v=-1)
3314 : k(t, off), width(w), varnum(v), flags(CHF_INITED) {}
3315 chain_t(const voff_t &_k, int w=1)
3316 : k(_k), width(w), varnum(-1), flags(CHF_INITED) {}
3317 void set_value(const chain_t &r)
3318 { width = r.width; varnum = r.varnum; flags = r.flags; *(intvec_t *)this = (intvec_t &)r; }
3319 const voff_t &key() const { return k; }
3320 bool is_inited(void) const { return (flags & CHF_INITED) != 0; }
3321 bool is_reg(void) const { return k.is_reg(); }
3322 bool is_stkoff(void) const { return k.is_stkoff(); }
3323 bool is_replaced(void) const { return (flags & CHF_REPLACED) != 0; }
3324 bool is_overlapped(void) const { return (flags & CHF_OVER) != 0; }
3325 bool is_fake(void) const { return (flags & CHF_FAKE) != 0; }
3326 bool is_passreg(void) const { return (flags & CHF_PASSTHRU) != 0; }
3327 bool is_term(void) const { return (flags & CHF_TERM) != 0; }
3328 void set_inited(bool b) { setflag(flags, CHF_INITED, b); }
3329 void set_replaced(bool b) { setflag(flags, CHF_REPLACED, b); }
3330 void set_overlapped(bool b) { setflag(flags, CHF_OVER, b); }
3331 void set_term(bool b) { setflag(flags, CHF_TERM, b); }
3332 mreg_t get_reg() const { return k.get_reg(); }
3333 sval_t get_stkoff() const { return k.get_stkoff(); }
3334 bool overlap(const chain_t &r) const
3335 { return k.type == r.k.type && interval::overlap(k.off, width, r.k.off, r.width); }
3336 bool includes(const chain_t &r) const
3337 { return k.type == r.k.type && interval::includes(k.off, width, r.k.off, r.width); }
3338 const voff_t endoff() const { return k.add(width); }
3339
3340 bool operator<(const chain_t &r) const { return key() < r.key(); }
3341
3342 void hexapi print(qstring *vout) const;
3343 const char *hexapi dstr() const;
3344 /// Append the contents of the chain to the specified list of locations.
3345 void hexapi append_list(const mba_t *mba, mlist_t *list) const;
3346 void clear_varnum(void) { varnum = -1; set_replaced(false); }
3347};
3348
3349//-------------------------------------------------------------------------
3350#if defined(__NT__)
3351#define SIZEOF_BLOCK_CHAINS 24
3352#elif defined(__MAC__)
3353#define SIZEOF_BLOCK_CHAINS 32
3354#else
3355#define SIZEOF_BLOCK_CHAINS 56
3356#endif
3357/// Chains of one block.
3358/// Please note that this class is based on std::map and it must be accessed
3359/// using the block_chains_begin(), block_chains_find() and similar functions.
3360/// This is required because different compilers use different implementations
3361/// of std::map. However, since the size of std::map depends on the compilation
3362/// options, we replace it with a byte array.
3364{
3365 size_t body[SIZEOF_BLOCK_CHAINS/sizeof(size_t)]; // opaque std::set, uncopyable
3366public:
3367
3368 /// Get chain for the specified register
3369 /// \param reg register number
3370 /// \param width size of register in bytes
3371 const chain_t *get_reg_chain(mreg_t reg, int width=1) const
3372 { return get_chain((chain_t(mop_r, reg, width))); }
3373 chain_t *get_reg_chain(mreg_t reg, int width=1)
3374 { return get_chain((chain_t(mop_r, reg, width))); }
3375
3376 /// Get chain for the specified stack offset
3377 /// \param off stack offset
3378 /// \param width size of stack value in bytes
3379 const chain_t *get_stk_chain(sval_t off, int width=1) const
3380 { return get_chain(chain_t(mop_S, off, width)); }
3381 chain_t *get_stk_chain(sval_t off, int width=1)
3382 { return get_chain(chain_t(mop_S, off, width)); }
3383
3384 /// Get chain for the specified value offset.
3385 /// \param k value offset (register number or stack offset)
3386 /// \param width size of value in bytes
3387 const chain_t *get_chain(const voff_t &k, int width=1) const
3388 { return get_chain(chain_t(k, width)); }
3389 chain_t *get_chain(const voff_t &k, int width=1)
3390 { return (chain_t*)((const block_chains_t *)this)->get_chain(k, width); }
3391
3392 /// Get chain similar to the specified chain
3393 /// \param ch chain to search for. only its 'k' and 'width' are used.
3394 const chain_t *hexapi get_chain(const chain_t &ch) const;
3395 chain_t *get_chain(const chain_t &ch)
3396 { return (chain_t*)((const block_chains_t *)this)->get_chain(ch); }
3397
3398 void hexapi print(qstring *vout) const;
3399 const char *hexapi dstr() const;
3400 HEXRAYS_MEMORY_ALLOCATION_FUNCS()
3401};
3402//-------------------------------------------------------------------------
3403/// Chain visitor class
3405{
3406 block_chains_t *parent; ///< parent of the current chain
3407 chain_visitor_t(void) : parent(nullptr) {}
3408 virtual int idaapi visit_chain(int nblock, chain_t &ch) = 0;
3409};
3410
3411//-------------------------------------------------------------------------
3412/// Graph chains.
3413/// This class represents all ud and du chains of the decompiled function
3414typedef qvector<block_chains_t> block_chains_vec_t;
3416{
3417 int lock; ///< are chained locked? (in-use)
3418public:
3419 graph_chains_t(void) : lock(0) {}
3420 ~graph_chains_t(void) { QASSERT(50444, !lock); }
3421 /// Visit all chains
3422 /// \param cv chain visitor
3423 /// \param gca_flags combination of GCA_ bits
3424 int hexapi for_all_chains(chain_visitor_t &cv, int gca_flags);
3425 /// \defgroup GCA_ chain visitor flags
3426 //@{
3427#define GCA_EMPTY 0x01 ///< include empty chains
3428#define GCA_SPEC 0x02 ///< include chains for special registers
3429#define GCA_ALLOC 0x04 ///< enumerate only allocated chains
3430#define GCA_NALLOC 0x08 ///< enumerate only non-allocated chains
3431#define GCA_OFIRST 0x10 ///< consider only chains of the first block
3432#define GCA_OLAST 0x20 ///< consider only chains of the last block
3433 //@}
3434 /// Are the chains locked?
3435 /// It is a good idea to lock the chains before using them. This ensures
3436 /// that they won't be recalculated and reallocated during the use.
3437 /// See the \ref chain_keeper_t class for that.
3438 bool is_locked(void) const { return lock != 0; }
3439 /// Lock the chains
3440 void acquire(void) { lock++; }
3441 /// Unlock the chains
3442 void hexapi release(void);
3443 void swap(graph_chains_t &r)
3444 {
3445 qvector<block_chains_t>::swap(r);
3446 std::swap(lock, r.lock);
3447 }
3448};
3449//-------------------------------------------------------------------------
3450/// Microinstruction class #insn
3452{
3453 void hexapi init(ea_t _ea);
3454 void hexapi copy(const minsn_t &m);
3455public:
3456 mcode_t opcode; ///< instruction opcode
3457 int iprops; ///< combination of \ref IPROP_ bits
3458 minsn_t *next; ///< next insn in doubly linked list. check also nexti()
3459 minsn_t *prev; ///< prev insn in doubly linked list. check also previ()
3460 ea_t ea; ///< instruction address
3461 mop_t l; ///< left operand
3462 mop_t r; ///< right operand
3463 mop_t d; ///< destination operand
3464
3465 /// \defgroup IPROP_ instruction property bits
3466 //@{
3467 // bits to be used in patterns:
3468#define IPROP_OPTIONAL 0x0001 ///< optional instruction
3469#define IPROP_PERSIST 0x0002 ///< persistent insn; they are not destroyed
3470#define IPROP_WILDMATCH 0x0004 ///< match multiple insns
3471
3472 // instruction attributes:
3473#define IPROP_CLNPOP 0x0008 ///< the purpose of the instruction is to clean stack
3474 ///< (e.g. "pop ecx" is often used for that)
3475#define IPROP_FPINSN 0x0010 ///< floating point insn
3476#define IPROP_FARCALL 0x0020 ///< call of a far function using push cs/call sequence
3477#define IPROP_TAILCALL 0x0040 ///< tail call
3478#define IPROP_ASSERT 0x0080 ///< assertion: usually mov #val, op.
3479 ///< assertions are used to help the optimizer.
3480 ///< assertions are ignored when generating ctree
3481
3482 // instruction history:
3483#define IPROP_SPLIT 0x0700 ///< the instruction has been split:
3484#define IPROP_SPLIT1 0x0100 ///< into 1 byte
3485#define IPROP_SPLIT2 0x0200 ///< into 2 bytes
3486#define IPROP_SPLIT4 0x0300 ///< into 4 bytes
3487#define IPROP_SPLIT8 0x0400 ///< into 8 bytes
3488#define IPROP_COMBINED 0x0800 ///< insn has been modified because of a partial reference
3489#define IPROP_EXTSTX 0x1000 ///< this is m_ext propagated into m_stx
3490#define IPROP_IGNLOWSRC 0x2000 ///< low part of the instruction source operand
3491 ///< has been created artificially
3492 ///< (this bit is used only for 'and x, 80...')
3493#define IPROP_INV_JX 0x4000 ///< inverted conditional jump
3494#define IPROP_WAS_NORET 0x8000 ///< was noret icall
3495#define IPROP_MULTI_MOV 0x10000 ///< the minsn was generated as part of insn that moves multiple registers
3496 ///< (example: STM on ARM may transfer multiple registers)
3497
3498 ///< bits that can be set by plugins:
3499#define IPROP_DONT_PROP 0x20000 ///< may not propagate
3500#define IPROP_DONT_COMB 0x40000 ///< may not combine this instruction with others
3501#define IPROP_MBARRIER 0x80000 ///< this instruction acts as a memory barrier
3502 ///< (instructions accessing memory may not be reordered past it)
3503#define IPROP_UNMERGED 0x100000 ///< 'goto' instruction was transformed info 'call'
3504 //@}
3505
3506 bool is_optional(void) const { return (iprops & IPROP_OPTIONAL) != 0; }
3507 bool is_combined(void) const { return (iprops & IPROP_COMBINED) != 0; }
3508 bool is_farcall(void) const { return (iprops & IPROP_FARCALL) != 0; }
3509 bool is_cleaning_pop(void) const { return (iprops & IPROP_CLNPOP) != 0; }
3510 bool is_extstx(void) const { return (iprops & IPROP_EXTSTX) != 0; }
3511 bool is_tailcall(void) const { return (iprops & IPROP_TAILCALL) != 0; }
3512 bool is_fpinsn(void) const { return (iprops & IPROP_FPINSN) != 0; }
3513 bool is_assert(void) const { return (iprops & IPROP_ASSERT) != 0; }
3514 bool is_persistent(void) const { return (iprops & IPROP_PERSIST) != 0; }
3515 bool is_wild_match(void) const { return (iprops & IPROP_WILDMATCH) != 0; }
3516 bool is_propagatable(void) const { return (iprops & IPROP_DONT_PROP) == 0; }
3517 bool is_ignlowsrc(void) const { return (iprops & IPROP_IGNLOWSRC) != 0; }
3518 bool is_inverted_jx(void) const { return (iprops & IPROP_INV_JX) != 0; }
3519 bool was_noret_icall(void) const { return (iprops & IPROP_WAS_NORET) != 0; }
3520 bool is_multimov(void) const { return (iprops & IPROP_MULTI_MOV) != 0; }
3521 bool is_combinable(void) const { return (iprops & IPROP_DONT_COMB) == 0; }
3522 bool was_split(void) const { return (iprops & IPROP_SPLIT) != 0; }
3523 bool is_mbarrier(void) const { return (iprops & IPROP_MBARRIER) != 0; }
3524 bool was_unmerged(void) const { return (iprops & IPROP_UNMERGED) != 0; }
3525
3526 void set_optional(void) { iprops |= IPROP_OPTIONAL; }
3527 void hexapi set_combined(void);
3528 void clr_combined(void) { iprops &= ~IPROP_COMBINED; }
3529 void set_farcall(void) { iprops |= IPROP_FARCALL; }
3530 void set_cleaning_pop(void) { iprops |= IPROP_CLNPOP; }
3531 void set_extstx(void) { iprops |= IPROP_EXTSTX; }
3532 void set_tailcall(void) { iprops |= IPROP_TAILCALL; }
3533 void clr_tailcall(void) { iprops &= ~IPROP_TAILCALL; }
3534 void set_fpinsn(void) { iprops |= IPROP_FPINSN; }
3535 void clr_fpinsn(void) { iprops &= ~IPROP_FPINSN; }
3536 void set_assert(void) { iprops |= IPROP_ASSERT; }
3537 void clr_assert(void) { iprops &= ~IPROP_ASSERT; }
3538 void set_persistent(void) { iprops |= IPROP_PERSIST; }
3539 void set_wild_match(void) { iprops |= IPROP_WILDMATCH; }
3540 void clr_propagatable(void) { iprops |= IPROP_DONT_PROP; }
3541 void set_ignlowsrc(void) { iprops |= IPROP_IGNLOWSRC; }
3542 void clr_ignlowsrc(void) { iprops &= ~IPROP_IGNLOWSRC; }
3543 void set_inverted_jx(void) { iprops |= IPROP_INV_JX; }
3544 void set_noret_icall(void) { iprops |= IPROP_WAS_NORET; }
3545 void clr_noret_icall(void) { iprops &= ~IPROP_WAS_NORET; }
3546 void set_multimov(void) { iprops |= IPROP_MULTI_MOV; }
3547 void clr_multimov(void) { iprops &= ~IPROP_MULTI_MOV; }
3548 void set_combinable(void) { iprops &= ~IPROP_DONT_COMB; }
3549 void clr_combinable(void) { iprops |= IPROP_DONT_COMB; }
3550 void set_mbarrier(void) { iprops |= IPROP_MBARRIER; }
3551 void set_unmerged(void) { iprops |= IPROP_UNMERGED; }
3552 void set_split_size(int s)
3553 { // s may be only 1,2,4,8. other values are ignored
3554 iprops &= ~IPROP_SPLIT;
3555 iprops |= (s == 1 ? IPROP_SPLIT1
3556 : s == 2 ? IPROP_SPLIT2
3557 : s == 4 ? IPROP_SPLIT4
3558 : s == 8 ? IPROP_SPLIT8 : 0);
3559 }
3560 int get_split_size(void) const
3561 {
3562 int cnt = (iprops & IPROP_SPLIT) >> 8;
3563 return cnt == 0 ? 0 : 1 << (cnt-1);
3564 }
3565
3566 /// Constructor
3567 minsn_t(ea_t _ea) { init(_ea); }
3568 minsn_t(const minsn_t &m) { next = prev = nullptr; copy(m); } //-V1077 uninitialized: opcode, iprops, ea
3569 HEXRAYS_MEMORY_ALLOCATION_FUNCS()
3570
3571 /// Assignment operator. It does not copy prev/next fields.
3572 minsn_t &operator=(const minsn_t &m) { copy(m); return *this; }
3573
3574 /// Swap two instructions.
3575 /// The prev/next fields are not modified by this function
3576 /// because it would corrupt the doubly linked list.
3577 void hexapi swap(minsn_t &m);
3578
3579 /// Generate insn text into the buffer
3580 void hexapi print(qstring *vout, int shins_flags=SHINS_SHORT|SHINS_VALNUM) const;
3581
3582 /// Get displayable text without tags in a static buffer
3583 const char *hexapi dstr() const;
3584
3585 /// Change the instruction address.
3586 /// This function modifies subinstructions as well.
3587 void hexapi setaddr(ea_t new_ea);
3588
3589 /// Optimize one instruction without context.
3590 /// This function does not have access to the instruction context (the
3591 /// previous and next instructions in the list, the block number, etc).
3592 /// It performs only basic optimizations that are available without this info.
3593 /// \param optflags combination of \ref OPTI_ bits
3594 /// \return number of changes, 0-unchanged
3595 /// See also mblock_t::optimize_insn()
3596 int optimize_solo(int optflags=0) { return optimize_subtree(nullptr, nullptr, nullptr, nullptr, optflags); }
3597 /// \defgroup OPTI_ optimization flags
3598 //@{
3599#define OPTI_ADDREXPRS 0x0001 ///< optimize all address expressions (&x+N; &x-&y)
3600#define OPTI_MINSTKREF 0x0002 ///< may update minstkref
3601#define OPTI_COMBINSNS 0x0004 ///< may combine insns (only for optimize_insn)
3602#define OPTI_NO_LDXOPT 0x0008 ///< the function is called after the
3603 ///< propagation attempt, we do not optimize
3604 ///< low/high(ldx) in this case
3605 //@}
3606
3607 /// Optimize instruction in its context.
3608 /// Do not use this function, use mblock_t::optimize()
3609 int hexapi optimize_subtree(
3610 mblock_t *blk,
3611 minsn_t *top,
3612 minsn_t *parent,
3613 ea_t *converted_call,
3614 int optflags=OPTI_MINSTKREF);
3615
3616 /// Visit all instruction operands.
3617 /// This function visits subinstruction operands as well.
3618 /// \param mv operand visitor
3619 /// \return non-zero value returned by mv.visit_mop() or zero
3620 int hexapi for_all_ops(mop_visitor_t &mv);
3621
3622 /// Visit all instructions.
3623 /// This function visits the instruction itself and all its subinstructions.
3624 /// \param mv instruction visitor
3625 /// \return non-zero value returned by mv.visit_mop() or zero
3626 int hexapi for_all_insns(minsn_visitor_t &mv);
3627
3628 /// Convert instruction to nop.
3629 /// This function erases all info but the prev/next fields.
3630 /// In most cases it is better to use mblock_t::make_nop(), which also
3631 /// marks the block lists as dirty.
3632 void hexapi _make_nop(void);
3633
3634 /// Compare instructions.
3635 /// This is the main comparison function for instructions.
3636 /// \param m instruction to compare with
3637 /// \param eqflags combination of \ref EQ_ bits
3638 bool hexapi equal_insns(const minsn_t &m, int eqflags) const; // intelligent comparison
3639 /// \defgroup EQ_ comparison bits
3640 //@{
3641#define EQ_IGNSIZE 0x0001 ///< ignore source operand sizes
3642#define EQ_IGNCODE 0x0002 ///< ignore instruction opcodes
3643#define EQ_CMPDEST 0x0004 ///< compare instruction destinations
3644#define EQ_OPTINSN 0x0008 ///< optimize mop_d operands
3645 //@}
3646
3647 /// Lexographical comparison
3648 /// It can be used to store minsn_t in various containers, like std::set
3649 bool operator <(const minsn_t &ri) const { return lexcompare(ri) < 0; }
3650 int hexapi lexcompare(const minsn_t &ri) const;
3651
3652 //-----------------------------------------------------------------------
3653 // Call instructions
3654 //-----------------------------------------------------------------------
3655 /// Is a non-returing call?
3656 /// \param flags combination of NORET_... bits
3657 bool hexapi is_noret_call(int flags=0);
3658#define NORET_IGNORE_WAS_NORET_ICALL 0x01 // ignore was_noret_icall() bit
3659#define NORET_FORBID_ANALYSIS 0x02 // forbid additional analysis
3660
3661 /// Is an unknown call?
3662 /// Unknown calls are calls without the argument list (mcallinfo_t).
3663 /// Usually the argument lists are determined by mba_t::analyze_calls().
3664 /// Unknown calls exist until the MMAT_CALLS maturity level.
3665 /// See also \ref mblock_t::is_call_block
3666 bool is_unknown_call(void) const { return is_mcode_call(opcode) && d.empty(); }
3667
3668 /// Is a helper call with the specified name?
3669 /// Helper calls usually have well-known function names (see \ref FUNC_NAME_)
3670 /// but they may have any other name. The decompiler does not assume any
3671 /// special meaning for non-well-known names.
3672 bool hexapi is_helper(const char *name) const;
3673
3674 /// Find a call instruction.
3675 /// Check for the current instruction and its subinstructions.
3676 /// \param with_helpers consider helper calls as well?
3677 minsn_t *hexapi find_call(bool with_helpers=false) const;
3678
3679 /// Does the instruction contain a call?
3680 bool contains_call(bool with_helpers=false) const { return find_call(with_helpers) != nullptr; }
3681
3682 /// Does the instruction have a side effect?
3683 /// \param include_ldx_and_divs consider ldx/div/mod as having side effects?
3684 /// stx is always considered as having side effects.
3685 /// Apart from ldx/std only call may have side effects.
3686 bool hexapi has_side_effects(bool include_ldx_and_divs=false) const;
3687
3688 /// Get the function role of a call
3689 funcrole_t get_role(void) const { return d.is_arglist() ? d.f->role : ROLE_UNK; }
3690 bool is_memcpy(void) const { return get_role() == ROLE_MEMCPY; }
3691 bool is_memset(void) const { return get_role() == ROLE_MEMSET; }
3692 bool is_alloca(void) const { return get_role() == ROLE_ALLOCA; }
3693 bool is_bswap (void) const { return get_role() == ROLE_BSWAP; }
3694 bool is_readflags (void) const { return get_role() == ROLE_READFLAGS; }
3695
3696 //-----------------------------------------------------------------------
3697 // Misc
3698 //-----------------------------------------------------------------------
3699 /// Does the instruction have the specified opcode?
3700 /// This function searches subinstructions as well.
3701 /// \param mcode opcode to search for.
3702 bool contains_opcode(mcode_t mcode) const { return find_opcode(mcode) != nullptr; }
3703
3704 /// Find a (sub)insruction with the specified opcode.
3705 /// \param mcode opcode to search for.
3706 const minsn_t *find_opcode(mcode_t mcode) const { return (CONST_CAST(minsn_t*)(this))->find_opcode(mcode); }
3707 minsn_t *hexapi find_opcode(mcode_t mcode);
3708
3709 /// Find an operand that is a subinsruction with the specified opcode.
3710 /// This function checks only the 'l' and 'r' operands of the current insn.
3711 /// \param[out] other pointer to the other operand
3712 /// (&r if we return &l and vice versa)
3713 /// \param op opcode to search for
3714 /// \return &l or &r or nullptr
3715 const minsn_t *hexapi find_ins_op(const mop_t **other, mcode_t op=m_nop) const;
3716 minsn_t *find_ins_op(mop_t **other, mcode_t op=m_nop) { return CONST_CAST(minsn_t*)((CONST_CAST(const minsn_t*)(this))->find_ins_op((const mop_t**)other, op)); }
3717
3718 /// Find a numeric operand of the current instruction.
3719 /// This function checks only the 'l' and 'r' operands of the current insn.
3720 /// \param[out] other pointer to the other operand
3721 /// (&r if we return &l and vice versa)
3722 /// \return &l or &r or nullptr
3723 const mop_t *hexapi find_num_op(const mop_t **other) const;
3724 mop_t *find_num_op(mop_t **other) { return CONST_CAST(mop_t*)((CONST_CAST(const minsn_t*)(this))->find_num_op((const mop_t**)other)); }
3725
3726 bool is_mov(void) const { return opcode == m_mov || (opcode == m_f2f && l.size == d.size); }
3727 bool is_like_move(void) const { return is_mov() || is_mcode_xdsu(opcode) || opcode == m_low; }
3728
3729 /// Does the instruction modify its 'd' operand?
3730 /// Some instructions (e.g. m_stx) do not modify the 'd' operand.
3731 bool hexapi modifies_d(void) const;
3732 bool modifies_pair_mop(void) const { return d.t == mop_p && modifies_d(); }
3733
3734 /// Is the instruction in the specified range of instructions?
3735 /// \param m1 beginning of the range in the doubly linked list
3736 /// \param m2 end of the range in the doubly linked list (excluded, may be nullptr)
3737 /// This function assumes that m1 and m2 belong to the same basic block
3738 /// and they are top level instructions.
3739 bool hexapi is_between(const minsn_t *m1, const minsn_t *m2) const;
3740
3741 /// Is the instruction after the specified one?
3742 /// \param m the instruction to compare against in the list
3743 bool is_after(const minsn_t *m) const { return m != nullptr && is_between(m->next, nullptr); }
3744
3745 /// Is it possible for the instruction to use aliased memory?
3746 bool hexapi may_use_aliased_memory(void) const;
3747
3748 /// Serialize an instruction
3749 /// \param b the output buffer
3750 /// \return the serialization format that was used to store info
3751 int hexapi serialize(bytevec_t *b) const;
3752
3753 /// Deserialize an instruction
3754 /// \param bytes pointer to serialized data
3755 /// \param nbytes number of bytes to deserialize
3756 /// \param format_version serialization format version. this value is returned by minsn_t::serialize()
3757 /// \return success
3758 bool hexapi deserialize(const uchar *bytes, size_t nbytes, int format_version);
3759
3760};
3761
3762/// Skip assertions forward
3763const minsn_t *hexapi getf_reginsn(const minsn_t *ins);
3764/// Skip assertions backward
3765const minsn_t *hexapi getb_reginsn(const minsn_t *ins);
3766inline minsn_t *getf_reginsn(minsn_t *ins) { return CONST_CAST(minsn_t*)(getf_reginsn(CONST_CAST(const minsn_t *)(ins))); }
3767inline minsn_t *getb_reginsn(minsn_t *ins) { return CONST_CAST(minsn_t*)(getb_reginsn(CONST_CAST(const minsn_t *)(ins))); }
3768
3769//-------------------------------------------------------------------------
3770/// Basic block types
3772{
3773 BLT_NONE = 0, ///< unknown block type
3774 BLT_STOP = 1, ///< stops execution regularly (must be the last block)
3775 BLT_0WAY = 2, ///< does not have successors (tail is a noret function)
3776 BLT_1WAY = 3, ///< passes execution to one block (regular or goto block)
3777 BLT_2WAY = 4, ///< passes execution to two blocks (conditional jump)
3778 BLT_NWAY = 5, ///< passes execution to many blocks (switch idiom)
3779 BLT_XTRN = 6, ///< external block (out of function address)
3780};
3781
3782// Maximal bit range
3783#define MAXRANGE bitrange_t(0, USHRT_MAX)
3784
3785//-------------------------------------------------------------------------
3786/// Microcode of one basic block.
3787/// All blocks are part of a doubly linked list. They can also be addressed
3788/// by indexing the mba->natural array. A block contains a doubly linked list
3789/// of instructions, various location lists that are used for data flow
3790/// analysis, and other attributes.
3792{
3793 friend class codegen_t;
3794 DECLARE_UNCOPYABLE(mblock_t)
3795 void hexapi init(void);
3796public:
3797 mblock_t *nextb; ///< next block in the doubly linked list
3798 mblock_t *prevb; ///< previous block in the doubly linked list
3799 uint32 flags; ///< combination of \ref MBL_ bits
3800 /// \defgroup MBL_ Basic block properties
3801 //@{
3802#define MBL_PRIV 0x0001 ///< private block - no instructions except
3803 ///< the specified are accepted (used in patterns)
3804#define MBL_NONFAKE 0x0000 ///< regular block
3805#define MBL_FAKE 0x0002 ///< fake block
3806#define MBL_GOTO 0x0004 ///< this block is a goto target
3807#define MBL_TCAL 0x0008 ///< aritifical call block for tail calls
3808#define MBL_PUSH 0x0010 ///< needs "convert push/pop instructions"
3809#define MBL_DMT64 0x0020 ///< needs "demote 64bits"
3810#define MBL_COMB 0x0040 ///< needs "combine" pass
3811#define MBL_PROP 0x0080 ///< needs 'propagation' pass
3812#define MBL_DEAD 0x0100 ///< needs "eliminate deads" pass
3813#define MBL_LIST 0x0200 ///< use/def lists are ready (not dirty)
3814#define MBL_INCONST 0x0400 ///< inconsistent lists: we are building them
3815#define MBL_CALL 0x0800 ///< call information has been built
3816#define MBL_BACKPROP 0x1000 ///< performed backprop_cc
3817#define MBL_NORET 0x2000 ///< dead end block: doesn't return execution control
3818#define MBL_DSLOT 0x4000 ///< block for delay slot
3819#define MBL_VALRANGES 0x8000 ///< should optimize using value ranges
3820#define MBL_KEEP 0x10000 ///< do not remove even if unreachable
3821 //@}
3822 ea_t start; ///< start address
3823 ea_t end; ///< end address
3824 ///< note: we cannot rely on start/end addresses
3825 ///< very much because instructions are
3826 ///< propagated between blocks
3827 minsn_t *head; ///< pointer to the first instruction of the block
3828 minsn_t *tail; ///< pointer to the last instruction of the block
3829 mba_t *mba; ///< the parent micro block array
3830 int serial; ///< block number
3831 mblock_type_t type; ///< block type (BLT_NONE - not computed yet)
3832
3833 mlist_t dead_at_start; ///< data that is dead at the block entry
3834 mlist_t mustbuse; ///< data that must be used by the block
3835 mlist_t maybuse; ///< data that may be used by the block
3836 mlist_t mustbdef; ///< data that must be defined by the block
3837 mlist_t maybdef; ///< data that may be defined by the block
3838 mlist_t dnu; ///< data that is defined but not used in the block
3839
3840 sval_t maxbsp; ///< maximal sp value in the block (0...stacksize)
3841 sval_t minbstkref; ///< lowest stack location accessible with indirect
3842 ///< addressing (offset from the stack bottom)
3843 ///< initially it is 0 (not computed)
3844 sval_t minbargref; ///< the same for arguments
3845
3846 intvec_t predset; ///< control flow graph: list of our predecessors
3847 ///< use npred() and pred() to access it
3848 intvec_t succset; ///< control flow graph: list of our successors
3849 ///< use nsucc() and succ() to access it
3850
3851 // the exact size of this class is not documented, there may be more fields
3852 char reserved[];
3853
3854 void mark_lists_dirty(void) { flags &= ~MBL_LIST; request_propagation(); }
3855 void request_propagation(void) { flags |= MBL_PROP; }
3856 bool needs_propagation(void) const { return (flags & MBL_PROP) != 0; }
3857 void request_demote64(void) { flags |= MBL_DMT64; }
3858 bool lists_dirty(void) const { return (flags & MBL_LIST) == 0; }
3859 bool lists_ready(void) const { return (flags & (MBL_LIST|MBL_INCONST)) == MBL_LIST; }
3860 int make_lists_ready(void) // returns number of changes
3861 {
3862 if ( lists_ready() )
3863 return 0;
3864 return build_lists(false);
3865 }
3866
3867 /// Get number of block predecessors
3868 int npred(void) const { return predset.size(); } // number of xrefs to the block
3869 /// Get number of block successors
3870 int nsucc(void) const { return succset.size(); } // number of xrefs from the block
3871 // Get predecessor number N
3872 int pred(int n) const { return predset[n]; }
3873 // Get successor number N
3874 int succ(int n) const { return succset[n]; }
3875
3876 mblock_t(void) = delete;
3877 virtual ~mblock_t(void);
3878 HEXRAYS_MEMORY_ALLOCATION_FUNCS()
3879 bool empty(void) const { return head == nullptr; }
3880
3881 /// Print block contents.
3882 /// \param vp print helpers class. it can be used to direct the printed
3883 /// info to any destination
3884 void hexapi print(vd_printer_t &vp) const;
3885
3886 /// Dump block info.
3887 /// This function is useful for debugging, see mba_t::dump for info
3888 void hexapi dump(void) const;
3889 AS_PRINTF(2, 0) void hexapi vdump_block(const char *title, va_list va) const;
3890 AS_PRINTF(2, 3) void dump_block(const char *title, ...) const
3891 {
3892 va_list va;
3893 va_start(va, title);
3894 vdump_block(title, va);
3895 va_end(va);
3896 }
3897
3898 //-----------------------------------------------------------------------
3899 // Functions to insert/remove insns during the microcode optimization phase.
3900 // See codegen_t, microcode_filter_t, udcall_t classes for the initial
3901 // microcode generation.
3902 //-----------------------------------------------------------------------
3903 /// Insert instruction into the doubly linked list
3904 /// \param nm new instruction
3905 /// \param om existing instruction, part of the doubly linked list
3906 /// if nullptr, then the instruction will be inserted at the beginning
3907 /// of the list
3908 /// NM will be inserted immediately after OM
3909 /// \return pointer to NM
3910 minsn_t *hexapi insert_into_block(minsn_t *nm, minsn_t *om);
3911
3912 /// Remove instruction from the doubly linked list
3913 /// \param m instruction to remove
3914 /// The removed instruction is not deleted, the caller gets its ownership
3915 /// \return pointer to the next instruction
3916 minsn_t *hexapi remove_from_block(minsn_t *m);
3917
3918 //-----------------------------------------------------------------------
3919 // Iterator over instructions and operands
3920 //-----------------------------------------------------------------------
3921 /// Visit all instructions.
3922 /// This function visits subinstructions too.
3923 /// \param mv instruction visitor
3924 /// \return zero or the value returned by mv.visit_insn()
3925 /// See also mba_t::for_all_topinsns()
3926 int hexapi for_all_insns(minsn_visitor_t &mv);
3927
3928 /// Visit all operands.
3929 /// This function visit subinstruction operands too.
3930 /// \param mv operand visitor
3931 /// \return zero or the value returned by mv.visit_mop()
3932 int hexapi for_all_ops(mop_visitor_t &mv);
3933
3934 /// Visit all operands that use LIST.
3935 /// \param list ptr to the list of locations. it may be modified:
3936 /// parts that get redefined by the instructions in [i1,i2)
3937 /// will be deleted.
3938 /// \param i1 starting instruction. must be a top level insn.
3939 /// \param i2 ending instruction (excluded). must be a top level insn.
3940 /// \param mmv operand visitor
3941 /// \return zero or the value returned by mmv.visit_mop()
3942 int hexapi for_all_uses(
3943 mlist_t *list,
3944 minsn_t *i1,
3945 minsn_t *i2,
3946 mlist_mop_visitor_t &mmv);
3947
3948 //-----------------------------------------------------------------------
3949 // Optimization functions
3950 //-----------------------------------------------------------------------
3951 /// Optimize one instruction in the context of the block.
3952 /// \param m pointer to a top level instruction
3953 /// \param optflags combination of \ref OPTI_ bits
3954 /// \return number of changes made to the block
3955 /// This function may change other instructions in the block too.
3956 /// However, it will not destroy top level instructions (it may convert them
3957 /// to nop's). This function performs only intrablock modifications.
3958 /// See also minsn_t::optimize_solo()
3959 int hexapi optimize_insn(minsn_t *m, int optflags=OPTI_MINSTKREF|OPTI_COMBINSNS);
3960
3961 /// Optimize a basic block.
3962 /// Usually there is no need to call this function explicitly because the
3963 /// decompiler will call it itself if optinsn_t::func or optblock_t::func
3964 /// return non-zero.
3965 /// \return number of changes made to the block
3966 int hexapi optimize_block(void);
3967
3968 /// Build def-use lists and eliminate deads.
3969 /// \param kill_deads do delete dead instructions?
3970 /// \return the number of eliminated instructions
3971 /// Better mblock_t::call make_lists_ready() rather than this function.
3972 int hexapi build_lists(bool kill_deads);
3973
3974 /// Remove a jump at the end of the block if it is useless.
3975 /// This function preserves any side effects when removing a useless jump.
3976 /// Both conditional and unconditional jumps are handled (and jtbl too).
3977 /// This function deletes useless jumps, not only replaces them with a nop.
3978 /// (please note that \optimize_insn does not handle useless jumps).
3979 /// \return number of changes made to the block
3980 int hexapi optimize_useless_jump(void);
3981
3982 //-----------------------------------------------------------------------
3983 // Functions that build with use/def lists. These lists are used to
3984 // reprsent list of registers and stack locations that are either modified
3985 // or accessed by microinstructions.
3986 //-----------------------------------------------------------------------
3987 /// Append use-list of an operand.
3988 /// This function calculates list of locations that may or must be used
3989 /// by the operand and appends it to LIST.
3990 /// \param list ptr to the output buffer. we will append to it.
3991 /// \param op operand to calculate the use list of
3992 /// \param maymust should we calculate 'may-use' or 'must-use' list?
3993 /// see \ref maymust_t for more details.
3994 /// \param mask if only part of the operand should be considered,
3995 /// a bitmask can be used to specify which part.
3996 /// example: op=AX,mask=0xFF means that we will consider only AL.
3997 void hexapi append_use_list(
3998 mlist_t *list,
3999 const mop_t &op,
4000 maymust_t maymust,
4001 bitrange_t mask=MAXRANGE) const;
4002
4003 /// Append def-list of an operand.
4004 /// This function calculates list of locations that may or must be modified
4005 /// by the operand and appends it to LIST.
4006 /// \param list ptr to the output buffer. we will append to it.
4007 /// \param op operand to calculate the def list of
4008 /// \param maymust should we calculate 'may-def' or 'must-def' list?
4009 /// see \ref maymust_t for more details.
4010 void hexapi append_def_list(
4011 mlist_t *list,
4012 const mop_t &op,
4013 maymust_t maymust) const;
4014
4015 /// Build use-list of an instruction.
4016 /// This function calculates list of locations that may or must be used
4017 /// by the instruction. Examples:
4018 /// "ldx ds.2, eax.4, ebx.4", may-list: all aliasable memory
4019 /// "ldx ds.2, eax.4, ebx.4", must-list: empty
4020 /// Since LDX uses EAX for indirect access, it may access any aliasable
4021 /// memory. On the other hand, we cannot tell for sure which memory cells
4022 /// will be accessed, this is why the must-list is empty.
4023 /// \param ins instruction to calculate the use list of
4024 /// \param maymust should we calculate 'may-use' or 'must-use' list?
4025 /// see \ref maymust_t for more details.
4026 /// \return the calculated use-list
4027 mlist_t hexapi build_use_list(const minsn_t &ins, maymust_t maymust) const;
4028
4029 /// Build def-list of an instruction.
4030 /// This function calculates list of locations that may or must be modified
4031 /// by the instruction. Examples:
4032 /// "stx ebx.4, ds.2, eax.4", may-list: all aliasable memory
4033 /// "stx ebx.4, ds.2, eax.4", must-list: empty
4034 /// Since STX uses EAX for indirect access, it may modify any aliasable
4035 /// memory. On the other hand, we cannot tell for sure which memory cells
4036 /// will be modified, this is why the must-list is empty.
4037 /// \param ins instruction to calculate the def list of
4038 /// \param maymust should we calculate 'may-def' or 'must-def' list?
4039 /// see \ref maymust_t for more details.
4040 /// \return the calculated def-list
4041 mlist_t hexapi build_def_list(const minsn_t &ins, maymust_t maymust) const;
4042
4043 //-----------------------------------------------------------------------
4044 // The use/def lists can be used to search for interesting instructions
4045 //-----------------------------------------------------------------------
4046 /// Is the list used by the specified instruction range?
4047 /// \param list list of locations. LIST may be modified by the function:
4048 /// redefined locations will be removed from it.
4049 /// \param i1 starting instruction of the range (must be a top level insn)
4050 /// \param i2 end instruction of the range (must be a top level insn)
4051 /// i2 is excluded from the range. it can be specified as nullptr.
4052 /// i1 and i2 must belong to the same block.
4053 /// \param maymust should we search in 'may-access' or 'must-access' mode?
4054 bool is_used(mlist_t *list, const minsn_t *i1, const minsn_t *i2, maymust_t maymust=MAY_ACCESS) const
4055 { return find_first_use(list, i1, i2, maymust) != nullptr; }
4056
4057 /// Find the first insn that uses the specified list in the insn range.
4058 /// \param list list of locations. LIST may be modified by the function:
4059 /// redefined locations will be removed from it.
4060 /// \param i1 starting instruction of the range (must be a top level insn)
4061 /// \param i2 end instruction of the range (must be a top level insn)
4062 /// i2 is excluded from the range. it can be specified as nullptr.
4063 /// i1 and i2 must belong to the same block.
4064 /// \param maymust should we search in 'may-access' or 'must-access' mode?
4065 /// \return pointer to such instruction or nullptr.
4066 /// Upon return LIST will contain only locations not redefined
4067 /// by insns [i1..result]
4068 const minsn_t *hexapi find_first_use(mlist_t *list, const minsn_t *i1, const minsn_t *i2, maymust_t maymust=MAY_ACCESS) const;
4069 minsn_t *find_first_use(mlist_t *list, minsn_t *i1, const minsn_t *i2, maymust_t maymust=MAY_ACCESS) const
4070 {
4071 return CONST_CAST(minsn_t*)(find_first_use(list,
4072 CONST_CAST(const minsn_t*)(i1),
4073 i2,
4074 maymust));
4075 }
4076
4077 /// Is the list redefined by the specified instructions?
4078 /// \param list list of locations to check.
4079 /// \param i1 starting instruction of the range (must be a top level insn)
4080 /// \param i2 end instruction of the range (must be a top level insn)
4081 /// i2 is excluded from the range. it can be specified as nullptr.
4082 /// i1 and i2 must belong to the same block.
4083 /// \param maymust should we search in 'may-access' or 'must-access' mode?
4085 const mlist_t &list,
4086 const minsn_t *i1,
4087 const minsn_t *i2,
4088 maymust_t maymust=MAY_ACCESS) const
4089 {
4090 return find_redefinition(list, i1, i2, maymust) != nullptr;
4091 }
4092
4093 /// Find the first insn that redefines any part of the list in the insn range.
4094 /// \param list list of locations to check.
4095 /// \param i1 starting instruction of the range (must be a top level insn)
4096 /// \param i2 end instruction of the range (must be a top level insn)
4097 /// i2 is excluded from the range. it can be specified as nullptr.
4098 /// i1 and i2 must belong to the same block.
4099 /// \param maymust should we search in 'may-access' or 'must-access' mode?
4100 /// \return pointer to such instruction or nullptr.
4101 const minsn_t *hexapi find_redefinition(
4102 const mlist_t &list,
4103 const minsn_t *i1,
4104 const minsn_t *i2,
4105 maymust_t maymust=MAY_ACCESS) const;
4106 minsn_t *find_redefinition(
4107 const mlist_t &list,
4108 minsn_t *i1,
4109 const minsn_t *i2,
4110 maymust_t maymust=MAY_ACCESS) const
4111 {
4112 return CONST_CAST(minsn_t*)(find_redefinition(list,
4113 CONST_CAST(const minsn_t*)(i1),
4114 i2,
4115 maymust));
4116 }
4117
4118 /// Is the right hand side of the instruction redefined the insn range?
4119 /// "right hand side" corresponds to the source operands of the instruction.
4120 /// \param ins instruction to consider
4121 /// \param i1 starting instruction of the range (must be a top level insn)
4122 /// \param i2 end instruction of the range (must be a top level insn)
4123 /// i2 is excluded from the range. it can be specified as nullptr.
4124 /// i1 and i2 must belong to the same block.
4125 bool hexapi is_rhs_redefined(const minsn_t *ins, const minsn_t *i1, const minsn_t *i2) const;
4126
4127 /// Find the instruction that accesses the specified operand.
4128 /// This function search inside one block.
4129 /// \param op operand to search for
4130 /// \param parent ptr to ptr to a top level instruction.
4131 /// denotes the beginning of the search range.
4132 /// \param mend end instruction of the range (must be a top level insn)
4133 /// mend is excluded from the range. it can be specified as nullptr.
4134 /// parent and mend must belong to the same block.
4135 /// \param fdflags combination of \ref FD_ bits
4136 /// \return the instruction that accesses the operand. this instruction
4137 /// may be a sub-instruction. to find out the top level
4138 /// instruction, check out *p_i1.
4139 /// nullptr means 'not found'.
4140 minsn_t *hexapi find_access(
4141 const mop_t &op,
4142 minsn_t **parent,
4143 const minsn_t *mend,
4144 int fdflags) const;
4145 /// \defgroup FD_ bits for mblock_t::find_access
4146 //@{
4147#define FD_BACKWARD 0x0000 ///< search direction
4148#define FD_FORWARD 0x0001 ///< search direction
4149#define FD_USE 0x0000 ///< look for use
4150#define FD_DEF 0x0002 ///< look for definition
4151#define FD_DIRTY 0x0004 ///< ignore possible implicit definitions
4152 ///< by function calls and indirect memory access
4153 //@}
4154
4155 // Convenience functions:
4156 minsn_t *find_def(
4157 const mop_t &op,
4158 minsn_t **p_i1,
4159 const minsn_t *i2,
4160 int fdflags)
4161 {
4162 return find_access(op, p_i1, i2, fdflags|FD_DEF);
4163 }
4164 minsn_t *find_use(
4165 const mop_t &op,
4166 minsn_t **p_i1,
4167 const minsn_t *i2,
4168 int fdflags)
4169 {
4170 return find_access(op, p_i1, i2, fdflags|FD_USE);
4171 }
4172
4173 /// Find possible values for a block.
4174 /// \param res set of value ranges
4175 /// \param vivl what to search for
4176 /// \param vrflags combination of \ref VR_ bits
4177 bool hexapi get_valranges(
4178 valrng_t *res,
4179 const vivl_t &vivl,
4180 int vrflags) const;
4181
4182 /// Find possible values for an instruction.
4183 /// \param res set of value ranges
4184 /// \param vivl what to search for
4185 /// \param m insn to search value ranges at. \sa VR_ bits
4186 /// \param vrflags combination of \ref VR_ bits
4187 bool hexapi get_valranges(
4188 valrng_t *res,
4189 const vivl_t &vivl,
4190 const minsn_t *m,
4191 int vrflags) const;
4192
4193 /// \defgroup VR_ bits for get_valranges
4194 //@{
4195#define VR_AT_START 0x0000 ///< get value ranges before the instruction or
4196 ///< at the block start (if M is nullptr)
4197#define VR_AT_END 0x0001 ///< get value ranges after the instruction or
4198 ///< at the block end, just after the last
4199 ///< instruction (if M is nullptr)
4200#define VR_EXACT 0x0002 ///< find exact match. if not set, the returned
4201 ///< valrng size will be >= vivl.size
4202 //@}
4203
4204 /// Erase the instruction (convert it to nop) and mark the lists dirty.
4205 /// This is the recommended function to use because it also marks the block
4206 /// use-def lists dirty.
4207 void make_nop(minsn_t *m) { m->_make_nop(); mark_lists_dirty(); }
4208
4209 /// Calculate number of regular instructions in the block.
4210 /// Assertions are skipped by this function.
4211 /// \return Number of non-assertion instructions in the block.
4212 size_t hexapi get_reginsn_qty(void) const;
4213
4214 bool is_call_block(void) const { return tail != nullptr && is_mcode_call(tail->opcode); }
4215 bool is_unknown_call(void) const { return tail != nullptr && tail->is_unknown_call(); }
4216 bool is_nway(void) const { return type == BLT_NWAY; }
4217 bool is_branch(void) const { return type == BLT_2WAY && tail->d.t == mop_b; }
4218 bool is_simple_goto_block(void) const
4219 {
4220 return get_reginsn_qty() == 1
4221 && tail->opcode == m_goto
4222 && tail->l.t == mop_b;
4223 }
4224 bool is_simple_jcnd_block() const
4225 {
4226 return is_branch()
4227 && npred() == 1
4228 && get_reginsn_qty() == 1
4229 && is_mcode_convertible_to_set(tail->opcode);
4230 }
4231};
4232//-------------------------------------------------------------------------
4233/// Warning ids
4235{
4236 WARN_VARARG_REGS, ///< 0 cannot handle register arguments in vararg function, discarded them
4237 WARN_ILL_PURGED, ///< 1 odd caller purged bytes %d, correcting
4238 WARN_ILL_FUNCTYPE, ///< 2 invalid function type has been ignored
4239 WARN_VARARG_TCAL, ///< 3 cannot handle tail call to vararg
4240 WARN_VARARG_NOSTK, ///< 4 call vararg without local stack
4241 WARN_VARARG_MANY, ///< 5 too many varargs, some ignored
4242 WARN_ADDR_OUTARGS, ///< 6 cannot handle address arithmetics in outgoing argument area of stack frame -- unused
4243 WARN_DEP_UNK_CALLS, ///< 7 found interdependent unknown calls
4244 WARN_ILL_ELLIPSIS, ///< 8 erroneously detected ellipsis type has been ignored
4245 WARN_GUESSED_TYPE, ///< 9 using guessed type %s;
4246 WARN_EXP_LINVAR, ///< 10 failed to expand a linear variable
4247 WARN_WIDEN_CHAINS, ///< 11 failed to widen chains
4248 WARN_BAD_PURGED, ///< 12 inconsistent function type and number of purged bytes
4249 WARN_CBUILD_LOOPS, ///< 13 too many cbuild loops
4250 WARN_NO_SAVE_REST, ///< 14 could not find valid save-restore pair for %s
4251 WARN_ODD_INPUT_REG, ///< 15 odd input register %s
4252 WARN_ODD_ADDR_USE, ///< 16 odd use of a variable address
4253 WARN_MUST_RET_FP, ///< 17 function return type is incorrect (must be floating point)
4254 WARN_ILL_FPU_STACK, ///< 18 inconsistent fpu stack
4255 WARN_SELFREF_PROP, ///< 19 self-referencing variable has been detected
4256 WARN_WOULD_OVERLAP, ///< 20 variables would overlap: %s
4257 WARN_ARRAY_INARG, ///< 21 array has been used for an input argument
4258 WARN_MAX_ARGS, ///< 22 too many input arguments, some ignored
4259 WARN_BAD_FIELD_TYPE,///< 23 incorrect structure member type for %s::%s, ignored
4260 WARN_WRITE_CONST, ///< 24 write access to const memory at %a has been detected
4261 WARN_BAD_RETVAR, ///< 25 wrong return variable
4262 WARN_FRAG_LVAR, ///< 26 fragmented variable at %s may be wrong
4263 WARN_HUGE_STKOFF, ///< 27 exceedingly huge offset into the stack frame
4264 WARN_UNINITED_REG, ///< 28 reference to an uninitialized register has been removed: %s
4265 WARN_FIXED_MACRO, ///< 29 fixed broken macro-insn
4266 WARN_WRONG_VA_OFF, ///< 30 wrong offset of va_list variable
4267 WARN_CR_NOFIELD, ///< 31 CONTAINING_RECORD: no field '%s' in struct '%s' at %d
4268 WARN_CR_BADOFF, ///< 32 CONTAINING_RECORD: too small offset %d for struct '%s'
4269 WARN_BAD_STROFF, ///< 33 user specified stroff has not been processed: %s
4270 WARN_BAD_VARSIZE, ///< 34 inconsistent variable size for '%s'
4271 WARN_UNSUPP_REG, ///< 35 unsupported processor register '%s'
4272 WARN_UNALIGNED_ARG, ///< 36 unaligned function argument '%s'
4273 WARN_BAD_STD_TYPE, ///< 37 corrupted or unexisting local type '%s'
4274 WARN_BAD_CALL_SP, ///< 38 bad sp value at call
4275 WARN_MISSED_SWITCH, ///< 39 wrong markup of switch jump, skipped it
4276 WARN_BAD_SP, ///< 40 positive sp value %a has been found
4277 WARN_BAD_STKPNT, ///< 41 wrong sp change point
4278 WARN_UNDEF_LVAR, ///< 42 variable '%s' is possibly undefined
4279 WARN_JUMPOUT, ///< 43 control flows out of bounds
4280 WARN_BAD_VALRNG, ///< 44 values range analysis failed
4281 WARN_BAD_SHADOW, ///< 45 ignored the value written to the shadow area of the succeeding call
4282 WARN_OPT_VALRNG, ///< 46 conditional instruction was optimized away because %s
4283 WARN_RET_LOCREF, ///< 47 returning address of temporary local variable '%s'
4284 WARN_BAD_MAPDST, ///< 48 too short map destination '%s' for variable '%s'
4285 WARN_BAD_INSN, ///< 49 bad instruction
4286 WARN_ODD_ABI, ///< 50 encountered odd instruction for the current ABI
4287 WARN_UNBALANCED_STACK, ///< 51 unbalanced stack, ignored a potential tail call
4288
4289 WARN_OPT_VALRNG2, ///< 52 mask 0x%X is shortened because %s <= 0x%X"
4290
4291 WARN_OPT_VALRNG3, ///< 53 masking with 0X%X was optimized away because %s <= 0x%X
4292 WARN_OPT_USELESS_JCND, ///< 54 simplified comparisons for '%s': %s became %s
4293 WARN_MAX, ///< may be used in notes as a placeholder when the
4294 ///< warning id is not available
4295};
4296
4297/// Warning instances
4299{
4300 ea_t ea; ///< Address where the warning occurred
4301 warnid_t id; ///< Warning id
4302 qstring text; ///< Fully formatted text of the warning
4303 DECLARE_COMPARISONS(hexwarn_t)
4304 {
4305 if ( ea < r.ea )
4306 return -1;
4307 if ( ea > r.ea )
4308 return 1;
4309 if ( id < r.id )
4310 return -1;
4311 if ( id > r.id )
4312 return 1;
4313 return strcmp(text.c_str(), r.text.c_str());
4314 }
4315};
4316DECLARE_TYPE_AS_MOVABLE(hexwarn_t);
4317typedef qvector<hexwarn_t> hexwarns_t;
4318
4319//-------------------------------------------------------------------------
4320/// Microcode maturity levels
4322{
4323 MMAT_ZERO, ///< microcode does not exist
4324 MMAT_GENERATED, ///< generated microcode
4325 MMAT_PREOPTIMIZED, ///< preoptimized pass is complete
4326 MMAT_LOCOPT, ///< local optimization of each basic block is complete.
4327 ///< control flow graph is ready too.
4328 MMAT_CALLS, ///< detected call arguments
4329 MMAT_GLBOPT1, ///< performed the first pass of global optimization
4330 MMAT_GLBOPT2, ///< most global optimization passes are done
4331 MMAT_GLBOPT3, ///< completed all global optimization. microcode is fixed now.
4332 MMAT_LVARS, ///< allocated local variables
4333};
4334
4335//-------------------------------------------------------------------------
4336enum memreg_index_t ///< memory region types
4337{
4338 MMIDX_GLBLOW, ///< global memory: low part
4339 MMIDX_LVARS, ///< stack: local variables
4340 MMIDX_RETADDR, ///< stack: return address
4341 MMIDX_SHADOW, ///< stack: shadow arguments
4342 MMIDX_ARGS, ///< stack: regular stack arguments
4343 MMIDX_GLBHIGH, ///< global memory: high part
4344};
4345
4346//-------------------------------------------------------------------------
4347/// Ranges to decompile. Either a function or an explicit vector of ranges.
4349{
4350 func_t *pfn = nullptr; ///< function to decompile. if not null, then function mode.
4351 rangevec_t ranges; ///< snippet mode: ranges to decompile.
4352 ///< function mode: list of outlined ranges
4353 mba_ranges_t(func_t *_pfn=nullptr) : pfn(_pfn) {}
4354 mba_ranges_t(const rangevec_t &r) : ranges(r) {}
4355 ea_t start(void) const { return (pfn != nullptr ? *pfn : ranges[0]).start_ea; }
4356 bool empty(void) const { return pfn == nullptr && ranges.empty(); }
4357 void clear(void) { pfn = nullptr; ranges.clear(); }
4358 bool is_snippet(void) const { return pfn == nullptr; }
4359 bool hexapi range_contains(ea_t ea) const;
4360 bool is_fragmented(void) const
4361 {
4362 int n_frags = ranges.size();
4363 if ( pfn != nullptr )
4364 n_frags += pfn->tailqty + 1;
4365 return n_frags > 1;
4366 }
4367};
4368
4369/// Item iterator of arbitrary rangevec items
4371{
4372 const rangevec_t *ranges = nullptr;
4373 const range_t *rptr = nullptr; // pointer into ranges
4374 ea_t cur = BADADDR; // current address
4375 bool set(const rangevec_t &r);
4376 bool next_code(void);
4377 ea_t current(void) const { return cur; }
4378};
4379
4380/// Item iterator for mba_ranges_t
4382{
4384 func_item_iterator_t fii;
4385 bool func_items_done = true;
4386 bool set(const mba_ranges_t &mbr)
4387 {
4388 bool ok = false;
4389 if ( mbr.pfn != nullptr )
4390 {
4391 ok = fii.set(mbr.pfn);
4392 if ( ok )
4393 func_items_done = false;
4394 }
4395 if ( rii.set(mbr.ranges) )
4396 ok = true;
4397 return ok;
4398 }
4399 bool next_code(void)
4400 {
4401 bool ok = false;
4402 if ( !func_items_done )
4403 {
4404 ok = fii.next_code();
4405 if ( !ok )
4406 func_items_done = true;
4407 }
4408 if ( !ok )
4409 ok = rii.next_code();
4410 return ok;
4411 }
4412 ea_t current(void) const
4413 {
4414 return func_items_done ? rii.current() : fii.current();
4415 }
4416};
4417
4418/// Chunk iterator of arbitrary rangevec items
4420{
4421 const range_t *rptr = nullptr; // pointer into ranges
4422 const range_t *rend = nullptr;
4423 bool set(const rangevec_t &r) { rptr = r.begin(); rend = r.end(); return rptr != rend; }
4424 bool next(void) { return ++rptr != rend; }
4425 const range_t &chunk(void) const { return *rptr; }
4426};
4427
4428/// Chunk iterator for mba_ranges_t
4430{
4432 func_tail_iterator_t fii; // this is used if rii.rptr==nullptr
4433 bool is_snippet(void) const { return rii.rptr != nullptr; }
4434 bool set(const mba_ranges_t &mbr)
4435 {
4436 if ( mbr.is_snippet() )
4437 return rii.set(mbr.ranges);
4438 else
4439 return fii.set(mbr.pfn);
4440 }
4441 bool next(void)
4442 {
4443 if ( is_snippet() )
4444 return rii.next();
4445 else
4446 return fii.next();
4447 }
4448 const range_t &chunk(void) const
4449 {
4450 return is_snippet() ? rii.chunk() : fii.chunk();
4451 }
4452};
4453
4454//-------------------------------------------------------------------------
4455/// Array of micro blocks representing microcode for a decompiled function.
4456/// The first micro block is the entry point, the last one is the exit point.
4457/// The entry and exit blocks are always empty. The exit block is generated
4458/// at MMAT_LOCOPT maturity level.
4459class mba_t
4460{
4461 DECLARE_UNCOPYABLE(mba_t)
4462 uint32 flags;
4463 uint32 flags2;
4464
4465public:
4466 // bits to describe the microcode, set by the decompiler
4467#define MBA_PRCDEFS 0x00000001 ///< use precise defeas for chain-allocated lvars
4468#define MBA_NOFUNC 0x00000002 ///< function is not present, addresses might be wrong
4469#define MBA_PATTERN 0x00000004 ///< microcode pattern, callinfo is present
4470#define MBA_LOADED 0x00000008 ///< loaded gdl, no instructions (debugging)
4471#define MBA_RETFP 0x00000010 ///< function returns floating point value
4472#define MBA_SPLINFO 0x00000020 ///< (final_type ? idb_spoiled : spoiled_regs) is valid
4473#define MBA_PASSREGS 0x00000040 ///< has mcallinfo_t::pass_regs
4474#define MBA_THUNK 0x00000080 ///< thunk function
4475#define MBA_CMNSTK 0x00000100 ///< stkvars+stkargs should be considered as one area
4476
4477 // bits to describe analysis stages and requests
4478#define MBA_PREOPT 0x00000200 ///< preoptimization stage complete
4479#define MBA_CMBBLK 0x00000400 ///< request to combine blocks
4480#define MBA_ASRTOK 0x00000800 ///< assertions have been generated
4481#define MBA_CALLS 0x00001000 ///< callinfo has been built
4482#define MBA_ASRPROP 0x00002000 ///< assertion have been propagated
4483#define MBA_SAVRST 0x00004000 ///< save-restore analysis has been performed
4484#define MBA_RETREF 0x00008000 ///< return type has been refined
4485#define MBA_GLBOPT 0x00010000 ///< microcode has been optimized globally
4486#define MBA_LVARS0 0x00040000 ///< lvar pre-allocation has been performed
4487#define MBA_LVARS1 0x00080000 ///< lvar real allocation has been performed
4488#define MBA_DELPAIRS 0x00100000 ///< pairs have been deleted once
4489#define MBA_CHVARS 0x00200000 ///< can verify chain varnums
4490
4491 // bits that can be set by the caller:
4492#define MBA_SHORT 0x00400000 ///< use short display
4493#define MBA_COLGDL 0x00800000 ///< display graph after each reduction
4494#define MBA_INSGDL 0x01000000 ///< display instruction in graphs
4495#define MBA_NICE 0x02000000 ///< apply transformations to c code
4496#define MBA_REFINE 0x04000000 ///< may refine return value size
4497#define MBA_WINGR32 0x10000000 ///< use wingraph32
4498#define MBA_NUMADDR 0x20000000 ///< display definition addresses for numbers
4499#define MBA_VALNUM 0x40000000 ///< display value numbers
4500
4501#define MBA_INITIAL_FLAGS (MBA_INSGDL|MBA_NICE|MBA_CMBBLK|MBA_REFINE\
4502 |MBA_PRCDEFS|MBA_WINGR32|MBA_VALNUM)
4503
4504#define MBA2_LVARNAMES_OK 0x00000001 ///< may verify lvar_names?
4505#define MBA2_LVARS_RENAMED 0x00000002 ///< accept empty names now?
4506#define MBA2_OVER_CHAINS 0x00000004 ///< has overlapped chains?
4507#define MBA2_VALRNG_DONE 0x00000008 ///< calculated valranges?
4508#define MBA2_IS_CTR 0x00000010 ///< is constructor?
4509#define MBA2_IS_DTR 0x00000020 ///< is destructor?
4510#define MBA2_ARGIDX_OK 0x00000040 ///< may verify input argument list?
4511#define MBA2_NO_DUP_CALLS 0x00000080 ///< forbid multiple calls with the same ea
4512#define MBA2_NO_DUP_LVARS 0x00000100 ///< forbid multiple lvars with the same ea
4513#define MBA2_UNDEF_RETVAR 0x00000200 ///< return value is undefined
4514#define MBA2_ARGIDX_SORTED 0x00000400 ///< args finally sorted according to ABI
4515 ///< (e.g. reverse stkarg order in Borland)
4516#define MBA2_CODE16_BIT 0x00000800 ///< the code16 bit removed
4517#define MBA2_STACK_RETVAL 0x00001000 ///< the return value is on the stack
4518#define MBA2_HAS_OUTLINES 0x00002000 ///< calls to outlined code have been inlined
4519#define MBA2_NO_FRAME 0x00004000 ///< do not use function frame info (only snippet mode)
4520#define MBA2_PROP_COMPLEX 0x00008000 ///< allow propagation of more complex variable definitions
4521
4522#define MBA2_DONT_VERIFY 0x80000000 ///< Do not verify microcode. This flag
4523 ///< is recomended to be set only when
4524 ///< debugging decompiler plugins
4525
4526#define MBA2_INITIAL_FLAGS (MBA2_LVARNAMES_OK|MBA2_LVARS_RENAMED)
4527
4528#define MBA2_ALL_FLAGS 0x0000FFFF
4529
4530 bool precise_defeas(void) const { return (flags & MBA_PRCDEFS) != 0; }
4531 bool optimized(void) const { return (flags & MBA_GLBOPT) != 0; }
4532 bool short_display(void) const { return (flags & MBA_SHORT ) != 0; }
4533 bool show_reduction(void) const { return (flags & MBA_COLGDL) != 0; }
4534 bool graph_insns(void) const { return (flags & MBA_INSGDL) != 0; }
4535 bool loaded_gdl(void) const { return (flags & MBA_LOADED) != 0; }
4536 bool should_beautify(void)const { return (flags & MBA_NICE ) != 0; }
4537 bool rtype_refined(void) const { return (flags & MBA_RETREF) != 0; }
4538 bool may_refine_rettype(void) const { return (flags & MBA_REFINE) != 0; }
4539 bool use_wingraph32(void) const { return (flags & MBA_WINGR32) != 0; }
4540 bool display_numaddrs(void) const { return (flags & MBA_NUMADDR) != 0; }
4541 bool display_valnums(void) const { return (flags & MBA_VALNUM) != 0; }
4542 bool is_pattern(void) const { return (flags & MBA_PATTERN) != 0; }
4543 bool is_thunk(void) const { return (flags & MBA_THUNK) != 0; }
4544 bool saverest_done(void) const { return (flags & MBA_SAVRST) != 0; }
4545 bool callinfo_built(void) const { return (flags & MBA_CALLS) != 0; }
4546 bool really_alloc(void) const { return (flags & MBA_LVARS0) != 0; }
4547 bool lvars_allocated(void)const { return (flags & MBA_LVARS1) != 0; }
4548 bool chain_varnums_ok(void)const { return (flags & MBA_CHVARS) != 0; }
4549 bool returns_fpval(void) const { return (flags & MBA_RETFP) != 0; }
4550 bool has_passregs(void) const { return (flags & MBA_PASSREGS) != 0; }
4551 bool generated_asserts(void) const { return (flags & MBA_ASRTOK) != 0; }
4552 bool propagated_asserts(void) const { return (flags & MBA_ASRPROP) != 0; }
4553 bool deleted_pairs(void) const { return (flags & MBA_DELPAIRS) != 0; }
4554 bool common_stkvars_stkargs(void) const { return (flags & MBA_CMNSTK) != 0; }
4555 bool lvar_names_ok(void) const { return (flags2 & MBA2_LVARNAMES_OK) != 0; }
4556 bool lvars_renamed(void) const { return (flags2 & MBA2_LVARS_RENAMED) != 0; }
4557 bool has_over_chains(void) const { return (flags2 & MBA2_OVER_CHAINS) != 0; }
4558 bool valranges_done(void) const { return (flags2 & MBA2_VALRNG_DONE) != 0; }
4559 bool argidx_ok(void) const { return (flags2 & MBA2_ARGIDX_OK) != 0; }
4560 bool argidx_sorted(void) const { return (flags2 & MBA2_ARGIDX_SORTED) != 0; }
4561 bool code16_bit_removed(void) const { return (flags2 & MBA2_CODE16_BIT) != 0; }
4562 bool has_stack_retval(void) const { return (flags2 & MBA2_STACK_RETVAL) != 0; }
4563 bool has_outlines(void) const { return (flags2 & MBA2_HAS_OUTLINES) != 0; }
4564 bool is_ctr(void) const { return (flags2 & MBA2_IS_CTR) != 0; }
4565 bool is_dtr(void) const { return (flags2 & MBA2_IS_DTR) != 0; }
4566 bool is_cdtr(void) const { return (flags2 & (MBA2_IS_CTR|MBA2_IS_DTR)) != 0; }
4567 bool prop_complex(void) const { return (flags2 & MBA2_PROP_COMPLEX) != 0; }
4568 int get_mba_flags(void) const { return flags; }
4569 int get_mba_flags2(void) const { return flags2; }
4570 void set_mba_flags(int f) { flags |= f; }
4571 void clr_mba_flags(int f) { flags &= ~f; }
4572 void set_mba_flags2(int f) { flags2 |= f; }
4573 void clr_mba_flags2(int f) { flags2 &= ~f; }
4574 void clr_cdtr(void) { flags2 &= ~(MBA2_IS_CTR|MBA2_IS_DTR); }
4575 int calc_shins_flags(void) const
4576 {
4577 int shins_flags = 0;
4578 if ( short_display() )
4579 shins_flags |= SHINS_SHORT;
4580 if ( display_valnums() )
4581 shins_flags |= SHINS_VALNUM;
4582 if ( display_numaddrs() )
4583 shins_flags |= SHINS_NUMADDR;
4584 return shins_flags;
4585 }
4586
4587/*
4588 +-----------+ <- inargtop
4589 | prmN |
4590 | ... | <- minargref
4591 | prm0 |
4592 +-----------+ <- inargoff
4593 |shadow_args|
4594 +-----------+
4595 | retaddr |
4596 frsize+frregs +-----------+ <- initial esp |
4597 | frregs | |
4598 +frsize +-----------+ <- typical ebp |
4599 | | | |
4600 | | | fpd |
4601 | | | |
4602 | frsize | <- current ebp |
4603 | | |
4604 | | |
4605 | | | stacksize
4606 | | |
4607 | | |
4608 | | <- minstkref |
4609 stkvar base off 0 +---.. | | | current
4610 | | | | stack
4611 | | | | pointer
4612 | | | | range
4613 |tmpstk_size| | | (what getspd() returns)
4614 | | | |
4615 | | | |
4616 +-----------+ <- minimal sp | | offset 0 for the decompiler (vd)
4617
4618 There is a detail that may add confusion when working with stack variables.
4619 The decompiler does not use the same stack offsets as IDA.
4620 The picture above should explain the difference:
4621 - IDA stkoffs are displayed on the left, decompiler stkoffs - on the right
4622 - Decompiler stkoffs are always >= 0
4623 - IDA stkoff==0 corresponds to stkoff==tmpstk_size in the decompiler
4624 - See stkoff_vd2ida and stkoff_ida2vd below to convert IDA stkoffs to vd stkoff
4625
4626*/
4627
4628 // convert a stack offset used in vd to a stack offset used in ida stack frame
4629 sval_t hexapi stkoff_vd2ida(sval_t off) const;
4630 // convert a ida stack frame offset to a stack offset used in vd
4631 sval_t hexapi stkoff_ida2vd(sval_t off) const;
4632 sval_t argbase() const
4633 {
4634 return retsize + stacksize;
4635 }
4636 static vdloc_t hexapi idaloc2vd(const argloc_t &loc, int width, sval_t spd);
4637 vdloc_t hexapi idaloc2vd(const argloc_t &loc, int width) const;
4638
4639 static argloc_t hexapi vd2idaloc(const vdloc_t &loc, int width, sval_t spd);
4640 argloc_t hexapi vd2idaloc(const vdloc_t &loc, int width) const;
4641
4642 bool is_stkarg(const lvar_t &v) const
4643 {
4644 return v.is_stk_var() && v.get_stkoff() >= inargoff;
4645 }
4646 member_t *get_stkvar(sval_t vd_stkoff, uval_t *poff) const;
4647 // get lvar location
4648 argloc_t get_ida_argloc(const lvar_t &v) const
4649 {
4650 return vd2idaloc(v.location, v.width);
4651 }
4652 mba_ranges_t mbr;
4653 ea_t entry_ea = BADADDR;
4654 ea_t last_prolog_ea = BADADDR;
4655 ea_t first_epilog_ea = BADADDR;
4656 int qty = 0; ///< number of basic blocks
4657 int npurged = -1; ///< -1 - unknown
4658 cm_t cc = CM_CC_UNKNOWN; ///< calling convention
4659 sval_t tmpstk_size = 0; ///< size of the temporary stack part
4660 ///< (which dynamically changes with push/pops)
4661 sval_t frsize = 0; ///< size of local stkvars range in the stack frame
4662 sval_t frregs = 0; ///< size of saved registers range in the stack frame
4663 sval_t fpd = 0; ///< frame pointer delta
4664 int pfn_flags = 0; ///< copy of func_t::flags
4665 int retsize = 0; ///< size of return address in the stack frame
4666 int shadow_args = 0; ///< size of shadow argument area
4667 sval_t fullsize = 0; ///< Full stack size including incoming args
4668 sval_t stacksize = 0; ///< The maximal size of the function stack including
4669 ///< bytes allocated for outgoing call arguments
4670 ///< (up to retaddr)
4671 sval_t inargoff = 0; ///< offset of the first stack argument;
4672 ///< after fix_scattered_movs() INARGOFF may
4673 ///< be less than STACKSIZE
4674 sval_t minstkref = 0; ///< The lowest stack location whose address was taken
4675 ea_t minstkref_ea = BADADDR; ///< address with lowest minstkref (for debugging)
4676 sval_t minargref = 0; ///< The lowest stack argument location whose address was taken
4677 ///< This location and locations above it can be aliased
4678 ///< It controls locations >= inargoff-shadow_args
4679 sval_t spd_adjust = 0; ///< If sp>0, the max positive sp value