DOC

Basics of Assembler

By Cynthia Wallace,2014-07-16 14:03
8 views 0
assembler filedisassembler enable assembler herb basics back to basics vmap assembler basics health basics the basics ffmpeg basics

    Assembler : The Basics In Reversing

    Indeed: the basics!! This is all far from complete but covers about everything you need to know about assembler to start on your reversing journey! Assembler is the start and the end of all programming languages. After all, all (computer LOL) languages are translated to assembler. In most languages we deal with relatively clear syntaxes. However, it's a completely other story in assembler where we use abbreviations and numbers and where it all seems so weird …

I. Pieces, bits and bytes:

    ; BIT - The smallest possible piece of data. It can be either a 0 or a 1. If you put a

    bunch of bits together, you end up in the 'binary number system'

    i.e. 00000001 = 1 00000010 = 2 00000011 = 3 etc.

    ; BYTE - A byte consists of 8 bits. It can have a maximal value of 255 (0-255). To

    make it easier to read binary numbers, we use the 'hexadecimal number system'.

    It's a 'base-16 system', while binary is a 'base-2 system'

    ; WORD - A word is just 2 bytes put together or 16 bits. A word can have a maximal

    value of 0FFFFh (or 65535d).

    ; DOUBLE WORD - A double word is 2 words together or 32 bits. Max value =

    0FFFFFFFF (or 4294967295d).

    ; KILOBYTE - 1000 bytes? No, a kilobyte does NOT equal 1000 bytes! Actually, there

    are 1024 (32*32) bytes.

    ; MEGABYTE - Again, not just 1 million bytes, but 1024*1024 or 1,048,578 bytes.

    ---------------------------------------------------------------------------------------------

II. Registers:

Registers are “special places” in your computer's memory where we can store data. You can

    see a register as a little box, wherein we can store something: a name, a number, a sentence. You can see a register as a placeholder.

    On today’s average WinTel CPU you have 9 32bit registers (w/o flag registers). Their names are:

    EAX: Extended Accumulator Register

    EBX: Extended Base Register

    ECX: Extended Counter Register

    EDX: Extended Data Register

    ESI: Extended Source Index

    EDI: Extended Destination Index

    EBP: Extended Base Pointer

    ESP: Extended Stack Pointer

    EIP: Extended Instruction Pointer

    Generally the size of the registers is 32bit (=4 bytes). They can hold data from 0-FFFFFFFF (unsigned). In the beginning most registers had certain main functions which the names imply, like ECX = Counter, but in these days you can - nearly - use whichever register you like for a counter or stuff (only the self defined ones, there are counter-functions which need to be used with ECX). The functions of EAX, EBX, ECX, EDX, ESI and EDI will be explained when I explain certain functions that use those registers. So, there are EBP, ESP, EIP left:

    EBP: EBP has mostly to do with stack and stack frames. Nothing you really need to worry about, when you start. ;)

    ESP: ESP points to the stack of a current process. The stack is the place where data can be stored for later use (for more information, see the explanation of the push/pop instructions)

    EIP: EIP always points to the next instruction that is to be executed.

    There's one more thing you have to know about registers: although they are all 32bits large, some parts of them (16bit or even 8bit) can not be addressed directly.

The possibilities are:

    32bit Register 16bit Register 8bit Register

    EAX AX AH/AL

    EBX BX BH/BL

    ECX CX CH/CL

    EDX DX DH/DL

    ESI SI -----

    EDI DI -----

    EBP BP -----

    ESP SP -----

    EIP IP -----

A register looks generally this way:

     |--------------------------- EAX: 32bit (=1 DWORD =4BYTES) -------------------------|

     |------- AX: 16bit (=1 WORD =2 BYTES) ----|

     |- AH:8bit (=1 BYTE)-|- AL:8bit (=1 BYTE)-|

     |-----------------------------------------|--------------------|--------------------|

    |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX|XXXXXXXXXXXXXXXXXXXX|XXXXXXXXXXXXXXXXXXXX|

     |-----------------------------------------|--------------------|--------------------|

    So, EAX is the name of the 32bit register, AX is the name of the "Low Word" (16bit) of EAX and AL/AH (8bit) are the names of the "Low Part" and “High Part” of AX. BTW, 4 bytes is 1

    DWORD, 2 bytes is 1 WORD.

REMARK: make sure you at least read the following about registers. It’s quite practical to

    know it although not that important.

    All this makes it possible for us to make a distinction regarding size:

    ; i. byte-size registers: As the name says, these registers all exactly 1 byte in size. This does not mean that the whole (32bit) register is fully loaded with data! Eventually empty spaces in a register are just filled with zeroes. These are the byte-

    sized registers, all 1 byte or 8 bits in size:

    o AL and AH

    o BL and BH

    o CL and CH

    o DL and DH

    ; ii. word-size registers: Are 1 word (= 2 bytes = 16 bits) in size. A word-sized register is constructed of 2 byte-sized registers. Again, we can divide these regarding their purpose:

    o 1. general purpose registers:

    AX (word-sized) = AH + AL -> the '+' does *not* mean: 'add them up'. AH

    and AL exist independently, but together they form AX. This means that if

    you change AH or AL (or both), AX will change too!

    -> 'accumulator': used to mathematical operations, store

    strings,..

    BX -> 'base': used in conjunction with the stack (see later)

    CX -> 'counter'

    DX -> 'data': mostly, here the remainder of mathematical

    operations is stored

    DI -> 'destination index': i.e. a string will be copied to DI

    SI -> 'source index': i.e. a string will be copied from SI

    o 2. index registers:

    BP -> 'base pointer': points to a specified position on the stack

    (see later)

     SP -> 'stack pointer': points to a specified position on the stack

    (see later)

    o 3. segment registers:

     CS -> 'code segment': instructions an application has to execute

    (see later)

     DS -> 'data segment': the data your application needs (see later)

     ES -> 'extra segment': duh! (see later)

     SS -> 'stack segment': here we'll find the stack (see later)

    o 4. special:

    IP -> 'instruction pointer': points to the next instruction. Just leave it alone ;)

    ; iii. Doubleword-size registers:

    2 words = 4 bytes = 32 bits. EAX, EBX, ECX, EDX, EDI…

    If you find an 'E' in front of a 16-bits register, it means that you are dealing

    with a 32-bits register. So, AX = 16-bits; EAX = the 32-bits version of EAX.

    ---------------------------------------------------------------------------------------------

III. The flags:

    Flags are single bits which indicate the status of something. The flag register on modern 32bit CPUs is 32bit large. There are 32 different flags, but don't worry. You will mostly only need 3 of them in reversing. The Z-Flag, the O-Flag and the C-Flag. For reversing you need to know these flags to understand if a jump is executed or not. This register is in fact a collection of different 1-bit flags. A flag is a sign, just like a green light means: 'ok' and a red one 'not ok'. A flag can only be '0' or '1', meaning 'not set' or 'set'.

    ; The Z-Flag:

    o The Z-Flag (zero flag) is the most useful flag for cracking. It is used in about

    90% of all cases. It can be set (status: 1) or cleared (status: 0) by several

    opcodes when the last instruction that was performed has 0 as result. You

    might wonder why "CMP" (more on this later) could set the zero flag,

    because it compares something - how can the result of the comparison be 0?

    The answer on this comes later ;)