Assembly Language Programming and Shellcoding – Hello World

Hello all,

With this post, we will be starting the actual programming. We will be using various things we have seen in previous posts and write a simple hello world program. Additionally, we will analyse the executable in GDB and see how we can use GDB to analyse such codes. Let’s begin:

Important parts/keywords of assembly code:

Comments: To add comment in assembly language, we use semicolon (;) and then the comments. The first two lines of the code are example of comments.

global: GLOBAL is NASM directive. It will export the defined symbol in object code. The linker will read that symbol and its value in object code file. Then it will define where to put that symbol in the actual executable. What does this mean? Very simply:

  1. _start is marked GLOBAL symbol.
  2. Linker ld will put _start and its value in object code.
  3. ld knows that _start is the symbol from which execution of program must start. Hence it will accordingly link the files.

_start: As we have seen above, _start will decide from where execution flow starts. It is as good as main() function of C/C++.

section: Section keyword is used for defining memory segments which we have seen in previous posts. In this program we have used two segments: TEXT segment (used for keeping code) and DATA segment (used for defining variables).

General Code Execution

Normally code execution includes following steps:

  1. Initialization of program
  2. Jumps of program
  3. Graceful exit once everything done.

In above code as well,

  1. Initialization starts from _start,
  2. Defined variables will be fetched from .data section,
  3. Hello world will be printed on screen
  4. Program will be exited gracefully

Now to write this code we have to understand two syscalls: 1. Write 2. Exit

As discussed in previous blogpost, we can find these syscall numbers in following file:

/usr/src/linux-headers-4.13.0-36/arch/sh/include/uapi/asm/unistd_32.h

Now lets look into these syscalls in more detail:

WRITE Syscall: As we have seen in earlier post, Linux Programmer Manual (or Man pages) are best source of info. By executing man 2 write, we will get the details on how the syscall is used.

As we can see, we have to find couple of things as below:

  • Syscall number for WRITE: From unistd_32.h, we know that it is “4”.
  • File descriptor fd: This can be 0, 1, 2 for standard input, standard output and standard error, respectively. Since we are interested in output, it will be “1”.
  • Buffer name buf: This is name of the string we have defined in program. In our case it is “msg”
  • Size of buffer count: Now this is tricky. We can manually count the length of buffer. But for runtime calculation, we will use $-msg. Here $ represents the current location of assembler. What $-msg will do is subtract location of msg from current location of assembler, which is effectively length of buffer.

Exit Syscall: Exit syscall Manpage is shown below:

For successful exit, we need following details:

  • Syscall number for Exit syscall: From unistd_32.h, we know that it is “1”.
  • Status: Based on requirement, you can pass 0 (Exit_Success) or 1 (Exit_Failure). I’ll pass “0”.

As we have all required details, we are ready to write code.

Write the &%$#ing code already!!!!

Sure, lets begin!!!!!

  1. First things first… Add description of code and your name as comments.
; This is simple hello world code
; Author: SLAER (Shashank Gosavi)
  1. Define _start as global variable
global _start
  1. Define .text section for adding code
section .text
  1. Then define _start:
_start:
  1. Below _start, start with register sanitization i.e. assigning 0 to them. XORing register with itself will reset it to zero. MUL ECX will multiply EAX with ECX. It will make EAX=0 and overwrite EDX with 0s.
xor ecx, ecx    ; Clearing ECX
xor ebx, ebx  ; Clearing EBX
mul ecx         ; Clearing EAX, EDX
  1. Since we have reset all registers, we will move above values to respective registers for Write subroutine.

mov eax, 0x4    ; Moving Write syscall number into EAX
mov ebx, 0x1    ; Moving file descriptor into EBX
mov ecx, $msg   ; Moving actual buffer into ECX
mov edx, $len   ; Moving the count into EDX
int 0x80        ; Interrupt 80
  1. We will repeat same process for graceful exit.
mov eax, 0x1    ; Moving Exit sysscall number into EAX
mov ebx, 0x0    ; Moving status number = 0 in EBX
int 0x80        ; Interrupt 80
  1. Finally, we will define .DATA section and define buffer (msg) and its length (len)
section .data
msg: db "Hello World!",0x0A
len: equ $-msg

The final code will look like below:

; This is simple hello world code
; Author: SLAER (Shashank Gosavi)

global _start

section .text
_start:

xor ecx, ecx    ; Clearing ECX
xor ebx, ebx    ; Clearing EBX
mul ecx         ; Clearing EAX, EDX

; Write subroutine

mov eax, 0x4    ; Moving Write syscall number into EAX
mov ebx, 0x1    ; Moving file descriptor into EBX
mov ecx, $msg   ; Moving actual buffer into ECX
mov edx, $len   ; Moving the count into EDX
int 0x80        ; Interrupt 80

; Graceful Exit
mov eax, 0x1    ; Moving Exit sysscall number into EAX
mov ebx, 0x0    ; Moving status number = 0 in EBX
int 0x80        ; Interrupt 80

section .data
msg: db “Hello World!”,0x0A
len: equ $-msg

Its time to assemble and link it

For assembling the code, we will use following command.

$ nasm -o <output object_code_file> -ggdb -felf32 <source_code_file>
  • -f is option for selecting output format. “elf32” is Executable and Linkable Format for 32bit systems. This is important incase you are using (which I’m 99.99% certain) x64 machine.
  • -g is to generate debugging information. “gdb” is GDB compatible symbol format.

In our case it will be

$ nasm -o helloworld.o -felf32 helloworld.nasm

To link it to executable file, we will use following command.

$ ld -o <output_executable_file> <object_code_file>

Now the file is ready to execute. In our case, it will be like this:

$ ./helloworld

It’s time for GDB!!!!

GNU Debugger (GDB) is one of the most important tool while writing any low-level program. We will see the basic usage of GDB. Using GDB is simple. Here see yourself

$ gdb <options> <executable>

Well not really!!! Let me help you to understand some important options of GDB.

First type “$ gdb -h” to get all the available options. Out of all that the only option use is “-q” which is quit start. It just suppresses licensing info. Another option which can be helpful is “-p” which is used for attaching already running process to GDB.

So now the program is loaded, we will see some internal options.

Inspecting loaded executable inside GDB:

  • “info” (i) command is used for extracting information of various kind. Just type “(gdb) help info” to get every available option. Some useful options are:
  1. info registers — List of integer registers and their contents
  2. info symbol — Describe what symbol is at location ADDR
  3. info breakpoints — Status of specified breakpoints (all user-settable breakpoints if no argument)
  4. info files — Names of targets and files being debugged

You can try all of them

  • “break” (b) command is used for setting a break point. This breakpoint can be set against “address”, “function” etc. GDB also offers facility of conditional breakpoints, which we will be using multiple times.
  • “run” (r) command will run the loaded executable inside GDB.
  • “disassemble” (disas) is the command to disassemble the pointed instruction.
  • stepi command will help to execute step by step execution.
  • x command is used for Examining the memory in various formats. We will look it more details in future posts.
  • Print

So lets use these commands step by step:

  1. Load program in GDB. I’ve shown above so I’m not going to screenshot again.

2. Type following command to get details about list of symbols in the executable.

3. Now set breakpoint for _start function.

4. Now run the program.

You can observe that on running the program the break point is hit. This is because execution starts with _start.

5. Now we can check the registers etc. Lets do that.

As we can see since program is not running register values are mostly zero. Mind you that these values are relative.

6. Lets disassemble the code. Note that disas command can disassemble address, function or register value. In our case we have disassembled EIP register value, which is address of next instruction.

The arrow (=>) is showing next instruction to be executed.

7. Now lets see one interesting feature of GDB called Hook. Hook is basically used for binding number of instruction to be executed per instruction. So lets “define hook-stop”.

Here I have defined very simple hook. It will disassemble $eip and next 10 instructions, then display value of EAX, EBX, ECX, EDX respectively. On running program, you’ll get following output:

We can observe the step by step changes in the value of registers. So after couple of “stepi”s it will be something like below.

Update: I forgot to tell you one very important thing. By default follows ATT convention disassembled code. Above disassembly convention is ATT (full of $ and %). To change the convention to Intel, use following command:

(gdb) set disassembly-flavor intel

Now if you run “disas” command you can see following:

You can see the difference right.

8. Finally just type “c” to continue execution. It will execute the program to the end, if no other breakpoint present.

Stop this very long post already!!!!!

Yeah, yeah!! I guess I have cover the simple hello world program in very detail. So its good idea to stop now :D. In next blog, I’ll cover some more basic program in assembly. Long way to go!!!! Till then, Auf Wiedersehen!!!!!

The post Assembly Language Programming and Shellcoding – Hello World appeared first on ScriptDotSh.

Article Link: https://scriptdotsh.com/index.php/2018/07/29/hello-world/