Printing command line arguments with assembly

I took some time improving my technical surroundings (the blog now has an RSS feed and I have a new notebook, a refurbished Thinkpad X240), but now am back with more assembler programming. For this article, I tried to access and print all command line arguments (the contents of what would be argv if I were programming C).

I currently have only the amd64 version, but might return with an i386 version at a later time.

Makefile

I improved my assembly makefile a little bit. I now use variables for the output file name and my objects, and I also switched to using a suffix rule for the actual assembler operations. This makes it very easy to take the Makefile for one project and put it to use for another one.

PRGNAME=printargs
OBJECTS=args.o openbsd-note.o

$PRGNAME: $(OBJECTS)
	ld --dynamic-linker=/usr/libexec/ld.so -L/usr/lib -lc -o $(PRGNAME) $(OBJECTS)

.s.o:
	as -g -o $@ $<

clean:
	-rm *.o
	-rm $(PRGNAME)

openbsd-note.s

The openbsd-note.s is the same as before. As it just works, I had no reason to improve anything at the moment. I still want to find out what the format is and maybe create a self-documenting version were I use symbolic names instead of plain numbers, but this has no priority right now.

.section ".note.openbsd.ident", "a"
        .p2align   2
        .long      8,4,1
        .ascii      "OpenBSD\0"
        .long      0

... and the real deal

The following assembler does the equivalent of this C code:

#include <stdio.h>

int main(int argc, char** argv)
{
    while(argc--)
        puts(*argv++);
}

... which is more succinct than I would normally write it, but it sets the focus on the right points: We get the number of arguments (including the program name itself) from main's parameters and check whether it is zero. Which it won't ever be on the first iteration. We then decrement it. Every time we enter the loop, we print one entry from the argv array and advance the pointer to were the next entry is (or would be, when it is the last iteration).

There are some subtle differences in how this C code is executed compared to my assembler version, but I don't think they are important here. As assembler source, it looks like this:

.intel_syntax noprefix

.global _start

.global puts
.global exit

.text
_start:
   mov r12, [rsp]
   mov r13, rsp
loop:
   cmp r12, 0
   jz end
   dec r12
   add r13, 8

   mov rdi, [r13]
   call puts@plt

   jmp loop

end:
   xor rdi, rdi
   call exit@plt

We start again by setting my preferred syntax options. Then we declare our public symbols and start with the definition of _start inside the text segment (which is the only segment we are using in this source file).

At the moment we start executing _start, our stack pointer (rsp) points to the number of arguments. Below that, we have a pointer for each argument string. While I could work with rsp directly, I prefer to copy both the number and the first argument to registers. I choose r12 and r13 for this, but could have used other registers. According to the SystemV AMD64 calling conventions, the registers rbx, rbp as well as r12 to r15 are callee-saved. This means that any function that wants to use them must restore their previous values before returning. All other registers could be overwritten by any function call I do - I would have to save them myself before calling the function, and to restore them myself after the call. Meh ...

Side note: I hope that the operating system does not expect me to restore the callee-saved registers before exiting the process. OpenBSD-current seems not to mind, but that is never a proof that I did everything correctly.

Anyway, I copy the number of arguments into r12 by dereferencing rsp. Into r13, I copy the plain value of rsp. I do not change it myself, but if I were to put anything on the stack, rsp would change and I could still find my arguments via r13. After this two MOV instructions, r13 contains the address of our argument count (&argc in C). We will have to fix this before the first puts call.

I now begin looping. There are many ways to write a loop in assembler - here, I decided to put the condition and the modifications of the loop variables at the top. I first compare r12 with zero. This sets some flags that tell if r12 was less than, equal or greater than zero. These flags can be used via conditional jump operations like JZ (Jump if Zero). This jump is only executed when r12 is zero.

If we did not jump to the end label, we will now decrement r12 and advance r13 by the size of one pointer. For the first iteration, r13 is now pointing at the pointer to the program name (&argv[0]). If we want to print the string behind that pointer, we will have to dereference it one more time. This is done with the MOV into rdi. If you remember my last article, rdi is the register were the ABI expects the first function parameter. We are set to call puts! Which works exactly like the last time ...

We now jump back to the loop label - this time with JMP, unconditionally. Back to comparing and decrementing r12 and to advancing r13, so that we can print the next argument.

This goes until r12 reaches zero. At that point, we jump to the end label, and perform one last call to exit with exit code zero.