Function Return Values and more from Assembler

The assembler programs I wrote before successfully called into C library functions, but I choose to only call functions that do not return any values I needed. This has to stop now.

For this article, I wrote something reminiscent of "Hello world", but with more functionality. My program should prompt the user to enter his name, and then read a string via scanf. If this succeeds, we will use printf and not only call greet him with his name, but also tell him how many characters his name has. If it does not succeed, we just print an error message and return with an exit code of one instead of the usual zero. Not too complex, but a good step up from the last programs.

The usual stuff

Neither the Makefile nor the openbsd-note.s should surprise you anymore. I include them here so you do not have to get them from past articles.

PRGNAME=greetings
OBJECTS=greetings.o openbsd-note.o

$PRGNAME: $(OBJECTS)
	ld --dynamic-linker=/usr/libexec/ld.so -L/usr/lib -lc -o $(PRGNAME) $(OBJECTS)

.s.o:
	as -g -o $@ $<

clean:
	-rm *.o
	-rm $(PRGNAME)

.section ".note.openbsd.ident", "a"
        .p2align   2
        .long      8,4,1
        .ascii      "OpenBSD\0"
        .long      0

greetings.s

.intel_syntax noprefix

.global _start

.text
_start:
   lea rbx, [rip + .L.buffer]

   lea rdi, [rip + .L.prompt]
   call puts@plt

   lea rdi, [rip + .L.scanf_format]
   mov rsi, rbx
   call scanf@plt

   cmp rax, 1
   jne .L.aborted

   call count_chars_in_rbx

   lea rdi, [rip + .L.print_format]
   mov rsi, rbx
   mov rdx, rax
   call printf@plt

   xor rdi, rdi
   call exit@plt

.L.aborted:
   lea rdi, [rip + .L.aborted_msg]
   call puts@plt

   mov rdi, 1
   call exit@plt

count_chars_in_rbx:
   xor rax, rax
   mov r8, rbx

.L.loop:
   mov r9b, [r8]
   cmp r9b, 0
   jz .L.endloop

   inc rax
   inc r8
   jmp .L.loop
   
.L.endloop:
   ret

.section .rodata
.L.prompt:
   .asciz "What's your name?"
.L.print_format:
   .asciz "Greetings, %s. Your name has %d characters.\n"
.L.scanf_format:
   .asciz "%s"
.L.aborted_msg:
   .asciz "No input given"

.bss
.L.buffer:
   .skip 1024

This will take some more explaining than the last program. We start by stating our syntax preferences and declaring _start as a global symbol. As you can see, I decided against listing all used C functions as globals - it does not do anything for us. The behaviour of the GNU assembler as(1) is to treat all unresolved symbols as external. By the way, if you still want to declare them, there is also an ".extern" directive which would be more correct, but it is documented to do nothing for as(1). You might want to use it for compatibility with other assemblers, but at the moment I am perfectly fine if my stuff works with that one assembler I have.

Let's skip the text section for a moment. Below it, you find two new sections. The first one is marked as ".section .rodata". You already know ".section" from the openbsd-note.s, so it is easy to see that this directive puts the stuff following it into a section called ".rodata" - read only data. If you have constants, this is the best place for them - I should have used them for the "Hello world" constant in my first article about assembler programming.

The second new section is called bss. This section is similar to the data section: You can store values there that can be changed (unlike rodata). But bss has a special ability: The contents of the section is automatically initialized to zero. This is used for static variables in C that are not initialized to an explicit value. This initialization means that the executable does not need to store the exact contents of the section - only the size is needed. This is useful for buffers like the one we have here. I use the ".skip" directive to give my symbol the space I want it to have without defining it's value.

_start

We will use our buffer, .L.buffer, multiple times in our program. I don't want to repeat the LEA instruction more often than necessary, so I use it only once to store the address into rbx. If you remember, rbx is one of the callee-saved registers that will not be overwritten when we call C functions.

The next two lines are a simple call to puts. You know the drill.

The block following that contains a call to scanf. I was a bit disappointed how little magic there is to calling it, scanf being a variadic function. However, at least with just two parameters it is straight-forward: Put the format string into rdi, put the pointer to the target into rsi. After the call, rax contains the result value of scanf.

Next, we compare this result value with one, the number of values we expect scanf to read. If our expectation does not hold (because of an EOF), we jump to an error handler below.

If we did not jump, we can continue and count the characters we've read into the buffer. I could have used strlen for this, of course, but at this point this would have not helped learn anything new. You would have to copy rbx into rdi again, call strlen@plt and get the result value from rax. I wrote count_chars_in_rbx so that it will contain the result value in rax so that you could easily swap this function for strlen if you want to - I even used strlen while prototyping, so I know it works.

The next block calls printf to present our results. We call printf with three parameters, so we load the address of the format string into rdi, copy rbx into rsi for the second parameter, and finally copy the character count from rax into rdx. We could have saved this one copy by just changing count_chars_in_rbx to returning the count in rdx directly. Just a reminder that I write this programs to learn, they are not perfectly optimized yet!

After printf, we exit the program with exit code zero. This means that we will not reach the code below - here begins the error handler mentioned before ... which is not very magical itself, as we can see: Just a call to puts and another to exit, this time with an exit code of one.

count_chars_in_rbx

count_chars_in_rbx begins by zeroing rax, which will be used as a counter. It also copies rbx into r8. Here we will store our iterator - we don't want to change rbx directly, but we want to advance the pointer to take a look at the characters.

The main part of the function is written as a loop. Inside the loop, we first check whether the character pointed at by r8 is zero. We only want to load and compare single bytes, so we use r9b instead of r9 for this. For the "new" registers r8-r15, you can access them as r8b, r8w, r8d to limit them to 8, 16 or 32 bits respectively. The old registers like rax can be accessed as al, ax or eax for this (also, you can get the upper half of ax by calling it ah - this does not work with the new registers; there is no r9h).

If we found our zero character, we leave the loop via a jump. Otherwise, we increment our character counter, move our iterator to the next character and jump back to the top of the loop.

After the loop has finished, we just return to where we have been called from.

And that's it for this program!

Why the funny label names?

Of the symbols I declared myself, only two seem "normal": _start and count_chars_in_rbx. All other symbols start with ".L." - what's up with that?

According to the GNU Assembler's documentation, '.L' is the prefix for local labels - that is, for labels that are only used while assembling and that do not occur in the symbol table. You can check with nm(1) that our greetings binary does not contain any of them. If you go back to the "Hello world" example from the beginning, you will also find the "msg" symbol for the string constant with nm(1). Here is a side-by-side comparison:

00003000 c _DYNAMIC                      00002000 c _DYNAMIC
000030d0 d _GLOBAL_OFFSET_TABLE_         000020d0 d _GLOBAL_OFFSET_TABLE_
00004000 B _end                          00003400 B _end
00001000 T _start                        00001000 T _start
                                         0000105f t count_chars_in_rbx
         U exit                                   U exit
00002000 d msg                                    U printf
         U puts                                   U puts
                                                  U scanf

I want to keep my binary small and clean, so I choose to adopt this prefix. The second dot is just my personal taste - I find it ".L.endloop" easier to read than ".Lendloop" - what is an lendloop?