Porting Assembler code to Linux
Sadly, OpenBSD is not the most used and most supported operating system in the
world. Therefore, we might have to port our programs for other operating
systems, like Linux (which is not the worst "other operating system" out there).
One of the great advances made through high-level programming languages like C
were that they were portable: You did not have to rewrite your program for every
new platform. Assembler code does not have this advantage - when we want to run
it on an ARM platform, we will have to rewrite it for certain! But if we stay on
our comfortable amd64, how much will we have to change? Let's find out!
Suddenly ... bash
Let's take our code form the last article and move it to a Linux system. I have
a shell account on ctrl-c.club, which offers a not-so-recent Ubuntu 14.04
installation that will be good enough for our experiment.
The first pleasant surprise is that our Makefile just works. However, running
the binary brings us the following result:
-bash: ./asmthread: No such file or directory
I'm 100% percent positive that the file exists. What gives?
Who runs an executable?
You might know about the shebang, that is, a notation used in the first line of
a script that tells our Unix system of choice which interpreter to use. A shell
script might have a first line like
#!/bin/sh
and a perl script might start with
#!/usr/bin/perl
This notation is simple and flexible and in my eyes clearly superior to the way
Windows does the same thing via the file extension. The shebang allows us to
write code in any language we want (well, as long as the needed runtime
environment is available) and the caller does not need to know. A program like
/usr/bin/false might be a compiled program, but it could be implemented as a
shell script just as easily.
Current Unix systems use the ELF format for their compiled programs (and
libraries). ELF is a binary format, so it does not allow that the file begins
with a textual shebang. As we have already seen, a program compiled as an ELF
file contains many sections, like .text, .bss or .rodata - could their be a
shebang section?
Let's use objdump(1) to find out!
$ objdump -h asmthread asmthread: file format elf64-x86-64 Sections: Idx Name Size VMA LMA File off Algn 0 .interp 00000013 0000000000000270 0000000000000270 00000270 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 1 .note.openbsd.ident 00000018 0000000000000284 0000000000000284 00000284 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 2 .dynsym 00000090 00000000000002a0 00000000000002a0 000002a0 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA 3 .gnu.hash 00000020 0000000000000330 0000000000000330 00000330 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA 4 .hash 00000038 0000000000000350 0000000000000350 00000350 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 5 .dynstr 0000004e 0000000000000388 0000000000000388 00000388 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 6 .rela.plt 00000060 00000000000003d8 00000000000003d8 000003d8 2**3 CONTENTS, ALLOC, LOAD, READONLY, DATA 7 .rodata 00000024 0000000000000438 0000000000000438 00000438 2**0 CONTENTS, ALLOC, LOAD, READONLY, DATA 8 .text 000000ac 0000000000001460 0000000000001460 00000460 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 9 .plt 000000b0 0000000000001510 0000000000001510 00000510 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE 10 .dynamic 000000e0 00000000000025c0 00000000000025c0 000005c0 2**3 CONTENTS, ALLOC, LOAD, DATA 11 .got.plt 00000038 00000000000026a0 00000000000026a0 000006a0 2**3 CONTENTS, ALLOC, LOAD, DATA 12 .data 00000000 00000000000036d8 00000000000036d8 000006d8 2**2 CONTENTS, ALLOC, LOAD, DATA 13 .bss 00000008 00000000000036d8 00000000000036d8 000006d8 2**2 ALLOC 14 .debug_line 00000063 0000000000000000 0000000000000000 000006d8 2**0 CONTENTS, READONLY, DEBUGGING 15 .debug_info 0000005f 0000000000000000 0000000000000000 0000073b 2**0 CONTENTS, READONLY, DEBUGGING 16 .debug_abbrev 00000014 0000000000000000 0000000000000000 0000079a 2**0 CONTENTS, READONLY, DEBUGGING 17 .debug_aranges 00000030 0000000000000000 0000000000000000 000007b0 2**4 CONTENTS, READONLY, DEBUGGING 18 .comment 00000013 0000000000000000 0000000000000000 000007e0 2**0 CONTENTS, READONLY
Well, that is a long list for such a simple program! We immediately recognize
.text and friends, just as expected. We can also see that the PLT we have used
for calling into libc or other C libraries seems to use some ELF sections. And
even if it is no .shebang, the first section named .interp sounds interesting -
could it be the interpreter? Let's find out!
$ objdump -j .interp -s asmthread asmthread: file format elf64-x86-64 Contents of section .interp: 0270 2f757372 2f6c6962 65786563 2f6c642e /usr/libexec/ld. 0280 736f00 so.
It contains a single string, "/usr/libexec/ld.so" - that is, the dynamic linker
we have been setting for every binary we have written so far. Let's now compare
that to a working binary on our Linux system!
objdump -j .interp -s `which sh` /bin/sh: file format elf64-x86-64 Contents of section .interp: 0238 2f6c6962 36342f6c 642d6c69 6e75782d /lib64/ld-linux- 0248 7838362d 36342e73 6f2e3200 x86-64.so.2.
Okay, that one is different. Maybe it works better if we pass that value to the
linker?
Fixing the Makefile for Linux
PRGNAME=asmthread OBJECTS=asmthread.o #openbsd-note.o $PRGNAME: $(OBJECTS) ld --dynamic-linker=/lib64/ld-linux-x86-64.so.2 -L/usr/lib -lc -lpthread -o $(PRGNAME) \ $(OBJECTS) .s.o: as -g -o $@ $< clean: -rm *.o -rm $(PRGNAME)
And just with that small change, our binary works just as well as on OpenBSD!
You can also remove the openbsd-note.s from the project (I left it as a comment
in the Makefile for you to see) - Linux does not need it and does not seem to
have an equivalent.
Is it always that simple?
No, of course not. Even for the same CPU architecture and with closely related
operating systems like OpenBSD and Linux, there are many subtly differences that
might trip you.
One thing that tremendously helps us is that we decided to forgo making syscalls
and instead leave that to libc. Syscall numbers, parameters and sometimes even
the calling conventions differ greatly between operating systems, which makes
code that uses them directly non-portable. The interface of standard C libraries
however are as portable as the word "standard" implies.
That alone, however, does not guarantee that our Assembler code will just work
between OpenBSD and Linux. Their are low-level features like the way TLS
(thread-local storage) works, that will not work that way. Linux does use a
special .tbss section for thread-local data, while OpenBSD uses some special
functions like __emutls_get_address which are defined in libcompiler_rt.a (which
might also be supported for Linux? That is a rabbit hole for another day).
So, don't get your hopes up that changing the .interp section for a
closed-source Linux game might be enough to run it on OpenBSD. But also keep in
mind that Assembler code can, in fact, be ported without rewriting it completely
- in limited cases at least.