Porting Assembler code to Linux
Sadly, OpenBSD is not the most used and most supported operating system in the
world. Therefore, we might have to port our programs for other operating
systems, like Linux (which is not the worst "other operating system" out there).
One of the great advances made through high-level programming languages like C
were that they were portable: You did not have to rewrite your program for every
new platform. Assembler code does not have this advantage - when we want to run
it on an ARM platform, we will have to rewrite it for certain! But if we stay on
our comfortable amd64, how much will we have to change? Let's find out!
Suddenly ... bash
Let's take our code form the last article and move it to a Linux system. I have
a shell account on ctrl-c.club, which offers a not-so-recent Ubuntu 14.04
installation that will be good enough for our experiment.
The first pleasant surprise is that our Makefile just works. However, running
the binary brings us the following result:
-bash: ./asmthread: No such file or directory
I'm 100% percent positive that the file exists. What gives?
Who runs an executable?
You might know about the shebang, that is, a notation used in the first line of
a script that tells our Unix system of choice which interpreter to use. A shell
script might have a first line like
#!/bin/sh
and a perl script might start with
#!/usr/bin/perl
This notation is simple and flexible and in my eyes clearly superior to the way
Windows does the same thing via the file extension. The shebang allows us to
write code in any language we want (well, as long as the needed runtime
environment is available) and the caller does not need to know. A program like
/usr/bin/false might be a compiled program, but it could be implemented as a
shell script just as easily.
Current Unix systems use the ELF format for their compiled programs (and
libraries). ELF is a binary format, so it does not allow that the file begins
with a textual shebang. As we have already seen, a program compiled as an ELF
file contains many sections, like .text, .bss or .rodata - could their be a
shebang section?
Let's use objdump(1) to find out!
$ objdump -h asmthread
asmthread:     file format elf64-x86-64
Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .interp       00000013  0000000000000270  0000000000000270  00000270  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .note.openbsd.ident 00000018  0000000000000284  0000000000000284  00000284  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .dynsym       00000090  00000000000002a0  00000000000002a0  000002a0  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .gnu.hash     00000020  0000000000000330  0000000000000330  00000330  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .hash         00000038  0000000000000350  0000000000000350  00000350  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .dynstr       0000004e  0000000000000388  0000000000000388  00000388  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .rela.plt     00000060  00000000000003d8  00000000000003d8  000003d8  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  7 .rodata       00000024  0000000000000438  0000000000000438  00000438  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  8 .text         000000ac  0000000000001460  0000000000001460  00000460  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  9 .plt          000000b0  0000000000001510  0000000000001510  00000510  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 10 .dynamic      000000e0  00000000000025c0  00000000000025c0  000005c0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 11 .got.plt      00000038  00000000000026a0  00000000000026a0  000006a0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 12 .data         00000000  00000000000036d8  00000000000036d8  000006d8  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 13 .bss          00000008  00000000000036d8  00000000000036d8  000006d8  2**2
                  ALLOC
 14 .debug_line   00000063  0000000000000000  0000000000000000  000006d8  2**0
                  CONTENTS, READONLY, DEBUGGING
 15 .debug_info   0000005f  0000000000000000  0000000000000000  0000073b  2**0
                  CONTENTS, READONLY, DEBUGGING
 16 .debug_abbrev 00000014  0000000000000000  0000000000000000  0000079a  2**0
                  CONTENTS, READONLY, DEBUGGING
 17 .debug_aranges 00000030  0000000000000000  0000000000000000  000007b0  2**4
                  CONTENTS, READONLY, DEBUGGING
 18 .comment      00000013  0000000000000000  0000000000000000  000007e0  2**0
                  CONTENTS, READONLY
Well, that is a long list for such a simple program! We immediately recognize
.text and friends, just as expected. We can also see that the PLT we have used
for calling into libc or other C libraries seems to use some ELF sections. And
even if it is no .shebang, the first section named .interp sounds interesting -
could it be the interpreter? Let's find out!
$ objdump -j .interp -s asmthread asmthread: file format elf64-x86-64 Contents of section .interp: 0270 2f757372 2f6c6962 65786563 2f6c642e /usr/libexec/ld. 0280 736f00 so.
It contains a single string, "/usr/libexec/ld.so" - that is, the dynamic linker
we have been setting for every binary we have written so far. Let's now compare
that to a working binary on our Linux system!
objdump -j .interp -s `which sh` /bin/sh: file format elf64-x86-64 Contents of section .interp: 0238 2f6c6962 36342f6c 642d6c69 6e75782d /lib64/ld-linux- 0248 7838362d 36342e73 6f2e3200 x86-64.so.2.
Okay, that one is different. Maybe it works better if we pass that value to the
linker?
Fixing the Makefile for Linux
PRGNAME=asmthread OBJECTS=asmthread.o #openbsd-note.o $PRGNAME: $(OBJECTS) ld --dynamic-linker=/lib64/ld-linux-x86-64.so.2 -L/usr/lib -lc -lpthread -o $(PRGNAME) \ $(OBJECTS) .s.o: as -g -o $@ $< clean: -rm *.o -rm $(PRGNAME)
And just with that small change, our binary works just as well as on OpenBSD!
You can also remove the openbsd-note.s from the project (I left it as a comment
in the Makefile for you to see) - Linux does not need it and does not seem to
have an equivalent.
Is it always that simple?
No, of course not. Even for the same CPU architecture and with closely related
operating systems like OpenBSD and Linux, there are many subtly differences that
might trip you.
One thing that tremendously helps us is that we decided to forgo making syscalls
and instead leave that to libc. Syscall numbers, parameters and sometimes even
the calling conventions differ greatly between operating systems, which makes
code that uses them directly non-portable. The interface of standard C libraries
however are as portable as the word "standard" implies.
That alone, however, does not guarantee that our Assembler code will just work
between OpenBSD and Linux. Their are low-level features like the way TLS
(thread-local storage) works, that will not work that way. Linux does use a
special .tbss section for thread-local data, while OpenBSD uses some special
functions like __emutls_get_address which are defined in libcompiler_rt.a (which
might also be supported for Linux? That is a rabbit hole for another day).
So, don't get your hopes up that changing the .interp section for a
closed-source Linux game might be enough to run it on OpenBSD. But also keep in
mind that Assembler code can, in fact, be ported without rewriting it completely
- in limited cases at least.