Porting Assembler code to Linux

Sadly, OpenBSD is not the most used and most supported operating system in the world. Therefore, we might have to port our programs for other operating systems, like Linux (which is not the worst "other operating system" out there).

One of the great advances made through high-level programming languages like C were that they were portable: You did not have to rewrite your program for every new platform. Assembler code does not have this advantage - when we want to run it on an ARM platform, we will have to rewrite it for certain! But if we stay on our comfortable amd64, how much will we have to change? Let's find out!

Suddenly ... bash

Let's take our code form the last article and move it to a Linux system. I have a shell account on ctrl-c.club, which offers a not-so-recent Ubuntu 14.04 installation that will be good enough for our experiment.

ctrl-c.club

The first pleasant surprise is that our Makefile just works. However, running the binary brings us the following result:

-bash: ./asmthread: No such file or directory

I'm 100% percent positive that the file exists. What gives?

Who runs an executable?

You might know about the shebang, that is, a notation used in the first line of a script that tells our Unix system of choice which interpreter to use. A shell script might have a first line like

#!/bin/sh

and a perl script might start with

#!/usr/bin/perl

This notation is simple and flexible and in my eyes clearly superior to the way Windows does the same thing via the file extension. The shebang allows us to write code in any language we want (well, as long as the needed runtime environment is available) and the caller does not need to know. A program like /usr/bin/false might be a compiled program, but it could be implemented as a shell script just as easily.

Current Unix systems use the ELF format for their compiled programs (and libraries). ELF is a binary format, so it does not allow that the file begins with a textual shebang. As we have already seen, a program compiled as an ELF file contains many sections, like .text, .bss or .rodata - could their be a shebang section?

Let's use objdump(1) to find out!

$ objdump -h asmthread

asmthread:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .interp       00000013  0000000000000270  0000000000000270  00000270  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .note.openbsd.ident 00000018  0000000000000284  0000000000000284  00000284  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .dynsym       00000090  00000000000002a0  00000000000002a0  000002a0  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .gnu.hash     00000020  0000000000000330  0000000000000330  00000330  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .hash         00000038  0000000000000350  0000000000000350  00000350  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .dynstr       0000004e  0000000000000388  0000000000000388  00000388  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .rela.plt     00000060  00000000000003d8  00000000000003d8  000003d8  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  7 .rodata       00000024  0000000000000438  0000000000000438  00000438  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  8 .text         000000ac  0000000000001460  0000000000001460  00000460  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  9 .plt          000000b0  0000000000001510  0000000000001510  00000510  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 10 .dynamic      000000e0  00000000000025c0  00000000000025c0  000005c0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 11 .got.plt      00000038  00000000000026a0  00000000000026a0  000006a0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 12 .data         00000000  00000000000036d8  00000000000036d8  000006d8  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 13 .bss          00000008  00000000000036d8  00000000000036d8  000006d8  2**2
                  ALLOC
 14 .debug_line   00000063  0000000000000000  0000000000000000  000006d8  2**0
                  CONTENTS, READONLY, DEBUGGING
 15 .debug_info   0000005f  0000000000000000  0000000000000000  0000073b  2**0
                  CONTENTS, READONLY, DEBUGGING
 16 .debug_abbrev 00000014  0000000000000000  0000000000000000  0000079a  2**0
                  CONTENTS, READONLY, DEBUGGING
 17 .debug_aranges 00000030  0000000000000000  0000000000000000  000007b0  2**4
                  CONTENTS, READONLY, DEBUGGING
 18 .comment      00000013  0000000000000000  0000000000000000  000007e0  2**0
                  CONTENTS, READONLY

Well, that is a long list for such a simple program! We immediately recognize .text and friends, just as expected. We can also see that the PLT we have used for calling into libc or other C libraries seems to use some ELF sections. And even if it is no .shebang, the first section named .interp sounds interesting - could it be the interpreter? Let's find out!

$ objdump -j .interp -s asmthread

asmthread:     file format elf64-x86-64

Contents of section .interp:
 0270 2f757372 2f6c6962 65786563 2f6c642e  /usr/libexec/ld.
 0280 736f00                               so.

It contains a single string, "/usr/libexec/ld.so" - that is, the dynamic linker we have been setting for every binary we have written so far. Let's now compare that to a working binary on our Linux system!

objdump -j .interp -s `which sh`

/bin/sh:     file format elf64-x86-64

Contents of section .interp:
 0238 2f6c6962 36342f6c 642d6c69 6e75782d  /lib64/ld-linux-
 0248 7838362d 36342e73 6f2e3200           x86-64.so.2.

Okay, that one is different. Maybe it works better if we pass that value to the linker?

Fixing the Makefile for Linux

PRGNAME=asmthread
OBJECTS=asmthread.o #openbsd-note.o

$PRGNAME: $(OBJECTS)
	ld --dynamic-linker=/lib64/ld-linux-x86-64.so.2 -L/usr/lib -lc -lpthread -o $(PRGNAME) \
		$(OBJECTS) 

.s.o:
	as -g -o $@ $<

clean:
	-rm *.o
	-rm $(PRGNAME)

And just with that small change, our binary works just as well as on OpenBSD! You can also remove the openbsd-note.s from the project (I left it as a comment in the Makefile for you to see) - Linux does not need it and does not seem to have an equivalent.

Is it always that simple?

No, of course not. Even for the same CPU architecture and with closely related operating systems like OpenBSD and Linux, there are many subtly differences that might trip you.

One thing that tremendously helps us is that we decided to forgo making syscalls and instead leave that to libc. Syscall numbers, parameters and sometimes even the calling conventions differ greatly between operating systems, which makes code that uses them directly non-portable. The interface of standard C libraries however are as portable as the word "standard" implies.

That alone, however, does not guarantee that our Assembler code will just work between OpenBSD and Linux. Their are low-level features like the way TLS (thread-local storage) works, that will not work that way. Linux does use a special .tbss section for thread-local data, while OpenBSD uses some special functions like __emutls_get_address which are defined in libcompiler_rt.a (which might also be supported for Linux? That is a rabbit hole for another day).

So, don't get your hopes up that changing the .interp section for a closed-source Linux game might be enough to run it on OpenBSD. But also keep in mind that Assembler code can, in fact, be ported without rewriting it completely - in limited cases at least.