Index RSS

Mirror your favourite git repositories!

I will start this article with a small rant. If you just want to read how I mirror git repositories on my server, feel free to skip to the corresponding subheading. I won't be mad, I promise!

Our Internet today is a miracle, at least for everyone who lived to see a time before computers being always online and, further more, pocket devices that also connect us all the time.

Back in the old days, when you wanted to install software on your notebook, you had to fetch the CD and install it from there. My first experience with Linux was a five disk set of SuSE containing not only the operating system (what I expected, knowing nothing but Windows), but also a huge amount of other software. Everything you could possible want was always there, just a disk away!

When I switched to Linux many years later, the Internet already was ubiquitous. It was just glorious: No need for CDs anymore (except the CD I installed Debian from), just enter

apt-get widelands

and enjoy a nice, calm strategy game. This was awesome - and still is!

Thanks to public repositories, we can easily access sources for vast amounts of software. But sometimes, we tend to forget what we lost on the way. We now rely on the network to always be available and to keep offering the software we need. We depend on being online.

If I wanted to install my old Turbo Basic compiler, all I would have to do is go grab that CD and install it from there. Well, I would need to find a computer with a CD drive (and maybe with Windows installed on it ... if I remember correctly, my compiler did not compile native DOS programs anymore, and I would not expect Wine to support it, but who knows?).

If I were to loose my Internet access or if someone would take the software I want offline, I would be screwed.

So, what's up with this rant? Just yesterday, the Github repository for youtube-dl was taken down via a DMCA claim from the RIAA. This is not (yet) as horrible as being unable to ever install it again, as there are many mirrors and you can still find a package for OpenBSD 6.8 and many other operating systems, but it reminds me sternly of two things.

Number 1: Even software that is essential for you to survive in our modern IT world can disappear. I use youtube-dl daily to watch videos on the Internet without having to endure the user-unfriendly web sites they created to host these videos. While no one can take the version I have from me (even the Wayback machine has a very recent copy), I wonder how long it will keep on working without developers improving it.

Number 2: Just clicking the fork button on Github is not enough to create a personal backup of software you want to keep! I did so a few times as a safety measure against a developer deciding to delete his repository himself. But if Github decided to take a repository down for good, it would be no problem to also remove all forked copies on their site. If you don't pay them for it, they are free to cease their service as they see fit!

What to do about it


Long story short, I will create mirrors for important software on my own server. I don't plan to make them public, but I recommend you to also backup whatever software you have on Github, Bitbucket and other SaaS repository hosting providers, so that we won't have to regret that we didn't when something goes missing.

This is even more important for software you have written yourself. Always keep a backup on different sites, the chances are smaller that someone else will back it up for you!

My solution is simple, but not perfect. I created a new user with /sbin/nologin as the login shell and cloned the repositories I cared for into his home directory. If you are not sure how to do that, the command I used is

git clone --mirror $REPOSITORY_URL

which creates a bare repository with all data from the origin.

Further, I created a small perl script (could have been shell, it really is very simple) that goes into each cloned repository and calls

git fetch

in it. I put this script in the user's crontab so that I can get weekly updates to my mirrored repositories. It is not a perfect solution, but it restores some peace of mind.

Here is the script:

#!/usr/bin/perl
use v5.30;
use warnings;

foreach my $dir (glob '*.git')
{
    next unless -d $dir;
    chdir $dir;
    print `pwd`;
    print `git fetch`;
    chdir '..';
}

I don't care if you use it or create your own solution, but please don't just rely on Github!