Mirror your favourite git repositories!
I will start this article with a small rant. If you just want to read how I
mirror git repositories on my server, feel free to skip to the corresponding
subheading. I won't be mad, I promise!
Our Internet today is a miracle, at least for everyone who lived to see a time
before computers being always online and, further more, pocket devices that also
connect us all the time.
Back in the old days, when you wanted to install software on your notebook, you
had to fetch the CD and install it from there. My first experience with Linux
was a five disk set of SuSE containing not only the operating system (what I
expected, knowing nothing but Windows), but also a huge amount of other
software. Everything you could possible want was always there, just a disk away!
When I switched to Linux many years later, the Internet already was
ubiquitous. It was just glorious: No need for CDs anymore (except the CD I
installed Debian from), just enter
apt-get widelands
and enjoy a nice, calm strategy game. This was awesome - and still is!
Thanks to public repositories, we can easily access sources for vast amounts of
software. But sometimes, we tend to forget what we lost on the way. We now rely
on the network to always be available and to keep offering the software we
need. We depend on being online.
If I wanted to install my old Turbo Basic compiler, all I would have to do is go
grab that CD and install it from there. Well, I would need to find a computer
with a CD drive (and maybe with Windows installed on it ... if I remember
correctly, my compiler did not compile native DOS programs anymore, and I would
not expect Wine to support it, but who knows?).
If I were to loose my Internet access or if someone would take the software I
want offline, I would be screwed.
So, what's up with this rant? Just yesterday, the Github repository for
youtube-dl was taken down via a DMCA claim from the RIAA. This is not (yet) as
horrible as being unable to ever install it again, as there are many mirrors and
you can still find a package for OpenBSD 6.8 and many other operating systems,
but it reminds me sternly of two things.
Number 1: Even software that is essential for you to survive in our modern IT
world can disappear. I use youtube-dl daily to watch videos on the Internet
without having to endure the user-unfriendly web sites they created to host
these videos. While no one can take the version I have from me (even the Wayback
machine has a very recent copy), I wonder how long it will keep on working
without developers improving it.
Number 2: Just clicking the fork button on Github is not enough to create a
personal backup of software you want to keep! I did so a few times as a safety
measure against a developer deciding to delete his repository himself. But if
Github decided to take a repository down for good, it would be no problem to
also remove all forked copies on their site. If you don't pay them for it, they
are free to cease their service as they see fit!
What to do about it
Long story short, I will create mirrors for important software on my own
server. I don't plan to make them public, but I recommend you to also backup
whatever software you have on Github, Bitbucket and other SaaS repository
hosting providers, so that we won't have to regret that we didn't when something
goes missing.
This is even more important for software you have written yourself. Always keep
a backup on different sites, the chances are smaller that someone else will back
it up for you!
My solution is simple, but not perfect. I created a new user with /sbin/nologin
as the login shell and cloned the repositories I cared for into his home
directory. If you are not sure how to do that, the command I used is
git clone --mirror $REPOSITORY_URL
which creates a bare repository with all data from the origin.
Further, I created a small perl script (could have been shell, it really is very
simple) that goes into each cloned repository and calls
git fetch
in it. I put this script in the user's crontab so that I can get weekly updates
to my mirrored repositories. It is not a perfect solution, but it restores some
peace of mind.
Here is the script:
#!/usr/bin/perl use v5.30; use warnings; foreach my $dir (glob '*.git') { next unless -d $dir; chdir $dir; print `pwd`; print `git fetch`; chdir '..'; }
I don't care if you use it or create your own solution, but please don't just
rely on Github!