GitHub drops SVN support, and breaks legacy software
This short article is about an error that we are starting to see more often when building unpackaged software. The error reads as follows:
svn: E170013: Unable to connect to a repository at URL
'https://github.com/.../.../trunk/...'
As it turns out, this is due to dropping SVN support from GitHub at the beginning of this year, which was announced a year earlier here https://github.blog/2023-01-20-sunsetting-subversion-support/
Bluntly said, this was a unfortunate move from GitHub that broke much of our legacy software. Today (6 months after the deprecation), there is still not a simple way to download an (archived) copy of a GitHub directory from the command line. The GitHub CLI does not offer a simple way, either, nor does the API (while accounting for subdirectories). Git’s sparse indices and partial clones are great new features that have been added to “git” but as far as I can tell, they are intended for other (less mundane) use cases, and suggesting to use them in the sunsetting article is, to say the least, problematic.
In many legacy Makefile workflows we do not want to clone anything, in fact, and all that we want to do is to download dependencies and obtain dependency code.
For example, blockly.games is a nice website that offers educational games for young programmers, which is based on the Google Blockly library and is open-source (https://github.com/google/blockly-games).
In the Makefile of this software project, we see the following line:
svn export --force \
https://github.com/ajaxorg/ace-builds/trunk/src-min-noconflict/ \
appengine/third-party/ace
What this line of bash code tries to do, is to save the contents of the “src-min-noconflict” folder, from the ajaxorg/ace-builds repository on GitHub, into “appengine/third-party/ace” locally within the working directory.
This is different from cloning the repository or a part of it, as the intention is simply to save the latest version of the files, which are being used as dependencies by the blockly-games website.
In the absence of SVN support, the closest I could find that stands a chance of allowing me to fix this legacy Makefile in a simple way was to use the source-code archive URL feature of GitHub.
Using this feature, the line above will have to be replaced with the following, much-longer and harder-to-maintain, equivalent:
DEST_DIR=appengine/third-party/ace
SOURCE_URL=https://github.com/ajaxorg/ace-builds/archive/refs/heads/master.tar.gz
SOURCE_DIR=ace-builds-master/src-noconflict
mkdir -p $DEST_DIR/.temp
curl -sL $SOURCE_URL | tar -xf - --directory=$DEST_DIR/.temp $SOURCE_DIR
mv $DEST_DIR/.temp/$SOURCE_DIR/* $DEST_DIR
rm -r $DEST_DIR/.temp
To explain, the code above downloads an archive (*.tar.gz) that contains the entirety of the repository code as a snapshot (i.e. without history information). Then, it unpacks the archive on the fly, picks up the desired directory, and stores it in the desired location locally.
The main disadvantages of this approach, beside the length of the replacement code is that it:
- … downloads the entire archived repository, even when the directory of interest represents a small part of that archive (*.tar.gz) file
- … depends on the name of the main branch (e.g.
master
vs.main
), unlike the SVN equivalent (using the/trunk/
Subversion URL) - … depends on the structure inside of the archive (in this case, all of the repository files are stored inside of a root folder that is called
ace-builds-master
which, if changed, would break the code above)
In conclusion, my plea to GitHub would be simply to bring back SVN support, either fully or partially, just to keep legacy code working. Alternatively, they could improve the API and update the CLI tool to allow us to download a part of a GitHub repository in a strait-forward manner.
Until then, I have created a simple shell script to fix legacy software when needed:
if [ $# -ne 5 ] && [ $# -ne 4 ]; then
echo "Usage: `basename $0` <github_user> <github_repo> <branch> <dest_dir> [<repo_dir>]"
echo
echo "if <repo_dir> is omitted then the entire repo archive is unpacked into <dest_dir>"
exit 0
fi
DEST_DIR=./$4
SOURCE_URL=https://github.com/$1/$2/archive/refs/heads/$3.tar.gz
if [ $# -ne 5 ]; then
SOURCE_DIR=$2-$3
else
SOURCE_DIR=$2-$3/$5
fi
# fetches the headers (-I) while following redirects (-L) in fail mode (-f)
# in this case the cURL command fails if redirects do not result in HTTP 200
# while the GNU regex pattern command ensures that an archive is encountered
# together they ensure that there is a .tar.gz archive at the source URL
curl -sILf $SOURCE_URL | grep "^content-type: \+application/x-gzip" 1>/dev/null
if [ $? -ne 0 ]; then
echo "GitHub repo '$1/$2' does not exist or does not have the '$3' branch"
exit 1
fi
if [ -d $DEST_DIR ]; then rm -rf $DEST_DIR; fi
mkdir -p $DEST_DIR/.temp
curl -sL $SOURCE_URL | tar -xf - --directory=$DEST_DIR/.temp $SOURCE_DIR
if [ ! -d $DEST_DIR/.temp/$SOURCE_DIR ]; then
echo "Unfamiliar archive content structure encountered on GitHub repo '$1/$2' (branch $3)"
rm -rf $DEST_DIR/.temp 2>/dev/null
exit 2
fi
mv $DEST_DIR/.temp/$SOURCE_DIR/* $DEST_DIR
rm -rf $DEST_DIR/.temp
As an application to the above, here is a merge request in which I tried to fix the build of the blockly games repository: https://github.com/google/blockly-games/pull/250