SVN to Git+LFS — It’s About Time

Mike Baker
Karman Interactive
Published in
11 min readJan 11, 2019

--

In this post we walk through the whole thought process and steps of getting from SVN to Git to Git+LFS. If you’re looking for just a portion of the process skip ahead to one of the following sections:

Rationale — Why we made the change.
SVN to Git — Maintaining history.
Git to Git+LFS — Getting LFS to handle those large files in all revisions
Map Commit Authors — Mapping your SVN users over to your Git accounts

Rationale

As a studio that creates games and interactive experiences we’ve watched for years in envy of development teams that were able to use Git. Working on projects with unpredictable client changes, constantly changing build processes and large, tightly coupled assets Git really couldn’t be a valid solution for us.

Since a Git repo contains all revisions of all files, highly revised binary assets aren’t handled well. For us, it wouldn’t be unusual for a project to contain many large videos, with specific metadata that helped track video elements whose features were tightly coupled to the codebase. Those videos would get updated by the client endlessly, mid development, without time to write proper tooling to separate the large assets from the repository.

With projects like that in mind, we had to stick with a centralized solution like SVN or Perforce and starting as a poor bootstrapped company we opted for the free open source solution, SVN :)

While on the topic of thrift, we hosted all of our ~500 GB of repositories on a Dreamhost shared hosting plan.

At the time, Git did have plugins to handle large binaries like git-annex and git-media but they were immature and cumbersome for a small team to setup and use.

By the start of 2018 the time was right to make the switch. Our, now larger, dev teams were experiencing all the common frustrations with SVN. Not to mention, the younger cohort of devs had never used SVN, it was just footnote mention in their education!

Git’s large binary support had also reached reasonable maturity and so the great SVN to Git+LFS migration began.

Git+LFS Saves the Day

LFS (aka Large File Storage) is the latest, most active and mature solution to large binaries in a Git repository.

At a high level, it’s an additional process run on your files before they are committed. Rather than storing a delta of a large binary in the repository and bloating it’s size, a hash of the file is generated and that is what is committed to the main Git repository. That version of the large binary file is then stored in a folder managed by LFS and pushed to a separate storage location when pushing your commits to a remote.

When another developer goes to pull down a revision, the changes are pulled as they normally would and all hashes that represent files handled by LFS are then fetched from a central storage location.

The advantage to this process is that a large file that is changed frequently will only require one version to be pulled down. That giant 3D character model that went through 40 revisions doesn’t bloat the repository for everyone who is cloning it!

Downsides

While LFS is a great solution there are a couple concerns to be aware of.

  • Most if not all Git hosts charge additional costs for LFS storage. Ex: GitHub charges a storage fee + a bandwidth fee
  • Not all users have a 100% complete version of the repository. If your Git remote goes down some LFS files may not be retrieved. To mitigate this you’ll want to maintain a full archive of the repo + LFS files outside your day-to-day host.
  • LFS is not packaged with Git by default and unless you give people a heads up they may be confused that all of their large files (images, video, etc..) appear to be corrupt. This is a good opportunity to make your Readme.md super clear!

The Process

At a high level here are the steps we’ll go through

  1. Pull down full SVN Repo
  2. Convert to Git
  3. Migrate to LFS (rewriting history as if we were always using LFS)
  4. Push

All of this has been tested on OS X. Installation of the dependencies will be different and there may be some slight variations in commands when performing the conversion on a different platform but the overall process should be very similar.

SVN To Git

Dependencies

  • Homebrew(Suggested) This will make installing all other dependencies a breeze
  • Git or brew install git
  • SVN or brew install subversion
  • Ruby or brew install ruby
  • svn2git or sudo gem install svn2git

Convert from SVN to Git

  1. Run an svn command that requires you to log into the target repo. OS X text entry of passwords for the svn2git script is buggy so this will just cache your authenticated session.
    svn log -r HEAD {URL-TO-YOUR-REPO}

    NOTE: All blocks of {SOME-TEXT} should be completely replaced by values for your own use. For example if your repo was at https://mydomain.com/my-repo you’d run the command:
    svn log -r HEAD https://mydomain.com/my-repo
  2. Create a directory with the same name as your repo.
    mkdir {YOUR-REPO-NAME}
  3. Create a Git repo out of the SVN repo using the svn2git command. This is going to take a while…
    svn2git {URL-TO-YOUR-REPO}

    Note: This assumes your repo follows the conventional structure of 3 folders (branches, tags, trunk) at the root level of the URL. If it doesn’t, use the svn2git documentation to modify the above the command.

Setup Your Ignored Files Again

The SVN ignores aren’t carried over to Git. Take the following steps to catch 90% of the issues.

  1. At the root of your repo create a .gitignore file
  2. If you need help, copy the contents of one of Github’s starter ignore templates to get started https://github.com/github/gitignore
  3. Open your project with your IDE(s) (Unity, Rider, etc…) to make sure all the temp and ignored files are created and you have no auto generated changes that Git finds in your working copy.
  4. Ignore any other files/directories that you missed. Using a tool like Sourcetree can be helpful for defining the ignore patterns you prefer a GUI.

Additional Resources

These are the resources that were helpful in establishing these steps. If you run into trouble they might help you too.

Introducing git-lfs

Dependencies

  • Homebrew(Suggested) This will make installing all other dependencies a breeze
  • Git or brew install git
  • git-lfs v2.2.0 or higher or brew install git-lfs
  • A Git repo host that supports git-lfs (Github, Bitbucket, etc…)

Setup LFS Files and Rewrite History

WARNING: Before you start know that these steps will end up re-writing the Git history. If this is an active repo that’s been pushed to a remote you’ll want to make sure everyone has their changes pushed before you start and nobody makes changes while you go through this process. You should also make a local copy of the repo before you start making changes. Backups, backups backups.

  1. Navigate to the root directory of your Git repo.
    cd {GIT-REPO-NAME}
  2. Compile a list of files and extensions you’d like tracked by LFS. Below is a good start but is geared towards Unity projects. Adapt as necessary for your project. This may be overkill but so far we’ve been happy with this set.
    *.unity3D,*.exr,*.unitypackage,*.pdf,*.psd,*.ai,*.fla,*.gif,*.jpg,*.jpeg,*.tga,*.tif,*.tiff,*.bmp,*.png,*.ttf,*.TTF,*.otf,*.aif,*.ogg,*.wav,*.rns,*.mp3,*.flv,*.mov,*.wmv,*.mpg,*.mpeg,*.avi,*.mp4,*.FBX,*.fbx,*.blend,*.lxo,*.so,*.bundle,*.a,*.dll,*.aar,*.srcaar,*.bin,*.mdb,*.ipa,*.swf,*.jar,*.apk,*.exe,*.rar,*.zip,*.gz,*.7z

    NOTE: In some instances there are directories that appear to be files. *.framework files are a good example of this. Since Git has no concept of directories (just files at paths) you can’t track them directly with LFS. You have to get creative with your rules. Below is an example for AdColony that tracks all instances of their largest file no matter the directory it’s in or how the structure of the subdirectories change in the future.
    **/AdColony.framework/AdColony
    **/AdColony.framework/**/AdColony
  3. Have git-lfs suggest file types that might be worth adding to LFS. Add any types that make sense to your list. Typically it’s worth adding any non-text files. The only exception is when you have text files that represent large blocks of data (>3MB) which have significant changes between revisions (Ex: GTFS transit data)
    git lfs migrate info --everything
    Or to restrict suggestions to files over a certain size
    git lfs migrate info --everything --above="5MB"

    The --everything parameter makes sure all branches are included in the analysis.
  4. Migrate all files/types and their history to LSF with
    git lfs migrate import --everything --include="{LFS-FILE-LIST}"
    For example, to migrate all .jpg, .bmp and .gif files
    git lfs migrate import --everything --include="*.jpg,*.bmp,*.gif"

    NOTE: {LFS-FILE-LIST} isn’t a file. It has to be a comma separated string entered directly in the command like the second example above. It’s best to write your list in a text editor then copy/paste into your command when executing. If someone with better bash foo than I has a better solution I’m all ears!
  5. Add the remote to the repo host
    git remote add origin git@{PATH-TO-REPO}
    Example:
    git remote add origin git@bitbucket.org:mycompany/my-repo.git
  6. Push origin along with all of the git-lfs tracked files to origin and set it to your default remote
    git push --all origin -u
  7. Make sure all of the LFS files were pushed. During this migration sometimes some of the historical files are missed so this command should double check things. During normal workflow this isn’t required.
    git lfs push origin --all

Verify Your LFS Results (OPTIONAL…but you should)

Now let’s verify that everything is working correctly for a fresh clone.

  1. Set git-lfs to skip resolving tracked files
    git lfs install --skip-smudge
  2. In a new directory clone the repo
    git clone git@{PATH-TO-REPO} {REPO-NAME}
  3. In Finder, go to a file that you know should be tracked by LFS. On the surface it will look identical to the original file but you should notice that it is much smaller and can’t actually be read. If you open the file in a text editor you’ll see that the file is just a pointer to the actual file. The file should look something like this:
    version https://git-lfs.github.com/spec/v1
    oid sha256:eafe20a57a5c00c8f6b81af59f58265b2355a517931cd608cea35e7ee065bb2a
    size 25891


    If the file is still the original file, something went wrong along the way.
  4. Navigate into the root of the repo’s directory and run:
    git lfs pull

    The LFS files should now start getting resolved.
  5. Once complete, check that the LFS tracked file is, in fact, the correct file. It should be the correct size and readable by it’s default application.
  6. Make sure to revert git-lfs back to its default behaviour. Re-enable the Git hook to automatically resolve git-lfs tracked files with:
    git lfs install

Additional Resources

These are the resources that were helpful in establishing these steps. If you run into issues or want to do something more advanced these are a great start.

Map SVN User/Author names to GitHub Names

During the transition from SVN to Git you likely lost the author account associations from your SVN Repo. With Git, you can rewrite history and fix the author associations. The following steps are GitHub specific but the process shouldn’t be much different for your host. You’re basically going…

From this:

To this:

GitHub provides a script and instructions to make this change but their script is tedious to use if you’re moving over multiple repositories with a lot of contributors. I’ve taken their script a bit further and modified it to fix multiple authors at a time.

NOTE: This definitely isn’t the most efficient way of doing this but unless you’re converting repos all day everyday it’s fine. This approach strikes a good balance between dev effort and time savings.

  1. Download the following two scripts and put them somewhere convenient. We’ll assume you put them in your home directory (aka ~).
    https://github.com/KarmanInteractive/gists/blob/master/svn-to-git/fix-author.sh

    https://github.com/KarmanInteractive/gists/blob/master/svn-to-git/fix-authors-batch.sh
  2. Make sure both files are executable
    chmod +x fix-author.sh fix-authors-batch.sh
  3. Take note of the the repo suffix that was appended to all of the usernames when converting from SVN to Git. Call the following from within the root directory of your Git repo.
    git log

    NOTE: A username will look something like this mbaker@c6d0b1ba-0d0c-439f-86f5-bb66f3f61e27. The Repo suffix is c6d0b1ba-0d0c-439f-86f5-bb66f3f61e27
  4. In the root of your repo run the following command to see a list of users that have ever contributed. Take note of this list, you’ll need it shortly.
    git shortlog --summary --numbered -e --all
  5. Open fix-authors-batch.sh with a text editor and change the REPO_SUFFIX value and save.
    Example: REPO_SUFFIX="c6d0b1ba-0d0c-439f-86f5-bb66f3f61e27"
  6. While fix-authors-batch.sh is open copy and paste the $DIR/fix-author.sh line for each author contributing author from step 4. You’ll need their SVN username, name, and email address. Example line:
    $DIR/fix-author.sh 'mbaker@'$REPO_SUFFIX "Mike Baker" mike@domain.com
  7. From the root of your Git repo execute the following. Depending on the number of commits in the repo this may take a while to complete.
    ~/fix-authors-batch.sh
  8. Once complete, take a look at the log to make sure some branches were changed. You should see at least a few lines that look something like this:
    Ref 'refs/heads/master' was rewritten

    If you don’t see any lines like that it means no users were actually mapped. The likely problem is that you copied the REPO_SUFFIX incorrectly or none of the authors defined in the fix-authors-batch.sh actually contributed to this repo.
  9. Once you’re happy with the results push them to the server.
    git push --force --tags origin 'refs/heads/*'
  10. Verify that all of your users mapped. If you go to your repo on Github and look at the commits you should see that all user names are clickable and have a profile pic.

If some users are missing you can repeat the above steps until you’ve mapped all the users.

Migrating Multiple Repos

If you’re migrating multiple repos repeat the above step for each repo. If the authors in your other repos are the same, you should only need to change the REPO_SUFFIX in fix-authors-batch.sh. Don’t worry if an author is defined in your script that didn’t contribute to the repository. The only harm is that it’ll slow down your author mapping a bit.

Additional Resources

These are the resources that were helpful in establishing these steps.

That’s it. You’re good to go now! Share the repo with your team and…uh…Git to it!

--

--

Mike Baker
Karman Interactive

Creator, Connoisseur, and Hoarder of 1's and 0’s. Founder @DeclineCookies, @PetLoopCo, and (Previous) @KarmanLtd