Converting a directory-per-version Subversion repository into a Git one in Git style
Accidentally I had to convert a Subversion repository into a Git one. It sounded easy first, git-svn, and all done. But when I examined the repository, it was not as simple as expected.
Subversion, but not Subversion
In the repository, each version is placed in its own directory:
project/
1.0/
1.1/
1.2/
...By scanning through the history, most commits added a directory for a specific version.
Adds version 1.2It’s OK. We have git filter-branch to handle it. But what worsens was some commits adding multiple versions:
Adds version 1.9, 2.0, 2.2To be worse, there were versions “downgraded”:
Downgrade version 2.2 as 2.1And a version followed:
Adds version 2.2Of course, new directories and directory renames between versions happened too.
Tools
Move a directory to repository root
Must use mv to create the files at the repository root.
git filter-branch -f --tree-filter 'mv -f 1.0/* ./' HEADMove files in a directory to repository root
Use ls-files to move known files recursively. If you use mv, it will require you to mv on a per-directory basis because of the existing directory structure.
setenv SHELL /bin/bash # Let git use bash if the shell is tcsh
git filter-branch -f --tree-filter \
'for f in `git ls-files 1.1`; do mv -f $f ${f##1.1/}; done' \
HEADAdd sub-directory addition in commit
This handles directory addition in a new version.
git filter-branch -f --tree-filter \
'mv 1.2/new_directory new_directory' one_previous_commit..HEADRemove a directory in commit
This handles directory removal in a new version
git filter-branch -f --tree-filter \
'rm -rf directory_to_remove' one_previous_commit..HEADSplit a multi-version commit
This splits a multi-version commit into commits, one commit for one version.
git rebase -i one_previous_commit# in vim
edit ...
pick ...
pick ...# stop at the commit to split
git reset HEAD~1
git add first_version
git commit -s -m 'Add first version'
git add second_version
git commit -s -m 'Add second version'
# ... and so on
git rebase --continue
Update commit date
Splitting commits using rebase generates new commits with new commit date. To update the time, use env-filter for these commits.
git filter-branch -f --env-filter \
'if [ $GIT_COMMIT = <TARGET_COMMIT_HASH> ]; then export GIT_AUTHOR_DATE=<TARGET_AUTHOR_DATE>; export GIT_COMMITTER_DATE=<TARGET_COMMITTER_DATE>; fi'Recipe
Assume the commits in the repository is as following:
1 Version 1.0
2 Version 1.1 (adds tests)
3 Version 1.2
4 Version 1.3
5 Version 1.4 (removes tests)
6 Version 1.5, 1.6, 2.0
7 Version 2.1Assume we have converted the Subversion repository into a Git one via git svn. And, for simplicity, we will assume the commit hash does not change in the following; however, they do change every time you do real work with filter-branch or rebase. So please look up the actual commit hash again, when a command below refers one.
First we move files in 1.0 to the root directory:
git filter-branch -f --tree-filter 'mv -f 1.0/* ./' HEADMove files in 1.1
git filter-branch -f --tree-filter \
'for f in `git ls-files 1.1`; do mv -f $f ${f##1.1/}; done' \
HEADThe above can not move files in1.1/tests to tests, because the directory tests does not exist yet. Move it manually.
git filter-branch -f --tree-filter \
'mv 1.1/tests tests' 1..HEADThe range 1..HEAD includes the next commit of 1 (ie 2) to HEAD.
The range is required, because before the commit, the directory does not exist and the moving ends with an error.
Continue with 1.2, 1.3, and 1.4:
git filter-branch -f --tree-filter \
'for f in `git ls-files 1.2`; do mv -f $f ${f##1.2/}; done' \
HEAD
git filter-branch -f --tree-filter \
'for f in `git ls-files 1.3`; do mv -f $f ${f##1.3/}; done' \
HEAD
git filter-branch -f --tree-filter \
'for f in `git ls-files 1.4`; do mv -f $f ${f##1.3/}; done' \
HEADRemove tests since 1.4:
git filter-branch -f --tree-filter \
'rm -rf tests' 4..HEADCommit 5 has three versions. So we would like to split it into three.
git rebase -i 4# in vim
edit 5
pick 6
pick 7
# save and exit vim# stopped at 5
git reset HEAD~1git add 1.5
git commit -s -m 'Version 1.5'
git add 1.6
git commit -s -m 'Version 1.6'
git add 2.0
git commit -s -m 'Version 2.0'git rebase --continue
The resulted commit history:
1 Version 1.0
2 Version 1.1 (adds tests)
3 Version 1.2
4 Version 1.3
5 Version 1.4 (removes tests)
6 Version 1.5
7 Version 1.6
8 Version 2.0
9 Version 2.1Move files in the remaining commits:
git filter-branch -f --tree-filter \
'for f in `git ls-files 1.5`; do mv -f $f ${f##1.5/}; done' \
HEAD
git filter-branch -f --tree-filter \
'for f in `git ls-files 1.6`; do mv -f $f ${f##1.6/}; done' \
HEAD
git filter-branch -f --tree-filter \
'for f in `git ls-files 2.0`; do mv -f $f ${f##2.0/}; done' \
HEAD
git filter-branch -f --tree-filter \
'for f in `git ls-files 2.1`; do mv -f $f ${f##2.1/}; done' \
HEADWe lost the commit date and commit time in commit 6, use env-filter
git filter-branch -f --env-filter \
'if [ $GIT_COMMIT = "6" ]; then export GIT_AUTHOR_DATE=<OLD_AUHTOR_DATE_OF_6>; export GIT_COMMITTER_DATE=<OLD_COMMITTER_DATE_OF_6>; fi'Of course, we could arrange an artificial date for newly split commits, 7 and 8, but it does not mean too much. So we left them untouched to record when the commits were split.
Furthermore, we can update the author and the commit if you are not the one who committed the commits to Subversion. If needed, use the following environment variables:
GIT_AUTHOR_NAME
GIT_AUTHOR_EMAIL
GIT_AUTHOR_DATE
GIT_COMMITTER_NAME
GIT_COMMITTER_EMAIL
GIT_COMMITTER_DATEFinally, we have a Git repository in the Git style.
Conclusion
We often need to convert legacy Subversion repositories into Git ones. Normally the conversion is easy with a git svn clonecommand. Sometimes the conversion is not enough, because the Subversion user did not use it in the Subversion way. Thanks to git filter-branch, we can make them in the Git style.
References
git-filter-branch: https://git-scm.com/docs/git-filter-branchgit-commit-tree, for environment variables available ingit filter-branch --env-filter: https://git-scm.com/docs/git-commit-tree- How can one change the timestamp of an old commit in Git?
- Git merge directories that have become separated
