Enterprise Subversion — what to expect in release 1.10 — this Fall

Jacek Materna
8 min readJul 24, 2017

--

While Next Generation Subversion is well under way, the community development team has been busy working on the next point release — “1.10” aimed at a Fall release. While the GA release is still many weeks away, the software has entered a critical phase — late-alpha stage — which means that it’s ready for external testing with a larger user base! Please get involved and help the team test by engaging the mailing list here.

Two major areas have been the focus, both of which give strong benefits to enterprise software teams:

Large file performance

Merging User Experience

Large file performance — Subversion has long been a go to version control technology for versioning large binary files. As I wrote in a previous article, teams working with digital content such as Game studios typically see a dizzying amount of large files in their projects. A key objective for managing large files must always be performance. Any improvement to this feature has immediate impact for users.

So what happens when your you commit “Big” files to the server? Well really four things happen:

  1. Your client calculates and sends deltas over the network.
  2. The network transports the data —over the protocol of choice. Different transport protocols have varying levels of performance. For example HTTP and SSH vary wildly as SSH by default compresses data before transport and in general reduces the amount of data needing to be send of the wire. HTTP does not do this (even though recent work in 1.10 around RA-serf (below)) — in general it is inherently slower for large files than svn+ssh.
  3. The Server receives the delta and reconstructs the delta.
  4. The Server constructs a second delta to store in the repository.

That’s a lot of work on both sides. While a couple of well known tips allow users to “squeeze” more performance out of commits today, these settings strip key features from Subversion in doing so. That being said it is good to go over these performance boosters for completeness sake. Since the SVN developers have been going round and around on this topic recently (with the 1.10 release upcoming), it prompted me to add some color to the conversation from a enterprise software team perspective.

svn — The stock SVN terminal client. Used by 75% of teams. Commit performance is great across a wide range of use cases.

svnmucc — You can make commits a bit faster by using the svnmucc client without a working copy. The svnmucc client always sends a delta against an empty file which is faster to calculate than the standard client delta against the previous file contents. While svnmucc allows for atomic commits across a multi-file transaction and allows for updates to single files, you loose the local working copy which does a great job at only sending the ‘delta’ of your local changes to the server — for large repos or commits, not having a delta would be disastrous from a performance perspective because you would be over-sending data to the server all of the time.

curl — You can make commits even faster by enabling SVNAutoversioning on the server (it is in the config file) and using curl as your client as then the client doesn’t calculate a delta at all. Note that the curl commit will still involve calculating a delta on the server. Curl is only good for tactical updates to single files as it cannot atomically guarantee a commit across multiple assets in one shot! Hence it’s fast but super limited in its use case.

I ran some tests on my macbook pro and saw the following results:

raw write to SSD disk: 3.5 sec (Baseline)
standard SVN commit: 39 sec
svnmucc commit: 28 sec
curl commit: 10.7 sec

What does this say exactly? Why doesn’t everyone just switch to curl and everyone gets a 300% speed increase? Well for one thing these are synthetic benchmarks again single large files. Second, curl and svnmucc remove the Working Copy concept from the workflow — essentially nothing is being done on the client to allow proper atomic durability of commits and for reducing the amount of data being sent to and from the server.

The point of a version control system is to support team based software development workflows and removing the concept of a working copy is synonymous to “doing all operations remotely” — there is no local file on your computer, all operations happen on the server. This model does not make any sense in a collaboration environment because enterprise repos are big, all commits usually have more than one file and commits in certain cases could be GB+ in size.

What is interesting is that curl is so fast! Why — well because it streams data and avoid a lot of overhead at the expense of functionality and overall support for various use cases. The developers tried to look at streaming in the past but it ended up not adding up for main workflow cases. Here are some thoughts from one of the communities developers — Julian Foad.

“When committing multiple files, deltification or network bandwidth or server CPU tends to dominate and the temp file effect becomes insignificant.

One of the Subversion developers, Evgeny Kotkov, did indeed write a streamy implementation to try out this idea: http://svn.apache.org/repos/asf/subversion/branches/ra_serf-stream-commit/BRANCH-README

Eliminating the temp file turned out not to make any practical difference in his tests at that time, and it made the code significantly more complicated, so was dropped. However, the idea is still open to being revisited as we know it makes a difference in particular cases.

What about the deltification overhead? Deltification against a similar version of a file obviously can achieve a huge data size reduction. Deltification even against an empty file achieves compression on many file types, although not as much as dedicated compression algorithm. On ‘incompressible’ file types it doesn’t have any useful effect and is pure overhead.

Deltification is very fast, at least it is much faster than the currently used Zlib compression, and so in general it is more useful to have it enabled because in cases where it helps, it helps a lot, while in cases where it is pure overhead, the overhead is modest. Synthetic benchmarks are really good at showing up this overhead!

However the best thing would be if deltification overhead could be eliminated when it is not useful — incompressible files on a fast network. In other words, there is room for better tuning of the behaviour depending on the environment (CPU and network speeds). Ideally this should be designed to happen automatically. Short of that ideal, manual interventions such as these are occasionally useful in specific scenarios.” Julian Foad

But… what if there was a way to improve the performance but retain this secure and reliable workflow? That’s exactly what one developer did with his patch to introduce LZ4 compression to the backend. Zlib (DEFLATE) is an old algorithm — LZ4 allows extremely fast performance using the latest CPU advances.

With LZ4, users can expect up to 300% increases in commit performance when 1.10 rolls out! That’s a big deal with enterprise software development teams.

I [and the community development team] am very happy to see that 1.10 will bring faster performance for large binary files. This is another great add for the Subversion user base in the enterprise — building complex projects that span source code and large binary assets alike.

Faster HTTP —Next on the list for performance is a little upgrade to the popular HTTP(S) transport. The svn client will be able compress data over http(s) endpoints when possible when interacting with a recent Mod_Dav module version. A developer submitted a patch to include this negotiation capability — win.

Overall this is a great addition to the Subversion user base because using it over a WAN (i.e the Cloud) is a growing use case for the enterprise and now HTTP(S) gets closer to SVN+SSH performance while retaining the infrastructure simplicity inherent in HTTP(S) (notably on the security side).

Will your favorite Subversion desktop app take advantage? Yes, the desktop apps that use the Subversion native API will immediately be able to negotiate “faster” HTTP performance with the server side with no changes.

Merging User Experience —Merging in Subversion has been a long standing pain. Ask any enterprise software team about what is most painful with Subversion and you’ll get the answer “merging” 9/10 times. Why is it so painful? Well for one, complex/modern merging was never a first class citizen case for Subversion. Further, it has not been significantly touched since version 1.5 (other than the reintegrate changes in 1.8). Alas, Subversion users can rejoice as 1.10 take a big chunk of this problem away with a revised conflict resolution module to better automate the process of merging. While a single step, it is an important one — expect more improvements to merging in the future.

I’ve pulled out an excerpt from the official Subversion release note:

The 1.10 release provides much better interactive resolution of tree conflicts than previous releases. Interactive conflict resolution has been completely redesigned and rewritten. The new conflict resolver searches repository history for structural changes (additions, deletions, copies, and moves) which conflict with local changes in the working copy and cause tree conflicts. Tree conflicts are now described in detail, including revision numbers and names of authors of conflicting changes. In previous versions of Subversion, the task of finding such information was left to the user. Automating this task is a huge usability improvement.

The new conflict resolver is able to detect moves and renames in repository history, and take them into account while modifying the local working copy. This makes seamless merging between branches possible even if one or both branches rename files or directories.

The new conflict resolver will avoid asking users about resolution options whenever there is a only one reasonable course of action. For example, if a file was moved to another location in the repository, the conflict resolver will attempt to resolve the tree conflict on behalf of the user by performing the same move locally if necessary. This allows users to focus their attention on conflicts which cannot be resolved automatically.

This feature takes a very large step towards moving the pain of merging away from the user. Only under dire-straights should Subversion have the user take action — what is a computer's value if its cannot automate things for you?!

“It is encouraging to see members of the community devoted to improving the user experience. Stefan Sperling has brought a user-focused approach to conflict resolution, teaching Subversion to dig deep in the repository history to present a really informative view of what changes led to a conflict, and providing the easiest and most likely options to resolve it.” Julian Foad

Stay tuned for more updates by following me or get on Twitter to keep up to date on 1.10 and 1.11 developments.

Happy SVN-ing!

--

--