Use Git for your PDF and Word .doc files!

Afik Cohen
Aug 5, 2014 · 2 min read

Sure, you’d love to do this, who wouldn’t? Word and PDF are awful file formats, but they’re still widespread and if you’re housing your important documents in them, and you’re like me, then you want them version controlled. Git can handle any binary file of course, but where’s the fun in seeing this?

$ git diff 
diff --git a/chapter1.doc b/chapter1.doc
index 88839c4..4afcb7c 100644
Binary files a/chapter1.doc and b/chapter1.doc differ

Enter .gitattributes.
http://git-scm.com/book/en/Customizing-Git-Git-Attributes#Binary-Files

Step by Step setup

Stick the following line in a .gitattributes file in your repo:

*.doc diff=word

I’ll let them explain:

This tells Git that any file that matches this pattern (.doc) should use the “word” filter when you try to view a diff that contains changes. What is the “word” filter? You have to set it up. Here you’ll configure Git to use the catdoc program, which was written specifically for extracting text from a binary MS Word documents … to convert Word documents into readable text files, which it will then diff properly:

$ git config diff.word.textconv textract

I deleted the stuff about catdoc because that project’s dead. Let’s use the lovely textract library I recently discovered instead.

$ pip install textract

For it to work with .doc files, we’ll need antiword installed too:

$ brew install antiword

Then let’s run a git diff:

Voila!

Originally published at www.aphex.cx on August 5, 2014.

aphex.cx

aphex.cx

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store