Python’ed my way out of copy-pastes

As a programmer, you must have found yourself at least once in the middle of a really boring data handling task — like renaming a property in several xml files — which may doesn’t involve any programming. Such work came in my way few weeks ago.
It was a copy-paste work. A biiiig pile of them!
And I automated the whole thing with python!
Now you may be scratching your head why I wrote a whole blog post on automating copy-paste. Yes, there are so many tutorials and articles on the topic and I’m sure you have read about this before. Also there’s a whole website (automatetheboringstuff.com) on the subject for sake. But the reason is that I had the most enjoyable time doing the thing. The time spent was totally worth it. And I thought the experience is worth sharing with you all.
The assignment was an i18n task. We had this large collection of e-mail templates — confirmation mails, cancellation mails, supplier mails, quotation mails… a mountain of them, all in XSLT. It doesn’t matter if you are not familiar with the term. However, what they are for is, when the system needs to send an e-mail, the relevant details are sent in an XML to the appropriate XSL template and it generates the HTML output which is finally be an e-mail. I’m not going to stress you about that process. The problem I faced was that I had to call another XSL template I wrote in each and every place i18n needed (almost every label, and there were thousands of them!!!) by replacing the current code used to display labels with a few different lines.
The lines to be replaced were like:
All those lines should be replaced with something like this:
You can notice that the string “CustomerNameLabel” has to be passed to the new text. So if I did this by hand, it would have taken weeks to replace all of them. I saw that there are thousands of lines to be replaced across number of files. No way I’m doing this by hand. I’ve read how many developers handle these type of things the smart way. My brain was screaming “ Let’s write a script to automate this! ”.
Python was the first tool came to my mind. I have used python few times before and had a glimpse of its power. Since I use a linux machine at work, python comes out of the box. So where to start?
First, I want to find all the strings in a file with the same format I showed above. Regex is the way to go, right? But I’m not a programmer who’s ready to make the regex string by hand. There must be a way to generate the
pattern easy way. So after googling a bit, I came across a site which did what I want. txt2re.com. It is a great tool and it is very easy to handle. Then I entered my xsl tag and after a few clicks, it gave me a whole python script to try out.

I didn’t like the output much since I was expecting only the pattern.

To get the string, Alt + Ctrl + T and fired up the REPL. And a few copy and pastes from the site did the trick.

So, Now I have the pattern. But I still haven’t started on the script. And I didn’t know how to process regex in python. However, that site gave me all I wanted and with few minutes googling too, I created a simple script which successfully replaced the strings with the pattern.
You might be confused by all the brackets in the regex string. I was too. Some of them could be removed to make it more familiar looking. But why waste time if it works?
I commented the code thoroughly and so, I don’t think there is no need bore you with more explanations. But I never commented the code when I was actually doing it. In fact, the code was much more ugly at that time. I added the comments and cleaned the code a bit just because I’m going to post it here. The important thing is, it did what I want. But don’t get the wrong idea that I’m encouraging you to write messy code.
Now, the script was doing what I needed. But it’s wasn’t over. There were hundreds of files I needed to be processed. I wasn’t ready to feed them one by one. So what did I do? I could have made the script to crawl through the projects and find the files and do the thing then and there. But I was afraid that it might mess up other files by accident and it will be hard to find and correct them. So I decided to pick the files by myself and place them in a
directory and make the script to apply the change to all files in that directory.
To make it work, I needed to know how to access files inside a directory in python. Google again! In few seconds, I came up with this piece of code
Here, “In” is the name of the directory I was going to place the files in. I moved the processed files to a directory named “Out” and you’ll see that later.
BTW, you can see a function named listdir() used here. To use it, the relevant libraries had to be imported. What that function does is, simply giving a list of files inside a given directory without the special entries like “.” or “..” which was very convenient for my purpose. Also the list can be iterated using a simple for loop.
After each file is processed, they needed to be saved in a file with the same name as the input file. I got a hint on how to do this very simply — without messing with file names — in the previous google search. That was to use the same “file” object created in the loop to create the output file.
Yes, I’ve used the same file objects again (in case if you have any doubt).
Now the final remaining task was to put the text manipulation code I wrote before inside the file handling loop. My script was complete after that.
Let’s take a look at the complete code.
Of course my original code wasn’t exactly like this. It was much more messy and I added few more replacements also. However, the big picture is the same — it did what I wanted.
It was time for testing. First, two directories named “In” and “Out” were created in the same place where the script is. Then copied all the files I wanted to modify to “In” directory and a simple python fixer.py and… Voila!
All the files were copied to the output folder changed the way I wanted in an instant!

It was new, exciting, different, satisfying, and above all, I saved myself from days of copy pastes and I delivered my work way early. Awesome… right?
