MuSOC’17 — Document Fingerprinting (Week-2)


After I had coded the first three algorithms, I thought that the path ahead will be a little less challenging or even if it were challenging, I could easily sail through because now I had developed the flow to work in this environment.


After all the three algorithms, I started looking for more ways to implement hashing and tried to figure out which ones would be more efficient, while I was doing that, I realised that Knuth Morris Pratt string matching algorithm which I had coded the previous week was less efficient than Boyer-Moore-Horsepool method and this happened due to my poor research. So as I realised this mistake of mine, I had to stop all that I was doing to code the boyer-moore-horsepool method as we had to use only the most efficient string matching algorithms so this took another day away from me, after doing this I thought that developing the front end for my web application would be the correct thing to do, and I started working on it but then as I was taking a lot of time in doing that so then I was reminded of the deadline for the entire project and I thought that front end and back-end were not my top priorities, my priority is to complete the project that I am working on and then develop the front end and back end.


I started with ALL to All matching algorithm to compare two files for similarity; it looked very easy as i had coded the Karp-Rabin and it was pretty similar to All to All. When I started reading the paper, I read it once, twice to only not understand what was actually going on. I knew how to create hash values of a string, I knew how to divide my whole string to n grams, I had a fair idea of creating fingerprints from those hash values but I wasn’t able to code it for the first couple of days because I wasnt very fluent with data structures like lists in python, so I was trying to avoid using that in my algorithm but things were only getting more complex, then I took help of my friend who taught me a lot of data structures and it’s implementations in python and then I sat and coded again and this time it happened in surprisingly very less time because this time I was determined and kept calm and in some time, I was ready with the basic code to create fingerprints of strings and compare them for similarity.


Now I had to use documents instead of strings to do so. Then tutorials point came to my rescue and gave me the file handling syntax which worked quite well and I was able to do what the algorithm asked for. In this period, my mentor was very helpful and patient and made me understand the same thing again and again.

I have learnt so much over the last few days and it’s not only with respect to the code. I have understood the importance of keeping calm in really tough situations, becoming almost hopeless and generating the hope again everyday. You really need to be determined enough to do what you need to do. If things were that easy, it wouldn’t be a challenge. I never knew that not implementing the code on the first day itself could teach me so many things, while I was avoiding data structures I read a lot of stuff and I think I will use that gained knowledge in the future. If you stay focused and try to take correct decisions, things do turn out better than what you were expecting.

  • Credits : Friend who helped me with Data Structures - Kshitiz Agrawal ( B. Tech Chemical Engineering (IIT-DELHI))