Hours of data piling up!

drifting along, calm and composed! *wink*

Howdy everyone, well its exactly mid way to Google Summer of Code 2016, and everything have been going as per the schedule and plan, as I type this looking at the matte screen of the Asus Zenbook that just arrived. No more of criticizing of the Electricity and rain which I have been doing in my previous posts ( **giggle** ) but the internet connectivity still haunts me.

The week started off with spending a day setting up the new Zenbook with dual boot, installing dependencies on Ubuntu (sudo apt-get install blah-blah ), setting up git and repo, and on the other hand hoping that Windows will finish updating… … …one day… …! Ultimately, I decided to turn every automatic things off ( **duh** ) so that I can squeeze some speed out of my Broadband connection ( -___- ).

Anyways, the completion of transcribing the dictionary to its phonetic representation means I can now concentrate on collecting the training voices from all the contributors. Almost 12 of the speakers have completed their quota of sentences and around 8 speakers are remaining. Once this is completed, I can actually begin the reorganizing of database and then start the training using that database.

In the meantime, there other files to setup. Like, the file containing the ‘phones’ alone ( ml.PHONE ), the file that contains the relative path to the audio files in the wav directory ( ml.FILEIDS ), “wav/speaker1/file_1.wav” , the filler file that contains phonetic representation of sounds and disturbances for a more accurate recognition ( ml.FILLER ).

Talking about making the ml.FILEIDS file, mapping 4993 sentences from 15+ folders with each one having exactly 250 wav files is not going to be easy. But then there is a catch, notepad++ is there to rescue. Column edit mode ( Alt + Shift + up/down ) and Column replace with increment decimal options are available which will save time writing down each file name.

Note: the column edit will only work as long as the character we want to replace is in same column. Now since, the file id is of the form speaker/file_# , I can easily select the # column and replace it with decimal increment option — 1,2,3,4…

So, that’s how the week have panned out and hoping to continue this good run of form ( * That’s the football side of me typing. Euro 2016 commentary style * ).

puts “until then ciao!”