HSBC HACKATHON 2019

Aadi Swadipto Mondal
Aadi-Swadipto-Mondal
9 min readApr 11, 2019

--

THE HACKATHON WIN

The Registration : All this started when Aniruddha made a sudden registration in the HSBC Hackathon on Wednesday. All of us didn’t even know that our name has been registered for the Hackathon. The next day Kanishka and Parth came to know that they were registered for the Hackathon. I was still out of the loop till then.

The Kick-start: I was still working on the open-soft problem in my hall. Nearly at 6, I got a call from Aniruddha that my name was given in the Hackathon under the team name data_X. I was quite interested in the new thing, but still, I was working for the hall open soft team. At 9, I got a call that delicious food is being served by HSBC and how can ever a “shastrian” think to leave good food. So that was the kick-start.

The Problem Selection : Among all the three problems, the third one was quite feasible. So let me describe the problems, the first one was that some bank data was given and you were asked to give some suggestions or AI-based predictions for some forthcoming season. As I have not attended the problem description session, I cannot explicitly say what they wanted. The second was that on publicly available data, you have to predict the development of a business understanding the business landscape, key factors, and other things. The third problem needed a User Interface, Artificial intelligence, and data. So first we started thinking about doing something on Indian Election data. We tried to think about quite a lot of things but ultimately didn’t come up with a proper idea. Next, we started thinking about disease-based data. We discussed among ourselves for a proper idea. Before that, we have also thought about the first two problems. The first problem was too specific to attack. Everything even the data was given and making something out of it was quite tough. On the other hand, the second was too open, and predicting how to develop some business on previous data was not a 36 hrs job, according to me. So we resort to the third problem.

Coding starts: It's time to code. But exactly what to code. So first thing first — data. So google it and kaggle comes out. So we downloaded some symptom-disease data set from kaggle. So data is with us, now what to apply. Now we thought to make it voice interactive. Aniruddha and Parth suggested making it interactive in both Hindi and English. So we started testing for some modules in python. We started for SpeechRecognition module and started testing it. We also got speech in both Hindi and English. We next tried for googletrans module. When speech is given in Hindi, it translates it into English. Also, we tested playsound module to play a sound file and gTTS module to convert text to speech. Also, work started on the graph data-base neo4j. Other minor searches also continued.

The dead end: Speech API is set but what to do now, nothing. Data was too huge and Saturday morning has already come. We were thinking how to progress but no way. On that time, we messaged one of the staff of HSBC. We met him in the afternoon. Also, Saturday was an election day and entire KGP was sealed. Getting out of halls, interacting with others is little tough. You have to use your brains and speaking skills for that. So let's move on…

Hopes again: As we met the staff of the HSBC, we got some hope. He told us just to make for 5 diseases. Also, the intelligence layer should be there to process the sentence to symptom part. The discussion with him was very important and showed us path for our next job. We came back to VS again and started working. The kaggle data was somehow unfit for this. So we thought to make our own database. We made a disease to symptom data and the sentence to symptom data. Parth made the sentence to symptom data-set putting 10 sentences for each symptom. I think it was quite exhaustive. Aniruddha and Kanishka was working on the graph database(neo4j and py2neo) and I was working on the user interface( tkinter module). The UI was nearly half ready and graph data-base was created. Now it has to interact with python. It was past 2 that night. By this time I have also made some works on NLP using nltk. I was able to remove pronouns and stop words from the sentence. We also tried to remove some verbs and tried some stemming algorithms, but they didn’t match our purpose.

The most demotivating time: Now it is the time for the neural net. I have previously seen some LSTM architecture in the net. Kanishka started implementing it. Nothing came out. Accuracy was less that 1%. We lost hopes. I took into charge. Aniruddha was still then working on the py2neo graph data-base API. I started trying some other hyper-parameters. I changed the loss function, optimizers, LSTM architecture, Dense Layer architecture but nothing worked out. All I can reach is up to 4% accuracy which is hopeless. Aniruddha left his own work. He started trying some nltk codes using google databases but one sentence took more than a minute for processing which is not feasible for a chat-bot. Also, other jobs had to be done like querying on the graph. He also tried a code in Ruby, an online code but didn’t succeed finally. So we were left with nothing. I still tried but no results. Aniruddha gave hope and was about to leave. Kanishka was asleep by this time. I took a last chance. I deleted all LSTM things and made a simple architecture of MLP( Multi Layer Perception). I was having some errors and asked Aniruddha to debug the code. I just wanted him to be there for some more time. Bug got fixed and WOLA!! 99.75 accuracy on train set. I still remember the exact numbers of the keras output. The thing that I made was very simple. I just had a dictionary of words collected from the training set. Now if the query has that word, its index in a vector is made 1 otherwise it remains zero. For example, I had 242 words, so I have a vector (1D) of zeros of size 242. Now say temperature is present in both the query and dictionary, so the position of temperature in the vector is made 1. Like this, for all other words after a basic nltk filter is done and a binary vector is created denoting whether that word is present or not among all the words. Believe me, this was too simple to expect but this lies as the brain of our UI. The intelligence part!!. We often neglect too small things, but they come out to be much better than highly complicated things 4% against 95%.

The final integration: I just woke up Kanishka from his deep sleep. He woke up and took some time to understand what was going on. Finally, he figured out. Aniruddha now had a zeal to make his py2neo code work. Kanishka started working on the presentation. But you know, the hardest work is the integration of things. Each of your components will work independently but when you bring them together, they fail or severe problems arise. Happened the same with us. All components are ready but we need to integrate. We all went for it. It was nearly 10 by that time and our presentation time was at 11:30. Parth left to request the authority to extend our time a little. It was 1:30. We still have 2 hrs. Rigorous hard work made a working system finally completed and more importantly without errors.

The Presentation: Aniruddha started the presentation with a small introduction. Parth gave a reason to justify why we made this. There was a maid in his house, and she never went to the doctor for their fees. So she took medicines without any proper consent. By our app we can overcome such problems. At least people will know that what disease they are having.Then Kanishka took over. This was a wrong move. Kanishka made the graph data-base but and I made the nltk and neural net. So the order was wrong. It was an eight minutes presentation, and we have 3 minutes left. I and Aniruddha took over. I gave a quick presentation of the neural net and nltk and Aniruddha on the graph data-base neo4j. We were then asked about some problems that we might face from the current intelligence level. The main problem was that it cannot correct spelling mistakes. So for that we need another neural network that matches the nearest word to that wrongly spelled word. Also, some other minor improvement scopes were asked. At last the presentation went quite well but only think that we missed is that the graph data-base using neo4j software was not shown.

The shocking award ceremony: We are all tired as we have not properly slept for two nights. I went back to my hall to have a sleep but I met one of my senior. I was just having a talk with him when Aniruddha called me that the HSBC people was calling us probably for some goodies or stuff. So I went out again. We reached Nalanda NR-121 and found that a team was giving some presentation in front of everyone. We thought that this was the winning team, and they were giving the presentation in front of everyone just as to show everyone. I started comparing how they were better than us as this is my habit. Their presentation went quite long, and we all were feeling very sleepy. Then we came to know that they were the last team to present as they had extended their presentation time. Then the judges went to the other room to discuss the winners for about half an hour or more. This was the most restless time for us as we thought of going to our hostel rooms and sleep. After that the ceremony started. The head of CDC and also some other professor gave a talk. Then they announced the prizes but not the team names. They also told that the first team will have an opportunity to present their demo in some of their conference. Then came the final time when the winners were announced. From the back, third was a PhD team and probably second was a team of fourth years. Then came the name data_X. As I was not a part of the registration process, first I thought that what was they saying. Then I remembered that yes, our team name is data_X only and no other teams except my teammates are getting up. Later I came to know that Kanishka first thought that it was a prank. I was asking Aniruddha if it was our team only. I began to tremor as I cannot control my emotions anymore. We then went to the stage to receive the prizes. This was the first time for me in IIT Kharagpur to receive some award. I saw that the entire audience was looking at me. Truly speaking, I was shocked with my result but there were many others in the room also who were not happy although I will not name them. We had some good photo session. The awards include ₹30,000 in the form of Amazon gift cards of ₹2,000 each and also T-shirts for all the team members.

What I have learned: These were the most important thing that I am going to take back other than the awards and the prize money that I feel:

  • Never feel that you are incompetent: Many teams participated but probably very few were 2nd Yr B.Tech students, maximum three to four. Also, lot of people who are quite well-known in the coding community participated. Even the third prize went to an PhD team and 2nd prize went to M.Tech or a group of final Yr students. But we stood first. So even if you not very famous or machao, you may stand out of the crowd.
  • Never give up until you have an opportunity: When our LSTM networks started failing, we almost thought to give up. But my last try using a simple MLP made the work. So, if we gave up that night, I wouldn't might have got the opportunity to write this blog. Fight till the last moment until you are compelled to stop.
The award collection
The gift card stack
Amazon gift card
Team data_X

--

--