My journey with HumanAI in the Google Summer of Code’24 Program (Part 2)

Shashank shekhar singh
6 min readSep 20, 2024

--

Introduction

Hi! I am Shashank Shekhar Singh, currently pursuing my Junior year in Mechanical Engineering at the Indian Institute of Technology, (BHU) Varanasi, India. This is my second and last blog in the series of Google Summer of Code blogs for the HumanAI organization for the year 2024. If you haven’t yet visited my first blog, Here’s the link where you can find it.

I am writing this blog to state my journey with the HumanAI organization in the GSoC’24 program. I will start this blog by mentioning how I started to connect and contribute at my current organization, following which can give you a head start over your fellow contributors.

How to join HumanAI as a contributor?

1. The HumanAI organisation publishes a list of several problem statements submitted to GSoC. These problems are related to integration of Machine Learning in the field of Arts and Humanities.

2. Any potential GSOC contributor needs to select a minimum of 1 problem statement and a maximum of 3 problem statements for submitting it on the GSoC website for participating as a contributor in the HumanAI organization.

3. The problem statements are a miniature version of the major project and therefore it demonstrates the amount and quality of work needed to be done by the contributor during the Gsoc period.

4. The Gsoc proposal should be written demonstrating the following:- Who you are?, What are your previous experiences in the related field?, What’s your game plan for the project duration?, and What’s the technology you would like to work with?

5. Try getting reviews on the proposal with people you may know and submit it before the submission deadline of Gsoc contributor registrations.

6. Once the results are declared by the official Gsoc website, the mentors will send a greetings mail and will add you to the respective communication channels of the organisation, which in our case was mattermost.

The project goals and my accomplishments include:

1. Development of Hybrid End-to-End Models: The primary goal of this project was to design, implement, and fine-tune hybrid end-to-end models based on CRNN architectures for text recognition. By combining the strengths of architectures such as recurrent neural networks (RNN) and convolutional neural networks (CNN), the built model aims to effectively capture both local and global features in the historical Spanish printed text.

2. Achieving High Accuracy: The ultimate objective was to train machine learning models capable of extracting text from seventeenth-century Spanish printed sources with at least 80% accuracy. This entailed extensive experimentation, hyperparameter tuning, and dataset curation & augmentation to ensure the model generalizes well across various styles, fonts, and degradation levels present in historical documents. The character level accuracy of my model is 95.79%, which demonstrates an excellent model performance keeping in mind training from scratch and using vanilla ML models like CRNN.

About my mentor interactions:

I would like to express my sincere gratitude to my mentors, Sergei Gleyzer and Emanuel Usai. Their support and guidance throughout the project have been incredible. With their deep knowledge of Machine Learning and AI, they helped me understand the root of problems and find the right solutions.

I also extend my heartfelt thanks to my mentors, Xabier Granja and Harrison Meadows for their dedicated mentorship. Their consistent guidance, weekly progress meetings, and deep knowledge of the Spanish language were crucial in assisting with literature and curating historical Spanish datasets.

Finally, I am deeply thankful to my fellow contributors for helping me out during the entire project tenure and my Mentors for selecting me for the GSoC program and for believing in my ability to contribute meaningfully to the project.

The Developed CRNN Model

The Convolutional Recurrent Neural Networks is the combination of two of the most prominent neural networks. The CRNN (convolutional recurrent neural network) involves CNN (convolutional neural network) followed by the RNN (Recurrent neural networks).

Work accomplished:

An advanced algorithm was successfully developed and implemented to print each word exactly as it appears on the page. This marks a significant improvement in the project’s overall functionality of transcribing PDFs in Spanish to written documents.

Model Performance

Following the fine-tuning of various parameters over the past few weeks, the training process has become significantly more efficient. It takes only 10–15 epochs to get to a CTC loss of 0.1, compared to the previous requirement of about 60 epochs, demonstrating the effectiveness of these improvements.

CTC Loss vs Epochs Depiction

Also, I developed a comprehensive pipeline for preprocessing the images provided to the Craft Model, which has been helpful in optimizing the model’s performance.

Lastly, I also corrected the functionality of the craft model and enhanced data extraction by fine-tuning various hyperparameters, such as grouping height (which allows for neglecting height differences between words in a single line) and padding (essential for effective word image extraction from PDFs).

These improvements collectively highlight the advancements made throughout the project.

Extra Curriculars

As part of his extracurricular involvement in the HumanAI community, I contributed to the following:

  1. Development of a Problem Statement for the Deep Learn Hackathon:
    I played a key role in formulating a unique starter code to the problem for the Deep Learn hackathon, centered around applying Optical Character Recognition (OCR) techniques to old Spanish documents similar to the one provided in GSoC. This challenge emphasizes the complexities involved in processing historical manuscripts, which often feature faded text, inconsistent fonts, and language-specific nuances.
  2. Writing a Workshop Paper for the NeurIPS SoLaR Workshop:
    I worked on a workshop paper for the prestigious NeurIPS SoLaR (Socially Responsible Language Modelling Research) workshop, which focuses on the intersection of AI and social sciences. The paper investigates how AI-driven solutions can be applied to optimize OCR, addressing key challenges such as efficiency and resource allocation.
  3. Writing an Article for a Humanities and Arts Journal:
    In addition to his technical pursuits, We also contributed an article to a humanities and arts journal, where we explored the intersection of technology and culture. This piece reflects on how innovations like AI and OCR can be used to preserve and analyze cultural artifacts, shedding light on the broader implications of digital transformation in the arts and humanities.

Closing statements and Further updates-

The model architecture performs well on the test data based on the training from the specific train data. My future endeavours include training the same model on lots of different types of data for enhancing the model generalizability and help it recognize different types of fonts and writing styles rather than memorizing any unwanted data patterns. The model accuracy can also be increased by 2–3 % when trained on a diverse data. Lastly, I wanted to mention that I will keep updating the model architecture whenever the existing model configuring goes obsolete, so as to keep its usefulness intact.

GITHUB Link

LINKEDIN Link

--

--

No responses yet