Experiences: Automating a Data Entry Task with Selenium

DailyPriyab
DAM DATA-AI-ML Learning
7 min readNov 7, 2018

Prelude

This blog is not about the technical details as to how achieved the automation of the data entry task using selenium. But about the key learnings that I got after doing this automation in a very tight schedule. The automation was not a complete success and of the Data Entry task only 25% of the values were completed using my automated script.

What?

Day before yesterday while I was getting ready to enjoy my Diwali holiday and making shopping and sightseeing plans, my friend came to my bay and put out a problem which at the outset looked to be a great problem to automate help him out.

His problem was that in a web page where there is no bulk upload, data had to be manually fed for 5 weeks for 170 items. Which resulted in almost data entry for 850 items. And this had to be done in few hours.

When?

Now the constraints were that,

  • We had to add multiple sections for each item in a single page and verify it as a request for manager approval.
  • We had to complete the whole task before people could leave for Diwali.
  • But there were some approvals we were waiting on. Hence, we cannot start the data entry task right away but late in the evening.
  • I was also not very clear on the exact requirement at hand, I had not even done a feasibility analysis before promising to automate the task

Why?

Now his initial plan was to get 6–7 people and get the work done, but the reason I was so much interested to help automate the task was because

  • From the outset it sounded like a task if not done would have a huge business impact for my friend and our department
  • I was one of the people designated to complete the task and being a programmer, I am always excited to find new problems that can be automated saving time and drudgery
  • I dreaded the idea of manually editing at least 100+ items in a short span of time

How?

At this point I had very vague idea about the work at hand. Anyhow I had requested the web link and the required data from my friend which I got by afternoon. In parallel I started studying the data and the process to follow for doing the automation. I also checked the tags and the key input fields which I needed to automate. While studying the whole thing I was also ready with a quick test script to check the whole process. When I got the final green signal, I started processing the data in full swing, but by this time it was almost late evening and quickly I realized some big problems with my testing script.

Problems:

  • During testing I found that in the data entry process has some time waits and page transitions and the whole workflow that I had developed was not optimized and was very slow. For just one data entry the whole process took 6–7 minutes which was not at all optimal for the volume we had and the time constraint at hand.
  • The initial assumptions that I had about the data and the process were only for simple cases, soon I came across complex cases and exceptions for which I had not built the code to handle. Because of which my code died midway without even option for manual intervention. This was the single most big issue which wasted too much of my time. I almost had to redo the data entry for all the people multiple times and had to fix the code based on the exceptions I encountered because of which I was awake late in the night and we were way past the deadline mark.
  • I did not have a plan B, I just chose the selenium web driver as the option for data entry I did not examine the rest API calls, and possible alternatives which could have same me time.
  • As I was the only developer in the group and the script was only with me, I had not thought a way to complete the task in parallel and it was not possible form my friends to run the script on their system as they did not have the setup.

I was dead tired from over thinking and did not even have the courage to admit that my script was not working. Now till late night I was still in false impression that I can complete the whole task and while my friend was probing me if we needed help , me in my enthusiasm of fixing the code and automating the whole process ignored the fact that we had a huge task at hand and even if I fixed the code the minimum time taken to automate the data entry and execute will far more time than if the work was done manually.

Since it was late night some of the volunteers for the manual data entry had left, hence it was left to me, my friend and his wife and another person to complete the task. My friend tasked me to complete a small portion of the task (approximately 20–25%) of the data entry while he took the rest. Added to it my automated data entry was impacted as there was a system maintenance between 3–6 pm. I was only able to do the data entry by morning. But the sleep and coupled with fresh ideas, I finished the rest of my task with the help of the task using Spyder IDE with IPython to run the automation script and ran the data entry tasks in mini batches. Which resulted in faster successful completions.

Hence the whole process was semi-automated, supervised and running in Spyder IDE and IPython did the trick where the browser did not die off and even if any place my automated script failed I was able to make the necessary manual edit and submit the data.

Takeaways

In retrospect I should have done a proper assessment of the requirement vs the time limit and should have gone with the original plan of manual update at the first place. More than the technical part I would have avoided the confusion that my friend had to go through and the false hopes that the automation would solve all his task. While I learnt lot of things from time management, requirement, execution and technical perspective but specifically for automation of data entry tasks I would list the following takeaways as I conclude my Blog.

Is the data entry a one time or a repetitive activity?

I did not follow this golden rule and ended up writing a script which in its original form will not be used again. It is better off doing such task manually than using a script as the time take to develop the script will be equivalent or even more than the original requirement. Even in cases of repetitive tasks it is better idea to know the frequency. Only if the task has some significant human overhead and has significant cost and time saving, then only automation should be chosen.

How critical time constraint is the task?

Let’s face it while automation has improved not all automation are blazing fast. In some cases, depending on the time criticality specifically selenium or web automation can be slow. And depending on the complexity of the workflow there is high risk that the automation will be slower than doing manually.

What kind of automation you need?

Now there are two types of automation that once can do, supervised or unsupervised. While developed the script, it was mostly for an unsupervised automation to convert a task to bulk upload so that the whole process can be fast involving only a single verification process before submitting. There were multiple cases where even during the automation of the process for an exception or bad data my automation was failing or needed supervision. Hence it is better to check and decide what kind of automation best fits your requirement and get it right at the initial stages itself instead of fixing later. For me fixing it late.

How much to automate?

Try automating simpler cases and use mini batches where ever possible. When I started building the script it was meant to automate the whole task and click the submit button after the automated verification was done. But there were many circumstances where manual intervention was required and even before I can do that either the script died, or the web action was unresponsive. This not only waster all my previous efforts but also resulted in added work. Later I figured by automating only a simpler part of the process and doing manual verification and submission I improved the overall productivity and accuracy more than the previous process. While this cannot be true for all data entry operations but for me it was true.

What is the cost of the automation?

Now the amount time taken to design, build and test the script and finally running it was far more expensive than if the work would have done manually. This was the biggest learning for me. While we many feel excited to do some good work, it is of no point if a high priority task is not delivered (botched in my case) and time spent on it is of no real value (the pits far outnumber the benefits). A Cost benefit analysis is always required for each task we do and based on that taking the right approach will help lot of people and save lot of money and time.

Finally

The only positive I can take from this experience was that I learnt lot of things technically and execution wise which I had forgotten since last few years. I had excelled working in tight timelines and in very stressful environments. But last few years have seen me mellowed down and opportunities like these come few and far between. While I will explore more such action and experiences, I would surely take my lessons from this experience of automating the data entry task.

--

--

DailyPriyab
DAM DATA-AI-ML Learning

Data Engineering | Data Governance | Azure | Spark | Python | Manager