Too much data is bad

Agastya Zayant
Agastya Zayant

--

Disclaimer: This is not your typical data science post.

There is a book named “The Paradox of Choice-Why More is Less” by Barry Schwartz, which argues that too many choices sometimes lead to bad decisions. We need to stop using too much training data, stop writing too many articles on the same topics and focus on practicing rather than writing and publishing theory on mundane articles.

Too much training data:

There is a reason we don’t use too much data to train our models because they lose the capacity to make decisions (predicting) outside of the context (test data). When a model is not performing well the general choice tends to be, collect more data but that doesn’t help always. We need to understand about the hyperparameters within our model and tweak them. Use the classic techniques of data cleaning and rudimentary data analysis. Don’t always collect more data instead try to create a good model with the data in hand.

Too many articles:

The idea of a neural network was first introduced in the 1940s and was simulated in the 1950s. I came across the term neural networks in 2012 on my IIT Professor’s office shelf whose doctoral research was related to this field. So, it has been more than 70 years and we are still writing articles about ‘how neural networks work from scratch’ day after day. There is nothing new but the same repeated stuff. Most people just copy and paste from others. Are we losing our focus? Isn’t data science and machine learning about practicing with data? or is data science about publishing the same tutorial articles available elsewhere on Medium? If everyone is going to be a novice, who is going to be a researcher and advance the field forward?

Don’t get me wrong, I use Medium to learn about other people's interview and preparation process but the glut of mundane articles is too much. One author will say, ‘These are the only steps to land a data science job’, the other author will say, ‘What you have learned till now is wrong, follow these steps’ and an another comes along and declares, ‘The only 4 things you will ever need to succeed as data scientist’.

Let me tell you one thing, there is no one correct way and stop looking for one. Always make sure your fundamentals are strong. Create a checklist of 3 most important areas to be good in, to land your dream data scientist job. Did you find anything interesting while studying? — good, first search if an article has already been written on the topic and write only if you think you can contribute something significantly new. I don’t think Medium should be a place for one’s rough work on a daily basis.

Conclusion

Too much data can break our model similarly too many conflicting and mundane articles makes it harder to make useful decisions. Create your dataset mindfully and similarly write articles in a mindful and useful manner.

Thank you.

--

--

Agastya Zayant
Agastya Zayant

Authentic and scientific articles on habits, productivity, and success.