Towards Data Industry Best Practice

Huy
TheDataist
Published in
4 min readFeb 10, 2019

A couple of days ago, I was talking to Adhityaa Chandrasekar, the founder of Commento.io. Getting a Master degree in Distributed Systems, working on multiple open-source projects and had an internship at Goldman Sachs, he's one of the brightest young minds you could find on this planet.

Upon hearing my suggestion to include some sort of data processing as a value-added feature of Commento, he told me how he is aware of the benefits that many people including himself can enjoy by studying data but "just disliked" all kinds of user data processing. So he decided not to go for it at all. And it struck me as odd because I'm a pro-data-science advocate who would go on for hours, whenever I have the chance, about the importance of data to our brain's evolution. But then I stopped and thought to myself:

Why would someone NOT go for it even when they already know the benefits?

The answer is obvious.

A screenshot (taken) from the trailer for ‘Taken 3.’ (20th Century Fox)

See, we didn't just lose those users' faith in the goodness of technology. We might have lost some of the best minds from ever entering the field of data science because of data breach.

And what happens happens...

Another screenshot from MarketWatch

Companies who rely on harnessing the power of big data suffered not only economically. It's just on the surface. Behind the scene, this affects the motivation to create new value of their current employees and could block the entrant of new talents. The aftermath of it may last for decade(s).

What do we do about it?

Well, every newspaper states that "Data Scientist is the sexiest job in the 21st century". But is it sexy because of the money or because of the meaning and value the profession could create for our society?

If you agree it's more on the latter, then we do need an ethical approach to pave the way to successful data science practices.

Fortunately, we're not the first to think about this problem. Someone has even written a framework for it. Here's an excerpt from The Data Ethics Framework released 6/2018 by the Department for Digital, Culture, Media & Sport (UK).

Data Ethics Framework Principles

Your project, policy, service or procured software should be assessed against the 7 data ethics principles.

1. Start with clear user need and public benefit

Using data in more innovative ways has the potential to transform how public services are delivered. We must always be clear about what we are trying to achieve for users — both citizens and public servants.

2. ​Be aware of relevant legislation and codes of practice

You must have an understanding of the relevant laws and codes of practice that relate to the use of data. When in doubt, you must consult relevant experts.

3. Use data that is proportionate to the user need

The use of data must be proportionate to the user need. You must use the minimum data necessary to achieve the desired outcome.

4. Understand the limitations of the data

Data used to inform policy and service design in government must be well understood. It is essential to consider the limitations of data when assessing if it is appropriate to use it for a user need.

5. Use robust practices and work within your skillset

Insights from new technology are only as good as the data and practices used to create them. You must work within your skillset recognising where you do not have the skills or experience to use a particular approach or tool to a high standard.

6. Make your work transparent and be accountable

You should be transparent about the tools, data and algorithms you used to conduct your work, working in the open where possible. This allows other researchers to scrutinise your findings and citizens to understand the new types of work we are doing.

7. Embed data use responsibly

It is essential that there is a plan to make sure insights from data are used responsibly. This means that both development and implementation teams understand how findings and data models should be used and monitored with a robust evaluation plan.

Visit this link to grab yourself a full version of the document.

Sure, that's still very conceptual. But it's a good start. And I personally found the 3rd principle to potentially solve many of our existing problems:

The use of data must be proportionate to the user need. You must use the minimum data necessary to achieve the desired outcome.

Nothing can put an end to human greed for power as long as we live. But we are the creatures capable of thinking and being self-aware.

We can decide and build the future we want.

If you agree with this, please clap for this post and share it. Help us build this publication to inspire thinking about Data Science Best Practice.

Thank you!

--

--