Data science and AI: 10 more terms you need to know
A couple of weeks ago we introduced 10 terms everyone should be familiar with in the areas of data science and artificial intelligence. As a follow-up, we thought we should go a little deeper and examine a few more:
- Data scientists and data engineers
- Open data
- Data ethics
- Data regulation
- AI ethics
- Turing test
- AI black box
Data scientists and data engineers
These two professions (yet to be formally recognised or supported by a standards body) have emerged from the data revolution are arguably now among the most sought-after job roles. Companies are expected to need and hire more people in these categories, and the growth rate alone in the last five years has been 256%. Both data scientists and data engineers use data analytics and programming skills in their workloads. The difference lies in their focus. Data engineers design, build and maintain data infrastructure, data scientists are both the customers and curators of data sources to this infrastructure and when working at scale are dependent on data engineers.
Fast becoming the lingua franca of data science and widely used by data engineers Python is an intuitive and relatively easy to learn open source programming language, even for absolute beginners. It is also general purpose — adaptable for both engineering and analysis requirements. And as such there is a rapidly growing wealth of resources available online for Python that allow for self learning and large online communities to aid every step of the way. With its ease of use, versatility and extensibility, no wonder data scientists and emerging technologies such as artificial intelligence are increasingly embracing Python. If you don’t’ have Python skills and capability in your team you might want to ask yourself why not and explore what area the openly available Python resources could help your business be less dependant on proprietary tools / methods and models.
Another expression that quickly reached buzzword status in the past 10 years refers to data that is freely available to access, view and share Many private companies, organisations and governments are now promoting open data, to benefit the health, safety, well being and convenience of people. Data on the weather, the environment, transportation, home ownership, healthcare, government spending and many more categories is increasingly being reused or embedded in services intended to improve citizens’ lives or to fuel new business models.
Open data of course does not mean that everyone’s private information is shared freely.
Data ethics is an emerging area of applied ethics focusing on the societal implications of collecting, generating, analysing and using data, most often customer or sensitive data.
Data ethics must rely on all existing data regulations, but if appropriate regulations are not in place yet, it often needs to take into consideration the unwritten rules of ethical business practices and make value judgements accordingly.
- Private customer data and identity should remain private
- Shared private information should be treated confidentially
- Customers should have a transparent view of their data
- Big data should never interfere with human will or institutionalise unfair biases
With data science and artificial intelligence feeding off large volumes of data, the collection of data from individuals and businesses has become a necessity.
Slowly, but surely data regulations have been created in every country and economic area, but in many cases these regulations are playing catch-up with technological advances, unethical collections of data and data breaches. Surprisingly, the United States doesn’t have a formal, centralized legislation at the federal level regarding data regulation and protection: they insure the privacy and protection of data through the United States Privacy Act, the Safe Harbor Act and the Health Insurance Portability and Accountability Act among other regulations. Because of this lack of centralised legislation, US based companies such as Facebook have been able to fall through the cracks when it comes to safeguarding large volumes of personal, user data.
The General Data Protection Regulation (GDPR) is a regulation in European Union law that was created in 2016 and was meant to be adopted by all operating in EU member states by 2018. Its purpose? To protect all EU citizens against the unlawful collection and handling of their personal data within the EU and the safeguard of their data in and outside the European Economic Area. GDPR details guiding principles, rights and obligations for companies as well as individuals. All companies, regardless of their size must comply with GDPR as of 25th May 2018. Seven key guiding principles are inherent in the GDPR for businesses to take into account in their approach to “controlling” or “processing” personal data:
- lawfulness, fairness, transparency
- purpose, limitation
- data minimisation
- storage limitation
- integrity and confidentiality (security)
Another area of applied ethics, this one has artificial intelligence in its focus. With the explosion of machine learning and algorithmic led AI solutions, AI ethics can most easily be split into robot ethics, which examines the ethics of people developing, designing and treat artificially intelligent machines and robots, and machine ethics which concerns with the machine’s own moral conduct.
AI ethics is becoming an increasing area of focus and concern for industry, and in national and international government agendas. As of yet no official international consensus is in place to inform and guide us what ethical considerations should be embedded into solutions being built. A great challenge is not to carry our human ethical mistakes and biases into the algorithms we develop. With hyperbole rife about the rise of “killer robots” most of the negative publicity AI solutions receive is largely still caused by human bias and errors in design — not by the machines themselves!
In plain english we think an algorithm is best described as a logical set and sequence of predefined steps to accomplish a task, goal or job. In its original form the term can be applied to any everyday task, such as a cooking recipe or a driving direction. In the field of artificial intelligence the term has been used to describe computer programmes that instruct machines to accomplish automated reasoning and decision making.
The Turing Test, named after the pioneering computer scientist and mathematician Alan Turing is a test to determine whether interaction with a computer can be so high level that the computer can be mistaken for a person. The first definite case of a computer passing the Turing test was in 2014, 60 years after Turing’s death, when a chatbot named Eugene Goostman tricked judges into believing it was a real teenager! In an era where chatbots are becoming more and more prevalent and voice assistants such as Alexa are widespread, it is becoming a more frequent subject of debate as to whether the Turing test is still relevant.
AI black box
This one has nothing to do with the flight data recorders on air crafts. In the context of artificial intelligence, a black box refers to an algorithm that provides an output or solution from an input, without its internal workings being known or clear. With deep learning solutions already applied in health care, judicial systems and many other areas where lives are at stake, the implications of not knowing how algorithms reach conclusions are very serious. As we explained in our previous blog piece, deep learning is a crude imitation of how the human brain works. But when the AI solution is doing its job perfectly well, it will reach answers difficult to explain, the same exact way humans sometimes do. In cases of mistakes, errors and even tragedies, the legal and ethical implications are endless.
If you really want to learn a little more about the tools, techniques, methods and opportunities for data science and AI to impact your life and your organisation why not get a little deeper and book on one of our online courses we offer in collaboration with the Southampton Data Science Academy? Reference “infoNation10datascienceandAIterms” to receive a 20% discount off the course booking fee — a £300 saving! Click here to get in touch.
Bogi Szalacsi is a Senior Associate with infoNation, based in London. You can contact her at firstname.lastname@example.org and follow her on Twitter: @infoNation5.