Artificial Intelligence can now predict demographic characteristics knowing only your name
We decided to start our blog with an article about technology behind Demografy. Though the article will contain some tech language it should give our clients a good overview of current state of technology and the place of Demografy in it with its pros and cons. We hope this will provide potential clients with more confidence and technology context of their markets and business goals. While for others we hope it will be an interesting reading.
Demografy is a market segmentation platform that turns customer and prospect names into demographic data using AI. It uses machine learning based non-invasive technology to get demographic characteristics of audience using only names. It can be used to get demographic insights or append lists with missing demographic data. Provided names can even be partially masked like credit card numbers to provide additional privacy protection. For example, John Jameson –> John J*son.
Since Demografy doesn’t require sensitive information like addresses or emails it is completely privacy safe and provides full coverage of any audience since everything that is required to be known is names.
How does it stack with existing approaches? What is unique about Demografy is that it uses very scarce and non-sensitive information as input while existing technologies use either large amount of data or sensitive personal information to detect demographics.
We developed Demografy taking into account interests and concerns of all parties: both businesses and people they demografy. This provides Demografy with two seemingly conflicting features:
- Ability to extract useful demographic characteristics about almost everyone considering that person’s name is known
- But keeping person’s identity absolutely anonymous thus preventing potential violation of privacy or unsolicited use of personal data by third-parties
Let’s compare existing technologies to understand how the problem of predicting demographics is approached today.
Existing technologies and problems
In the era of data-driven marketing and decision making the problem of predicting person’s demographic characteristics like gender or age is not new. There are a number of ways how demographic data can be detected with varying accuracy. The below list is not exhaustive but includes most used and widely available technologies:
Consumer databases. The simplest and the most straightforward approach. And also widespread. There are plenty of data append services that try to match provided list of people to their records and append additional information about each matched record. They require sensitive information like addresses, emails, phone numbers and other personally identifying information to match records more accurately using this information. Though this method is intended to match records directly it has questionable accuracy due to out-dated data on both ends as well as other reasons, violates privacy and has very low coverage. Current trend in third-party personal data regulation (e.g. EU GDPR) also puts this approach at question.
Cookie-based user tracking. Most popular example is demographic targeting in Google Ad Words or Demographics in Google Analytics. It involves putting third-party cookies on site visitors’ devices to track them across the Google Audience Network in order to infer their demographic characteristics using their Internet usage patterns. It uses machine learning to predict demographic indicators like age and gender based on sites user visited. Such method requires large amount of data for each person to provide better accuracy and available almost exclusively to large tech giants like Google or Facebook which have access to large diversified network of websites and large-scale infrastructure. This is a backbone of today’s targeting in online advertising. However it uses controversial third-party cookies whose support in browsers is on decline because of the raise of ad blockers, privacy concerns and browsers like Safari that disable third-party cookies by default. This technology is also limited to particular advertising platforms or site analytics services meaning it cannot be used across different marketing channels or exported.
Machine learning solutions that use videos/photos of faces. This is one of the most sophisticated technologies. It uses machine learning on the large set of tagged photos with known demographic data to predict demographic characteristics of new photo. Latest algorithms has pretty good accuracy of around 80–90%. One of the commercial applications are video analysis solutions in security and retail sectors. But on the downside it imposes high requirements for the input data. For better accuracy it requires to have videos or photos. In practice this significantly narrows the possible applications of this technology. As for predicting demographic data such solutions have mostly offline applications like proof-of-concept of detecting demographic characteristics of a person in front of a camera to display them more relevant ads. That’s also a popular theme in sci-fi where people are walking by CCTVs and watch augmented reality ads based on their detected profile.
Demografy’s approach of using only names as input stands out of these technologies having the following advantages:
- High coverage due to low requirements for input data. It requires only names meaning it is able to unlock much more data than consumer databases or ML that use photos of faces. For data owners it means they can mine and get insights form much larger part of their data or even all their data.
- Privacy by design. Privacy is the important aspect of any data marketing. Personal information is never disclosed because Demografy requires only names. It can process even masked surnames like J*son so identity of each person remains secured. As a result it enables businesses and organizations to mine additional data about people but keep their personal information in-house at the same time.
- Compliant with data regulation and supports third-party data. Increasing data regulation imposes severe restrictions on using third-party personal data. EU GDPR is one of such examples. It practically forbids any usage of personally identifiable information if such usage is not “opt-in”. Demografy doesn’t require personally identifiable information at all. Names without additional data like date of birth, email or address are not personally identifiable information in most jurisdictions. And if they are, Demografy is able to work with even partially masked names like masked credit card numbers. This is especially vital in the world where businesses can be forbidden to share personal data with third-parties like data append services. In this case they can still take advantages of their data by sharing anonymous non personal information. Even third-party agencies can take advantage of data they have without sharing personal information.
As any solution Demografy is not a silver bullet and has the scope for its application. Below is the overview of pros and cons for each technology:
1 — Although we are currently working on introducing support for additional non-basic demographic characteristics
2 — Acxiom, industry leader in data append. According to CNNMoney
3 — Online ads targeting average, according to Nielsen Digital Ad Ratings
4 — Google Ad Words targeting, claimed by Google
5 — Best algorithms
6 — Individual record match accuracy (depends on selected demographic characteristic). E.g. how accurately demographic indicators are detected for each record. Based on own benchmarks of hundreds of thousands records of self-reported data.
7 — Aggregated statistics accuracy. E.g. how accurate overall distribution of demographic groups is. Based on own benchmarks of hundreds of thousands records of self-reported data.
Why do we need to predict demographic characteristics?
Predicting demographic data of people has many business applications. The era of traditional marketing and decision making is gone. Today’s marketing and decision making process in organizations became increasingly data-driven. Meaning that key decisions and activities are based and backed by collected data and its interpretation. All major corporations and many smaller businesses now deploy data-driven analytics, marketing and other solutions to their processes.
And demographic characteristics such as gender, age, ethnicity or income level is one of the most crucial pieces of big data in modern world. Knowing this data empowers businesses and organizations in many ways. It helps knowing your customer better, it helps understanding your market, it enables more in-depth research and analysis for educational or scientific purposes. Some use cases include:
- Demographic targeting in advertising platforms like Google Ad Words or Facebook Ads
- Segmenting customer and prospect lists to deliver right offer to the right person. For example, if you’re a clothing brand you want to offer dresses to women while ties to men
- Demographics analytics of your customers or prospects to understand your target audience and back your decisions with data