Attribute Stats on Enriched Data

Jan 30 · 3 min read

The various technologies, analytics and business intelligence around data have been buzzwords for some time now. They have also been the main ingredient for enhancing the decision-making process and business value. DemystData provides you with external data, technology to access these data and helps you with the right analytical tools to extract the value out of this data.

Demyst’s Python API allows you to access external data from a large number of data products, around 160 in numbers. Once you have your input dataset enriched with external data, you would like to see the stats around them. You can get this done easily through a single method, report(), from our Python API. The method looks through your input and the enriched response from Demyst to provide a list of stats on the products and attributes. Here is a list of some stats you get back:

  • match_rate: Number of matches found by the product for the given input dataset. This helps you in determining the best products for your use case
  • fill_rate: The number of values filled for the attribute. While match_rate will be at the data product level, fill_rate is more on the attribute response level. This gives you an idea of how often the attribute will contain values or how much values are contained in the attribute and will it be enough for processing. fill_rate will always be less than equal to the match_rate.
  • type: The type of the attribute response. It will be one of boolean, object, int or float.
  • nunique: The unique number of values that the attribute contains. Imagine if you saw 52 unique values for the state attribute! Unique values give you an insight on the attribute values and can also aid in data scrubbing if any.
Image for post
Image for post

Now that you have the statistics, you would want to retain a subset from all the attributes which would be fed to your decision-engine. You can filter the attributes based on these stats through another method, query(). In the image below, you can see the attributes being filtered on three stats, match rate at least 80% or more, fill_rate of 50% and at least 2 or more unique values.

Image for post
Image for post

You now have access to selected few quality attributes. These attributes can be evaluated on their relative importance for your use-case using any of the modeling techniques. Or you can customize data the way you want, having the tools and the power to access and manipulate the data from Demyst!

Want to learn more about Demyst? Schedule a consultation with Demyst to speak with one of our industry-leading data experts.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store