How Your Company Can Take Advantage of (Big) Data
We’ve summarised a few critical steps you should take to make the Big Data trend worth your company’s while, such as determining what questions you want answered, and implementing data cleansing processes.
Data analytics is the next big thing. It enables companies like Netflix to get super granular with marketing content to their customers that they think they may like, based on their preferences, rather than their demographics. This is a move they made after examining their data and discovering that sorting by interests was much more useful for assessing taste in shows than sorting by demographics. While this can creep out customers, it can also increase loyalty.
Sociologists have long established that there is more difference inside demographics than between them (on average). With big data, companies are better positioned to be able to take advantage of this extra information. Big data opens up the possibility to take advantage of previously untapped knowledge, as there is finally enough data to determine exact preferences.
Now that you’re aware of one of the biggest impact big data can have on a company’s marketing strategy, let’s delve into exactly how to implement big data effectively.
Smart CTOs and analytics managers will resist the urge to try to use every single piece of data they have and publicly available data that is vaguely related to their market. Instead, they will work to determine what data is and isn’t relevant to their organisation. It is very easy to get too caught up in the fine details. A good first step is to take stock of what data is created internally and determine what external data sources, if any, would fill in knowledge gaps and bring added insight.
If you want to do real time analytics & save you and your developers time, use data middleware.
Organise your systems so that it’s easy for ETL (Extract, Transform, Load) systems to process them. Before data can be analyzed it needs to be extracted from where it is originally stored. Next it must be cleansed and transformed into a useable format, then loaded to a new destination after having been structured in such a way that will allow the software to process it efficiently. This is called ETL.
Your data preparation process is the E and T of ETL. Whether you chose to buy software to do it for you or you ask your developers to design a custom solution is up to you and depends entirely on the stack you have. For structured and unstructured data this will differ slightly.
But, what’s happening behind the scenes? For structured data, a parser is created to find and fix missing values and inconsistent formats, remove and resolve duplicate data, and data enhancement by adding data and creating new data from combining existing values. Raw data errors like inconsistent values and formats can be fixed by string normalization or approximate string matching. For an example of how this works, check out Google OpenRefine.
For unstructured data, like Tweets, PDFs and Word documents, parsers are built to understand and process the specific data formats you’re working with. Parsing tools exist that can be taught to recognise patterns in data. You don’t have to build your own. Informatica, Qlik, Paxata, Trifacta, Tamr, SnapLogic, and Logi Analytics all offer self-service data preparation software, that work out how to parse it for you. It’s designed so even non-tech people can use it.
Don’t take it from us!
Federico Castanedo, Chief Data Scientist at wiseathena.com, says
Preparing and cleaning data for any kind of analysis is notoriously costly, time consuming, and prone to error, with conventional estimates holding that 80% of the total time spent on analysis is spent on data preparation … Substantial ROI gains can be realized by modernizing the techniques and tools enterprises employ in cleaning, combining, and transforming data.
All this self-service data preparation frees up IT staff for other work, and away from manual labor that can be ‘garbage in, garbage out’ if done hastily. Data scientists estimate they spend up to 90% of their time on preliminary projects just cleaning data. This means they can make few interesting actionable insights and are merely scraping the surface.
Determine what questions you want to ask.
David Loshin, president of Knowledge Integrity, Inc. recommends this process:
- Clarify the question you want to answer.
- Identify the information necessary to answer the question.
- Determine what information is available and what is not available.
- Acquire the information that is not available.
- Solve the problem.
And remember, don’t get sidetracked!
Dr Olly Downs, CEO of Amplero: “There’s a push to get to the nano level of targeting, because the more specific you can get targeting audiences and the way your interactions address those audiences, the more effective they are. It’s a little bit like building microprocessors, though. You reach a physical limit where you can’t be sure the signal is distinct. You can chase that rabbit hole a long way before you realize you’re doing things so granularly that you can’t measure whether it’s having an effect or not.”
Now you’re all set to get maximum value out of your data!
This is Part 1 of a series covering Big Data.
Coder Factory now offers digital leadership and technology workshops for employees of companies seeking tech, innovation, and digital literacy. The training sessions function as mini-hackathons, where employees work to use technology in solving real problems within their company.
About Coder Factory Academy: Meet Australia’s first and only accredited fast-track coding bootcamp! Our immersive course helps students acquire in-demand skills through hands-on, project-based training by industry experts over 24 weeks. Become an employable full-stack developer in only six months.
Now enroling domestic & international students in Sydney & Melbourne! Study now, pay later! VET FEE-HELP available to eligible students.
Are you a woman interested in coding? Check out our Women in Tech Scholarship!