What context really means right now with intelligent systems
It’s interesting how far data disciplines have come along in the last three years. There are lots of folks talking about ‘big’ and ‘small’ types of data, and their respective practices.
I’d like to take just a few moments to talk about smart data.
I work with a lot of smart people on a daily basis. Some of them are data scientists, some network analysts, some advanced mathematicians, some of them social and interest graph experts, some of them gurus in database optimization. All of them are doing amazing things with data. All of them want to contribute meaningfully and exponentially in better understanding not only our purchase behaviors, but in how our sociocultural behaviors map to the development of better information products and data services (which are inexorably linked, of course).
A few them have asked me lately: “Where do you see data practices going?”
My usual response: “Depends on the context, and what the horizons look like.”
With that, I get rolling eyes and some grimaces.
See, it’s easy to get caught up in mechanical rabbit holes. Whether you write unstructured data algorithms and develop information products like I do, or you do heavier statistical work in ‘deep AI’, we all get caught up in not seeing data as an ecosystem of interdependent information. And it really is.
Without going into the finer details of various data types, specific processes and all that, I thought it would be best to describe the smart data opportunity in terms of functional descriptors — in other words, how various systems assigned to ‘big’ and ‘small’ would operate, and do operate, interdependently, such that smarter practices can converge around them.
I’m taking some liberties and making some larger assumptions about system functionality (G-d knows how nuanced this gets), but this is how one might look at the data ecosystem.
First, it is really important to understand the perceived ‘problem’ of ‘big data’ isn’t so much it’s size (although that is a substantial part of it), but it’s composition. Most databases, for example, sit on heaps of redundancies, ‘dirty data’ (meaning it isn’t cleansed, tagged properly and easily indexed), and as well, they suffer from slower processing speeds as a result.
Second, the perceived ‘problem’ of small, and in particular, unstructured data (conversational, semantic, graph-based, etc.), isn’t so much that it is ‘uncleansed’, but that it doesn’t have much in terms of open standardization. The influencer domain is great example: companies (startups) tend to make products that measure influence according to their standards, rather than open standards based on the contextual needs of clients and partners.
Compounding this is the fact that the big Internet companies (Google, Yahoo!, Facebook, Bing, et al) have done very little to advance their data standards beyond their own publishing and advertising agendas, making it very difficult for smaller companies to do things like rewrite the web, and for larger companies to get a far better handle on what their customers are doing, without having to game their information, or basically steal it and resell it to partners. This is also partly why ‘privacy’ and data management are elusive to the average Internet or social network user.
Personally, it’s one of many reasons why I bailed on Facebook after being an early adopter, and will be bailing on Google+ as another early adopter sooner rather than later. Twitter has been pushing back against the government on these issues, although in reality, all of these digital environments are hackable.
Advertising cookies, as another example, have barely advanced into the first party realm (whereby they don’t need to ‘follow you around’ while you do searches and click on pages), while third party cookies still flood web browsers and enable agencies like the NSA to surveil people through backdoors. Regardless of how you feel about it (you should definitely question it), the bigger issue is that we are contributing to the degradation of our information systems, as well as unknowingly helping to put up barriers to the content we see, and the data that is managed.
On the client side, I can’t tell you many phone calls I get from decision-makers who are frustrated that their data partners, software vendors and analyst teams struggle to sort through the data sets they manage, how difficult it is for them to segment old and new customers or prospects, and how hard it is for them to scale their own practices. In short, they’re still not able to empower their decisions with the kind of rigor they’d like. And of course, that sort of ‘rigor’ is relative to the demands and operations of the business, as well as supply or value chain partners, and most importantly, customers.
That said, with these great challenges come even greater opportunities.
A big one is to use these superstructures to our advantage by ‘sorting and seeing’ information more seamlessly on the ‘big’ side, and ‘making and matching’ information as users (participants) on the ‘small side’. Sorting and seeing basically involves using data orientations more and more in parallel, while making and matching is really about how each of us is active in how we curate and share our information with other people.
What results — at least what I’ve witnessed in my own work — is a whole new set of emerging processes that are both analytical (observation and pattern-oriented) and insightful (intuitively empowered).
Put another way, in a world in which questions are arguably more important than ‘answers’, we have an opportunity to understand, shape and compute information in incredibly efficient ways.
What remains is a desire to approach our respective disciplines differently and more openly, and to collaborate a lot more with people ‘on other sides of the data spectrum’.
As always, there is much to look forward to in our infofuture.