Data Relationships 2.0

Published in

Insurance 2.0

4 min readJun 1, 2011

One of the most under-appreciated tools of the last decade is Yahoo Pipes. If you’ve never seen it, it allows you to wield together different data feeds like RSS and do some cleansing operations on them, spitting them out as a unified data feed on the back-end. It also makes it so easy that a non-programmer can easily do it. I will be really sad if they ever shut it down, but I kind of anticipate it given the amount of support it receives.

Anyway, the reason I’m thinking about Yahoo Pipes is that it’s kind of a unique and easy way to think about data compared to traditional data. Instead of thinking about data as something you go out and get, it treats data as something that trickles in and is then categorized, put into the right place, and so on when it arrives. This always struck me as a very elegant way to treat data, rather than as the chunks or flat files that are batched, filtered, treated with rules, and so on, and then sent to the next batch process.

But most of all I like the pipe concept — that data can flow between two points. It’s a very user-friendly idea and it works well in practice (aside from the tragic bugginess of the Pipes product itself). It’s an idea I’ve been exploring a lot lately, and I think two-way pipes are interesting as a concept for “Relationships 2.0” (I’m sorry for the 2.0, please forgive me).

A diagram of a wormhole that I'm calling a relationship

But most of all I like the idea of a pipe as a wormhole, a way for data to be in two places at the same time. This is just a location in data-space that, when you walk through it, you’re in a completely different place but looking at the same stuff. An odd concept, but think of all of the old-school relationships that covers in one idea. One thing — this pipe, wormhole, whatever — and you get from the stuff your’e looking at here to all of this related stuff over there. In fact, there’s no difference, you’re seeing it all at once.

All of this boils down to thinking about relationships between locations in data-space rather than between entities. If an entity is in one place then it also in another place. If an entity moves into one point in data-space (and by data-space I mean considering each piece of data — a field in a traditional database — as its own dimension, very much like OLAP does) then it pops into existence in the other point as well. Whatever you’re looking at becomes dynamic because wormholes can pop open at any time and a bunch of new information can tumble through.

So say I’m looking at something of interest — a property the bank mispriced, to give you a real-world example. If I want to see more “like that”, I’m going to end up looking at other properties that are similar in some way. Say, for example, it is owned by a particular bank, Countrywide Financial for grins. If that bank is taken over by the FDIC and is acquired by Bank of America, I can do one of two things to make my data right: I can create relationships from all of the properties that were owned by Countrywide to Bank of America, or I can create a pipe — a wormhole — from where Countrywide used to be to where Bank of America is. And then if the merger is squashed for some reason I can undo the whole thing by removing the relationship.

If I’m looking at a mispriced asset that I’d like to find more similar assets, this is huge. By creating this one thing, a two-way pipe, wormhole, whatever, a whole new world of data is exposed to me. The amount of data that is quite instantly visible from the same place (whether I’m looking at a report, a dashboard, an app, whatever) has grown instantly. (What’s cool, too, is that you can expand or contract the wormhole requiring more or less similarity to flow thru it. A geeky way to say that you can easily adjust how similar the stuff is that you’re looking at.)

This sounds like a pretty abstract thing, but it has me kind of excited because this type of merging and un-merging is what data integration is all about. It’s a huge, hairy problem, as you know if you’ve ever been involved in an ETL project, and this seems to solve a lot of that. Now, trying to implement this has not been a walk in the park either, but it’s definitely a nice challenging programming challenge.

Some earlier thoughts about data wormholes located here.

Data Relationships 2.0

Written by Jason Kolb