Google Dataflow diagram showing common automated data organization strategies joining real-time streaming data (e.g. digital data) with batch processes (e.g. legacy datasets)

Reinventing Public Sector Decision-making in a Cloud-based ecosystem

Published in

Digital Diplomacy

6 min readDec 3, 2018

When the public sector encounters a problem, the operational levers typically go identify problem > measure > analyze + report > craft program > implement program >END

As far back as the first official Government studies ever ran, surveyors were given the task of identifying territory borders and mapping vital points. (Often untrained, relatively random) citizens were given tools of approximate measurement — e.g. circumferentor, wyns, and gunter chains — which were used to draw (poor) maps. Over time, these instruments improved. The circumferentor became GPS, the gunter chain became laser-based measurement tools, etc.

The accuracy of the tools allowed policymakers to therefore ask deeper questions, e.g. not only where their borders were, but how the landscape (e.g. river flows) were changing over time and what impact those changes might have on their population. When the government came online in the late 90s, to manage that ever-expanding set of information the digitization process began. The original informational tools were much like the original surveying tools, namely clunky to use and difficult to ascertain truth from.

As Google grew, we ran headfirst into this problem and had to invent our way out of it. Along the way we have discovered key insights about how both to create effective technological tools as well as extract accurate meta analysis from them. We began producing whitepapers detailing breakthroughs we reached in data storage, processing, analysis, etc. These whitepapers we turned into open source projects, which we then modified internally to suit our needs, but it is those same whitepaper-led products (like Dremel) which we still use today and offer externally (Dremel is known as BigQuery in Google Cloud). As we tried to improve our products to better serve our users needs, one key insight that Google discovered was the vital importance of each individuals decision framework in relation to what they were seeking. E.g. Two people can be looking to buy milk, but one may be simply looking for milk for cereal while another to make french breads at an industrial scale. If we as Google want to serve each user’s simple “milk” query with the best results for that person, we have to understand that background. This has grown in importance as Google’s algorithms have improved, often the queries get simpler. Why search ‘best milk for enterprise operations’ when a simple “milk” query will suffice?

One benefit that we’ve seen in our approach over time is that consumer habits have naturally made that information-deriving scheme more accurate by shifting core decision-marking habits online. Currently the average US adult checks their phones around 160x / day, and over 80% rely on online systems for most daily activity, and across the board usage figures are astronomically higher than even (the very recent past of) 2010. This increased consumer reliance on the digital ecosystem has created a paradigm-shifting event in the management of government programs: the availability of real-time information.

In this new paradigm, Google’s decision framework is often somewhat opposite governments. Speaking personally to my time at Google, my teams workflows have follow a different dynamic : Start with an insight > analysis of the data we have > plan to get the data we don’t > identify gaps+problem areas (both technical and non technical) > identify key ongoing stakeholders + collaborate toward a resolution > iterate solution as needed

The key concept here unique to Government programs is the iterative element. It works in two ways: first is the ability to quickly and consistently improve programs over time, and second is the ability to improve the lowest level data quality over time. This second feature is somewhat counterintuitive to the status quo, as Government often has the most sensitive data on the population, but I see most Government-owned datasets as the final step in a long citizen journey that is largely ignored by the public sector. Car companies use digital information to coordinate the why of buying a car. As a big purchase, some have seen user journeys with as many as 900+ touchpoints before picking a vehicle, and have dove headfirst into trying to outline the value of every single one to understand what they as a company could be doing better. Imagine if they continued to rely only on the sales data they get from their dealerships and none of the other 895+ touch points. That archaic process is where Government systems are today, e.g. public health data around the opioid crisis is largely tied up in moments of fatal overdose data, medicaid/medicare request data, hospital visitation data, vital records data etc., but without understanding how someone got to that point we’re left with fairly rudimentary options for understanding and more importantly intervention.

The issue with digital information is the opposite of traditional information — namely it’s robustness. The average County in the US (~100k pop.) produces ~1.1 terabytes per day of digital information. It is neither prudent nor really possible to work with every single digital provider to work to get 100% of that information.

The key then becomes what data do we need to answer the question at hand, and what systems do we rely on to get that information?

As it relates to your average constituent, the digital data most relevant is tied up in website analytics and advertising services. As any private sector digital group knows, a website has information that you can think of like a digital storefront, with a functionality roughly analogous to standing at the entrance asking questions to people as they come in. What are you here to do? When did you get here? Where did you come from? What sort of things do you buy usually in stores like these? What are your hobbies? etc.

On the flipside, advertising data contains information on the journey that user took before finally engaging with Government services and/or landing on public sector website / in a health clinic / hospital / etc. When you place an advertisement online, you get data back from that content. E.g. if you place an advertisement against all YouTube videos about Opioids, you would get back engagement rates by video, by content type, by geo, by creator, etc. Typically, those engagement rates tightly correlate with activities downstream E.g. high watch rates typically indicate there will be high Search interest on that topic after viewing which we can measure. While invented to help sell goods&services, this same process can be critically important to our understanding of public policy programs. With these three data sources — State owned data, website data, and advertising data, a bureaucratic body can develop a framework for how their key audience is deciding to request (or not request) Government services.

For example, the CDC used opioid search data from advertising to evaluate program goals on changing local conversations from the highly stigmatized ‘painkiller abuse’ to less stigmatized ‘opioid disorders’, a national conversation I think they were fairly successful in shifting. Project Jigsaw was able to identify a decision journey framework for radicalization online, which helped defang those radicalization techniques. More specifically, they identified the moments where a key individuals were showing interest in potentially joining terrorist organizations, and Project Jigsaw used simple strategically timed anti-radicalization PSAs (via YouTube Ads) to “redirect” a user toward a different interpretation of the information they were viewing.

Today, Governments already have most of the properties necessary to begin more deeply informing understanding of their audience(s). Other APIs with public sentiment data like Twitter/Facebook/YouTube also contain a myriad of analytics insights if organized in the right way, and BigQuery is growing its public datasets program with elements that can be very helpful in public policy analysis, e.g. hosting NOAA weather pattern data + others.

I predict that 2019 will be the year the first attempts at coordinating this information to better serve the public will get underway. Watch this space!

Reinventing Public Sector Decision-making in a Cloud-based ecosystem

Written by Quinn Chasan