More Thoughts on Advancing Data Equality

As head of the Economic and Statistics Administration at the U.S. Department of Commerce, I think a lot about the Open Data revolution, and who benefits from it.

I’m actually a lawyer by training, and I started my career working with and fighting for small-to-medium, regional American companies. They made or distributed things from plumbing and auto parts, to construction and medical equipment.

Counselor Justin Antonipillai speaks at the Big Data Innovation Summit

To me, these companies are among the best of American industry. The leaders of these companies were focused on their people and communities. They brought good-paying jobs and opportunities to people and places across the nation that struggle to win in the new global economy. They hoped to grow through innovation, efficiency, sales and exports.

The primary focus of these execs was their employees, operations and best ways to make, sell, and distribute their products in ways that would grow local jobs. They faced often razor-thin margins, missed covenants, and many other challenges.

When they did focus on data, it was their own numbers — sales, revenues, expenses and earnings. One of the last things on their lists of priorities was finding, wrangling, analyzing, or searching Big Data, including open public data produced or released by the government.

I also saw that once companies grew, and had more resources, there was a greater of use of data to drive growth. That meant downloading, accessing, and harnessing commercial and public data sets to drive decisions. I saw the growth, jobs, and advantages that come from making decisions based on data, instead of gut instinct.

This is exactly the kind of innovation and growth and we are driving here at the Department of Commerce through our open data efforts.

I saw the same data edge when big-time commercial cases went to trial. Well-funded trial teams come — once again — armed with data. They have the resources to tap an entire industry with data-fueled expertise in jury picking, and even conduct jury studies with incredible demographic and other stats to bear, along with mock jury exercises, before a trial even starts.

The data divide was also evident to me both while I was still in college working as an investigator for the Public Defender Service in Washington, DC, and later, after law school when I started a clinic to represent folks who could not afford the legal fees in some very difficult criminal cases.

Public defenders and pro bono criminal counsel have huge caseloads and tough cases. I tried these cases and represented those without deep pockets. You have to be an ad hoc social counselor in their community, and worry about where your clients sleep, eat, and work. And, you have to focus your time investigating cases, finding witnesses, and researching the law. You can’t afford the time or money to find, search and crunch data — even free public data — to select a jury or build your case.

The famed law professor Alan Dershowitz quipped that to win in court, you first pound the facts, then pound the law, then pound the table. Today, in the Digital Age with our virtually unlimited Big Data, it’s faster to hack the Dershowitz process and pound the facts with powerful data. Juries love facts.

We all know that data is one form of income of the Digital Age. Advancing real equal access to data — especially open Government-produced public data — will serve to advance our justice system, economy, job creation and quality of life.

But government can’t advance data equality alone. We need the best digital minds in the private and nonprofit sectors to help us get our data out and make it more easily useable by more entities large and small.

Going back to the basics of equal opportunity, Justice Louis Brandeis said that, “Democracy rests upon two pillars: One, the principle that all men [and women] are all equally entitled to life, liberty and the pursuit of happiness. And the other, the conviction that such equal opportunity will most advance civilization.”

These principles continue to drive our law and policy to advance equal access, rights and opportunity, from jobs to housing and mortgage credit, education and college admissions, marriage and other basics in life.

Alan Krueger, as he departed as Chair of the White House Council of Economic Advisors three years ago, worried that “inequality in incomes is causing an unhealthy division in opportunities.” I’d make the same case about data equality and the need to address it, just as we care about income equality, gender pay disparity, and equal opportunity in all areas of life and work, current and yet uncharted.

In the digital economy, data is a critical asset, like capital, credit and talent. Having the best data and the tools to use it gives you a distinct competitive advantage. It allows you to find and pursue opportunities and manage risks better than those without the same digital-data resources.

Those with the data edge certainly have greater economic opportunity. Look at a hedge fund with a back office of data scientists, developers and engineers to collect, compile and crunch public data to analyze an industry or public security. Then look at a mid-sized asset manager without those digital and data assets. Who has the better chance to produce a higher yield?

It was 1954 when Thurgood Marshall served as chief counsel in the landmark 14th Amendment equal rights decision in Brown v. Board of Education that outlawed racial segregation in public schools. IBM had just dazzled the Popular Science crowd with the first “calculating machine” that used solid-state transistors instead of vacuum tubes. The Digital Age pioneer Tim O’Reilly, who popularized the terms “open source” and “web 2.0,” was just born.

Back then the notion of data equality might only cross the mind of a pulp science fiction writer. Today it raises some interesting legal questions.

But I won’t try to answer those here. Why not?

To be clear, I’m not opening a Pandora’s Box of new laws to level the data playing field.

It would be more immediate and effective, I think, to launch a new public-private partnership for data equality.

To paraphrase President Obama when he spoke at the South by Southwest Interactive in Austin last March, “The reason I’m here is to recruit all of you.” The President recognized that combining their digital talent and public spirit could bring powerful solutions to national problems they care about.

The secret sauce is public data. Unknown to many, the federal government is a global leader in Big Data. Every day, intentionally or as a byproduct of their work, the U.S. agencies produce countless terabytes of data on the nation’s health, schools, transportation, roads and bridges, industries, the economy, jobs, weather — you name it.

This is public information. It’s priceless. And it’s open for the asking. But we need to make it easier to grab, use, analyze and use this data.

The Obama administration has been working on it, with the President leading the charge to open public data and use it to drive innovation inside and outside government.

On his first day in office in January 2009, President Obama launched the Open Government Initiative and ordered all federal agency heads to make our data as open and available as possible. He established the first-ever White House Chief Technology Officer to promote a more open, tech-savvy government. And appointed Aneesh Chopra, the digital transformation thought leader, followed by Todd Park, and now currently Megan Smith.

The President went on to appoint the first White House Chief Data Scientist, DJ Patel, who many credit with coining the term “data scientist,” and Jen Pahlka to head up government innovation. Jen launched Code for America, which the Washington Post called “the technology world’s equivalent of the Peace Corps or Teach for America,” with tech innovators answering the call of public service by coding for a better government.

The White House also launched the Presidential Innovation Fellows, the US Digital Service, and the 18-F program to help federal agencies build, buy and share digital services.

Secretary Pritzker has also been a visionary on open data and improved statistics. By some measures, the Commerce Department where I work produces 30–40 percent of all federal data. So we’re playing a leading role in the open data challenge, with Commerce Secretary Penny Pritzker driving us to live up to our nickname as “American’s Data Agency.”

Commerce includes the Patent and Trademark Office, the International Trade Administration, the National Oceanic and Atmospheric Administration that has the National Weather Service, the Census Bureau, and the Bureau of Economic Analysis, which puts out the official GDP numbers that hit the headlines.

These functions produce a huge wealth of data on income, production and consumer spending, imports and exports, inventions and ideas, the oceans and climate change, science and technology, and demographics. NOAA itself produces enough terabytes of data to fill the Library of Congress twice.

U.S. Commerce Secretary Pritzker has made “Open for Business” the department’s strategy

Commerce Secretary Pritzker has made getting all of this data out to the market as part of her “Open for Business” strategy, in four ways:

First, we’re making public data more consumable. That means curating and cleaning our data … and building APIs and open-source seed libraries … in common digital languages like “Python” and the statistical language “R” to enable better and faster access to our data.

Second, we’re joining with organizations across the country to co-sponsor “hackathons,” which challenge the best of the digital world to mine and harness our data to come up with new innovations and solutions. The participants are finding amazing ways to use public data to come up with new ways to address public needs. For example: The “Hack the Pay Gap” project arms women with localized job and income data to negotiate for pay equality.

Third, we’re bringing Silicon Valley to the Potomac. Our Chief Data Officer and Chief Data Scientist Jeffrey Chen who is here with me today, are leading digital wizards, supercharge our data projects and connection with the digital economy. Our Commerce Data Advisory Council of 20 digital thought leaders from all sectors helps us think and learn from the market. Our Commerce Data Academy trains our employees in data science. The Commerce Data Service brings a team of data scientists and developers to lead our digital-data efforts.

Data visualization is critical to usability

And fourth, we’re harnessing data to improve government. We’re flipping the script on how government works by harnessing data to improve operations, doing things the Silicon Valley way. Like adopting cloud computing and agile development technologies, using open-source tools, and encouraging developers to contribute to those communities.

One project uses predictive modeling to help U.S. foreign trade officers identify and reach out to small businesses that exhibit the traits of successful exporters, and helps them sell overseas. This is data-driven client identification, something the private sector does everyday, and something we absolutely needed to start doing.

This is all good stuff. But I want to be honest and realistic here. When I mention to digital pioneers that we’re starting to use R, they’re more sympathetic than wowed. They know government doesn’t have the back-end infrastructure and front-end tools that would allow everyone can find, download, crunch and use our mountains of public data easily.

Global commerce depends on cross-border data flow

We do appreciate how many sophisticated data consumers are working to improve their access and use of our data. We want to support and encourage that. But we also want to expand the universe of public data users, including small- and medium-sized companies, non-profits, charities, public defenders and perhaps even everyday citizens.

To give us some data on data equality — even just a rough sample — we looked at who’s tapping the Census Bureau’s API, which includes all the data from their annual American Community Survey.

If you’ve ever gotten this survey in the mail, it’s pretty comprehensive. And it’s worth the time to fill it out, because it provides the richest dataset available about our nation, including population, demographics and geographics. There’s data ranging from household wellbeing, to job attainment, to how our veterans are doing, and much more. Journalists, researchers, academics, and companies treat this data like it’s gospel.

So we looked at the top 100 users this year of the Census API that includes this data. At first, we found a pretty nice, even distribution — about 60 percent from private industry, 20 from higher education, 9 percent from government, and the remaining from non-profits. But when we looked at frequency of use and scale of access, the top 100 users look very different. And when we drilled down into the private sector users, we found that about 85 percent were in real estate — mostly consulting and development firms. The rest were marketing consultants.

What’s the takeaway? Some of the most valuable Big Data on the planet — open for the taking — is being directly accessed mostly by a small group of niche industries.

Now, we know that others may be getting our data in other ways, but here is what we are pushing for:

· When a mid-market building materials distributor in Indiana is buying door frames, we want that company checking our import/export data to make sure they are aware of all the places they could be buying those frames.

· When a public defender in California chooses a jury, we want them to have access to the same demographic data that jury consultants use every day in commercial cases.

· When a charity is fundraising for good causes, we want them to be using our income, demographic, and other data to raise funds more efficiently.

But we cannot address this issue of data equality alone. In fact, while we can do a lot to wrangle, clean, and release data sets in bulk or on an API, we need help figuring out the uses that will solve the biggest public problems, and how to make the data really available to solve those problems.

So here’s my ask today: We want to take giant steps forward in democratizing data and advancing data equality, continuing what President Obama started. We’re asking the best and most restless minds in the digital sphere to help us provide easier access to public data.
The broader goal: Ensuring data equality

To be specific, we need the most help with the “last mile” problem. We’re building the capability and resources to clean our data, post it in bulk, and then build APIs to help users access it. But we’re not best suited to support every data user’s unique needs and systems. What we need is for more data scientists, developers, engineers and others to build the applications and business intelligence tools on top of our APIs to promote best and broadest use.

Just like our hackathon participants are doing.

Now, full-on data equality and democratization, where every citizen can pull and wrangle the mountains of public data at home, may be a bridge too far.

For now, I would love to see the next generation of law school interns in the public defender’s office using the same data as jury consultants.

Or those modest Midwest companies I represented leverage not just their own sales and other data, but also the incredible public data on demographics, income, imports and exports, patents and weather.

Justice Marshall said, “I challenge anyone to say that [equality] is not a goal worth working for. Today in the Digital Age when data equality means equality in justice and opportunity, we need the help of those who know best. Data is a new frontier in our constant quest to advance equal rights — and a goal worth working for.

Thank you for this chance to reach out to you, and I look forward to hearing from you.

Remarks prepared for delivery by Justin Antonipillai, U.S. Department of Commerce, Big Data Innovation Summit, Boston, Mass., September 9, 2016