Chief Data Officer

Just over three years ago, I became the city of Syracuse’s first chief data officer. A couple dozen other cities, as well as states and federal government agencies have chief data officers as well. The position is new enough that many of us with the same title have vastly different backgrounds and responsibilities. There are others in city government who have similar responsibilities but different backgrounds (chief information officers, chief innovation officers, chief performance officers, etc).

Organizations have recently been created to better define and share best practices for chief data officers and those with similar responsibilities: What Works Cities and the Civic Analytics Network are a couple — both of which I’m fortunate to be a part of.

Despite the organizations and the shared goals of many chief data officers across the country, on a day to day basis we all have our own responsibilities and special circumstances at home.

Three years in, I thought I’d share some of the ways I’ve approached my work, with both the challenges and successes, as well as maybe done tips for others who want to join the field.

When I was in college, the department chair for one of my majors — Bill Coplin’s Policy Studies program — was insistent that we get hands on experience to improve skills. This meant that for one class, all of the students split up work on a final project that included getting GPS locations of every bus shelter in Syracuse and taking Rider surveys of bus riders. Little did I know how much this project would prepare me for my job as chief data officer. When I started working for the city, my mom reminded me that I’d always liked doing this kind of work, going all the way back to that project.

Locating and mapping bus shelters is one of those things that a casual observer might assume someone else has already done. That same person might also assume that someone else was thinking about if the bus shelters were placed equitably, or in the optimal places, or facing in such a way that the shelter would block bad weather, or that the adjoining sidewalk to get to the shelter would be in good shape and compliant with the Americans with disabilities act. Those assumptions would all be wrong, at least in Syracuse in the mid 2000s.

In fact none of the bus shelters had been mapped, and thus no one had thought holistically about how the bus shelters should be placed.

When I took the chief data officer job with the city, I thought I’d get to grab all the data and build predictive models which in turn would save millions of dollars, make people’s lives easier, and prevent issues like infrastructure failures. While I’ve gotten to do this a couple of times, on a day to day basis, the main goal is generally to figure out how to count stuff correctly.

In this really long post, I’ll go into detail about how we count stuff (and why counting isn’t always easy). Then when we’ve got counting down, I’ll talk about how we use those counts to identify trends (and why there can be disagreement with trends). Then when we have the trends down, we can use them to start making predictions (and how even with the smartest data scientists in the country you still may have to do data entry before predicting things). Then I’ll talk about being transparent and creating partnerships to help get the job done.

Throughout I’ll also touch on how I, in this CDO role, have thought about smart cities, where I get involved when there are discussions about new technology, and where block chain could (or definitely couldn’t) fit in local government.

In the three years I’ve been in this position, I have not done everything right (it’s probably mostly been wrong). The role started with just me, but over time I’ve been able to build a small team. Because disciplines like technology and data have been underfunded and overlooked for a long time in most local governments, it is easy for scope creep to happen as anyone with some ideas about technology or data may be asked to fill in on a number of different types of projects. This can be good and bad, and definitely leaves me feeling like I’m winging it, with a fair amount of imposter syndrome every day. I also, often, am one of the first ones to recommend against believing in a solely technology driven solution. I am almost always the first to caution against some kinds of data collection because of privacy concerns.

I’ll touch on these issues as well. I hope these pieces are informative, and I am interested in your feedback and thoughts along the way.

Counting Things

Sometimes the job of Chief Data Officer is as simple as figuring out how to count things. The challenge is that counting things is not always quite as easy as it seems. With processes documented on paper, information not standardized, or not documented at all, the task of figuring out how much stuff happens on a day-to-day basis in city government can be a challenge.

Potholes

The City of Syracuse has a lot of potholes. With infrastructure that has mostly passed its useful life and snow and plows that give the roads a beating each year, roads show their damage by the time spring comes around.

Our task was to try and figure out where and on which roads potholes were being filled most. We also wanted to report how many potholes were filled overall so people understood how much work our public works staff did each year to keep the roads smooth.

Unfortunately, the pothole filling crews did not record when they filled potholes in any easy format. We did have all of the pothole complaints that people reported to the Department of Public Works, but we were told, and observed, that the crews would often times fill about 10 other potholes on the way to filling the pothole that was reported. Crews would record some of the pothole locations on paper, but this would not be easy to analyze. Still the task remained.

Instead of buying an expensive piece of software to solve the problem, we worked with the crews to test a few different options:

  • Excel spreadsheet with columns for date the pothole was filled and address nearest to where it was filled. The good part was that the data would be digitized. The problem with this was there was still a lot of data entry, we had no guarantee the crews would actually fill out the spreadsheet, the addresses and dates could be inaccurate, and the time spent with data entry meant less time spent filling potholes.
  • Do Button with If This, Then That. This solution let crews take a picture which automatically logged GPS information, date, and an image into a Google spreadsheet. While the data would be digitized and less data entry was required, the reality was that crews were not guaranteed to have a smart phone in the field, remembering to take a picture after the pothole was fixed was not guaranteed, and there was an increased risk of the phone getting broken while being used in the field.

After doing tests with the options above, we realized that we needed a fully automated option as it would guarantee data entry and not distract crews from their core job: filling potholes.

We realized that we could leverage the GPS units on the trucks to record the data. Potholes are primarily filled using a Durapatcher truck with has an attached hose that shoots asphalt. By connecting a sensor to the hose, when the crews pulled the trigger to engage the trigger on the hose, a row of data could be logged that indicated GPS coordinates as well as the date and time the sensor was triggered. This was the fully automated option we sought. From there we could pull data in near real time, map the information, and report on it.

The outcome was that we could now count how much work was occurring each day and in which parts of the city. Also, instead of paying for an app or a different software solution, we were able to test, understand the possibilities and limitations for the crews, and leverage existing technology to count better.

Sidewalk Curb Corners

Another challenge we constantly face is tracking how many assets we have and are responsible for in the city. One example was sidewalk curb corners. There are specific requirements set by the Americans With Disabilities Act about how these curb corners should be constructed, and the City was responsible for presenting a plan for when all of the existing curb corners would be updated to be in line with current requirements of the law.

With thousands of curb corners in the city, and none of them mapped, and few of the attributes known, this would be a big project. Sending staff to collect data would be too time consuming.

The resolution was to work with student interns to log the information. First, we worked with the Engineering Department to understand what attribute types were critical to record: color change in the curb, raised bumps on the slope of the curbs so people who are visually impaired could sense when the sidewalk was ending, and more. Then, using a data collection product from ArcGIS — a mapping platform most municipal governments use, we programmed the tool with drop down menus that would be easy for the students to complete.

We split the city into square zones, and assigned each zone to a student. Then, the students used Google Maps and Street View to look at each curb corner in their area. They would click on the ArcGIS map, a popup would appear with all of the drop down fields they needed to complete, and once they filled in all the information, the data point would appear on the map.

In all, the students logged more than 7,000 curb corners in a semester. Pieces of data were certainly missing or inaccurate, but this was less expensive and less time consuming for our staff. We also got the information we needed to complete the project and create a plan while also building a partnership with Syracuse University.

City staff now have access to the map and can make updates and additions. Any of the data that was inaccurate can be fixed as city staff go to different curb corners to assess and monitor, but at least the baseline is already there.

Trends

Once data is counted correctly, we can start to look at trends. If we know how many potholes are filled each day, then we can start to ask if we filled more potholes today than yesterday, or more this month than the same month last year. If we aren’t counting correctly, looking at trends will not matter because the data isn’t accurate to begin with.

Road Ratings

An early project I did with the City was to look at road ratings over time. The City rates its roads over the course of two years on a scale of 1–10. This data existed going back about 30 years, but no one had ever looked in the aggregate at if the roads were largely getting worse according to the ratings.

To do this analysis, first it was critical to have the data at all. That staff had maintained a spreadsheet/database for decades was unexpected and impressive.

We needed to understand what each of the ratings meant, and how that translated to the decisions getting made in the Department of Public Works. We learned that a road rated 5 or below would be considered “poor” and a candidate for milling and paving — a major road reconstruction project. Roads rated 6–7 were fair, and 8 and above were in good shape.

Compiling the data and visualizing it showed a trend toward more poor and fair streets, and fewer good streets over the past 15 years. This meant generally the roads were in worse shape.

We were also able to calculate that based on the amount it costs to repave roads, the City was at a crucial point where it needed to do some proactive work to maintain fair and good rated roads so they avoided deteriorating into the “poor” category — which would mean more expensive measures were needed using a budget that was not available.

We also noticed that the trend lines for how roads deteriorated, based on the ratings, were not what we expected. Many of the roads’ ratings would decline relatively rapidly — over the course of about a decade, to a rating of 6. The rating would then get stuck at a 6 for a number of years, before finally dropping to a 5.

Based on research we did on how roads typically decline, our guess was that since staff were rating roads, and those staff knew that once a road became a 5, it was considered “poor”, there must have been some observational bias occurring where the road rater saw that the road had problems, but didn’t seem to be “poor” so the rating would stay at a 6 — just good enough to avoid the expensive maintenance treatment.

As with much municipal data, you work with what you have and stay open about the assumptions and challenges in the data — it’ll never be perfect.

Performance Dashboard

In Fall 2018, we launched a Performance Management Program that uses the Objectives and Key Results framework to both set priorities for the City government and track the goals and measures associated with those priorities. Seeing the trends lines for the key results has been important for understanding how and where progress is made.

Of course, the process starts with counting things correctly. One of the key results is to increase code violation compliance from 20–35%. While knowing how many code violations have been closed in time is relatively simple, the issue is that the Department of Code Enforcement has done a lot of work to be more proactive in helping homeowners fix the problem before a violation is issued. This means that those who end up with a violation are likely the ones who would not have complied in fixing the problem anyway, and thus the overall compliance percentage is lower.

Figuring out how we count correctly, show trends, and then also provide other supporting information to understand progress all become part of the process, even on what seems like a straightforward metric.

Prediction

First we count things correctly, then we use those counts to see if we have done more work today than yesterday and start to understand trends. The next layer of complexity is to start using those trends to predict what might happen tomorrow. If we can predict, then we can proactively do work to avoid problems rather than looking backwards to understand if the work we did was effective. It is particularly important when working to predict things that you understand the historic data that is informing your model. Hopefully you are counting everything and your trends are accurately reflecting the work that has been done.

But, sometimes work gets done in a biased manner. We fill potholes based on complaints, so if a particular neighborhood doesn’t complain much about potholes, even though the potholes exist in that neighborhood, if we then attempt to build a predictive model about where potholes will turn up, we won’t have data on pothole locations in that neighborhood, and thus the model may not direct us to fill potholes there. The result is that the model uses historic information and the bias that is built into that information to make predictions about what will happen in the future. In this case, the neighborhood will not have had potholes filled before, and will continue to not have them filled. The municipality may not know that there is a problem, but people will be given inequitable service. Identifying those biases and solving for them must happen before the project begins.

Water Main Risk

In Syracuse, we are an older northeastern city that has infrastructure, like water mains, that are mostly past their useful life — some mains are more than 100 years old. Due to a combination of factors, the water mains break regularly — hundreds of times per year. This leaves people on the block where the water main broke without water for hours. It also causes Water Department crews to be reactive in their work rather than proactive.

The Water Department workforce is also older — many expected to retire in the next 5 years. They have a large amount of institutional knowledge, but it is mostly passed down through stories rather than writing down important issues to know about.

Water mains are also not the only piece of infrastructure that fails regularly in Syracuse. I wrote about roads in poor condition and the sewers also collapse and back up too often. When the opportunity to replace a sewer or repave a road comes up, it is important to know how risky the water main is so that the main does not break soon after a road is repaved — causing crews to dig into a brand new road. This happens all too often due to lack of information and coordination.

As a potential solution to these issues, we embarked on a project to predict the risk of water mains breaking throughout the city, working with a team from the Data Science for Social Good program at the University of Chicago. Though staff might be able to identify the most risky mains, ensuring their knowledge was documented was important since many of them would retire. Additionally, while staff might have a sense on risk, it is difficult to think about risk over 400+ miles of water main. Also, different staff made different assumptions about why and where mains break. Doing an analysis that brought some level of objectivity to the process, and also was able to visualize the risk on a map could be useful.

To do the project, first we needed to count correctly. Just defining how many water main breaks occurred was not as straight-forward as we had hoped. First, fortunately, we had data about breaks. The GIS Analyst in the Water Department had digitized most of the water system, including all of the work that had been done in the last decade. This meant we knew locations of water mains, some ages and materials of the mains, and which had breaks since about 2008. The number of confirmed breaks, though, was different than the number of breaks that were called in. Many times, the dispatchers generated a work order for a water main break based on the report they received, but the work turned out to be some other issue.

As we got into building the dataset to be used for modeling, we realized we needed more information about the material of the water mains. Since this was not digitized, the team looked at the original engineering books from a century ago and did some data entry.

Yes, sometimes building predictive models involves data entry from 100 year old books.

Once the data was in a good enough place, we could look at trends like the time of year breaks happened most, which mains tended to break most, and which parts of the city breaks happened most commonly.

From there, we were able to also join other data about soil, road ratings, the last year a road was paved (some thought that the act of paving the road could hurt the water main — this turned out not to be true), and more. The team used a gradient boosting model to assign risk scores to each section of water main in the city. The model choice came at the end, following testing of a number of other models and after much data entry and cleaning.

The expectation was that the model would predict 32 of the top 50 most risky mains would break over the next three years. During that time, we used the model to help dictate where construction should and should not happen. We were also able to work with developers in the city that were doing major projects that involved digging up the road. The City took advantage of the road being opened to install new water mains, saving more than $1 million in the process.

Unfortunately there was not enough money to replace all of the most risk mains, but in the three years since the model was developed, 32 of those top 50 risky water mains did break, many of them more than once, proving the effectiveness of the model and meeting many of the objectives we set at the beginning of the project.

Partner

When I first became Chief Data Officer, I was a team of one. In order to do a lot of the work I wanted to do, I needed to find partners both locally and nationally to help do some of the work. The water main risk project was a good example of partnership. We were able to define a key problem, get buy in and staff help from the department, and ended up working with a team of very talented PhD students who were interested in getting experience working on data science projects with municipal governments. We have worked with classes and and hosted hackathons as well, all with the aim of raising awareness of the work, getting help on expanding the types of projects we can take on, and ultimately helping the organization.

Below I’ll detail some of the specific ways we partner, but I’ll also mention here that if you’re lucky, you’ll find a person or two locally who has held a similar type of position and can offer guidance and support. For me, that person has been Mark Headd, the former Chief Data Officer in Philadelphia. His thoughts, guidance, and support have been important to me as I’ve tried to grow into this role, and I’m lucky that he happens to live in Central New York.

Hackathons

I wrote about our approach to hackathons here: https://elgl.org/i-have-to-ask-you-civic-hackathons/. Hackathons offer a great way to partner with other organizations. While I don’t believe that one hackathon will create a new product or app for a city (you should probably pay for that), I do think hackathons offer the opportunity to show that the city is interested in working with the community and asking for help. The two major hackathons we’ve co-sponsored focused on road quality and snow plowing. The results of the events gave us a sense about what some people would want to see when it comes to reporting potholes or showing where snow plows are on a map. We’ve used that information to inform some of our decisions. They also allow us to show off our open data portal (where the data is hosted for each event) and lets us talk about the challenges we face in an open way.

While we have done work to plan hackathons on our own, it also made sense to partner with existing civic technology events. The key one in Central New York is called Hack Upstate which hosts a hackathon locally every six months. They get about 200 attendees for each event, so partnering with them allows us to reach an audience, and it also gives the participants a real world problem to work on.

Student consultants

We are lucky to have a top-tier university — Syracuse University — within our borders. I am a bit biased since my grandma and dad are both alumni, I graduated from there twice, worked there, and taught there.

During my time teaching as an adjunct, a lot of students asked if they could intern in my office. Having the extra help is exciting because we have plenty to work on, but managing a bunch of students who all have different schedules and levels of commitment to the tasks was going to be overwhelming.

Luckily, the School of Information Studies has a program called iConsult, led by the Associate Dean Art Thomas. This program serves as a student consulting firm that works with non-profits and small businesses in the area. I asked iConsult if they could manage some of the projects we wanted to tackle using some of the students who approached me about an internship.

So far, it has worked well. The teams from iConsult have helped us maintain code that helps predict water main break risk, built prototypes for databases used by departments, dashboards for departments, and more. This relationship takes management as any would, but having specific goals, dedicated students to help, and flexible timelines all make it successful.

We’ve done similar projects with Jonnell Robinson, the Community Geographer at Syracuse University. With her students, we have been able to collect and map data on sidewalk curb corner quality, map upcoming infrastructure projects, and more.

Class Projects

We work with classes who want to do projects regularly, as well. Again, with a long list of work to do and not enough staff to do all the work, but with a top-tier institution a couple miles away, partnering makes a ton of sense.

The challenges of class projects are that there does need to be a fair amount of staff time dedicated to the project, the class is generally learning the material as they go on and the project takes a semester so there are no quick turnarounds for a project, and often there is no one to maintain the analysis or code once the semester is complete.

That said, we have been successful in a couple of ways. We worked with grad school students as part of a class at the Maxwell School to develop a report of best practices related to our open data policy. We worked with another set of grad students from Harvard’s Kennedy School to develop a report on data privacy as it relates to smart cities. These were discrete projects with talented students that didn’t have a tight turnaround and had a logical hand off point.

We also have worked with Marcene Sonneborn’s class at the School of Information Studies to develop a website that shows information about properties and code violations. Knowing that this would take longer than a semester, part of the challenge to the students who have worked on the project is to open-source the code and then build on what was produced in the previous semester. That way, students can learn about maintaining and updating code, rather than creating everything from scratch. This is more how things work in the real world once the students get a job, it allows us to better understand and maintain the code, and the project continues from semester to semester.

Local government organizations

My position wouldn’t even exist if it weren’t for an initial grant from Bloomberg Philanthropies to fund the City’s Innovation Team. That is what brought me to work for the city, and ultimately get placed into the Chief Data Officer role.

Bloomberg Philanthropies and the Innovation Teams program, as well as the What Works Cities program have allowed us to learn from experts who work in other local governments, or who work for places like the Sunlight Foundation, the Center for Government Excellence, Results for America, and more.

Other organizations like ELGL offer more connections with people trying to build similar types of programs in their local municipalities, and either give space for people to commiserate when times are challenging, or more frequently generating new ideas through their podcasts, blogs, and more.

The Civic Analytics Network hosted at Harvard and run by Stephen Goldsmith offers Chief Data Officers the chance to work together, think about higher level policy goals we should push for together, and can get really technical about approaches to data governance, ETL pipelines, coding tools, and more.

These organizations offer so many resources that we have relied upon to launch our open data portal, get our performance office off the ground, and tell our story when appropriate. The people in all the organizations are passionate public servants, and many of them are now friends of mine, too.

Internal partnerships

The last type of partnership is potentially the most important — and that is partnerships with internal stakeholders. In my role, I could supervise a bunch of projects that I think are important but have no buy in elsewhere, and they would either fail from the beginning, or when complete, would just sit on a shelf.

My role reports into the Mayor’s office, not to IT. That means that to get access to databases or other data, I often make the request of the IT department. Building a good relationship with those that hold the keys to the back end technology has been critical. Explaining the types of work we are doing, ensuring we are keeping the data secure, and advocating on behalf of the IT department to ensure the software we are procuring allows us to access the data that is collected is the only way we can be successful.

We also partner with other departments. Whether it is building a risk model for water main breaks, a snowplow map, an analysis of census data, or dashboards for the performance office, understanding what the data means to those who collect it each day and know it best is important. Without knowing what specific columns in a dataset mean, or when a department typically closes out service requests, or what limitations we should put on a recommendation from an analysis because the department only has so many resources, we can’t deliver a useful product. To understand these important details, we will often conduct interviews, show data visualizations that reflect what we are finding in the data, and go on ride alongs to watch staff recording data and learn more about their processes.

We built a model that predicts fire hydrant water flow rates. Based on interviews and observations, we learned that though the information is important, it is only really useful if it is mapped within the existing software the fire department uses when it is responding to a fire. It is not practical for them to bring up a separate screen that only shows water flow rates for fire hydrants while en route to an incident. That meant working with additional entities to ensure the data is presented well, and can be updated.

Being Open and Transparent

One of the great things about working in local government is that we are all doing our best to make the lives of the residents in our communities better. That means that when there is a good idea elsewhere, not only should we try to replicate it locally, but the person who developed the idea is likely willing to share it because their interest is public service, too.

Being open about our work, both the good and bad parts of it, is important so that people understand what we are working on and how we are spending tax dollars. But it also means that if one of our programs or analyses has done something good, another municipality could use the idea, too.

We share our code when we develop an analysis or build a tool. We also borrow from a lot of other people’s work (and offer credit!) so we can develop solutions.

We also believe that sharing data publicly is important. It is the people’s data, and it is their right to access it, assuming there is nothing private or sensitive in nature within the dataset. Sharing data also means you don’t have to negotiate MOUs or data sharing agreements, because everything is already public.

As we continue to build our program, we think about how the more information we share, and the more public we can be about our processes and the processes of individual departments, the more people understand the real challenges that exist here, and they may offer to help.

By sharing data, people can count, look at trends, and maybe predict some things on their own. Sharing data may bring more partnerships, as well. In fact, our open data portal got its start after we shared some data for our first hackathon. There was such a desire for that data that we realized there would be a demand for a larger open data portal as well.

As smart cities and the internet of things become more and more a part of city government, there is a movement amongst some to monetize data that is collected from sensors. In my position as Chief Data Officer, I believe it is my role to argue for us to not sell data, and continue to make it open to all. That may mean we need to think more critically about how data is collected, since data from IoT devices is vast and can be robust, and that comes with a cost. We also may not want to collect specific types of data through sensors because that information may violate someone’s privacy — think water meters that report water usage every minute. If we know how much water someone is using each minute from a sensor, we know when the person takes a shower, uses the bathroom, or is likely asleep or on a vacation. That is much different than collecting an overall water usage monthly or quarterly just for billing purposes. There may be a good reason to collect water usage data every minute, but that reason should be considered carefully. That information could also be very valuable to a company who is willing to pay for it, but the collection at that level still may not be right according to privacy standards you set.

It is my belief that taxpayers are paying for the data to be collected, it is just like other data we are already making public, just collected in a slightly different way, and by offering it for free to anyone, it also means there is equal access to using the information. Maybe someone will start a business or learn something new about their community. That shouldn’t be limited to only those who can pay.

Being the City of Syracuse’s Chief Data Officer has been a privilege and I’m excited to see how the role continues to grow in the coming years. Though the scope will likely continue to evolve, and there will always be potential for scope creep (how much time to spend on smart cities, or software procurement, or fiber networks, etc?) I believe the job will always come down to counting things, looking at trends, predicting based on those trends, and then finding partnerships to help do some of the work, and always being transparent about how the work gets done.