Replacing Backend Data Sources at Congressional Quarterly

Published in

FiscalNoteworthy

6 min readNov 30, 2021

Replacing a backend data vendor can be one of the most intimidating projects for an established product. On the Congressional Quarterly (CQ) team here at FiscalNote, our clients rely on us to combine automated data feeds with human journalism and analysis to give America’s policy-makers and decision-makers insight into everything happening on Capitol Hill. We recently completed an eight-month effort to replace the vendor providing us with structured data such as introduced bills and committee reports. Here are some of the things we learned:

Know What’s Out There

For structured government data, CQ has relied on a proprietary data feed from a third-party vendor for many years. Years ago, the federal government did a poor job of presenting its own data in a machine-readable format, so we paid this vendor more than $200,000 a year to turn unstructured data into structured data. In addition to the cost, they were at risk of being acquired by a competitor who may choke off our flow to this critical data in the future.

It’s important to perform regular evaluations of your vendors, and see if the situation has changed. We discovered that the Government Printing Office’s own GovInfo (http://govinfo.gov) website had improved greatly in recent years, providing a modern, JSON-based API that was easy to parse. When was the last time you evaluated your data sources? Perhaps there are better or cheaper options out there. Don’t get locked into decisions that were made years ago.

Confirm Data Coverage and Business Requirements

Before taking the plunge, we carefully vetted the new data set from GovInfo, to ensure it would meet our needs. We performed an exhaustive review to confirm that all of the data objects and fields would match up between the old and new data sources. Some things matched exactly; other things were less exact and required us to negotiate our business requirements with product management.

We had to consider more than just data coverage. Other departments’ business processes relied on this data for performing the regular duties. We worked carefully with them to understand these requirements and make sure the new data would meet them. One of the more careful considerations was timing; it wasn’t just important that the data was correct, but also that it came in at a particular time of day to meet their business needs. How well do you understand how your data is used throughout your organization?

Modernize and Destroy Knowledge Silos

Technical debt and knowledge silos creep into every long-term product. Replacing data vendors is an excellent time to fix this. We used this opportunity to rethink our pipelines, our choice of language, and our technology stack. We built a modern Java pipeline using REST/JSON APIs. We improved our documentation and standardized across the board, building things right for the future. This is a great opportunity to get other engineers involved, and spread that knowledge. Perhaps knowledge of your back-end system is siloed in one or two engineers. Maybe you have junior engineers or front-end engineers who’ve always wanted to learn more about your backend.

Track your Burn-up

Hitting deadlines is crucial in any effort, but especially when a contract renewal is on the line. We didn’t want to slip past a deadline and have to renew an expensive contract. FiscalNote uses the Agile Scrum framework, and we carefully estimated the overall number of story points for this effort. We tracked the pace at which that story point total grew, and the pace at which we consumed it. This “burn-up” chart allowed us to estimate when the project would be complete.

One advantage was that we were replacing five similar-but-distinct data flows. By tracking story point totals for the first few data flows, we were able to understand our “actual” story point costs and recognize some of the hidden stories that weren’t immediately obvious.

It’s impossible to estimate all stories at once, so we took the approach of brainstorming all of the possible stories needed, and multiplying the ungroomed stories by the average story point value for stories so far. This informed us what the likely story point total would be for the entire effort. Once that story point total stabilized, it was easy to figure out the pace we would need to maintain. In our case, we determined well ahead of time that we were going to miss our original deadline, and were able to negotiate a partial contract renewal at a fraction of the cost. We were able to justify additional resources from our senior management, who were happy to not be blindsided by the news. What are your strategies, to ensure you hit expensive deadlines that are nearly a year out?

Quality: Both Qualitative and Quantitative

From the clients’ perspective, replacing a data source is a high-risk, low-reward initiative. Ideally, our clients wouldn’t notice any change at all, but they’d certainly complain if something was missing! We spent nearly as much time writing test harnesses as we did on pipeline development. This test harnesses compared data between different sources: between our QA and Production environments, and between QA and the source-of-truth. They compared database records and made API calls to make sure no records were missing, and compared each of the fields to make sure no data was missing in each record.

One of our greatest challenges was matching records between different vendors. It was easy to find a “natural key” for some records, but for others, it wasn’t so obvious and required fuzzy matching. We ran our new pipelines in the QA environment for as long as we could, running these comparisons daily for multiple weeks to make sure we saw as much variety and volume of data as possible.

As a bonus, these test harnesses can now serve as the foundation of a monitoring system, comparing our production data against source-of-truth data for years to come.

Our quality assurance was more than just quantitative. We also had business experts perform qualitative reviews of the data, to make sure everything would be acceptable to users. There were some inevitable changes in the client experience, and we needed to make sure they were acceptable from a business perspective and communicated to both clients and stakeholders. What qualitative and quantitative processes do you have in place to monitor the quality of your data?

Release Carefully

Releasing this new backend data is a scary proposition! Each of our releases was an all-hands-on-deck situation, and we established communication plans to make sure any problems were reported promptly. We scheduled the deployments so that the lower-risk data feeds were rolled out months before the more complex ones, so we could learn and build confidence. We had rollback plans to switch back to our old data feed if problems came up.

Conclusion

We successfully migrated five of our key data sources from one vendor to another, saving more than $200,000 a year while delighting our customers, modernizing our technology, and strengthening our engineering team.

FiscalNote’s mission is to empower people and organizations with the right policy information and insights so that they can better navigate market risk and uncertainty and maximize new opportunities. If you’re interested in learning more about our mission and the opportunities here, please reach out!