DC Business Data Demonstration Project

A proposed collaborative project to align District Government data sets about businesses operating in DC

A portion of the DC.gov idea book, a list of proposals for innovative municipal programs.

The District government houses several disparate datasets related to business operating within the District. This data is collected as a consequence of regulatory oversight, but it is not aligned or managed for the purpose of economic analysis.

The Department of Consumer of Regulatory Affairs (DCRA), which houses many central business datasets, at one point proposed working with core internal and external to government partner to create a methodology and align core business data sets.

The aligned datasets could be used to conduct gap analysis between District owned-data and private data sets produced for this purpose. This effort aligned with the Deputy Mayor for Economic Development’s Economic Indicators Council. If successful, the alignment could be expanded to include additional data sets overtime.

The development of the methodology would also drive the development of several shared process and definitions for working with business data, and will drive the process for redesigning business regulation within DC.

Core Datasets

Business Licenses

The Business License records are housed within DCRA’s Acella system. Most business activity within the District of Columbia requires a business license, though exceptions for some specialized activities like child development centers and some professional licensing such a legal and medical providers apply. The Business License records are address-based and require a Certificate of Occupancy or other zoning approval to be in place before it is issued. The Business Licensing law includes numerous “categories” of business activities that must be individually licensed.

This data set brings the largest catchment of “businesses,” provides location details, provides some information about the main business activity type, and may be able to provide some information regarding industry activity and size within the District. This dataset will require significant cleaning and categorization of annual data quality that can only be developed as analysis occurs.

Corporate Registration Records

Corporate Registration Records are housed under control of DCRA with a third-party provider. Most business transactions within the District of Columbia are require corporate registration, though exceptions for sole proprietors and general partnerships apply. The Corporate Registration records are entity-based and require a DC-based registered agency. The Corporate law allows several different entity types and permits nested corporations and silent ownership.

This dataset provides a different shape of what a business is, may provide some ownership information, and can provide a baseline for understanding ownership connections. An assessment of the total percentage of businesses covered by the dataset would occur during analysis.

Certificates of Occupancy

The Certificate of Occupancy records are housed within DCRA’s Acella system, and are in the process of being opened to the public through the Office of the Chief Technology Officer’s (OCTO) city data warehouse. Businesses operating in the District of Columbia with any sort of work activity occurring are required to have an up to date Certificate of Occupancy or for non-public facing locations within a home, a Home Occupancy Permit. The Certificate of Occupancy records are address-based and require both ownership and use descriptions. Analysis would underscore the level of compliance in the Business Licensing data.

Unemployment Insurance Records

The unemployment insurance records are housed within the Department of Employment Services and offer a rich data set about numbers of employees and stated operation. Assessment of the level of overlap has yet to occur. This dataset is likely entity-based.

Ancillary Datasets A

Master Address Repository

The DC GIS team maintains a dataset of every address in DC. The MAR is baseline dataset that every data set here should reference to ensure consistency. This dataset is, definitionally, address-based.

Tax Records

The Office of Revenue Analysis has internal access to the tax records. Due to IRS disclosure rules these records cannot be released outside of ORA, but the Chief Economist’s office can run analysis internally and provide generalized information back to the group for both economic indicators and for limited enforcement purposes.

Dunne & Bradstreet

The Urban Institute uses Dunne and Bradstreet data for economic estimates. This is a private provided data set that purports to understand DC economic indicators. The Urban Institute can begin to run gap analysis between this data set and any combined data set to suggest an agenda for future data clean up, analysis, and collection.

Urban Institute Housing Data

The Urban Institute also maintains a combined housing data set that they compile based on DC data. Aligning the Certificate of Occupancy and Business Licensing data and analyzing it against this data set will allow a fuller picture of the rental market and rental housing regulatory compliance in DC.

Info USA Data

The State Data Center in the Office of Planning works with privately provided business data to understand DC economic indicators. Through this process, the State Data Center can run gap analysis between this data set and any combined data set to suggest agenda items for future data clean up, analysis, and collection.

Census Data

The State Data Center in the Office of Planning is the District’s connection to the Census and to understanding census data. The State Data Center can use any combined data set and compare it to understanding from all federally collected census data.

Ancillary Datasets B

The District maintains several other data sets regarding businesses and business activity. Although not part of the original demonstration project, if the partnership and methodology work out, then overtime this project could expand to encompass these other data sets. Therefore a basic understanding of these data sets will be valuable.

Professional Licensing
Occupational and Health Licensing
Alcohol and Beverage Regulation Administration
DC Taxi Records
Weights & Measures Data
Office of Contract Procurement Data
Certified Business Entity Data
Small Business Development Grants

Core Partners

Urban Institute’s Neighborhood Indicators Project could help create a data set that aligns with property data they already works with and may help set national standards. Urban can help conduct gap analysis against private data sets.

Office of Revenue Analysis could help create a data set that aligns with non-public tax data and to create industry size and growth data estimates for DC. Other benefits may include tools towards enforcement of either tax or regulatory rules.

Department of Employment Services could align ifnormation from the Unemployment Insurance data to other data sets to create a comprehensive list of business, explore enforcement against non-location-based businesses.

Deputy Mayor for Economic Development could consider ways to collect and measure data on new business starts, business departures, and add on analysis as to why businesses start or depart.

Department of Small and Local Business Development could consider the alignment of programs that work with small businesses, helping to provide risk assessment of methodological and definition implications of the data alignment, looking carefully at NAICS, and helping to engage the local business community.

DC Geospatial Information Systems Team could consider how the data products can be translated into readable formats for the public and/or tools for the government to align data from disparate sources.

Office of Planning’s State Data Center could think about how to align any data work underneath Census Bureau products and how to conduct gap analysis with private and/or other nonaligned data sets they access.

DC Chamber of Commerce could provide feedback and ideas on methodology, privacy, and ensuring that processes that overall the project supports streamlining the business regulatory process. As appropriate, the Chamber will engage its membership to ensure that business perspectives are included.

Washington, DC Economic Partnership could bring its expertise in both data scraping, the flaws in existing data sets, and an understanding of investors’ needs for more information about opportunities within DC.

Department of Consumer and Regulatory Affairs could create this project with an eye towards redesigning data intake either at DCRA or elsewhere, and to determine whether licensing and permitting are appropriate spaces for capturing economic indicator data.

Additional Stakeholders

Many additional stakeholders indicated an interest in being involved in conversations related to the analysis that comes from combining these data sets. DCRA could use existing forums like the Economic Indicators Council run by DMPED, the Business Advisory Group and Business Think Tank at DCRA, and the DCGIS Data Subcommittee to help structure inclusion of these many stakeholders that are internal and external to government.

Groups that have expressed interest:

The Business Improvement District Council
US Census Bureau
The Aspen Institute
Coalition for Nonprofit Housing & Economic Development

Central Goals & Secondary Goals

The central goal of the project is to use this data alignment to improve the quality of both District owned data and agencies’ ability to regulate effectively, fairly, and appropriately as facilitated by data-sharing.

Many secondary goals of the alignment exist, including:

To provide accurate datasets for research and public use.
To understand the economic health of the District.
To provide a platform to consider redesigning data intake either at licensing or in other areas where the District collects business data.
To assess whether DC can accurately assign NAICS codes to businesses.
To determine the quality of private datasets.
To create appropriate partnerships through data that will expand to be programmatic partnerships.

Programmatic & Enforcement Risk Assessment

An early portion of the demonstration project would be for each of the core partners to conduct a risk assessment related to this data collaboration project. While risks are manageable, it is recognized that misalignments and flaws in the datasets could have programmatic impacts that could affect policy priorities, resource allocation, and budgeting. This may necessitate planning to conduct internal and external outreach and plan programmatic reactions when issues are raised.

Data sharing & Data Warehousing

Before DCRA can share data with core partners, a data sharing Memorandum of Understanding must be negotiated and executed. This can be built on a standard Office of the Chief Technology Officer template, but it will include discussion of official numbers from DCRA versus raw data.

Methodology Creation & Approval

The core partners will need to determine whether we are planning to create a central methodology for combining the datasets at the beginning, or whether we will have each partner experiment and work towards a common understanding. At the end of the demonstration project, the core partners must have an understanding of methodology, whether we have one or many, and that methodology will require statements from each of the core partners relating to whether they officially accept and adopt any or all of the created methodology.

Legal Definition Alignment

Alignment of the data will prompt a discussion of disparate legal definitions of business, business activities, new businesses, closed businesses, small businesses, etc. Each methodology will need to adopt an understanding of what definitions apply, and publish those definitions with the methodology. As a break off project, DCRA will be working with many partners to explore appropriate definitions understanding that any definitions set in the data work may have both legal and programmatic impacts.

Privacy Project

These data efforts implicate consumer and business privacy and government liability and transparency questions. DCRA is beginning to engage partners from the Federal Trade Commission, Georgetown Law, and elsewhere to explore these questions in combination with this project. More work to develop questions and partners is needed.

Research & Analysis Process

Each of the core partners has specific research questions that they are interested in teasing out of their work with the core data sets. Conducting this analysis separately but in collaboration with the other partners will allow each partner to work on their own time tables, but will allow the group to move more quickly through some questions without reworking the same topics. Regular meetings or email check-ins will allow each of the partners to know what questions other partners are answers and will allow organic collaboration.

NAICS Code Project

Many stakeholders have an interest in determining whether we can apply NAICS codes to business entities or activities in the District. Similar projects have faced challenges in the past. One aspect of this project will be a subgroup dedicated to considering how, whether, and where NAICS can and should be applied to the combined data set.

Business Corridor Project

The Office of the Chief Technology new tech and innovatio team is also work to create a connected demonstration project that will be directly tied to this effort. Akaii Lineberger is working to develop a data intake and assessment program for business corridors that align with the great streets. The connection to this project will allow for coordination of definitions, methodology, and thinking about how intake can support better analysis later.