Researching organisation identifier lists: methodology and early findings

Published in

org-id.guide

4 min readJul 2, 2017

Originally posted November 28th 2016 at identify-org.net (the initial working title of the org-id.guide project)

Over the last few weeks we’ve been testing out our methodology for researching organisation identifier lists.

Identify-org.net will prospectively include any list of organisations that assigns them a consistent identifying number of string, or provides enough information to disambiguate one organisation from other. However, we need to be able to identify which lists should be preferred over others as offering the best chance of delivering interoperable identifiers.

We also need to make sure that users searching for a potential id for an organisation have enough information to choose the best source, and then to locate and make use of the right identifier.

In our draft researchers handbook we set out a number of key definitions, and the steps to go through in researching any identifier list.

For each identifier list we aim to:

Assign a meaningful prefix;
Identify whether the list is a primary register, or a secondary list of organisations;
Describe how identifiers from thie list are assigned;
Describe the jurisdictions, legal types and sectors that the identifiers in the list cover;
Find example identifiers;
Document any bulk access available to the list, and the license of any data.

We also aim to document any mappings that might exist between lists: for example, when a charity register also records the company numbers of companies with charitable status. And we captured key ‘Need to Know’ facts that a user might want to be aware of when researching identifiers in a particular country or sector.

Early findings

So far, we’ve reviewed around 30 organisation identifier list entries imported from the IATI Organisation Registration Agency codelist, and worked through the research methodology to update the meta-data about them.

Some key observations so far:

How identifiers are written down matters

For example, the Australian Business Number is a nine-digit identifier, but is generally written down as an 11-digit number, with the first two digits acting as check-sum for the identifier itself. When presented on screen, systems often show the 9-digit version as three triplets (e.g. ‘123 456 789’), but download a dataset of the numbers and you will find them as a single string (123456789).

When constructing a unique organisation identifier using the prefix for the Australian Business Number list (AU-ABN), how should this be written?

We need to develop general principles (e.g. remove spaces) and develop specific guidance for each identifier list there there is a risk of ambiguity.

NGO Registration is a compex businesses

A number of the entries on the IATI Registration Agency Codelist are for Government Ministries responsible for overseeing NGOs operating in their country. In some cases, we managed to locate a register that the agency holds, although in other cases, we couldn’t find mention of a register at all*.

These registers often cover ‘NGOs operating in the country’ and so they might act as the primary identifier for local NGOs, but only as a secondary identifier for international NGOs operating in the country.

We need to review our ‘primary’ and ‘secondary’ identifier distinction, to identify whether some further graduation is needed (e.g. lists that are ‘primary for some organisations’).

*In these cases we’ve marked the identifier list as ‘deprecated’ ready to potentially be removed.

Single government identifiers may provide a pragmatic option

In many countries, business registration takes place at a local level, through Chambers of Commerce or other entities. The same entity might have to be registered in a number of different states. Other entities, like Charities, may not have to register at all.

However, often there exists a register of all the organisations interacting with government in some form. For example, the Australian Business Number mentioned above is described like this:

“The Australian Business Number (ABN) enables businesses in Australia to deal with a range of government departments and agencies using a single identification number.”

When organisations transact with government, they generally end up with an ABN — and there is a national dataset of these identifiers.

In our model, this is a secondary identifier (there is no solid guarantee that it uniquely and persistently picks out a single legal entity), but pragmatically, it may be much easier to find and use than a local company registration. And having a single dataset that covers companies, charities and other entities is much easier to work with than lots of disparate datasets in a country.

We need to consider how this will affect the way we prioritise identifier lists — and to make sure we clearly document the nature of each identifier list.

Governments are moving towards unified identifier databases

We’ve found a number of cases where governments have either recently built, or are working on, national datasets that aggregate together state-level identifiers, such as from individual Chambers of Commerce.

In the best cases, these registers might also include identifiers for government agencies as well.

There may be opportunities to advocate for these centralised directories to follow best practices for open data, and to promote common standards for register publication

Where next?

We’ll continue on our first ‘research sprint’ for the next three weeks, aiming to confirm around 100 organisation identifier lists. As part of this we’re also seeking to work out how long researching each list takes on average, in order to think about a sustainable approach to keeping Identify-Org.net updates.

We’ll then be taking a short pause over the Christmas break, and returning to research in the new year with a refined methodology based on all we have learnt so far.