Every legal problem that exists: the legal help taxonomy for machine learning

This short report catches up where the earlier post on a Human-Centered Taxonomy of Legal Help started. I have been reading thousands of Reddit and legal forum posts every week, to see what people are asking for legal help on. At the same time, I have been reading thousands of entries into lawyers’ lists of the help that they can give — the issues that they structure their practices, guides, and websites around.

Why am I doing this? The big vision is to draft a better way to label all of those lawyers’ resources — especially those they share online — so that the people on Reddit and legal forums can find the ones that apply to them. It can also be that with more standardized, structured issue codes, we can direct automated tools to refer people to the right legal diagnosis, options, and knowledge. This will be particularly useful for search engines, but also social media bots, legal help portals, and other public interest technologies.

Last week, David Colarusso of the Suffolk Lit Lab and I presented on our work around machine learning and access to justice for the legal services technology group, LSNTAP. Much of the presentation was about the new updated version of the National Subject Matter Index that I am working on as part of the larger machine learning project. The NSMI is a taxonomy of issues that legal aid groups in the US encounter, both with their clients and with running their organizations. It is over a decade old and still used, though with issues.

Our goal with updating the NSMI is to make the issue codes better organized and more clear, so that they can be consistently used as labels for machine learning projects. This is in the first instance for our project on labeling Reddit posts, ABA Free Legal Answers data, and online court and self-help content. The Pew Charitable Trusts are supporting Stanford Legal Design Lab and Suffolk Lit Lab to develop the taxonomy, label this data, and train machine learning models on it.

As I have been going through the taxonomy, merging it with other legal aid groups’ taxonomies, and then also trying to label many people’s posts about their problems, I have been working to define guiding principles for what a new taxonomy should be.

Here, I summarize the points that wemade during the presentation, and also invite any interested lawyers to join me in this effort by signing up to be an expert reviewer of the draft of version 2 of the NSMI.

What Makes for a good Legal Help Taxonomy?

Here are the driving principles for what a good taxonomy would be. I’ve been developing these based on other best practices work in taxonomies, as well as the specific purpose of machine learning labelling.

  1. Clarity of a given term: When we present a law student or a lawyer with a taxonomy term, are they able to know exactly what’s implied by it?Can they, across the group, consistently label with it?
  2. Inclusion of all resources/requests into one of the term families: Does every legal help issue which is being covered in guides from legal services and courts have a ‘place’ in the taxonomy? Does every legal help issue which appears in people’s posts have a ‘place’?
  3. Keep term parents streamlined and distinct that don’t over-combine terms, but that don’t over-divide them: Make sure that there are an approachable number of parent categories, but that keep parents into a reasonable family — not too much of an unclear mashup

Right now, the current NSMI taxonomy has problems with all three of these, which is why we are refining it.

For clarity, the issues are that many NSMI ‘children’ terms are devoid of context — they are oversimplified and hard to apply unless you see all of their parents. For example, if we asked you if the issue of ‘Marital Status’ was present in a given Reddit post — it would be tough to know exactly what this label connotes. That is not clear unless you see its lineage: Work >> Discrimination at Work >> Marital Status. We need labels that carry all that lineage in their phrasing.

For inclusion, the NSMI right now is not broad enough to cover non-poverty related issues. That is because the NSMI was built for civil legal aid providers, who have distinct service areas that they cover and do not. Torts, accidents, injuries, and harassment often aren’t covered by these providers — so they do not have current homes in the existing NSMI. Many issues do not have a clear home that show up on Reddit — around, for example online bullying, concern over sexting, other new 21st century issues. We need homes for non-legal aid-friendly issues, like around neighbors, torts, accidents, disputes.

For parent balance, the NSMI currently has a sprawling amount of parents — and no co-parents of children terms. This means that the taxonomy is larger and more diffuse than it needs to be. For example, the categories of Bankruptcy, Taxation, and Consumer issues are all parents. So are TANF, Social Security, and Social Security Disability Benefits. The taxonomy could be more approachable if these near-by categories are subsumed into single parents. Allowing for children terms to have multiple parents also prevents many duplicate entries of the same term.

How to Make the Better Taxonomy?

We are in the middle of several rounds of merging, prioritizing, pruning, and reviewing a new version of NSMI that will be better suited to machine learning for access to jsutice.

This has involved a First Pass of revisions:

  1. Combining with other web-navigation taxonomies from legal aid providers and website owners
  2. Removing ‘non-help’ oriented issues, that were more focused on administration of organizations and not people’s issues
  3. Streamlining related topics together into single parents
  4. Prioritizing tiers of parents, to determine where to begin our work

Now we are in our Second Pass of more fine-grained revision:

  1. Cleaning up duplicate terms, to merge them into single, co-parented issues
  2. Bulking up issue terms, beyond single phrases to have greater clarity/context
  3. Streamlining categories within parents, to have coherent sub-categories

Our current parent (top-level)categories are as follows. This list of parents is meant to serve as the high-level homes for most every legal issue that a person might raise, or that an organizaiton might offer help for.

Accidents, Injuries, and Problems with Others: This category covers problems that one person has with another person (or animal), like when there is a car accident, a dog bite, bullying or possible harassment, or neighbors treating each other badly.

Crime & Prisons: This category covers issues in the criminal system including when people are charged with crimes, go to a criminal trial, go to prison, or are a victim of a crime.

Family: This category covers issues that arise within a family, like divorce, adoption, name change, guardianship, domestic violence, child custody, and other issues.

Health: This category covers problems that arise when getting medical treatment, paying medical bills, being in a hospital or nursing home, or other issues.

Housing: This category covers issues with paying your rent or mortgage, landlord-tenant issues, housing subsidies and public housing, eviction, and other problems with your apartment, mobile home, or house.

Work: This category covers issues related to working at a job, including discrimination and harassment, worker’s compensation, worker’s rights, unions, getting paid, pensions, being fired, and more.

Traffic: This category covers problems with traffic and parking tickets, fees, and other issues experienced with the traffic system.

Benefits: This category covers benefits that people can get from the government, like for food, disability, old age, medical help, unemployment, child care, or other social needs.

Estates & Wills: This category covers planning for end-of-life and special circumstances, including the wills, powers of attorney, advance directives, trusts, and other estate issues that people and families deal with.

Immigration: This category covers visas, asylum, green cards, citizenship, migrant work and benefits, and other issues faced by people who are not full citizens in a country.

Money, Debt, and Consumer Issues: This category covers issues people face regarding money, insurance, consumer goods and contracts, taxes, and small claims about quality of service.

Civil and Human Rights: This category covers people’s fundamental rights, that the government should protect and others should respect. It applies to situations of discrimination, abuse, due process, the first amendment, indigenous rights, and other key rights.

Court and Lawyers: This category covers the logistics of how a person can interact with a lawyer or the court system. It applies to discussions of procedure, rules, and other practical matters about dealing with these systems.

Disaster Relief: This category covers issues related to natural disasters, including people’s rights, getting benefits and assistance, clearing title to property, and dealing with insurance.

Education: This category covers issues around school, including accommodations for special needs, discrimination, student debt, discipline, and other issues in education.

Environmental Justice: This category covers issues around pollution, hazardous waste, poisons, and other issues with the environment.

Government services: This category covers services that people request from the government, including licenses for firearms, businesses, and hunting, as well as requests for information, and other privileges from the government.

Native American Issues: This category covers issues and laws specific to Native Americans and indigenous populations.

Small Business: This category covers issues faced by people who run small businesses or nonprofits, including around incorporation, licenses, taxes, regulations, and other concerns.

Veterans & Military: This category covers issues, laws, and services specific to people who have served in the military.

Now, within each of these parent categories, we are defining the Child Level-1 issue categories — that get to more specific areas of issues and help. Most of the very specific issues are at Child Level-3 or -4.

Where we are at now + Your Help

Our team at Stanford is making slow, steady progress in drafting a v2 with more streamlined categorization and bulking up of terms.

Our big need is for legal experts to review drafts of family law issues, work law issues, housing law issues (and beyond). As we get further into children categories and specific terms — we need experts’ input and edits of the taxonomy before we start labeling and training models based on the terms and categories. Are you interested? Please talk to us! mdhagan [at] stanford [dot] edu

Stay tuned!