Creating a dataset of cases to analyze the implementation of child protection laws in India

Apoorv Anand
CivicDataLab
Published in
8 min readJul 12, 2023

A joint collaboration between Enfold Trust and CivicDataLab funded by the Patrick J McGovern Foundation under the Data and Society Accelerator Program.

The Problem

In India, we’re seeing a rise in the number of reported incidents related to violence, exploitation and abuse of children¹. On one hand, this rise in cases can be attributed to a much more well-informed citizenry which is aware of its rights and to other important initiatives undertaken to increase awareness and strengthen response mechanisms by the government, judiciary, and civil society organizations. But on the other hand, the number of convictions in these cases is quite low², and the pendency rates in court are extremely high³.

The four main laws that deal with crimes against children are: The Juvenile Justice (Care and Protection of Children) Act, 2015; The Prohibition of Child Marriage Act, 2006 (PCMA); The Protection of Children from Sexual Offenses Act, 2012 (POCSO Act), and The Child and Adolescent Labour (Prohibition and Regulation) Act, 1986 (CALPRA). While these laws are aimed at providing adequate protection to children against various crimes, the challenge has been in ensuring their effective implementation and adherence to child-friendly procedures while implementing them.

In recent years, many civil society organizations, academic institutions, and governmental stakeholders have allocated considerable resources to understand the implementation of the POCSO Act⁴. However, there has been a lack of effort and focus on understanding the implementation of other three laws leading to a significant gap in understanding the manner in which the PCMA, CALPRA, and JJ Act are being used to address crimes against children. Little is known about the specific provisions of law being used, the outcome under these provisions, the sentences and compensation orders being passed, and the time being taken for the overall processing of these cases. Adding to this concern, national crime figures do not provide disaggregated data on all offenses against children, and there are discrepancies and wide variations in the numbers reported by several official data sources. Data on implementation of these laws is critical to identify areas of intervention, gaps in implementation, and make strategic and evidence-backed decisions about the direction of policy reform. Here is a more detailed write up about the current state of the data ecosystem to analyze these laws and its limitations.

Proposed Solution

Since 2001, Enfold has been working in the area of prevention of and response to child sexual abuse in India and on child protection issues. At CivicDataLab we have been working to enhance access to public information related to child-protection laws. With the assistance provided by the Patrick J McGovern Foundation (PJMF) through their Human Rights Accelerator grant, Enfold and CivicDataLab are collaborating to build datasets which will be used to study the implementation of the other three laws under the child protection umbrella — The Juvenile Justice (Care and Protection) Act, 2015; The Prohibition of Child Marriage Act, 2006; and The Child and Adolescent Labour (Prohibition and Regulation) Act, 1986. Our objective is to generate disaggregated indicators on several offences against children and how these laws have been interpreted by the district courts in the last 7 years (2016–2022).

This image shows the different project components in terms of downloading data from eCourts, to processing data and then publishing datasets.
Project Overview — Using data from eCourts to analyse child protection laws

The Roadmap

We began thinking about the scope of this project, the team and other resources needed for us to create these datasets while writing the grant application. One of the first grant requirements was to create a detailed project roadmap. The conversations with the PJMF team helped us in prioritizing the tasks which were identified earlier, detail out the technical architecture and finalize the set of tools, libraries which we will be using under this project. Prioritization of the tasks was done by ranking each project output on the basis of their ease of implementation and their importance in terms of achieving the objective. Another important component of this exercise was identifying the probable risks and risk mitigation strategies at the start of the project. Here is the link to our detailed project roadmap.

Progress

We started the project by identifying the total number of cases available for each of the selected statutes in the eCourts districts portal. This step was important for a couple of reasons:

  1. To gauge the scale of data we will be dealing with and
  2. To prioritize the states for our analysis as per the prevalence of cases reported on eCourts

At this stage, you may ask why we are not analyzing cases for the entire country?

Our reasons to restrict the analysis to a few states are:

  1. In our prior projects, we observed a lot of inconsistencies in terms of the quality of case data available on the eCourts platform. Because of data-related issues, we had to allocate most of our time to clean up the data and make it ready for analysis. As part of this project, we would like to allocate considerable time for an exploratory data analysis of cases and publish the findings.
  2. Other than the basic case information (metadata) available for every case on eCourts, we’re also planning to collect important indicators for all three laws by analyzing the judgments available for disposed cases. As of now, we’re only focusing on the judgments written in English since a lot of courts upload case documents in regional languages as well.

Collecting metadata for each case

To date, we have been able to fetch close to 1,05,000 cases registered under all three laws for the entire country between 2016 and 2022. We’re now working to clean up the case variables which are required for analysis. The current dataset includes all types of cases (bail, annulment of child marriage petitions, etc).

Collecting data from judgments

For the cases which got disposed of, we have also fetched the judgments if they were available on the eCourts platform. We did this exercise only for our selected states. As per our interim analysis, we observed that the overall number of judgments uploaded in these states is quite low. We are now working to extract insights from these judgments by following the process as described below:

  1. We prepare a list of variables (words/phrases/tables, etc) to be extracted from judgments. In the same list, we identify the prominent locations where these data points are usually available. For e.g. the designation of the judge hearing a case, the total number of respondents, etc.
  2. The team then annotates these variables within a small set of judgments. All annotations are done using a slightly customized version of Doccano.
  3. Using these annotations, we develop an algorithm to identify these variables for a larger sample of judgments from multiple states. You can think of this as building a regex (regular expression for each variable)
  4. For a few variables, we are using the OpenNyAI library to extract information. To familiarize ourselves with the OpenNyAI library, we conducted an experiment on POCSO judgments. We have written about this in detail here.
  5. Before we can run the models on judgments, we convert all PDF judgments and those available in non-machine readable formats to TXT formats. A python script for this task is available here for reference.

The PJMF accelerator grant also enables us to experiment with multiple ideas. Going forward, we are planning to conduct experiments using the recently launched GPT-4 models and use more context-based search libraries like semantra which might give us better results than RegEx. Our cohort members also suggested a few other libraries which we can use for our ususe caseike LOME, Open-SESAME and SLING. We will keep documenting insights from these experiments and share them in our upcoming blogs about the project.

Our Advisors

We have onboarded Prof Mahesh Menon (Assistant Prof of Law, Sai University) and Ms. Bharti Ali (Co-Founder and Executive Director, HAQ — Centre for Child Rights) as our project advisors. In our first discussion, they shared a few recommendations on how we can further scope out the project objectives and think about presenting this dataset or insights from this dataset to a wide range of actors working in the child-protection ecosystem.

This image contains a list of primary users for research, datasets and analysis.
Primary users of published datasets and analysis

Our advisors also pressed upon the need and importance of this dataset since there are no other primary or secondary datasets that can be used by policymakers to understand the issues with the current legislations. They also pointed out that there are often differences between the terminologies and data curation methodologies adopted by various agencies which can lead to a difference in the overall number of cases reported by these agencies. There can be other possible reasons such as lack of e-court penetration across districts, the inclusion of nullity petitions and bail petitions on eCourts, the “principal offence” rule followed by NCRB, etc.

A common suggestion was that instead of emphasizing data validation more, the research can point out the variance among different sources, which itself is important output. We’re also aware that the dataset itself might not be of much importance to several stakeholders as compared to the insights we derive from these datasets. This also pushes us to think about information dissemination strategies for a few stakeholders and not just restrict ourselves to publishing the dataset under an open- license.

Success Outcomes

We had to define a few success outcomes while we were working on the grant application. The ideal success outcomes of any child-protection related project should be centered around the prevention of crime in the first place and ensuring a child-centered response system, making it easier for children to access justice. We’re well aware of the limitations of our project and the importance of other stakeholders to achieve these desired outcomes. But we do hope that our work on creating this dataset shines a light on the extent of utilization of these laws to protect children from abuse and exploitation.

[1]: According to Crime in India , the number of reported crimes against children under the Indian Penal Code (IPC) and Special Local Laws (SLL) at the national level increased by 39.7% in a span of five years from 106958 in 2016 to 149404 in 2021.

Source: National Crime Records Bureau, Crime In India 2016 Vol.1, Table 4A.1,p.186, https://ncrb.gov.in/sites/default/files/Crime%20in%20India%20-%202016%20Complete%20PDF%20291117.pdf; National Crime Records Bureau, Crime In India 2021 Vol.1, Table 4A.1,p.317,https://ncrb.gov.in/sites/default/files/CII-2021/CII_2021Volume%201.pdf

[2]: According to Crime in India 2021, the national conviction rate in 2021 for crimes against children was 33.4% for offences under the Indian Penal Code,1860 and 33.5% for offences under Special and Local Laws.

Source: National Crime Records Bureau, Crime In India 2021 Vol.1, Table 4A.5,p.357,https://ncrb.gov.in/sites/default/files/CII-2021/CII_2021Volume%201.pdf

[3]: According to Crime in India 2021, the national pendency rate in 2021 for crimes against children was 95.4 % for offences under the Indian Penal Code,1860 and 94.2% for offences under Special and Local Laws.

Source: National Crime Records Bureau, Crime In India 2021 Vol.1, Table 4A.5, p.357,https://ncrb.gov.in/sites/default/files/CII-2021/CII_2021Volume%201.pdf

[4]: “ImplementatIon of the POCSO Act , 2012, by Special Courts: Challenges and Issues”, Centre for Child and the Law (CCL), National Law School of India University, February 2018, https://ccl.nls.ac.in/wp-content/uploads/2021/10/8.-Implementation-of-the-POCSO-Act-2012-by-speical-courts-challenges-and-issues.pdf

“#Data4Justice — Unpacking Judicial Data to Track Implementation of the POCSO Act in Assam, Delhi & Haryana”, HAQ: Centre for Child Rights &

CivicDataLab, November 2021, https://www.haqcrc.org/wp-content/uploads/2021/11/unpacking-judicial-data-to-track-implementation-of-the-pocso-act-in-assam-delhi-and-haryana-full-report.pdf

“Romantic” Cases under the POCSO Act: An Analysis of Judgments of Special Courts in Assam, Maharashtra & West Bengal”, Enfold Proactive Health Trust, https://enfoldindia.org/wp-content/uploads/2022/12/Romantic-cases-under-the-POCSO-Act.pdf

“A Decade of POCSO: Developments, Challenges and Insights from Judicial Data”, Vidhi Centre for

Legal Policy, November 2022,https://vidhilegalpolicy.in/research/a-decade-of-pocso-developments-challenges-and-insights-from-judicial-data/

--

--

Apoorv Anand
CivicDataLab

Works on finding ways for people and communities to engage with public datasets. Also writes at https://behindbars.netlify.app/