Web Mining And Data Analysis on Nonprofits Across India — Part 3 [Clustering Key Issues to Align With NITI Aayog’s Sectors for ADP]

Kamna Sinha
Sensewithai
Published in
5 min readFeb 1, 2023

Our Part1 and Part2 of this series talked about collecting data on 1.5 Lakh NGOs using our intelligent web crawling, cleaning and processing that data , and doing data analysis on it to understand various trends on the growth of NGOs throughout the country ever since the first registered NGO data was found.

We also found that certain fields [ mainly ‘key issues’/’sectors’ and ‘contacts’] were empty for a lot of NGOs, for which we plan to go ahead and do Data Enrichment using similar extraction from the web.

In this post, we did something interesting based on a well formulated usecase :

NITI Aayog’s Aspirational Districts Programme (ADP) which completed 4 years in January 2022 according to its annual report , is a platform for

“ collaboration with the community, including civil society organizations, NGOs, and development partners, to resolve District-specific issues. Whether it is the improvement of foundational literacy and numeracy or health and nutrition outcomes, the ADP has enabled focused efforts towards long-standing development goals.”

“The initiative involves association with local institutional players (like local NGOs, colleges, media, women self-help groups, panchayat samitis and faith leaders) to leverage deep community knowledge and facilitate behaviour change at the community level.”

112 Aspirational Districts on the key performance indicators under the five thematic categories of

‘Health and Nutrition’,

‘Education’,

‘Agriculture and Water Resources’,

‘Skill Development and Financial Inclusion’ and

‘Basic Infrastructure’

Pic credit : https://www.niti.gov.in/sites/default/files/2022-09/Institute-of-Competitiveness-Assessment-of-ADP-August-2020.pdf

A 2.5 years old article describes how better quality of data and use of data from NGOs can lead to better collaboration between Govt initiatives and grassroot level NGOs and give NGOs their due as development partners

“ If the NGO data available on Darpan portal is mapped SDG-wise & District-wise followed by a campaign sensitizing them towards their district-level SDG goals & indicators, then only we will be able to give NGOs their due as development partners. It will go a long way in aligning NGO efforts towards the national priorities.”

……

“There is an urgent need to sensitize the NGOs about the importance of documentation, record-keeping and filing of accounts. It would not be possible without a concerted effort towards their capacity building in this space. “

In our last post, through intricate data analysis process we found out few correlation between 2 or more of the 45 ‘key issues’ enlisted on Ngodarpan website for every NGO.

We also observed a large portion of NGOs who enlisted a certain ‘key issue’ were highly likely to also have enlisted a correlated ‘key issue’ in the list. for eg. Agriculture and Rural Development.

What we did was to try to find more such correlation or co-occurrence algorithmically using co-occurrence matrix and heatmaps.

Clustering the key issues can reduce the number from 45 to a smaller number which can then be mapped to

17 SDGs as stated by the U.N. or

5 sectors for ADP as mentioned above.

5 themes for ADP as shown on the Champions of Change Dashboard

Using Data Science :

A heat map(or heatmap) is a two-dimensional graphical representation of the data which uses color to represent data points on the graph. It is useful in understanding underlying relationships between data values that would be much harder to understand if presented numerically in a table/ matrix.

We analyzed how often an attribute[key issue] occured in relationship with the other attributes[other key issue/s]. To analyze this relationship, we computed the co-occurrence matrix.

co-occurrence matrix of key issues showing each of their occurrence along with others in a 45x45 matrix

We can see that the values in the co-occurrence matrix represent the occurrence of each attribute with the other attributes. Although the matrix contains all the information, it is visually hard to interpret and infer from the matrix. To counter this problem, we used heat maps, which can help relate the co-occurrences graphically.

Heatmap of the co-occurrence matrix indicating the frequency of occurrence of one key issue with all others

Since the frequency of the co-occurrence is represented by a color pallet, we can now easily interpret which attributes appear together the most. Thus, we can infer that these attributes are common to most of the NGOs.

A closer look at the heatmap — a threshold of 0.45 yields high likeliness of key issues occurring together for NGOs

We then set a threshold value of 0.45 based on domain observation and created a filter to get clustered key issues , and made us come one step closer to grouping them into a smaller number.

Clustering of ‘key issues’ using a threshold score of 0.4

Some clusters which came out were
1. [‘Health & Family Welfare’, ‘Children’, ‘Women’s Development & Empowerment’]
2. [‘Environment & Forests’, ‘Agriculture’, ‘Drinking Water’, ‘Disaster Management’]
3. [‘Vocational Training’ , ‘Women’s Development & Empowerment’ , ‘youth affairs’]
4. [‘Human rights’, ‘Labor and Employment’, ‘HIV/AIDS’ , ‘Legal Awareness and AID’,’ Right to Information & Advocacy’]
5. [Dalit Upliftment, Differently Abled, Human Rights, Disaster Management, Civic Issues’
6. [‘Micro Small & Medium Enterprises’, ‘Micro Finance’, ‘Labor and Employment’, ‘Land Resources’, ‘Housing’]
7. [‘Biotechnology’, ‘Animal Husbandry’, ‘Dairy & Fisheries’]
8. [‘Biotechnology’, ‘Scientific & Industrial Research’, ‘New & Renewable Energy’]

After mapping to NITIAayog’s 5 sectors for ADP, we got the following result

Mapping of clustered key issues on NGODARPAN to sectors for ADP program

The goal of this experiment was to increase the usability and feasibility of publicly available data for policy decisions under social initiatives.

The results can certainly be improved through iterations, human in the loop, more domain knowledge and further text mining using more resources such as online available documents on ADP, access to more relevant data sources, combining the mined results with clustering algorithm to give more accurate results.

What can be done further ?

Another way data analysis can help in closer collaboration between NGOs registered on Ngodarpan and NITIAayog is to be able to map [Address, city, state] data from NGO’s profile to their respective districts so that this direct mapping can be applied on District wise campaigns as mentioned above.

References :

https://www.niti.gov.in/sites/default/files/2022-09/Institute-of-Competitiveness-Assessment-of-ADP-August-2020.pdf

https://www.niti.gov.in/data-aspirational-districts-model

https://www.niti.gov.in/sites/default/files/2018-12/Transformation-of-AspirationalDistricts-Primer-ANew-India2022.pdf

https://sdgs.un.org/goals

https://limararuv5gr0sh-nitiprdadw.adb.ap-mumbai-1.oraclecloudapps.com/ords/f?p=100:3::::::

--

--