Web Mining And Data Analysis on Nonprofits Across India — Part 1 [Karnataka]

Kamna Sinha
Jan 18, 2023

Data analysis on nonprofits data for the state of Karnataka

Idea :

We collected and analyzed data which is openly available on the internet about all Indian nonprofits with their details through web crawling, scraping and web mining.

After cleaning and preprocessing the data on the web on some 1.5 lakh NGOs pan India, we ran the data through our Data Intelligence Platform to understand the nature of growth on NGOs specifically for the state of Karnataka, ever since the first registered nonprofit came into existence in this state.

Some very interesting facts and observations came along.

The idea was to do trend analysis on the growth of nonprofits in general and also at a granular level when plotted by respective “key issues” and do some time series analysis to understand their growth through these years .

Use Cases Studied :

  1. Understanding the growth of nonprofits in the state since the very first nonprofit was registered.
  2. Growth of overall nonprofits in time series analysis.
  3. Grouped by key issues and observe growth in important sectors in nonprofits over time.
  4. Correlation of growth trends in more than one cause found out by data plotted on the graph.

I. Growth Analysis

The following shows the growth of newly registered NGOs in the state since earliest available data.

Year on year registration number of NGOs across Karnataka

The first ever nonprofit of a little over 9000 nonprofits that currently exist in Karnataka according to this data , was found to be the BANGALORE BAPTIST CHURCH TRUST in Bangalore .

As observed in the graph, the next nonprofit came only in the year 1901 , 4 registrations, and growth was almost negligible till the 1950s post which more and more NGOs came into existence in the state.

Another interesting fact that came out of plotting was that the number of new NGOs registered fluctuated greatly in the last 5 years.

Trend Analysis in the observed growth

We can also see a trend in the last 20 years of rise and fall in numbers , at an average frequency of around 5 years and the pattern seems to repeat.

More domain information will be required to understand this seasonality analysis and we can infact use this information to make predictions for growth in the years to come.

II. Growth trends overtime of overall NGOs

  1. Number of NGOs registered per year from the beginning until 2022.

Its being observed that in the number of NGOs getting registered also saw a steep rise in the last 20 years compared to the same slots of time periods before that.

1960–1980 -max number of NGOs registered were 48 in the year 1979,

1980–2000 the range varied in between 43, in 1980 to a peak of 173 in 1999,

2000–2022 min was 158 in 2000 to as many as 408 in 2018

That's almost a 10 times increase in max number of NGOs registered in ‘79

Number of NGOs registered every year in Karnataka

2. By observing the cumulative growth of nonprofits , we observe the steep rise in nonprofits in the last 20 years.

Cumulative growth of nonprofits registered in Karnataka

III. Growth of NGOs grouped by their key issues

Next, we processed the data to group nonprofits into their respective ‘key issues’ or causes, as mentioned in the details page of the nonprofits.

The video shows the growth of nonprofits in various causes through the years, data starts appearing from 1950s, capturing what’s visibly clear for analysis and clarity.


Many nonprofits mention more than one key issue as their undertaking . This was done to understand the growth rate in nonprofits around significant key issues and those which saw a slow rate of growth as well.

Growth of NGOs across various Key Issues in Karnataka

Few observations made from the above graph :

a. Which sector had the steepest growth of NGOs ?

Education and Literacy

Why : A lot of govt backed initiatives have been taken in the field and international funds have poured in as well in the recent past to encourage more hands joining the cause.

The number rose from some 3 in 1901 to 4163 in 2022

Growth of NGOs in ‘Education and Literacy’ as the key issue

b. Skill development nonprofits still very low compared to education and literacy , very slow growth of NGOs in the sector .


As important as this cause may seem, the NGOs that grew in the ‘skill development’ key issues have been much slower as compared to that of education .

Interesting fact which came out was that the total number of NGOs which operate for the cause of “skill development” is 632 as of 2022, the same number of NGOs in the education sector were already registered by the end of 1992 !

c. MicroFinance [ SHGs] — slow growth

Its interesting to note that while the microfinance industry has been estimated to show growth of over 16-fold in the last decade, nonprofits enabling this key issues seem to have a very slow growth rate in the same period.


IV. Correlation found in key issues

The graph also shows 3 of the 45 key issues growing along the same path.

[vocational training + rural dev and poverty alleviation + agriculture ]

Correlation of key issues

As the domain understanding goes, the 3 key issues namely rural development, vocational training and agriculture are such that many NGOs put them together as their cause in the efforts they undertake.

We also observed this in raw data which was collected that these 3 were coexisting in the ‘key issues’ for most nonprofits.

This is sort of market basket analysis in data science terms, hence showcasing a correlation of the growth trends in the 3 key issues .

Challenges Faced during the process of Web mining and Data Analysis :

  1. Main info of all nonprofits present on popups : This posed a challenge to be able to extract data off the website since most popular web mining techniques fail to do so at large scale such as this.

2. City wise granular information analysis was a challenge since manually entered names had various spellings for one city name. To overcome this challenge for further use cases, techniques like ‘clustering’ and ‘deduplication’ would be applied to do a more granular level of analysis.

sample of raw data

3. Some data collection challenges were unique to the nonprofit sector specially around their presence on the internet and finding authentic information from websites. Most govt and nonprofit websites run on outdated technology which are not very friendly to the web scraping techniques.

4. A lot of unstructured raw data was collected , processing of which is not easy through traditional methods. A lot of nonprofits information is in form of PDFs, we gathered that info and stitched it with other sources to bring out meaningful insights.

5. Since we were extremely focused on creating a time series analysis as part of our experiment, we put significant effort in formatting the date information correctly to be able to plot nonprofits on a graph to show their growth.

What can be done further ?

  1. Data Enrichment

a. The “key issues” for many NGOs were found missing or marked as “not available” . This data can be enriched with the help of further web crawling, scraping, and data processing.

b. The contact information , websites etc for many NGOs were either not available or out of date, this information can be gotten from other sites and then be updated in the master data of NGOs.

2. Further analysis of growth and observed trends for all NGOs across India , for all the states.

3. Prediction on growth of NGOs in the years to come based on past trends.

We will continue to do more data collection and analysis for trends observed in the nonprofit sector in India using our Data Intelligence Platform, applying our capabilities of the platform to the Social Sector domain.

Watch out this space for more !

