A community project demonstrating how open source technologies and hosted search can be leveraged to minimize the cost of hosting and sharing philanthropic data
Wait, what?!? You’re telling me you have to go to your local library to access on ONLINE database?!?
As a former COO of tiny technology startups, I’m empathetic to nonprofit Executive Directors and the struggle to balance limited resources, particularly staff time. Thus, I was floored when a good friend of mine, the Executive Director of a relatively small nonprofit, told me she “simply walked to the local library” to pull basic data on prospective donors. What is sometimes just a 30 second search likely took nearly an hour out of her day.
Of course, any practitioner in the nonprofit space knows she was referring to Foundation Directory Online (FDO), the go-to search tool for philanthropic data built and managed by Foundation Center. The reason for the library trek was simple: subscriptions to FDO were too expensive for her to justify.
Sure, the majority of her trips involved more detailed searches (FDO’s taxonomy is extensive and highly valuable to professional fundraisers), but I couldn’t stop thinking about how crazy it seemed to have to walk to a library to access an online database for quick, basic searches.
At the time of our conversation, I had been exploring the IRS Form 990 dataset released last summer and had a subset of the data downloaded and sitting in a database on my laptop. This dataset, for those outside the philanthropy world, contains machine-readable tax filings from nonprofits, from small charities to the largest grantmaking foundations.
My explorations focused exclusively on Form 990PF, the form filed by private foundations. Thus, in theory, I already had a good chunk of the information my Executive Director friend sought out during some of her trips to the library. Information like the average grant size made by the JFR Foundation or who she knew on the board of the Fairbridge Foundation (both are family foundations without websites).
So I asked myself — Could I save my friend a trip to the library with the data I already had on my laptop?
A quick query told me my local database contained information on just over 3 million grants from 68,696 individual foundations. Turns out this is roughly half the data advertised in Foundation Center’s Premium FDO subscription.
With that amount of data already indexed and inserted into a local database, how quickly could I get a searchable version online?
The answer was two hours.
Hosted Search = 🤘
Most real software engineers look forward to the challenge of building a search tool. I’m not a real software engineer and I certainly wasn’t looking forward to wrestling with ranking algorithms nor spinning up enough servers to make the whole shebang hum.
Enter Algolia. Algolia, if you’re not familiar, is a startup providing a unique service: hosted search. That means all they do is search. Really freaking fast search. Their team is comprised of highly skilled engineers constantly optimizing algorithms to improve search results and squeezing every millisecond out of search time.
I was already familiar with Algolia and its service thanks to an earlier project I built. That project, a quick weekend hack, was built to help my fellow #nptech friends search through the Form 990PF indexes to more easily find specific data files:
At this point I had the data, 80% of the front-end code I needed, and a rough idea of what data might be useful to my Executive Director friend.
All I needed was a cortado and a few hours to pull it all together.
Total cost to build: $60*
*Does not include cost of cortados 😉
Using Jekyll to publish 68,696 foundation profiles
After my cortado-fueled coding binge, Grantmakers.io was a simple search tool providing basic summary data on foundations: legal name, location, and basic grantmaking stats (number of grants, low/high/median amount).
I still had a ton of valuable data sitting in my database (grantee names, grant purposes, trustee lists, etc), but it was simply too much to squeeze into the search results.
Enter Jekyll + Github Pages.
Jekyll is a static site generator perfect for publishing webpages that don’t change. Foundation tax filings were a perfect fit.
Jekyll also has a great collections feature that allows a developer to publish individual pages from large amounts of structured data. I wrote a quick script to “Jekyllize” my data, pieced together a basic template from other open source projects, and after a 15 minute build time, I had individual webpages for all 68,696 foundations in the dataset. Thanks to Github Pages, the profiles were live a few seconds later.
Ongoing maintenance cost: none
Getting the word out
It’s about this time I realized the true value in what I just created: structured data that is 100% open, human-readable, and available to anyone. No login or library trip required.
You know who is particularly fond of structured data? Google. You know who does a great job of helping people find information? Google.
I submitted a sitemap containing all 68,696 profile urls to Google and sat back. The only thing left to do was let the Googlebot do its thing.
Sure enough, within a few days Google indexed 16,075 filings. In the following 30 days, 1,230 visitors found data on Grantmakers.io via general Google searches.
Certainly not Foundation Center-level traffic, but then again, I’ll spend nothing (literally, zero dollars) hosting and maintaining Grantmakers.io whereas Foundation Center spent $9 million last year on its data collection division.
Built to inspire, not compete
If it sounds like I’m picking on Foundation Center, that’s not my intention. If it sounds like I’m trying to build a site to compete with them, I’m not. Foundation Center provides a valuable service to countless professional fundraisers (and the nonprofit sector at large) and have been doing so since 1959, well before the Internet existed. Want to know the story behind the free library access my Executive Director friend uses from time to time? Do yourself a favor and read up on their history if you’re unfamiliar.
My intention with Grantmakers.io is to simply demonstrate a new way of thinking about philanthropic data accessibility. The Grantmakers.io dataset along with every line of code is open source and freely available on Github for a reason. I hope the code starts a dialogue between philanthropy insiders and geeks like me, and I look forward to seeing where the community takes it.
Built for philanthropy insiders and hackers alike
If you’re a hacker looking for something interesting to work on, I hope this article piques your interest. If you’re looking for a place to dive in, check out the current list of open issues, or may I suggest the following search on Github.
If you’re involved in the charitable support sector, I hope my little project helps push your #OpenData thinking forward. If you work at a foundation, I hope you compare your profile to those of your peers and ask yourself if your organization is putting its best foot forward. If you sit on the board of a foundation and don’t see your organization in the results, I hope you ask why your organization doesn’t yet file electronically.
While my early site traffic pales in comparison to the traffic generated by the Foundation Centers of the world, if just one of those 1,23o visitors from Google was a nonprofit Executive Director, it was well worth it.
- The site’s 69k foundation records exceeds the 10k record limit of Algolia’s free community tier. After explaining the non-commercial and open-source nature of the project, Algolia provided an upgrade to their standard tier gratis.