The Education Data Portal is an application program interface (API) that makes education data from several sources available and easily accessible to researchers. Earlier this month, we wrote about why we built this tool for researchers, scholars, and policymakers and why we chose an API to do it. In this technical dive, we explore the API’s building blocks, explain why we chose them, and provide a high-level overview of how each component works.
Our first decision in building the API involved how to store the data and the framework by which users could access them. In both cases, we went with well-supported, popular, open-source tools. The implementation of open-source tools and technologies reflects our goals to make these data more open. To serve the data, we used a stack that includes Python and, more specifically, the Django framework. To store the data, we chose the workhorse MySQL hosted on Amazon Web Services (AWS).
Our goal was to focus on building a feature-rich API and not on writing and debugging potentially complex implementation code. Time is valuable, and we felt that our developers should spend time adding value through features and functionality and not on reinventing the wheel. To accomplish this, we chose to base our application on the Django framework.
There were several reasons we chose Django as our development framework:
1. Django was developed to quickly implement web applications using the Python programming language. The framework has built-in templates, libraries, and an API that were designed to work well together.
2. Django uses Python, an easy-to-use and popular programming language among our research programmers and data scientists. Using Python helps us collaborate more closely and build great products across teams.
3. Django scales to projects of any size. Because of this scalability, it’s perfect for the projects we see at Urban.
4. Django is well documented and widely used. Developers can get the information they need when they need it.
5. Django can be extended using application plug-ins and Python packages. Between general Python packages and Django-specific implementations, this framework provides a great foundation and plenty of flexibility.
One of the key add-ons we leveraged was the Django REST framework (DRF), a well-documented and well-tested API framework that forms the data portal’s foundation. DRF is customizable, has extensive documentation and a community of supporters, and provides a straightforward implementation that is configurable and consistent.
After choosing the framework, we decided on a database platform to store the data. The database behind the Education Data Portal is large (it holds more than 150 data tables) and is expected to continue growing. It contains relational data on various levels that the API needs to access. We chose to host the database on Amazon Relational Database Service (RDS) using MySQL as the database software.
Using AWS and MySQL has four primary benefits:
1. We do not have to host the infrastructure on-site, and we pay only for what we need, which lowers costs and ensures reliable service uptime.
2. RDS is highly available and scalable as the amount of data grows.
3. The AWS instance is secure within our virtual private cloud, comes with monitoring tools, and has powerful backup and restore capabilities.
4. Django’s framework has the libraries necessary to integrate easily with MySQL.
The Four Primary Parts of the API
Most Django-based API products have four primary pieces, and the Education Data Portal is no different. They are as follows:
1. Database models that leverage Django’s object-relational mapping (ORM) to do the heavy lift of interacting with the database.
2. Viewsets that define the data and apply filters.
4. Routers that define our endpoints (datasets, in this case).
Once this four-part structure is built, defining and building new endpoints becomes quick and simple. Because the code is broken up into these discrete components, it is easier to maintain and debug.
Database models define the database’s structure for Django and provide an ORM to interact with the database. ORM enables you to map your code to the database and provides tools to interact with those objects. To define your database, create a models.py file and create a class for each table you will use in your Django app. Each model class needs a name, some variables, and a metadata class. The field names and types should match your database, and a primary key must be defined. For example, the code below defines the model for Integrated Postsecondary Education Data System (IPEDS) admissions data.
Once a model is defined and registered, interacting with the data is implemented using the Django framework using Python (not SQL), and data are returned as querysets, which are lists of objects (rows) based on the model. For example, returning a queryset of all the IPEDS admission enrollment records from the database would be simply done by using the following code:
The viewset is a basic building block provided by the Django REST framework and is generally used with routers to provide an easy way to implement standard behavior and consistent URLs (more on routers below). Django and the DRF provide several tools to create URL patterns and views of your data, but because the Education Data Portal contains a large amount of data, we opted for the router-viewset structure to ensure a standardized implementation for the API.
In the code below, we use the ReadOnlyModelViewset class because our API provides only read-only actions (we allow people to only access the data, not to add to or edit them). For the viewset, we only need to include a queryset of data (in this case, data from the IpedsAdmissionEnrollments model) and a serializer (in this case, a specially designed serializer for the IPEDS admissions data). To return all records from the IPEDS admissions data table, we need an implementation in views.py like the following code:
Serializers convert the data stored in the database into a format the API can use. There are several serializers available through the DRF, but for this API, we used the Model Serializer, which allows us to automatically create a serializer that maps fields directly with a model. To create the Model Serializer, we only have to define the model and specify that the fields should be associated. Using the ‘_ALL_’ keyword
(fields = ‘__all__’) includes all the model variables, or a user can specify which model fields should be included. In our example below, we list a specific set of variables to include in our data.
Finally, we need to define the URL that users will see when accessing the data. Typically, we would define our APIs so they adhere to a URL pattern. For example, we would traditionally define a list, detail, and edit route for each endpoint. Accessing these endpoints would list all the data available, allow a user to drill down into a single record detail, or allow a user to edit the single record detail.
But the Django REST framework provides a standardized way to add URL routes to the general Django framework without requiring the developer to define list, detail, and edit routes for each pattern. Because we are using the ReadOnlyModelViewset class, the DRF router will create two URLs for each route we have defined: a list route and a detail route. To create and register the URL in the router, we needed to provide the URL pattern (‘college-university/ipeds/admissions-enrollment’) and a viewset that will be called when this URL is accessed (IpedsAdmissionsEnrollmentsViewSet). Once defined, this router creates two URLs for our API:
1) college-university/ipeds/admissions-enrollment/, which lists all the records in the admissions-enrollment data table
2) college-university/ipeds/admissions-enrollment/[ID], which provides a detail route based on the primary key defined in the model. In other words, it lists information pertaining to a single record; in this case, a single enrollment figure for a school.
This is the basic framework we used to build the endpoints for the Education Data Portal. We implemented many more building blocks to provide the functionality we needed to make the API useful. Over the coming months, we’ll provide a follow-up to this post to provide more detail on the process for building the API for the Education Data Portal.