Untapped Data Wealth of wichealth.org

John Brusk
Oh, Behave!
Published in
5 min readJan 20, 2018

Every minute, over 300,000 Facebook users update their statuses and over 50,000 share a link for their social network to consume. The flood of data from this trail of electronic human interaction and the ensuing techno-mediated sociological phenomenon, known as “social networking”, is now leveraged by corporations around the world to optimize a given business objective with data-driven, user-tailored action.

At the same time, the diffusion of electronic health related applications is accelerating, with the number available currently exceeding 300,000. Hundreds of research studies have demonstrated that the US healthcare system could save billions per year in avoidable healthcare costs with the dissemination of digital health management applications. The proliferation of these electronic health management applications is generating large volumes of data at high velocity. More importantly, this data can be used to feedback these applications in order to maximize quality.

As a digital health education innovation, wichealth.org has demonstrated the ability to ensure content management efficiency and delivery of the most current information at scale. Since 2010, over three million wichealth.org lessons have been completed by over 800,000 unique clients of the Women, Infants and Children (WIC) program. During the course of their lessons, these clients have favorited, liked, shared and/or commented on a variety of online healthy eating and physical activity learning resources over 750,000 times. In the last year, nearly 60% of all wichealth.org users completed their lessons on a mobile device, expanding the potential opportunity for mediating longer user interaction throughout their day via text, email and applications able to curate a constant flow of information.

Wichealth.org is definitely not Facebook, yet currently there are two lessons completed by WIC clients every minute. The data collected during these interactions has been integral to the development of new and improved content, such as the prioritization of resources based on a user’s personal profile. Further, this information has ensured that wichealth.org performance has remained consistent over time by enabling monitoring of key performance statistics such as health behavior stage of change progression and belief in the ability to engage in recommended behaviors (see figure 1).

Figure 1. wichealth.org Key Performance Statistics for Top 5 Lessons, October 2016 — September 2017

Link engagement, a metric recorded by wichealth.org, is defined as the proportion of lessons completed where a client liked, favorited or shared an online resource they were directed to by the system. Nearly 10% of wichealth.org users engage with at least one resource they access while completing their lesson.

A decade after wichealth.org came into existence, the 1st International Conference on Learning Analytics and Knowledge was held in response to the rapid movement of educational offerings into the online space with the “purpose of understanding and optimizing learning and environments in which it occurs.” Online educational environments with their ability to generate large amounts of data, have led to the emergence of a new discipline known as “Educational Data Mining” (EDM). The Educational Data Mining community website, defines EDM as an “emerging discipline, concerned with developing methods for exploring the unique and increasingly large-scale data that come from educational settings and using those methods to better understand students, and the settings which they learn”.

Although the success of wichealth.org as a health education program is well documented, the opportunity for EDM and learning analytics on data collected from wichealth.org has only begun to be tapped. Fifteen years of ongoing quality improvement and innovative content development are testament to the commitment of the wichealth.org team to utilize user interaction and feedback to deliver impactful online health education. However, a vast opportunity still exists for more advanced and sophisticated analytics that leverage all of the data available from wichealth.org users. Potentially valuable learning analytics initiatives exist using wichealth.org data. Several research topics include:

  • geospatial impacts such as urban compared to rural use
  • natural language processing of nearly a million comments left by users to date
  • impact of visual aid tuning such as virtual host photo or video gestures
  • longer term repeat user cohort investigations
  • network analysis of user interaction and targeted outcomes

The latter of these may hold the greatest potential for predicting and potentially increasing the probability of a user engaging in a particular behavior. Most of the analytics conducted on wichealth.org data to date has been focused on lesson completion statistics. However, the various patterns in which resources are accessed and the sequence of user reactions to them such as likes, favorites, shares and comments has not been well described with respect to the potential influence it may have on users, and if so, the types of users. The interconnected set of lines making up the network diagram in figure 2, was constructed using the last year of lessons completed for the wichealth.org “Make Meals and Snacks Simple” lesson. Each dot is a “node” or, in this case, a resource available to lesson users. Each line is an “edge” or directed path from one resource to another.

Figure 2. wichealth.org Network Diagram of Resource Link Paths, October 2016 — September 2017

The weight of the edges is proportional to the traffic of users that took the particular path from one resource to another. In this very simple network representation, it is clear that a dominant triangle of usage between three heavily used resources represents a common pathway for users of the lesson. More sophisticated stratification of the analysis may result in the elucidation of pathways more likely to achieve a desired educational goal. Various measurements and methods for describing user path networks between resources include:

  • Page rank — a metric developed by Google to quantify the importance of an internet resource
  • Clustering coefficient — a metric directly proportional to the degree of connectedness of the neighboring nodes
  • Density — a metric that quantifies the number of connections between nodes of a network
  • Centrality — a measure of relative importance of nodes and edges as leaders in a network

Many opportunities exist to mine wichealth.org data for untapped value able to drive high quality and personally tailored health education. Expanding the use of big data tools able to process complex algorithms like natural language processing is definitely one of the first steps. The availability of Amazon Web Services (AWS) enables anyone to utilize high performance statistical and machine learning libraries at an affordable cost. The potential value that lies buried in the interactions of millions of wichealth.org lessons completed may lead to further innovations in electronically mediated health education that can help support better health outcomes and elimination of avoidable medical expenses.

--

--

John Brusk
Oh, Behave!

Epidemiologist and data scientist with over 20 years of experience in innovative, technology-driven health services delivery measurement and evaluation.