Of Hypergraphs, Programming Pairs, and Personal Learning Networks, Oh My! (Part 2 of 2)
Exploring the Research and Data Visualization Designs of the GitHub Copilot for Disabled Developers Program
In the first part of this article, I laid out the basis for a Faceted extension to the PAOH (Parallel Aggregated Ordered Hypergraph) notation of hypergraphs. Using the allusion to atoms and molecules, part one of this article focused on the Person-atoms of the design for the proposed GitHub Copilot for Disabled Developers research study and support program. In this part, I present a means to move the focus of visualization and analytics of my research design to explore the Programming Pair-molecules of this proposed study.
Note: GitHub Copilot is a Machine-Learning programmer assistive code-writing technology with the potential to be a life-changing productivity tool for disabled developers. For more on GitHub Copilot, go here. To learn more about the proposed GitHub Copilot for Disabled Developers research and support program, go here.
Atoms and Molecules as Part-Subpart Compositions
In the first part of this article I recalled my prior exposure decades ago to the concept of hypergraphs related to documentation writing for two Macintosh software programs; one a decision support system based on PetriNets, and the other a visual programming language. In both cases, these programs used a composition technique to move from one level of part-subpart understanding to another. I find this way of thinking about hypergraphs useful for moving between the Person and Programming Pair subjects of my proposed research design.
The easiest way to understand this Pair aggregation perspective is to think of hypergraph hyperedges as overlapping subgraphs of the hypergraph. Taking each hyperedge in isolation from other hyperedges of the hypergraph, it is clear that the vertices of a hyperedge form a subgraph of the full graph vertices. For example, in Figure 1 the Person Subjects labeled P1, P2, P6, P7, P11, P12, P16, P17 in the Homogeneous (Matched) Pairs hyperedge of my Programming Pair composition of the proposed GitHub Copilot for Disabled Developers program form a subgraph of the full set of 20 Person subjects of this design.
When we take this grouping perspective down to the level of the composition of the ten unique Programming Pairs of this research design, we can represent each pair using a Venn diagram visualization as shown in Figure 2.
To see how this Pair-wise “molecular” composition can be used to transform the visualization of our “atomic” member-wise PAOH expression of the GitHub Copilot for Disabled Developers research design, we start by rearranging the Person rows of our diagram. In Figure 3 we have reordered the Person rows to bring the Programming Pair members next to each other.
Note how the Person row reordering — callout 1 in Figure 3 — affects the previously neatly ordered partitioning of the Physical and Coding Abilities hyperedge columns — callout 2 in Figure 3. With this reordering of the Person-atoms of our research design, we have all the information needed to encode a set of ten icons that visually represent the atomic composition of our design’s Programming Pair molecules as shown in Figure 4.
From Atomic to Molecular Data Analysis
Now that we have a visual encoding for a set of ten icons that depict the hypergraph composition of the Programming Pairs of the GitHub Copilot for Disabled Developers research design, we can take a look at how these icon-encodings can be used in molecular-level analysis of the proposed research. To do this — as shown here in Figure 5 — let’s revisit the visualization of the synthetic data example of a Degree of Satisfaction measure by Persons participating in the program.
As you can see, this data is difficult to interpret when viewed at the atomic-level measure of individual Person subjects of the proposed research/support program. A simple computation of the Pair-level Mean and Difference measures of this Person-level data will move our analysis from the atomic level of individual subjects to the molecular level of our ten Programming Pairs of this research design, as shown in Figure 6.
With the Mean and Difference measures for the Programming Pairs of our proposed research study, we can easily project this molecular-level data onto a 2-axis facet using our ten Pair-composition icons as shown in Figure 7.
From Data Analysis to Personal Learning Network Mentoring Insights
The bulk of content in this two-part article has centered on the atomic and molecular level of the data model for the proposed GitHub Copilot for Disabled Developers research and support program. I have only briefly mentioned the organism level of this program. That is, both the Person-atom and Programming Pair-molecules are subparts of a whole which we can characterize as a Personal Learning Network, or PLN.
The collective members of the GitHub Copilot for Disabled Developers program will periodically meet via Microsoft Teams (or similar online meeting technology) to provide quantitative and qualitative feedback on the effectiveness and techniques by which participants use GitHub Copilot to facilitate their personal learning and mentoring activities. In addition, these network level group meetings will support members’ abilities to increase the effectiveness of the program’s contributions to participating Digital Humanities, Citizen Science and Open Source Software projects.
In this regard we can demonstrate how the real-time, ongoing collection and analysis of participant feedback can be used to manage the agendas for the PLN online meetings and facilitate program managers’ ability to enhance individual members’ and Programming Pairs’ satisfaction and learning through participation in the proposed research and support program. To see how our data model can be used in this program management manner, let’s refer to the Degree of Satisfaction data visualization of Figure 7.
By collecting the Degree of Satisfaction feedback data, timely analysis can be done to identify participants and Pairs for focused support and additional program-improving feedback. If we were to see the synthetic data of Figure 7 on a weekly report of the program’s feedback system, we would naturally want to look for Pairs that fall in the “danger zone” quadrant where both Pair members reflect a shared agreement of disappointment satisfaction in their participation in the program. That is, we would be looking for Pairs whose Mean and Difference measures place their Pair icon in the top-left quadrant as highlighted in Figure 8.
In addition to looking to support Pairs with shared levels of strong dissatisfaction with participation in the program, we would be on the lookout for Pairs that show a large difference of opinion about the member’s satisfaction regarding participation in the program. We would, for example, want to have a chat with the Pair found in the Difference measure of 9 row in Figure 8. One member of this Pair is extremely satisfied with the program while the other is categorically disappointed with their experience.
In Conclusion and Next Steps
I started this article as an experience report that I could share primarily with prospective research collaborators and possible funders. While the proposed GitHub Copilot for Disabled Developers program would seem to be a worthwhile endeavor as a potential life-changing support program in its own right, I wanted to explore how a pilot of this program could also be designed as a valuable research study to contribute to the domain of occupational and life-affirming therapy for disabled people, especially those recovering from catastrophic physical injuries such as my July 2020 spinal cord injury.
To achieve this exploratory goal, I planned to look at the various ways that we could use this program to capture and analyze participant insights and feedback to provide justification for funding the ongoing operation of this support program. If my tragic personal experience could be the catalyst for the creation and maintenance of the GitHub Copilot for Disabled Developers program, that would be a tangible way to find personal peace and fulfillment with having suffered this horrible injury.
Before digging into this exploration of the data model for the pilot of this proposed research study, I had thought that I would only need to dust off my knowledge of standard research design and statistics. I did not think that this exploration would lead me to think deeply about hypergraph data models and the challenges of their visualization. I certainly did not expect that I would be envisioning and documenting a faceted extension to the previously unknown-to-me PAOH (Parallel Aggregated Ordered Hypergraph) model. So even if my efforts go nowhere to find funding and collaboration to pilot the GitHub Copilot for Disabled Developers program, I feel like this two-part article is a useful contribution to the data science domain that is finding increased relevance and application in today’s research and technology development communities.
As to next steps, I have two threads to pursue. First, I will write a letter of admiration and self-introduction to the four Aviz researchers — Paola Valdivia, Paolo Buono, Catherine Plaisant, Nicole Dufournaud, and Jean-Daniel Fekete — who created the amazing PAOH model. I will thank them for freely sharing their inspiring ideas, point them to this two-part article, and seek their feedback to correct any misstatements or muddled thinking I may have expressed in describing the Faceted PAOH model extension of their work. And I will ask them if there is a possible Aviz researcher or student that would be interested in collaborating to coauthor a proper scientific paper based on the ideas in this experience report.
My second thread of follow up activity will be to do some prototype development of a Python library to programmatically generate Faceted PAOH data models and provide convenient ways to capture and analyze the data of such models. While my initial exploratory investigation was well-served using the features of a standard PC spreadsheet, I know that the powerful set- and matrix-functions and related features of Python libraries NumPy, Pandas, SciPy, etc., will greatly ease the use of the Faceted POAH model in diverse applications of this data model in research designs. And this activity, by the way, will get me away from writing about this proposed project to actually using the amazing GitHub Copilot technology. Copilot is truly a joy to experience as it supercharges my ability to develop software despite my unfortunate injury.
Until next time… thank you for reading this extended two-part article. And, as always, I welcome feedback on the article itself as well as communication with Kindred Spirits who may be interested in collaborating to fund or implement the GitHub Copilot for Disabled Developers research and support program.
Jim Salmons is a seventy-one year old post-cancer Digital Humanities Citizen Scientist. His primary research is focused on the development of a Ground Truth Storage format providing an integrated complex document structure and content depiction model for the study of digitized collections of print era magazines and newspapers. A July 2020 fall at home resulted in a severe spinal cord injury that has dramatically compromised his manual dexterity and mobility.
Jim was fortunate to be provided access to the GitHub Copilot Technology Early Access Community during his initial efforts to get back to work on the Python-based tool development activities of his primary research interest. Upon experiencing the dramatic positive impact of GitHub Copilot on his own development productivity, he became passionately interested in designing a research and support program to investigate and document the use of this innovative programming assistive technology for use by disabled developers.