Of Hypergraphs, Programming Pairs, and Personal Learning Networks, Oh My! (Part 2 of 2)

Exploring the Research and Data Visualization Designs of the GitHub Copilot for Disabled Developers Program

Jim Salmons
GitHub Copilot for Disabled Developers
9 min readMar 10, 2022

--

In the first part of this article, I laid out the basis for a Faceted extension to the PAOH (Parallel Aggregated Ordered Hypergraph) notation of hypergraphs. Using the allusion to atoms and molecules, part one of this article focused on the Person-atoms of the design for the proposed GitHub Copilot for Disabled Developers research study and support program. In this part, I present a means to move the focus of visualization and analytics of my research design to explore the Programming Pair-molecules of this proposed study.

Note: GitHub Copilot is a Machine-Learning programmer assistive code-writing technology with the potential to be a life-changing productivity tool for disabled developers. For more on GitHub Copilot, go here. To learn more about the proposed GitHub Copilot for Disabled Developers research and support program, go here.

Atoms and Molecules as Part-Subpart Compositions

In the first part of this article I recalled my prior exposure decades ago to the concept of hypergraphs related to documentation writing for two Macintosh software programs; one a decision support system based on PetriNets, and the other a visual programming language. In both cases, these programs used a composition technique to move from one level of part-subpart understanding to another. I find this way of thinking about hypergraphs useful for moving between the Person and Programming Pair subjects of my proposed research design.

The easiest way to understand this Pair aggregation perspective is to think of hypergraph hyperedges as overlapping subgraphs of the hypergraph. Taking each hyperedge in isolation from other hyperedges of the hypergraph, it is clear that the vertices of a hyperedge form a subgraph of the full graph vertices. For example, in Figure 1 the Person Subjects labeled P1, P2, P6, P7, P11, P12, P16, P17 in the Homogeneous (Matched) Pairs hyperedge of my Programming Pair composition of the proposed GitHub Copilot for Disabled Developers program form a subgraph of the full set of 20 Person subjects of this design.

Figure 1. Taking each colored hyperedge of this Faceted PAOH perspective on the proposed GitHub Copilot for Disabled Developers research design, the Person subjects of that edge form a subgraph of the full set of Person subjects participating in the study.

When we take this grouping perspective down to the level of the composition of the ten unique Programming Pairs of this research design, we can represent each pair using a Venn diagram visualization as shown in Figure 2.

Figure 2. This animated GIF shows the ten unique compositions of the Programming Pairs in the proposed GitHub Copilot for Disabled Developers research design. On the left is a Venn diagram of the hyperedge composition of a Pair. On the right is a compact icon expression of this composition.

To see how this Pair-wise “molecular” composition can be used to transform the visualization of our “atomic” member-wise PAOH expression of the GitHub Copilot for Disabled Developers research design, we start by rearranging the Person rows of our diagram. In Figure 3 we have reordered the Person rows to bring the Programming Pair members next to each other.

Figure 3. Reordering the Person rows to bring the members of the ten Programming Pairs next to each other.

Note how the Person row reordering — callout 1 in Figure 3 — affects the previously neatly ordered partitioning of the Physical and Coding Abilities hyperedge columns — callout 2 in Figure 3. With this reordering of the Person-atoms of our research design, we have all the information needed to encode a set of ten icons that visually represent the atomic composition of our design’s Programming Pair molecules as shown in Figure 4.

Figure 4. By rearranging the Person subjects of Figure 1 so the Programming Pair members are next to each other (1), the hyperedges of the Abilities edges are scrambled (2) but this gives us an arrangement that prescribes the encoding of the ten Pair icons shown along the diagonal of the main Pair Combinations grid.

From Atomic to Molecular Data Analysis

Now that we have a visual encoding for a set of ten icons that depict the hypergraph composition of the Programming Pairs of the GitHub Copilot for Disabled Developers research design, we can take a look at how these icon-encodings can be used in molecular-level analysis of the proposed research. To do this — as shown here in Figure 5 — let’s revisit the visualization of the synthetic data example of a Degree of Satisfaction measure by Persons participating in the program.

Figure 5. Synthetic data of a Degree of Satisfaction measure polled from the Person subjects of the proposed GitHub Copilot for Disabled Developers research project.

As you can see, this data is difficult to interpret when viewed at the atomic-level measure of individual Person subjects of the proposed research/support program. A simple computation of the Pair-level Mean and Difference measures of this Person-level data will move our analysis from the atomic level of individual subjects to the molecular level of our ten Programming Pairs of this research design, as shown in Figure 6.

Figure 6. Computing the Mean and Difference for the Degree of Satisfaction metric for the ten Programming Pairs based on the synthetic data of the Person-subject level measures.

With the Mean and Difference measures for the Programming Pairs of our proposed research study, we can easily project this molecular-level data onto a 2-axis facet using our ten Pair-composition icons as shown in Figure 7.

Figure 7. Visualizing the Mean and Difference of Degree of Satisfaction (synthetic data) among the ten Programming Pair compositions of the proposed GitHub Copilot for Disabled Developers research/support program.

From Data Analysis to Personal Learning Network Mentoring Insights

The bulk of content in this two-part article has centered on the atomic and molecular level of the data model for the proposed GitHub Copilot for Disabled Developers research and support program. I have only briefly mentioned the organism level of this program. That is, both the Person-atom and Programming Pair-molecules are subparts of a whole which we can characterize as a Personal Learning Network, or PLN.

The collective members of the GitHub Copilot for Disabled Developers program will periodically meet via Microsoft Teams (or similar online meeting technology) to provide quantitative and qualitative feedback on the effectiveness and techniques by which participants use GitHub Copilot to facilitate their personal learning and mentoring activities. In addition, these network level group meetings will support members’ abilities to increase the effectiveness of the program’s contributions to participating Digital Humanities, Citizen Science and Open Source Software projects.

In this regard we can demonstrate how the real-time, ongoing collection and analysis of participant feedback can be used to manage the agendas for the PLN online meetings and facilitate program managers’ ability to enhance individual members’ and Programming Pairs’ satisfaction and learning through participation in the proposed research and support program. To see how our data model can be used in this program management manner, let’s refer to the Degree of Satisfaction data visualization of Figure 7.

By collecting the Degree of Satisfaction feedback data, timely analysis can be done to identify participants and Pairs for focused support and additional program-improving feedback. If we were to see the synthetic data of Figure 7 on a weekly report of the program’s feedback system, we would naturally want to look for Pairs that fall in the “danger zone” quadrant where both Pair members reflect a shared agreement of disappointment satisfaction in their participation in the program. That is, we would be looking for Pairs whose Mean and Difference measures place their Pair icon in the top-left quadrant as highlighted in Figure 8.

Figure 8. The Degree of Satisfaction feedback is valuable data to collect for the eventual overall assessment of the program’s impact and effectiveness. However, real-time collection and analysis of this data can be used for program management and support service improvement. Hotspots to investigate are highlighted.

In addition to looking to support Pairs with shared levels of strong dissatisfaction with participation in the program, we would be on the lookout for Pairs that show a large difference of opinion about the member’s satisfaction regarding participation in the program. We would, for example, want to have a chat with the Pair found in the Difference measure of 9 row in Figure 8. One member of this Pair is extremely satisfied with the program while the other is categorically disappointed with their experience.

In Conclusion and Next Steps

I started this article as an experience report that I could share primarily with prospective research collaborators and possible funders. While the proposed GitHub Copilot for Disabled Developers program would seem to be a worthwhile endeavor as a potential life-changing support program in its own right, I wanted to explore how a pilot of this program could also be designed as a valuable research study to contribute to the domain of occupational and life-affirming therapy for disabled people, especially those recovering from catastrophic physical injuries such as my July 2020 spinal cord injury.

To achieve this exploratory goal, I planned to look at the various ways that we could use this program to capture and analyze participant insights and feedback to provide justification for funding the ongoing operation of this support program. If my tragic personal experience could be the catalyst for the creation and maintenance of the GitHub Copilot for Disabled Developers program, that would be a tangible way to find personal peace and fulfillment with having suffered this horrible injury.

Before digging into this exploration of the data model for the pilot of this proposed research study, I had thought that I would only need to dust off my knowledge of standard research design and statistics. I did not think that this exploration would lead me to think deeply about hypergraph data models and the challenges of their visualization. I certainly did not expect that I would be envisioning and documenting a faceted extension to the previously unknown-to-me PAOH (Parallel Aggregated Ordered Hypergraph) model. So even if my efforts go nowhere to find funding and collaboration to pilot the GitHub Copilot for Disabled Developers program, I feel like this two-part article is a useful contribution to the data science domain that is finding increased relevance and application in today’s research and technology development communities.

As to next steps, I have two threads to pursue. First, I will write a letter of admiration and self-introduction to the four Aviz researchers — Paola Valdivia, Paolo Buono, Catherine Plaisant, Nicole Dufournaud, and Jean-Daniel Fekete — who created the amazing PAOH model. I will thank them for freely sharing their inspiring ideas, point them to this two-part article, and seek their feedback to correct any misstatements or muddled thinking I may have expressed in describing the Faceted PAOH model extension of their work. And I will ask them if there is a possible Aviz researcher or student that would be interested in collaborating to coauthor a proper scientific paper based on the ideas in this experience report.

My second thread of follow up activity will be to do some prototype development of a Python library to programmatically generate Faceted PAOH data models and provide convenient ways to capture and analyze the data of such models. While my initial exploratory investigation was well-served using the features of a standard PC spreadsheet, I know that the powerful set- and matrix-functions and related features of Python libraries NumPy, Pandas, SciPy, etc., will greatly ease the use of the Faceted POAH model in diverse applications of this data model in research designs. And this activity, by the way, will get me away from writing about this proposed project to actually using the amazing GitHub Copilot technology. Copilot is truly a joy to experience as it supercharges my ability to develop software despite my unfortunate injury.

Until next time… thank you for reading this extended two-part article. And, as always, I welcome feedback on the article itself as well as communication with Kindred Spirits who may be interested in collaborating to fund or implement the GitHub Copilot for Disabled Developers research and support program.

Jim Salmons is a seventy-one year old post-cancer Digital Humanities Citizen Scientist. His primary research is focused on the development of a Ground Truth Storage format providing an integrated complex document structure and content depiction model for the study of digitized collections of print era magazines and newspapers. A July 2020 fall at home resulted in a severe spinal cord injury that has dramatically compromised his manual dexterity and mobility.

Jim was fortunate to be provided access to the GitHub Copilot Technology Early Access Community during his initial efforts to get back to work on the Python-based tool development activities of his primary research interest. Upon experiencing the dramatic positive impact of GitHub Copilot on his own development productivity, he became passionately interested in designing a research and support program to investigate and document the use of this innovative programming assistive technology for use by disabled developers.

--

--

Jim Salmons
GitHub Copilot for Disabled Developers

I am a #CitizenScientist doing #DigitalHumanities & #MachineLearning research via FactMiners & The Softalk Apple Project. Medium is my #OpenAccess channel.