Navigating Open-Source Software Discussions with SUMMIT

Avinash Bhat
ACM CSCW
Published in
5 min readSep 26, 2023

— Saskia Gilmer, Avinash Bhat, Shuvam Shah, Kevin Cherry, Jinghui Cheng, Jin L.C. Guo

A signboard labelled ‘Summary’, indicating the importance of summaries for lengthy discussions.
Summary by Nick Youngson CC BY-SA 3.0 Alpha Stock Images

Open-source software (OSS) development relies heavily on collaboration mechanisms among software stakeholders facilitated by Issue Tracking Systems (ITS), where discussions about bugs, new features, and other topics take place. However, navigating these discussions can be overwhelming due to the volume and complexity of the comments¹, especially in popular repositories like Tensorflow.

Summarization is an effective way to accelerate the collaborative information acquisition process of OSS communities. To realize this, we developed SUMMIT, a tool that combines automatic text summarization techniques and community efforts to help users collectively develop and use summaries in OSS discussions.

Why Summarization?

A summary is a brief statement that expresses, in a concise form, information that appears elsewhere in the issue thread. Through a content analysis on a sample of long discussion threads from three large open-source software projects, we observed that manual summarization is a commonly used strategy for ITS users to participate in discussions in long-living issue threads.

We found that summarizations target different audiences with four distinct objectives:

  1. Adding context to facilitate comprehension within an entangled discussion thread;
  2. Providing an access point to enable easy capture of important information or action points;
  3. Providing supporting evidence to back up actions, statements, or opinions; and
  4. Clarifying to confirm understanding or highlight intention.

Despite the efforts involved in summarization, the current design of the issue thread often results in summaries being buried within the overwhelming number of comments and failing to be caught by the intended readers. Moreover, the objective of summarization diversifies, so it is impossible to resort to any off-the-shelf summarization technique to satisfy all common use cases simultaneously. Therefore, we were motivated to design a new tool to facilitate the authoring and retrieval of summaries on issue threads that can serve various objectives while respecting the current usage of issue-tracking systems in OSS development and maintenance.

The Design of SUMMIT

Through a user-centred iterative process, we propose to extend the existing ITS interface with two additional UI components for Conversation Summary and Information Type Summary to the page (see the image below). To alleviate the manual effort, we used the machine learning model developed in the previous work¹ for classifying the information type and the BERTSumEx² for providing a draft summary to the users.

GitHub issue page enhanced by SUMMIT: ‘Information Type Summaries’ panel between the issue title and the messages and ‘Conversation Summaries’ panel on the left.
On installing SUMMIT, two panels, indicated by the red boxes, are added to the issue interface when users access an issue on GitHub: Information Type Summaries (on top) and Conversation Summaries (on the left).

The information type summary panel is visible in an expanded view at the top of the thread, providing summaries of certain aspects of the thread, like bug reproduction, solution discussion or workaround, among others. As indicated in the image below, users could browse through the summaries organized by different information types, edit the summaries, navigate through the comments that contributed to each summary, and correct the auto-detected information types.

User interactions with the ‘Information Type Summary’ panel: (1) Edit summary, (2) Browse by information types, and (3) Navigate to contributing comments.
Possible user interactions with the information type summary panel: Users can (1) edit the summary, (2) browse through the summaries organized by different information types, and (3) navigate through the comments that contributed to each summary.

The conversation summary panel is located on the left side and sticks to the side while scrolling, allowing the users to summarize a set of comments of their choosing. Users could create a new conversation summary from a list of comments of their choice, edit the summaries, add or remove comments that contribute to an existing summary, and navigate through the comments that contribute to each summary.

Interactions with ‘Conversation Summary’ panel: (1) Create new summary, (2) Select comments for summary, (3) Generate draft with BERTSumEx, (4) Refine and save, (5) Edit or modify comments in existing summary, (6) Navigate to contributing comments.
Possible user interactions with the conversation summary panel: Users can (1) create a new conversation summary, (2) choose a list of comments from the issue thread to add to the summary, (3) generate a draft summary using the BERTSumEx model, (4) refine and save a summary, (5) add or remove comments and edit an existing summary, and (6) navigate through the comments that contribute to each summary.

Evaluation and Lessons Learned

A user study with 16 participants reveals that using SUMMIT has

  • drastically reshaped the information acquisition strategies of the participants in long issue threads;
  • helped surface useful content from the thread, and decrease the perceived difficulties in finding relevant information, and
  • reduced the mental effort of comprehending long issue threads while improving information organization

Through the iterative design and evaluation process, we generated and reflected on five design guidelines for tools that support summarization in issue tracking systems of OSS.

Guideline 1: The tool should be flexible to accommodate various objectives and use cases of summarization. The two types of summaries were well-perceived by our participants, who discussed different scenarios in which each type of summary would be useful.

Guideline 2: The discussion context needs to be preserved in the summaries. Participants frequently used the features in SUMMIT to find the corresponding content within context. Those features supported users in gaining valuable contextual information and judging the credibility of the summaries on collaboration platforms.

Guideline 3: Important information within the thread should be made readily available. Making the summaries readily visible and easily discernable enabled participants to quickly find the needed information. However, an intricate question arises concerning the importance of information — the summaries may risk contributors ignoring the complexity of an issue, a risk to be addressed using the other design guidelines.

Guideline 4: The system should support iterative management of summaries. Participants highlighted their need for content moderation and expressed their hesitation to trust the summary quality and machine learning model performance. It is, therefore, crucial to support users to correct and update the content, in particular when automated methods are used in the tool.

Guideline 5: Users should be encouraged to contribute to collective sense-making efforts. SUMMIT offers a new way for the OSS community members to make more noticeable contributions. However, a delicate balance needs to be achieved between motivating/enabling the contribution and controlling the quality of the contribution.

To learn more, check out our paper entitled “SUMMIT: Scaffolding OSS Issue Discussion Through Summarization” at CSCW 2023. You can access the full paper here.

The in-person presentation at CSCW 2023 will be on October 17th between 11 a.m. — 12:30 p.m. in the Collaboration I session.

References:

¹ Arya, Deeksha, Wenting Wang, Jin LC Guo, and Jinghui Cheng. “Analysis and detection of information types of open source software issue discussions.” In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 454–464. IEEE, 2019.

² Liu, Yang, and Mirella Lapata. “Text Summarization with Pretrained Encoders.” In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3730–3740. 2019.

--

--