Artifact package for our paper “How do Developers Talk about GHA?”. This repository includes our data and scripts.
- Data collection (2018.10.1-2022.10.31)
- SO data, i.e., posts, from the official SO data dump
- GitHub data, i.e., issues, using the GitHub Search API
- Data for manual classification:
SO post
and GitHub issue
- This data includes:
- 6,590 SO questions (Q_S) with 2,471 accepted SO answers (A_S)
- 315 GitHub issues (Q_G) with 217 closed GitHub issues (A_G)
- The results of manual classification can be found in
- Data structure: (id, type, phase, category)
- id: the number used in this paper. “P1” and “I1” represent the first SO post and the first GitHub issue in our dataset, respectively.
- type: “github issue” or “so post”
- phase: phase of a post or an issue
- category: category of a post or an issue
- Data for characteristics analysis
- The data for characteristics analysis can be found in
and so_post_difficulty.csv
- Popularity metrics include:
- avgView, the average number of views for all the questions of a category;
- avgFav, the average number of favorites for all the questions of a category;
- avgScore, the average score for all the questions of a categpru;
- avgAns, the average number of answers for all the questions of a category.
- Difficulty metrics include:
- ansRate, the percentage of questions of a category with at least one answer;
- acceptRate, the percentage of questions of a category that have accepted answers;
- timeFA, the median time needed for questions of a category to receive the first answers, in hours;
- timeAA, the median time needed for questions of a category to receive the accepted answers, in hours;
- textSize, the average number of description characters for questions of a category.
- The accepted answer examples and detailed discussion of each solution strategy can be found in
We seek to analyze the characteristics of the identified problem categories in terms of popularity and difficulty.
- Spearman’s rank correlation coefficient
- Figure 1: The trend of GHA discussed on Stack Overflow
- Figure 2: The taxonomy of GHA problems
- Table 1: Popularity of GHA problem categories
- Table 2: Difficulty of GHA problem categories
- Table 3: Correlation between Popularity and Difficulty of GHA problem categories
- Table 4: Difficulty of GHA problem categories (GitHub issues)
Artifact package for our paper “How do Developers Talk about GHA?”. This repository includes our data and scripts.
SO post
andGitHub issue
We seek to analyze the characteristics of the identified problem categories in terms of popularity and difficulty.