@inproceedings{yu2018dataset,
title={A dataset of duplicate pull-requests in github},
author={Yu, Yue and Li, Zhixing and Yin, Gang and Wang, Tao and Wang, Huaimin},
booktitle={Proceedings of the 15th International Conference on Mining Software Repositories},
pages={22--25},
year={2018}
}
使用了此数据集的研究
Li, Z., Yu, Y., Zhou, M., Wang, T., Yin, G., Lan, L, & Wang, H.Redundancy, Context, and Preference: An Empirical Study of Duplicate Pull Requests in OSS Projects. (2020). IEEE Transactions on Software Engineering (TSE). PDF
Wang, Q., Xu, B., Xia, X., Wang, T., & Li, S. (2019, October). Duplicate Pull Request Detection: When Time Matters. In Proceedings of the 11th Asia-Pacific Symposium on Internetware (pp. 1-10).
Zhou, S., Vasilescu, B., & Kästner, C. (2019, August). What the fork: a study of inefficient and efficient forking practices in social coding. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) (pp. 350-361).
Ren, L., Zhou, S., Kästner, C., & Wąsowski, A. (2019, February). Identifying redundancies in fork-based development. In Proceedings 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER) (pp. 230-241). IEEE.
Li, Z., Yu, Y., Wang, T., Yin, G., Mao, X., & Wang, H. (2019). Detecting Duplicate Contributions in Pull-based Model Combining Textual and Change Similarities. Journal of Computer Science and Technology. PDF
Li, Z., Yin, G., Yu, Y., Wang, T., & Wang, H. (2017, September). Detecting duplicate pull-requests in github. In Proceedings of the 9th Asia-Pacific Symposium on Internetware (pp. 1-6). PDF
English
DupPR 数据集
关于数据集
本数据集包含了开发者在GitHub平台上 非故意提交 的重复pull-request,具体数据详见dup_prs.md。 一些直接可用的与Pull-request相关的数据详见pullreq_info
帮助我们
非常欢迎大家帮助我们一起完善此数据集,你可以通过提交Issue或者Pull-request来:
注意: 请不要提交重复issue/pull-request :)
引用此数据集
使用了此数据集的研究
Li, Z., Yu, Y., Zhou, M., Wang, T., Yin, G., Lan, L, & Wang, H.Redundancy, Context, and Preference: An Empirical Study of Duplicate Pull Requests in OSS Projects. (2020). IEEE Transactions on Software Engineering (TSE). PDF
Wang, Q., Xu, B., Xia, X., Wang, T., & Li, S. (2019, October). Duplicate Pull Request Detection: When Time Matters. In Proceedings of the 11th Asia-Pacific Symposium on Internetware (pp. 1-10).
Zhou, S., Vasilescu, B., & Kästner, C. (2019, August). What the fork: a study of inefficient and efficient forking practices in social coding. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) (pp. 350-361).
Ren, L., Zhou, S., Kästner, C., & Wąsowski, A. (2019, February). Identifying redundancies in fork-based development. In Proceedings 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER) (pp. 230-241). IEEE.
Li, Z., Yu, Y., Wang, T., Yin, G., Mao, X., & Wang, H. (2019). Detecting Duplicate Contributions in Pull-based Model Combining Textual and Change Similarities. Journal of Computer Science and Technology. PDF
Li, Z., Yin, G., Yu, Y., Wang, T., & Wang, H. (2017, September). Detecting duplicate pull-requests in github. In Proceedings of the 9th Asia-Pacific Symposium on Internetware (pp. 1-6). PDF