Pull Request Process Model

Here is sample of basic process model for Pull Request.

Data Source: MSR14 Mining Challenge

Project: Rail

Query String: SELECT pull_request_history.created_at, pull_request_history.pull_request_id, pull_request_history.action, pull_request_history.actor_id, pull_requests.pullreq_id FROM pull_request_history inner join pull_requests on pull_request_history.pull_request_id=pull_requests.id inner join projects on pull_requests.base_repo_id=projects.id where projects.id=78852

Process Model

This process model doesn’t contain loop because the pull_request_history table doesn’t capture the review comments.  However, this model show the lead time of pull request. In this model the light grey text represents median duration.  The median duration for accepted pull request are much shorter than those that are not accepted.

Between Open -> Closed or Open -> Merged there are multiple comments and commits.  In order to capture that, we will need to get data from pull_request_comments table.

MSR14 Github Projects

Projects in MSR14 based on language and number of forks

 

Create EER Schema for MSR14 Dataset

Use MySQL Workbench

  1. From the MySQL Workbench Home Go to the Model Section
    • Click Create EER Model from Database
  2. Follow the Reverse Engineer Database Flow
  • Select Stored Connection
  • HostName: Localhost
  • Username: msr14, Password msr14
  • Continue
  • Select msr14 schema
  • Continue
  • Continue
  • Execute
  • Continue
  • Close

Rahman2014-MSR: An Insight into the Pull Requests of GitHub

Author: Mohammad Masudur Rahman Chanchal K. Roy University of Saskatchewan, Canada

ABSTRACT

Given the increasing number of unsuccessful pull requests in GitHub projects, insights into the success and failure of these requests are essential for the developers. In this paper, we provide a comparative study between successful and un- successful pull requests made to 78 GitHub base projects by 20,142 developers from 103,192 forked projects. In the study, we analyze pull request discussion texts, project specific in- formation (e.g., domain, maturity), and developer specific information (e.g., experience) in order to report useful in- sights, and use them to contrast between successful and un- successful pull requests. We believe our study will help de- velopers overcome the issues with pull requests in GitHub, and project administrators with informed decision making.

Notes

Data Set: MSR 2014 Challenge – Github

Techniques: Latent Dirichlet Allocation (LDA) for Topic Modeling
Tools: JGibbLDA, a LDA implementation that uses Gibbs sampling, http://jgibblda.sourceforge.net/

Methodology: Extract 100 topic and select top 5.

Factors:

  • Label
  • Programming Languages
  • Project Age & Maturity
  • Project Developers & Experience

Finding:

each topic is more prevalent in the discussion of the unsuccessful pull requests than that of the successful pull requests except a dominant topic{ Actor Model.

In case of 24 GitHub base projects using three program- ming languages{Ruby, Java and JavaScript, average num- ber of unsuccessful pull requests per month is exceptionally higher than that of successful pull requests.

Exploring MSR 2014 Challenge data with Tableau

Set up Tableau on Mac

  1. Get Tableau Desktop for Student here http://www.tableausoftware.com/academic/students.  Tableau provides a free license for Full-time student to use.  It has capability to connect to various databases including MySQL that store the Github data.  List of all available databases connections are available here http://www.tableausoftware.com/support/drivers.
  2. Download and install the MySQL ODBC driver.
  3. Open Tableau Desktop and connect to MySQL.
  4. Use following input
    • Server: localhost, Port: 3306
    • Username: msr14
    • Password: msr14

 

Next step is to create the worksheet from existing table or joining them.

Import MSR2014 Challenge to Mac

The MSR2014 Challenge page provide a good instruction on how to import the data to MySQL on Mac and Linux.

$ wget http://ghtorrent.org/downloads/msr14-mysql.gz
$ mysql -u root -p
mysql > create user 'msr14'@'localhost' identified by 'msr14';
mysql> create database msr14;
mysql> GRANT ALL PRIVILEGES ON msr14.* to msr14@'localhost';
mysql> flush privileges;
# Exit MySQL prompt
$ zcat msr14-mysql.gz |mysql -u msr14 -p msr14

There is one issue with this.  The instruction worked on Linux but not on Mac.  It requires slide modification due to the issue with zcat on mac.  When using zcat, Mac automatically append .Z to the file name.  This cause the

$ zcat msr14-mysql.gz |mysql -u msr14 -p msr14

To give error. There are 3 work around.
  1. rename msr14-mysql.gz to msr14-mysql.Z
  2. use gunzip -c instead
$ wget http://ghtorrent.org/downloads/msr14-mysql.gz
$ mysql -u root -p
mysql > create user 'msr14'@'localhost' identified by 'msr14';
mysql> create database msr14;
mysql> GRANT ALL PRIVILEGES ON msr14.* to msr14@'localhost';
mysql> flush privileges;
# Exit MySQL prompt
$ gunzip -c msr14-mysql.gz | mysql -u msr14 -p msr14

To validate that the data is import correctly use
mysql> select language,count(*) from projects where forked_from is null group by language;

or in MySQL Workbench there should be a new schema called msr14 and all table should appear like below.

 

Challenges restoring MSR 2014 data

After I finish another article, I tried to imported the msr data but ran into errors on Windows machine. The MSR site only provides instruction to Linux and Mac OS.  It also assume that the database servers are already set up.

Now I am trying to set up on Mac.  Before I can follow instruction in http://ghtorrent.org/msr14.html, I need to set up the

MySQL Server installation on Mac.

I tried to installed manually by downloading file from mysql site. I did something similar to this http://blog.mclaughlinsoftware.com/2011/02/10/mac-os-x-mysql-install/

MySQL Community Server (GPL) -> I use DMG Version and installed both mysql server and also startup item.

MySQL Workbench (GPL)

Also set mysql to the path

Version Information

  • mysql-5.6.19-osx10.7-x86_64

There is also an automated script that install MySQL on Mac Install MySQL on OS X 10.9 Mavericks. | Mac Mini Vault.  I didn’t try this since I got manually approach to work.

Next step is to try to import the data using the instruction here http://ghtorrent.org/msr14.html.