Skip to content

This repository contains all the resources related to the MSR 2019 Data Showcase submission. The project is on creating a Github dataset that includes hundreds of Python repository metadata and ASTs.

Notifications You must be signed in to change notification settings

sumonbis/MSR19-DataShowcase

Repository files navigation

MSR19-DataShowcase

The popularity of Python programming language has surged in recent years due to its increasing usage in Data Science. The availability of Python repositories in Github presents an opportunity for mining software repository research, e.g., suggesting the best practices in developing Data Science applications, identifying bug-patterns, recommending code enhancements etc. To enable this research, we have created a new dataset that includes 1558 mature Github projects that are developing Python software for Data Science tasks. By analyzing the metadata and code, we have included the projects in our dataset which use a diverse set of machine learning libraries and managed by a variety of users and organizations. The dataset is made publicly available through Boa website infrastructure both as a collection of raw projects as well as in a processed form that could be used for performing large scale analysis using Boa language. We also present an initial application to demonstrate the potential of the dataset that could be leveraged by the community.

For accessing the dataset and writing queries on the dataset vist: http:https://boa.cs.iastate.edu/boa/index.php.

About

This repository contains all the resources related to the MSR 2019 Data Showcase submission. The project is on creating a Github dataset that includes hundreds of Python repository metadata and ASTs.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published