Skip to content

Repository for analysis and experiments in the BigCode project.

License

Notifications You must be signed in to change notification settings

ocramz/bigcode-analysis

 
 

Repository files navigation

BigCode Analysis

This repository is for the analysis done in BigCode Project. You can find analysis of datasets, models, architecture choices and more.

Contents

  • Data analysis: In the folder data_analysis, we provide code for data analysis:

    • Near deduplication
    • Python data analysis:
      • Natural language distribution in comments/docstrings
      • Data decontamination for HumanEval and MBPP benchmarks
      • Percentage of files that can be successfully compiled
      • Percentage of configuration and test files Some notebooks with some early data and model loss analysis.
  • Multi-Query Attention experiments, for details please to multi_query_experiments/README.md)

About

Repository for analysis and experiments in the BigCode project.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 94.0%
  • Python 5.9%
  • Shell 0.1%