EnCoSum: Enhanced Semantic Features for Multi-scale Multi-modal Source Code Summarization

The source code and datasets for the EnCoSum.

Datasets

EnCoSum includes Java and Python datasets. If you want to train the model, you must download the datasets.

Java dataset

You can get it on: https://github.com/xing-hu/TL-CodeSum

Python dataset

You can get it on: https://github.com/EdinburghNLP/code-docstring-corpus

CodeXGLUE dataset

You can get it on: https://github.com/microsoft/CodeXGLUE/tree/main

Data process

EnCoSum uses the JDK compiler to parse java methods as ASTs, and the Treelib toolkit to prase Python functions as ASTs.

Get AST

In datapre file, the get_java_ast.py generates ASTs for the Java dataset and get_python_ast.py generates ASTs for Python datasets. You can run the following command：

python3 source.code ast.json

Get E-AST

The file get_adj.py gets ASTs with added control flow and data flow.

BERT embedding

You can use BERT method to embed AST nodes, detail in here

pip install bert-serving-server  
pip install bert-serving-client

Train-Test

In Model file, the main.py enables to train the model. Train and test model:

Python3 main.py

The nlg-eval can be set up in the following way, detail in here.
Install Java 1.8.0 (or higher).
Install the Python dependencies, run:

pip install git+https://github.com/Maluuba/nlg-eval.git@master

Requirements

If you want to run the model, you will need to install the following packages.

pytorch 1.7.1  
bert-serving-client 1.10.0  
bert-serving-server 1.10.0  
javalang 0.13.0  
nltk 3.5  
networkx 2.5  
scipy 1.1.0  
treelib 1.6.1

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
Mode		Mode
datapre		datapre
dataset		dataset
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EnCoSum: Enhanced Semantic Features for Multi-scale Multi-modal Source Code Summarization

Datasets

Java dataset

Python dataset

CodeXGLUE dataset

Data process

Get AST

Get E-AST

BERT embedding

Train-Test

Requirements

About

Releases

Packages

Languages

TranSMS/EnCoSum

Folders and files

Latest commit

History

Repository files navigation

EnCoSum: Enhanced Semantic Features for Multi-scale Multi-modal Source Code Summarization

Datasets

Java dataset

Python dataset

CodeXGLUE dataset

Data process

Get AST

Get E-AST

BERT embedding

Train-Test

Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages