"Software product <--> Requirements specification" verification model

The repository contains the model intended to solve the problem of automatic verification of accomplishment of software requirements. We employ the cosine similarity between latent representations of Java methods and functional requirements to estimate the measure of accomplishment. The repository contains full implementation of three source code encoders of different granularity, being the subject of research of Innopolis 2020 master thesis project, along with one natural language encoder. Additional details on problem, related work, model structure and demo can be found in presentation.

Requirements

Python >= 3.6
Tensorflow >= 2.0
h5py
dpu-utils

Training

Training script train.py performs all the essential operations including:

preparation and generation of data with --data-path and optional --data-folder or loading of already preprocessed data using --data-path and optional --preprocessed-data-folder arguments
specification of model structure with a set of hyperparameters along with selection of source code encoder with --model argument from ngram, api and bert
model training with ability to continue training from checkpoint --load-cp

The list of all training options and arguments can be retrieved using the following command:

python train.py --help

Evaluation

Evaluation script evaluate.py scores performance of trained model retrieved from provided checkpoint. We evaluate the retrieval abilities of model to recover Java methods provided description of functional requirement. For this purpose we exploit the Mean Reciprocal Rank, Relevance@k and First-Rank scoring metrics. To evaluate an ability of model to distinguish between relevant and irrelevant pairs of Java methods and functional requirements we score their cosine similarity.

Additional evaluation arguments can be retrieved using the command:

python evaluate.py --help

Model structure

We propose the Siamese Artificial Neural Network able to learn joint embeddings of Java methods and functional requirements written in natural language. The experimental results demonstrate that cosine similarity with empirically calculated threshold is an adequate measure to verify an accomplishment of functional requirements.

N-gram encoder

N-gram encoder treats the Java methods as an unordered bag-of-contexts.

API encoder

API encoder builds representations based on an extracted sequence of API calls augmented with function name and body tokens.

BERT Encoder

For embedding of functional requirements we exploit the BERT small.

Data details

Altered CodeSearchNet challenge dataset augmented with Github-Data-Collector repositories. Addition processing applied to the original dataset contains the steps of:

removal of exotic symbols
filtering of @params, @link and other description references
denial of 5% outlying long descriptions

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
images		images
model_checkpoints		model_checkpoints
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
Thesis presentation.pdf		Thesis presentation.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

"Software product <--> Requirements specification" verification model

Requirements

Training

Evaluation

Model structure

N-gram encoder

API encoder

BERT Encoder

Data details

About

Releases

Packages

Contributors 2

Languages

LeviiBereg/Reqver-model

Folders and files

Latest commit

History

Repository files navigation

"Software product <--> Requirements specification" verification model

Requirements

Training

Evaluation

Model structure

N-gram encoder

API encoder

BERT Encoder

Data details

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages