Generating Python Mutants from Bug Fixes using Neural Machine Translation

This repository contains the code for the paper "Generating Python Mutants from Bug Fixes using Neural Machine Translation" by Sergen AŞIK, and Uğur YAYAN.

Dataset

Dataset folder contains the Python scripts that are used to create the dataset. The dataset is created using the following steps:

commit_to_diff.py (Extraction) script is used to downloads the diff files of the commits in the commit list file and saves them to the diffTexts folder. The commit list file contains the commit ids of the commits and repository urls of the repositories. The commit list file is in the following format:

commit_id1,repository_url1
commit_id2,repository_url2
...
commit_idn,repository_urln

diff_to_tp.py (Transform) script is used to separate text files with the commit differences one by one. It then calls the sep_file function to get the buggy and fixed version of each file. While performing this operation, each buggy and fixed source code pair must be numbered the same. Global variables are needed in order for the numbering process to be regular.
edit_actions.py (Classifier) script is used to extract the actions that occur during the conversion of the buggy code to the fixed code. Actions of source codes translated to ASTs are obtained with xml_diff. The actions are saved in the mutation_types folder.
abstraction.py (Abstraction) script is used to abstract source code to reduce vocabulary size.
ex_abs_main.py script is the main script for the create dataset process. It calls the other scripts in the correct order.

Requirements

Python 3.8 or above is required. You can install the requirements using the following command:

pip install -r Dataset/requirements.txt

Usage

To start the dataset creation process, you must first create a commit list file. The commit list file contains the commit ids of the commits and repository urls of the repositories. The commit list file is in the following format:

python Dataset/main.py

Transformer Model

The Transformer is a neural network architecture that solves sequence to sequence problems using attention mechanisms. The Transformer model is used to translate the fixed source code to the buggy source code. The Transformer model is implemented using the PyTorch library. The Transformer model is trained using the dataset created in the previous step. The architecture of the Transformer model is shown in the following figure:

Requirements

Python 3.8 or above is required. You can install the requirements using the following command:

pip install -r Model/requirements.txt

Training

Let's see how we can train the Transformer Model from scratch using the code in this repo. First let's download the dataset. The dataset folder contains the following subfolders:

Update: contains the update mutation type source and target files
Delete: contains the delete mutation type source and target files
Insert: contains the insert mutation type source and target files

The files are seperated two parts: Formatted and Unformatted. Formatted files are the files that are formatted using the special tokens like NEWLINE, INDENT and DEDENT. Unformatted files are the files that are not formatted using the special tokens. We used the formatted files in our experiments.

Then train our model:

python Model/main.py

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Dataset		Dataset
Model		Model
assets		assets
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generating Python Mutants from Bug Fixes using Neural Machine Translation

Dataset

Requirements

Usage

Transformer Model

Requirements

Training

License

About

Releases

Packages

Languages

License

ESOGU-SRLAB/ChArIoT-Testing

Folders and files

Latest commit

History

Repository files navigation

Generating Python Mutants from Bug Fixes using Neural Machine Translation

Dataset

Requirements

Usage

Transformer Model

Requirements

Training

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages