Skip to content

ESOGU-SRLAB/ChArIoT-Testing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Generating Python Mutants from Bug Fixes using Neural Machine Translation

This repository contains the code for the paper "Generating Python Mutants from Bug Fixes using Neural Machine Translation" by Sergen AŞIK, and Uğur YAYAN.

Dataset

Dataset folder contains the Python scripts that are used to create the dataset. The dataset is created using the following steps:

  • commit_to_diff.py (Extraction) script is used to downloads the diff files of the commits in the commit list file and saves them to the diffTexts folder. The commit list file contains the commit ids of the commits and repository urls of the repositories. The commit list file is in the following format:
commit_id1,repository_url1
commit_id2,repository_url2
...
commit_idn,repository_urln
  • diff_to_tp.py (Transform) script is used to separate text files with the commit differences one by one. It then calls the sep_file function to get the buggy and fixed version of each file. While performing this operation, each buggy and fixed source code pair must be numbered the same. Global variables are needed in order for the numbering process to be regular.

  • edit_actions.py (Classifier) script is used to extract the actions that occur during the conversion of the buggy code to the fixed code. Actions of source codes translated to ASTs are obtained with xml_diff. The actions are saved in the mutation_types folder.

  • abstraction.py (Abstraction) script is used to abstract source code to reduce vocabulary size.

  • ex_abs_main.py script is the main script for the create dataset process. It calls the other scripts in the correct order.

Dataset Interface

Requirements

Python 3.8 or above is required. You can install the requirements using the following command:

pip install -r Dataset/requirements.txt

Usage

To start the dataset creation process, you must first create a commit list file. The commit list file contains the commit ids of the commits and repository urls of the repositories. The commit list file is in the following format:

python Dataset/main.py

Transformer Model

The Transformer is a neural network architecture that solves sequence to sequence problems using attention mechanisms. The Transformer model is used to translate the fixed source code to the buggy source code. The Transformer model is implemented using the PyTorch library. The Transformer model is trained using the dataset created in the previous step. The architecture of the Transformer model is shown in the following figure:

Transformer Architecture

Requirements

Python 3.8 or above is required. You can install the requirements using the following command:

pip install -r Model/requirements.txt

Training

Let's see how we can train the Transformer Model from scratch using the code in this repo. First let's download the dataset. The dataset folder contains the following subfolders:

  • Update: contains the update mutation type source and target files
  • Delete: contains the delete mutation type source and target files
  • Insert: contains the insert mutation type source and target files

The files are seperated two parts: Formatted and Unformatted. Formatted files are the files that are formatted using the special tokens like NEWLINE, INDENT and DEDENT. Unformatted files are the files that are not formatted using the special tokens. We used the formatted files in our experiments.

Then train our model:

python Model/main.py

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published