Security Attacks on LLM-based Code Completion Tools

This is the official repo of Security Attacks on LLM-based Code Completion Tools. (Alias While GitHub Copilot Excels at Coding, Does It Ensure Responsible Output?)

(Demo pic is generated by Bing.)

Important Notice

All the codebase provided in this repo is for research purposes ONLY. Please keep LOVE & PEACE.

Brief Abstract

The rapid development of large language models (LLMs) has significantly advanced code completion capabilities, giving rise to a new generation of LLM-based Code Completion Tools (LCCTs). Unlike general-purpose LLMs, these tools possess unique workflows, integrating multiple information sources as input and prioritizing code suggestions over natural language interaction, which introduces distinct security challenges. Additionally, LCCTs often rely on proprietary code datasets for training, raising concerns about the potential exposure of sensitive data. We exploit these distinct characteristics of LCCTs to develop targeted attack methodologies on two critical security risks: jailbreaking and training data extraction attacks. Our experimental results expose significant vulnerabilities within LCCTs, including a 99.4% success rate in jailbreaking attacks on GitHub Copilot and a 46.3% success rate on Amazon Q. Furthermore, We successfully extracted sensitive user data from GitHub Copilot, including $54$ real email addresses and $314$ physical addresses associated with GitHub usernames. Our study also demonstrates that these code-based attack methods are effective against general-purpose LLMs, such as the GPT series, highlighting a broader security misalignment in the handling of code by modern LLMs.

For more details, please refer to our paper.

Usage

Requirements

For LCCT attacks, please ensure you have already installed the GitHub Copilot and Amazon Q plugins from the VSCode market, and guarantee access to them. There are no package requirements.

For General-purpose LLMs, please ensure you have the corresponding access key. The requirements for OpenAI's GPT series have been listed.

pip install -r requirements.txt

Forbidden Questions

The dataset we employed in our experiments has been attached to the data folder.

Notes for Conducting Attacks on LCCTs

You may manually trigger LCCTs to complete the attack code for demonstration, if you want to make it efficient, we recommend designing a script based on Pyautogui. We will not provide this script due to concerns about any misuse.

Citation

If you use this codebase, or our idea inspires your work, welcome cite:

@misc{cheng2024securityattacksllmbasedcode,
      title={Security Attacks on LLM-based Code Completion Tools}, 
      author={Wen Cheng and Ke Sun and Xinyu Zhang and Wei Wang},
      year={2024},
      eprint={2408.11006},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2408.11006}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
attack_scripts		attack_scripts
data		data
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Security Attacks on LLM-based Code Completion Tools

Important Notice

Brief Abstract

Usage

Requirements

Forbidden Questions

Notes for Conducting Attacks on LCCTs

Citation

About

Releases

Packages

Languages

Sensente/Security-Attacks-on-LCCTs

Folders and files

Latest commit

History

Repository files navigation

Security Attacks on LLM-based Code Completion Tools

Important Notice

Brief Abstract

Usage

Requirements

Forbidden Questions

Notes for Conducting Attacks on LCCTs

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages