Skip to content
View PyEGo's full-sized avatar

Block or report PyEGo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
PyEGo/README.md

PyEGo

Description

PyEGo: Inferring Environment Dependencies for Python Programs

Introduction

PyEGo is a tool of automatically inferring environment dependencies for Python programs.
A Python program's environment dependencies mainly consists of three parts:

  • Compatible Python interpreter version;
  • Dependent Python third-party packages;
  • Dependent System libraries.
    For example, the following snippet print emoji on the terminal:
import emoji
print emoji.emojize('Python is :thumbs_up:')

This snippet is only compatible with Python2, because there are no parentheses after "print". If we run the snippet in Python3:

$ python example/example.py
File "example/example.py", line 2
  print emoji.emojize('Python is :thumbs_up:')
          ^
SyntaxError: invalid syntax

On the other hand, the snippet depends on a Python third-party package emoji. If we run the snippet without installing emoji:

$ python example.py 
Traceback (most recent call last):
  File "example/example.py", line 1, in <module>
    import emoji
ImportError: No module named emoji

PyEGo can build a runtime environment for the snippet:

$ python PyEGo.py -r example/example.py

And then, output a Dockerfile:

FROM python:2.7
RUN apt-get clean
RUN apt-get update
RUN pip install --upgrade pip
RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
RUN pip install emoji==0.6.0
ADD example.py example.py
# add CMD command to run your programs here

Add CMD instruction to run the snippet, build docker image:

$ echo "CMD python example.py" >> example/Dockerfile
$ cd example
$ docker build -t ego .

Now, run it!

$ docker run ego
Python is ๐Ÿ‘

Installation

Install local

  • Install Python>=3.6
  • Install dependent Python packages:
$ pip install -r requirements.txt
  • Install NEO4J>=3.5.13, <4
  • Merge PyKG: Our knowledge graph, PyKG, is split into 2 files because of file size limit, merge them before load it:
$ cat PyKG/PyKG.dump.a* >> PyKG.dump
  • Load database(PyKG):
$ cp PyKG.dump /PATH/TO/NEO4J/data/databases/
$ cd /PATH/TO/NEO4J
$ bin/neo4j stop
$ bin/neo4j-admin load --from=data/databases/PyKG.dump
  • Config PyEGo
    Edit config.py, config neo4j connection:
NEO4J_URI = "YOUR NEO4J URI"
NEO4J_PWD = "YOUR NEO4J PASSWORD"

Docker

We also provide a Docker image of PyEGo. Build Docker image by:

$ docker build -t ego -f Docker/Dockerfile .

Instructions

Local

Start neo4j before running PyEGo:

$ cd /PATH/TO/NEO4J
$ bin/neo4j start

If you installed PyEGo local, you can use PyEGo by:

$ cd /PATH/TO/PyEGo
$ python PyEGo.py [-h] [-t OUTPUT_TYPE] [-p OUTPUT_PATH] -r PROGRAM_ROOT
             
  • Program root can be either a single .py file or a Python project folder.
  • PyEGo provides two types of output: Dockerfile, and dependency.json. Default output type is Dockerfile.
    For a Dockerfile output, set --output_type=Dockerfile(-t Dockerfile), and for a json output, set --output_type=json.
  • --output_path(-p) indicate the output path of the Dockerfile or dependency.json. PyEGo generates the file in the parent folder of PROGRAM_ROOT by default. For more help, see:
$ python PyEGo.py -h

Docker

If you built Docker image of PyEGo, you can use PyEGo by:

$ docker run -v /PATH/TO/PROGRAM/ROOT:/INPUT/IN/CONTAINER \
             -v /PATH/TO/OUTPUT:/OUTPUT/IN/CONTAINER \
             ego /INPUT/IN/CONTAINER /OUTPUT/IN/CONTAINER

Replay our experiment

Experiment on Hard-gists

Experimental results are available in another repository, exp-gist.

Run PyEGo on Hard-gists

  • Edit experiment/exp_config.py, config hard-gists root
EGO_GISTS_ROOT = "/YOUR/HARD/GISTS/ROOT/OF/PYEGO"
  • Run PyEGo
$ cd /PATH/TO/PYEGO
$ python experiment/tests_gist.py --run

Compare PyEGo results with DockerizeMe and Pipreqs

  • Run DockerizeMe and Pipreqs
    We provide our experiment bash script of DockerizeMe and Pipreqs
    script/dockerizeme_gen_df.sh uses DockerizeMe to generate Dockerfiles for gists. Note that run the script in DockerizeMe vagrant(Provided by DockerizeMe)
# Run the script in DockerizeMe vagrant
# Edit line2: cd /YOUR/HARD/GISTS/ROOT/OF/DOCKERIZEME
$ cd /PATH/TO/PyEGo/script
$ bash dockerizeme_gen_df.sh

script/pipreqs_gen_df.sh uses Pipreqs to generate requirements.txt and Dockerfiles for gists. Note that run the script after install pipreqs(pip install pipreqs) in Python2.7

# Edit line2 and line3: /YOUR/HARD/GISTS/ROOT/OF/PIPREQS
$ cd /PATH/TO/PyEGo/script
$ bash pipreqs_gen_df.sh

script/dockerize_all.sh builds Docker images by DockerizeMe-generated or Pipreqs-generated Dockerfile, runs Docker containers, checks results and records results in log.txt.

# Edit line2: cd /YOUR/HARD/GISTS/ROOT
$ cd /PATH/TO/PyEGo/script
$ bash dockerize_all.sh
  • Edit experiment/exp_config.py, config hard-gists root and log path
EGO_GISTS_ROOT = "/YOUR/HARD/GISTS/ROOT/OF/PYEGO"
ME_GISTS_ROOT = "/YOUR/HARD/GISTS/ROOT/OF/DOCKERIZEME"
REQS_GISTS_ROOT = "/YOUR/HARD/GISTS/ROOT/OF/PIPREQS"

EGO_GISTS_LOG = "/YOUR/HARD/GISTS/LOG/PATH/OF/PYEGO"
ME_GISTS_LOG = "/YOUR/HARD/GISTS/LOG/PATH/OF/DOCKERIZEME"
REQS_GISTS_LOG = "/YOUR/HARD/GISTS/LOG/PATH/OF/PIPREQS"
  • Compare results
$ cd /PATH/TO/PYEGO
$ python experiment/tests_gist.py --compare

Experiment on Github dataset

Results of experiments are available in another repository, exp-github.

Download dataset

Our dataset is available on https://drive.google.com/file/d/1oHr6mbm0d5jIlVxeDkY6iyvow_Q63L_w/view.

  • Unzip dataset:
$ tar -xvf GithubProjects.tar.gz
  • Make copies for experiments:
$ cp GithubProjects /YOUR/GITHUB/DATASET/ROOT/OF/EGO
$ cp GithubProjects /YOUR/GITHUB/DATASET/ROOT/OF/PIPREQS

We need some copies of the dataset for our experiments. It's OK to use only one copy, but results would be overwriten.

Run PyEGo on Github dataset

  • Edit experiment/exp_config.py github dataset root
EGO_GITHUB_ROOT = "/YOUR/GITHUB/DATASET/ROOT/OF/EGO"
  • Run PyEGo
$ cd /PATH/TO/PYEGO
$ python experiment/tests_github.py --run --tool=PyEGo

Compare PyEGo results with DockerizeMe and Pipreqs

  • Run pipreqs
    Install pipreqs in Python3.6+ Edit experiment/exp_config.py, config github dataset root and pipreqs path
REQS_GITHUB_ROOT_39 = "/YOUR/GITHUB/DATASET/ROOT/OF/PIPREQS"
PIPREQS_PATH = "/YOUR/PIPREQS/PATH"

You can simply find pipreqs path by

$ which pipreqs

Run pipreqs

$ cd /PATH/TO/PYEGO
$ python experiment/tests_github.py --run --tool=Pipreqs

We provide results of DockerizeMe in exp-github.

  • Edit experiment/exp_config.py, config github dataset root and log path
EGO_GITHUB_ROOT = "/YOUR/GITHUB/DATASET/ROOT/OF/EGO"
REQS_GITHUB_ROOT_39 = "/YOUR/GITHUB/DATASET/ROOT/OF/PIPREQS"
ME_GITHUB_ROOT_39 = "/YOUR/GITHUB/DATASET/ROOT/OF/DOCKERIZEME"

EGO_GITHUB_LOG = "/YOUR/GITHUB/DATASET/LOG/PATH/OF/EGO"
REQS_GITHUB_LOG_39 = "/YOUR/GITHUB/DATASET/LOG/PATH/OF/PIPREQS"
ME_GITHUB_LOG_39 = "/YOUR/GITHUB/DATASET/LOG/PATH/OF/DOCKERIZEME"
  • Compare results
$ cd /PATH/TO/PYEGO
$ python experiment/tests_github.py --compare

Experiment running PyEGo with different strategies

Results of experiments are available in exp-gist.

  • Here are our 2 strategies:
id select strategy
1(default) select-one
2 select-all
  • Edit experiment/exp_config.py, config hard-gist root
EGO_GISTS_ROOT = "/YOUR/HARD/GIST/DATASET/ROOT/OF/PYEGO/STRATEGY1"
EGO_GISTS_ROOT_2 ="/YOUR/HARD/GIST/DATASET/ROOT/OF/PYEGO/STRATEGY2"
  • Run strategy on Hard-gists:
$ cd /PATH/TO/PYEGO
$ python experiment/tests_strategies.py --strategy=X

Popular repositories Loading

  1. PyEGo PyEGo Public

    PyEGo: environment dependency inference for Python programs

    Python 15 5

  2. exp-github exp-github Public

    Experimental results on Github dataset

    Dockerfile 2

  3. exp-gist exp-gist Public

    Experiment results on hard-gists dataset

    Python 1