PyEGo
Description
PyEGo: Inferring Environment Dependencies for Python Programs
EGo-system can be visited at http://39.105.157.9:10350/ , and here is the README of PyEGo command line tool
Introduction
PyEGo is a tool of automatically inferring environment dependencies for Python programs.
A Python program’s environment dependencies mainly consists of three parts:
- Compatible Python interpreter version;
- Dependent Python third-party packages;
- Dependent System libraries.
For example, the following snippet print emoji on the terminal:import emoji
print emoji.emojize('Python is :thumbs_up:')
This snippet is only compatible with Python2, because there are no parentheses after “print”.
If we run the snippet in Python3:$ python example/example.py
File "example/example.py", line 2
print emoji.emojize('Python is :thumbs_up:')
^
SyntaxError: invalid syntax
On the other hand, the snippet depends on a Python third-party package emoji.
If we run the snippet without installing emoji:$ python example.py
Traceback (most recent call last):
File "example/example.py", line 1, in <module>
import emoji
ImportError: No module named emoji
PyEGo can build a runtime environment for the snippet:$ python PyEGo.py -r example/example.py
And then, output a Dockerfile:FROM python:2.7
RUN sed -i s@/archive.ubuntu.com/@/mirrors.aliyun.com/@g /etc/apt/sources.list
RUN apt-get clean
RUN apt-get update
RUN pip install --upgrade pip
RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
RUN pip install emoji==0.6.0
ADD example.py example.py
# add CMD command to run your programs here
Add CMD instruction to run the snippet, build docker image:$ echo "CMD python example.py" >> example/Dockerfile
$ cd example
$ docker build -t ego .
Now, run it!$ docker run ego
Python is 👍
Installation
Install local
- Install Python>=3.6
- Install dependent Python packages:
$ pip install -r requirements.txt
- Install NEO4J>=3.5.13, <4
- Merge PyKG:
Our knowledge graph, PyKG, is split into 2 files because of file size limit, merge them before load it:
$ cat PyKG/PyKG.dumpa* >> PyKG.dump
- Load database(PyKG):
$ cp PyKG.dump /PATH/TO/NEO4J/data/databases/
$ cd /PATH/TO/NEO4J
$ bin/neo4j stop
$ bin/neo4j-admin load --from=data/databases/PyKG.dump
- Config PyEGo
Edit config.py, config neo4j connection:NEO4J_URI = "YOUR NEO4J URI"
NEO4J_PWD = "YOUR NEO4J PASSWORD"
Docker
We also provide a Docker image of PyEGo. Build Docker image by:$ docker build -t ego -f Docker/Dockerfile .
Instructions
Local
Start neo4j before running PyEGo:
$ cd /PATH/TO/NEO4J
$ bin/neo4j start
If you installed PyEGo local, you can use PyEGo by:
$ cd /PATH/TO/PyEGo
$ python PyEGo.py [-h] [-t OUTPUT_TYPE] [-p OUTPUT_PATH] -r PROGRAM_ROOT
- Program root can be either a single .py file or a Python project folder.
- PyEGo provides two types of output: Dockerfile, and dependency.json. Default output type is Dockerfile.
For a Dockerfile output, set –output_type=Dockerfile(-t Dockerfile), and for a json output, set –output_type=json.
- –output_path(-p) indicate the output path of the Dockerfile or dependency.json. PyEGo generates the file in the parent folder of PROGRAM_ROOT by default.
For more help, see:
$ python PyEGo.py -h
Docker
If you built Docker image of PyEGo, you can use PyEGo by:
$ docker run -v /PATH/TO/PROGRAM/ROOT:/INPUT/IN/CONTAINER \
-v /PATH/TO/OUTPUT:/OUTPUT/IN/CONTAINER \
ego /INPUT/IN/CONTAINER /OUTPUT/IN/CONTAINER
Replay our experiment
Experiment on Hard-gists
Experimental results are available in another repository, exp-gist.
Run PyEGo on Hard-gists
- Edit experiment/exp_config.py, config hard-gists root
EGO_GISTS_ROOT = "/YOUR/HARD/GISTS/ROOT/OF/PYEGO"
- Run PyEGo
$ cd /PATH/TO/PYEGO
$ python experiment/tests_gist.py --run
Compare PyEGo results with DockerizeMe and Pipreqs
- Run DockerizeMe and Pipreqs
We provide our experiment bash script of DockerizeMe and Pipreqs
- script/dockerizeme_gen_df.sh* uses DockerizeMe to generate Dockerfiles for gists. Note that run the script in DockerizeMe vagrant(Provided by DockerizeMe)
# Run the script in DockerizeMe vagrant
# Edit line2: cd /YOUR/HARD/GISTS/ROOT/OF/DOCKERIZEME
$ cd /PATH/TO/PyEGo/script
$ bash dockerizeme_gen_df.sh
- script/pipreqs_gen_df.sh* uses Pipreqs to generate requirements.txt and Dockerfiles for gists. Note that run the script after install pipreqs(pip install pipreqs) in Python2.7
# Edit line2 and line3: /YOUR/HARD/GISTS/ROOT/OF/PIPREQS
$ cd /PATH/TO/PyEGo/script
$ bash pipreqs_gen_df.sh
- script/dockerize_all.sh* builds Docker images by DockerizeMe-generated or Pipreqs-generated Dockerfile, runs Docker containers, checks results and records results in log.txt.
# Edit line2: cd /YOUR/HARD/GISTS/ROOT
$ cd /PATH/TO/PyEGo/script
$ bash dockerize_all.sh
- Edit experiment/exp_config.py, config hard-gists root and log path
```python
EGO_GISTS_ROOT = “/YOUR/HARD/GISTS/ROOT/OF/PYEGO”
ME_GISTS_ROOT = “/YOUR/HARD/GISTS/ROOT/OF/DOCKERIZEME”
REQS_GISTS_ROOT = “/YOUR/HARD/GISTS/ROOT/OF/PIPREQS”
EGO_GISTS_LOG = “/YOUR/HARD/GISTS/LOG/PATH/OF/PYEGO”
ME_GISTS_LOG = “/YOUR/HARD/GISTS/LOG/PATH/OF/DOCKERIZEME”
REQS_GISTS_LOG = “/YOUR/HARD/GISTS/LOG/PATH/OF/PIPREQS”
* Compare results
```$xslt
$ cd /PATH/TO/PYEGO
$ python experiment/tests_gist.py --compare
Experiment on Github dataset
Results of experiments are available in another repository, exp-github.
Download dataset
Our dataset is available on https://drive.google.com/file/d/1oHr6mbm0d5jIlVxeDkY6iyvow_Q63L_w/view.
- Unzip dataset:
$ tar -xvf GithubProjects.tar.gz
- Make copies for experiments:
$ cp GithubProjects /YOUR/GITHUB/DATASET/ROOT/OF/EGO
$ cp GithubProjects /YOUR/GITHUB/DATASET/ROOT/OF/PIPREQS/PYTHON38
$ cp GithubProjects /YOUR/GITHUB/DATASET/ROOT/OF/PIPREQS/PYTHON39
We need 3 copies of the dataset for our experiments. It’s OK to use only one copy, but results would be overwriten.Run PyEGo on Github dataset
- Edit experiment/exp_config.py github dataset root
EGO_GITHUB_ROOT = "/YOUR/GITHUB/DATASET/ROOT/OF/EGO"
- Run PyEGo
$ cd /PATH/TO/PYEGO
$ python experiment/tests_github.py --run --tool=PyEGo
Compare PyEGo results with DockerizeMe and Pipreqs
- Run pipreqs
Install pipreqs in Python3.6+
Edit experiment/exp_config.py, config github dataset root and pipreqs pathREQS_GITHUB_ROOT_38 = "/YOUR/GITHUB/DATASET/ROOT/OF/PIPREQS/PYTHON38"
REQS_GITHUB_ROOT_39 = "/YOUR/GITHUB/DATASET/ROOT/OF/PIPREQS/PYTHON39"
PIPREQS_PATH = "/YOUR/PIPREQS/PATH"
You can simply find pipreqs path by$ which pipreqs
Run pipreqs$ cd /PATH/TO/PYEGO
$ python experiment/tests_github.py --run --tool=Pipreqs --pyver=<3.8 or 3.9>
We provide results of DockerizeMe in exp-github.
- Edit experiment/exp_config.py, config github dataset root and log path
```python
EGO_GITHUB_ROOT = “/YOUR/GITHUB/DATASET/ROOT/OF/EGO”
REQS_GITHUB_ROOT_38 = “/YOUR/GITHUB/DATASET/ROOT/OF/PIPREQS/PYTHON38”
REQS_GITHUB_ROOT_39 = “/YOUR/GITHUB/DATASET/ROOT/OF/PIPREQS/PYTHON39”
ME_GITHUB_ROOT_38 = “/YOUR/GITHUB/DATASET/ROOT/OF/DOCKERIZEME/PYTHON38”
ME_GITHUB_ROOT_39 = “/YOUR/GITHUB/DATASET/ROOT/OF/DOCKERIZEME/PYTHON39”
EGO_GITHUB_LOG = “/YOUR/GITHUB/DATASET/LOG/PATH/OF/EGO”
REQS_GITHUB_LOG_38 = “/YOUR/GITHUB/DATASET/LOG/PATH/OF/PIPREQS/PYTHON38”
REQS_GITHUB_LOG_39 = “/YOUR/GITHUB/DATASET/LOG/PATH/OF/PIPREQS/PYTHON39”
ME_GITHUB_LOG_38 = “/YOUR/GITHUB/DATASET/LOG/PATH/OF/DOCKERIZEME/PYTHON38”
ME_GITHUB_LOG_39 = “/YOUR/GITHUB/DATASET/LOG/PATH/OF/DOCKERIZEME/PYTHON39”
* Compare results
```$xslt
$ cd /PATH/TO/PYEGO
$ python experiment/tests_github.py --compare
Experiment running PyEGo with different strategies
Results of experiments are available in exp-gist.
- Here are our 2 strategies:
id |
select strategy |
1(default) |
select-one |
2 |
select-all |
PyEGo
Description
PyEGo: Inferring Environment Dependencies for Python Programs
EGo-system can be visited at http://39.105.157.9:10350/ , and here is the README of PyEGo command line tool
Introduction
PyEGo is a tool of automatically inferring environment dependencies for Python programs.
A Python program’s environment dependencies mainly consists of three parts:
For example, the following snippet print emoji on the terminal: This snippet is only compatible with Python2, because there are no parentheses after “print”. If we run the snippet in Python3: On the other hand, the snippet depends on a Python third-party package emoji. If we run the snippet without installing emoji: PyEGo can build a runtime environment for the snippet: And then, output a Dockerfile: Add CMD instruction to run the snippet, build docker image: Now, run it!
Installation
Install local
Edit config.py, config neo4j connection:
Docker
We also provide a Docker image of PyEGo. Build Docker image by:Instructions
Local
Start neo4j before running PyEGo:
If you installed PyEGo local, you can use PyEGo by:
For a Dockerfile output, set –output_type=Dockerfile(-t Dockerfile), and for a json output, set –output_type=json.
Docker
If you built Docker image of PyEGo, you can use PyEGo by:
Replay our experiment
Experiment on Hard-gists
Experimental results are available in another repository, exp-gist.
Run PyEGo on Hard-gists
Compare PyEGo results with DockerizeMe and Pipreqs
We provide our experiment bash script of DockerizeMe and Pipreqs
EGO_GISTS_LOG = “/YOUR/HARD/GISTS/LOG/PATH/OF/PYEGO” ME_GISTS_LOG = “/YOUR/HARD/GISTS/LOG/PATH/OF/DOCKERIZEME” REQS_GISTS_LOG = “/YOUR/HARD/GISTS/LOG/PATH/OF/PIPREQS”
Experiment on Github dataset
Results of experiments are available in another repository, exp-github.
Download dataset
Our dataset is available on https://drive.google.com/file/d/1oHr6mbm0d5jIlVxeDkY6iyvow_Q63L_w/view.
Run PyEGo on Github dataset
Compare PyEGo results with DockerizeMe and Pipreqs
Install pipreqs in Python3.6+ Edit experiment/exp_config.py, config github dataset root and pipreqs path You can simply find pipreqs path by Run pipreqs We provide results of DockerizeMe in exp-github.
EGO_GITHUB_LOG = “/YOUR/GITHUB/DATASET/LOG/PATH/OF/EGO” REQS_GITHUB_LOG_38 = “/YOUR/GITHUB/DATASET/LOG/PATH/OF/PIPREQS/PYTHON38” REQS_GITHUB_LOG_39 = “/YOUR/GITHUB/DATASET/LOG/PATH/OF/PIPREQS/PYTHON39” ME_GITHUB_LOG_38 = “/YOUR/GITHUB/DATASET/LOG/PATH/OF/DOCKERIZEME/PYTHON38” ME_GITHUB_LOG_39 = “/YOUR/GITHUB/DATASET/LOG/PATH/OF/DOCKERIZEME/PYTHON39”
Experiment running PyEGo with different strategies
Results of experiments are available in exp-gist.