Themis is a collection of real-world, reproducible crash bugs (collected from
open-source Android apps) and a unified, extensible infrastructure
for benchmarking automated GUI testing for Android and beyond.
@inproceedings{themis,
author = {Ting Su and
Jue Wang and
Zhendong Su},
title = {Benchmarking Automated GUI Testing for Android against Real-World Bugs},
booktitle = {Proceedings of 29th ACM Joint European Software Engineering Conference and Symposium
on the Foundations of Software Engineering (ESEC/FSE)},
pages = {119--130},
year = {2021},
doi = {10.1145/3468264.3468620}
}
News
- We are now actively developping an automated analysis tool to understand the effectiveness of GUI testing tools. We will release the tool soon!
- Themis is using by ByteDance’s FastBot to evaluate and improve its bug finding abilities! (Nov. 2021)
- Themis’s paper was accepted to ESEC/FSE’21! (Aug. 2021)
- We released Themis’s dataset and infrastructure (Mar. 2021)
1. Contents of Themis
Themis’s bug dataset
Themis now contains 52 reproducible crash bugs. All these bugs are labeled by
the app developers as “critical bugs” (i.e., important bugs), which affected the
major app functionalities and the larger percentage of app users.
For each bug, we provide:
The original bug report of the bug
An executable APK (Jacoco-instrumented for coverage collection),
A bug-reproducing script and a video
The stack trace of the bug
Metadata for supporting evaluation (e.g., app login scripts and configuration files used by Themis for code coverage computation)
Themis contains a unified, extensible infrastructure for benchmarking automated GUI testing
for Android. Any testing tool can be easily integrated into this infrastructure and
deployed on a given machine with one line of command.
usage: themis.py [-h] [--avd AVD_NAME] [--apk APK] [-n NUMBER_OF_DEVICES] [--apk-list APK_LIST] -o O [--time TIME] [--repeat REPEAT] [--max-emu MAX_EMU] [--no-headless] [--login LOGIN_SCRIPT]
[--wait IDLE_TIME] [--monkey] [--ape] [--timemachine] [--combo] [--combo-login] [--humanoid] [--stoat] [--sapienz] [--qtesting] [--weighted] [--offset OFFSET]
optional arguments:
-h, --help show this help message and exit
--avd AVD_NAME the device name
--apk APK
-n NUMBER_OF_DEVICES number of emulators created for testing, default: 1
--apk-list APK_LIST list of apks under test
-o O output dir
--time TIME the fuzzing time in hours (e.g., 6h), minutes (e.g., 6m), or seconds (e.g., 6s), default: 6h
--repeat REPEAT the repeated number of runs, default: 1
--max-emu MAX_EMU the maximum allowed number of emulators
--no-headless show gui
--login LOGIN_SCRIPT the script for app login
--wait IDLE_TIME the idle time to wait before starting the fuzzing
--monkey
--ape
--timemachine
--combo
--combo-login
--humanoid
--stoat
--sapienz
--qtesting
--offset OFFSET device offset number w.r.t emulator-5554
Implementation details
The directory structure of Themis is as follows:
Themis
|
|--- esecfse2021-paper1009.pdf the accepted paper of Themis
|
|--- scripts: scripts for running testing tools and analyzing testing results.
|
|--- themis.py: the main script for deploying themis.
|
|--- check_crash.py: the script to check whether a tool find the bugs.
|
|--- compute_coverage.py: the script to compute the code coverage achieved by a tool.
|
|--- compare_bug_triggering_time.py: the script to pairwisely compare bug-triggering times between different tools.
|
|--- run_monkey.sh the internal shell script to invoke Monkey, Ape, Humanoid, ComboDroid, TimeMachine and Q-testing
|--- run_ape.sh
|--- run_humanoid.sh
|--- run_qtesting.sh
|--- run_timemachine.sh
|--- run_combodroid.sh
|
|--- tools: the supported auotmated testing tools.
|
|--- Humanoid the tool Humanoid
|
|--- TimeMachine the tool TimeMachine
|
|--- Q-testing the tool Q-testing
|
|--- Ape the tool Ape
|
|--- ComboDroid the tool ComboDroid
|
|--- Monkey the tool Monkey
|
|--- app_1: The bugs collected from app_1.
|
|--- app_2: The bugs collected from app_2.
|
|--- ...
|
|--- app_N The bugs collected from app_n.
2. Instructions for Using Themis
The instructions in this section was used for artifact evaluation.
In the artifact evaluation, we run Themis in Virtual Machine with all the required stuffs already installed and prepared.
You can follow the instructions in this section to get familar with Themis.
You can download the VM image Themis.ova (15GB) from this link on Google Drive.
For using Themis for your own research, we recommend you to build and run Themis on a native machine (see 3. Instructions for Reusing Themis).
Prerequisite
You need to enable the virtualization technology in your computer’s BIOS (see this link for how to enable the virtualization technology). Most computers by default already have this virtualization option turned on.
Your computer needs at least 8 CPU cores (4 cores may also work), 16G of memory, and at least 40G of storage.
We built our artifact by using VirtualBox v6.1.20. Please install VirtualBox based on your OS type. After installing VirtualBox, you may need to reboot the computer.
Setup Virtual Machine
Open VirtualBox, click “File”, click “Import Appliance”, then import the file named Themis.ova (this step may take about five to ten minutes to complete).
After the import is completed, you should see “vm” as one of the listed VMs in your VirtualBox.
Click “Settings”, click “System”, click “Processor”, and allocate 4-8 CPU cores (8-cores is preferred), and check “Enable Nested VT-x/AMD-V”. Click “Memory”, and set memory size to at least 8GB (16GB is preferred). Overall, you can allocate more memory and CPU cores if your system permits to ensure smooth evaluation.
Run the virtual machine. The username is themis and the password is themis-benchmark.
If you could not run the VM with “Nested VT-x/AMD-V” option enabled in VirtualBox, you should check whether the Hyper-V option is enabled. You can disable the Hyper-V option (see this link for more information about this).
Getting Started
Take the quick test to get familar with Themis and validate whether it is ready.
Step 1. open a terminal and switch to Themis’s scripts directory
cd the-themis-benchmark/scripts
Step 2. run Monkey on one target bug for 10 minutes
--avd Android7.1 specifies the name of the emulator (which has already been created in the VM).
--apk ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk specifies the target bug which is ActivityDiary‘s bug #118 in v1.1.8.
--time 10m allocates 10 minutes for the testing tool to find the bug
-o ../monkey-results/ specifies the output directory of testing results
--monkey specifies the testing tool
Expected results: you should see (1) an Android emulator is started, (2) the app ActivityDiary is installed and started, (3) Monkey is started to test the app, (4) the following sample texts are outputted on the terminal during testing, and (5) the emulator is automatically closed at the end.
**click to see the sample output on the terminal of a successful run.**
If Step 2 succeeds, you can see the outputs under ../monkey-results/ (i.e., /home/themis/the-themis-benchmark/monkey-results).
$ cd ../monkey-results/
$ ls
ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2020-06-24-20:39:27/ # the output directory
$ cd ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2020-06-24-20:39:27/
$ ls
coverage_1.ec # the coverage data file (used for computing coverage)
coverage_2.ec
install.log # the log of app installation
logcat.log # the system log of emulator (this file contains the crash stack traces if the target bug was triggered)
monkey.log # the log of Monkey (including the events that Monkey generates)
monkey_testing_time_on_emulator.txt # the first line is the starting testing time, and the second line is the ending testing time
How to validate: If you can see all these files and these files are non-empty (use ls -l to check), the quick test succeeds. Note that the number of coverage data files (e.g., coverage_1.ec) varies according to the testing time. In practice, Themis notifies an app to dump coverage data every five minutes.
Please note that the outuput files of different testing tools may vary (but all the other tools have these similar types of output files like Monkey).
Detailed Instructions
I. The supported tools
Themis now supports and maintains 6 state-of-the-art fully-automated testing tools for Android (see below). These tools can be cloned from Themis’s repositories and are put under the-themis-benchmark/tools.
Note that these tools are the modified/enhanced versions of their originals because we coordinated with the authors of these tools to assure correct and rigorous setup (e.g., report the encountered tool bugs to the authors for fixing). We tried our best efforts to minimize the bias and ensure that each tool is at “its best state” in bug finding (see Section 3.2 in the Themis’s paper).
Specifically, we track the tool modifications to facilitate review and validation. We spent slight efforts to integrate Monkey, Ape, Humanoid and Q-testing into Themis. Combodroid was modifled by its author and intergrated into Themis, while TimeMachine was modified by us and later verified by its authors (view this commit to check all the modifications/enhancements made in TimeMachine).
II. Running the supported tools
** In the following, we take Monkey as a tool and ActivityDiary-1.1.8-debug-#118.apk as a target bug to illustrate
how to replicate the whole evaluation, and how to validate the artifact if you do not have enough resources/time**
[Replicate the whole evaluation]:
Step 1. run Monkey on ActivityDiary-1.1.8-debug-#118.apk for 6 hours and repeat this process for 5 runs.
This step will take 30 hours to finish because of 5 runs of testing on one emulator. We do not recommend to run
more than one emulators in the VM because of the limited memory and performance. If you do not have enough resource/time, see the instruction below.
--avd Android7.1 specifies the name of the emulator (which has already been created in the VM).
--apk ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk specifies the target bug which is ActivityDiary‘s bug #118 in v1.1.8.
-n 1 denotes one emulator instance will be created (in practice at most 16 emulators are allowed to run in parallel on one native machine)
--repeat 5 denotes the testing process will be repeated for 5 runs (these 5 runs will be distributed to the available emulator instances)
--time 6h allocates 6 hours for the testing tool to find the bug
-o ../monkey-results/ specifies the output directory of testing results
--monkey specifies the testing tool
[Validate the artifact if you do not have enough resources/time: run 1-2 tools on 1-2 bugs at your will with limited testing time]:
For example, in Step 1, we recommend you to shorten the testing time (e.g., use --time 1h for 1 hour or --time 30m for 30 minutes).
Thus, you can use the following command (this step will take 2 hours to finish because of 2 runs of testing on one emulator).
Step 2. When the testing terminates, you can inspect whether the target bug was found or not in each run, how long does it take to find the bug, and how many times the bug was found by using the command below.
--app ActivityDiary specifies the target app (You can omit this option to check all the tested apps and their bugs).
--id \#118 specifies the target bug id (You can omit this option to check all the target bugs of app ActivityDiary).
--simple outputs the checking result to the terminal for quick check (You can substitute --simple with --csv FILE_PATH to output the checking results into a CSV file).
Use -h to see the detailed list of command options.
An example output could be (In this case, the target bug, ActivityDiary‘s #118, was not found by Moneky in all the five runs):
**click to see the sample output.**
ActivityDiary
=========
[ActivityDiary, #118] scanning (ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5554.Android7.1#2020-06-24-20:39:27)
[ActivityDiary, #118] testing time: 2020-06-24-20:39:33
the start testing time is: 2020-06-24-20:39:33
the start testing time (parsed) is: 2020-06-24 20:39:33
[ActivityDiary, #118] scanning (ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5556.Android7.1#2020-06-24-20:39:32)
[ActivityDiary, #118] testing time: 2020-06-24-20:39:36
the start testing time is: 2020-06-24-20:39:36
the start testing time (parsed) is: 2020-06-24 20:39:36
[ActivityDiary, #118] scanning (ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5558.Android7.1#2020-06-24-20:39:37)
[ActivityDiary, #118] testing time: 2020-06-24-20:39:43
the start testing time is: 2020-06-24-20:39:43
the start testing time (parsed) is: 2020-06-24 20:39:43
[ActivityDiary, #118] scanning (ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5560.Android7.1#2020-06-24-20:39:42)
[ActivityDiary, #118] testing time: 2020-06-24-20:39:47
the start testing time is: 2020-06-24-20:39:47
the start testing time (parsed) is: 2020-06-24 20:39:47
[ActivityDiary, #118] scanning (ActivityDiary-1.1.8-debug-#118.apk.monkey.result.emulator-5562.Android7.1#2020-06-24-20:39:47)
[ActivityDiary, #118] testing time: 2020-06-24-20:39:52
the start testing time is: 2020-06-24-20:39:52
the start testing time (parsed) is: 2020-06-24 20:39:52
Another example output could be (In this case, the target bug, AnkiDroid‘s #4451, was found by 1 time after running Monkey for 55 minutes in one run):
**click to see the sample output.**
AnkiDroid
=========
[AnkiDroid, #4451] scanning (AnkiDroid-debug-2.7beta1-#4451.apk.monkey.result.emulator-5554.Android7.1#2020-06-26-00:59:31)
[AnkiDroid, #4451] testing time: 2020-06-26-00:59:32
[AnkiDroid, #4451] testing time: 2020-06-26-06:59:32
the start testing time is: 2020-06-26-00:59:32
the start testing time (parsed) is: 2020-06-26 00:59:32
[AnkiDroid, #4451] scanning (AnkiDroid-debug-2.7beta1-#4451.apk.monkey.result.emulator-5576.Android7.1#2020-06-26-12:04:55)
[AnkiDroid, #4451] testing time: 2020-06-26-12:04:57
[AnkiDroid, #4451] testing time: 2020-06-26-18:04:57
the start testing time is: 2020-06-26-12:04:57
the start testing time (parsed) is: 2020-06-26 12:04:57
[AnkiDroid, #4451] scanning (AnkiDroid-debug-2.7beta1-#4451.apk.monkey.result.emulator-5574.Android7.1#2020-06-26-12:04:55)
[AnkiDroid, #4451] testing time: 2020-06-26-12:04:56
[AnkiDroid, #4451] testing time: 2020-06-26-18:04:56
the start testing time is: 2020-06-26-12:04:56
the start testing time (parsed) is: 2020-06-26 12:04:56
[AnkiDroid, #4451] scanning (AnkiDroid-debug-2.7beta1-#4451.apk.monkey.result.emulator-5558.Android7.1#2020-06-26-00:59:38)
[AnkiDroid, #4451] testing time: 2020-06-26-00:59:39
[AnkiDroid, #4451] testing time: 2020-06-26-06:59:40
the start testing time is: 2020-06-26-00:59:39
the start testing time (parsed) is: 2020-06-26 00:59:39
[AnkiDroid, #4451] scanning (AnkiDroid-debug-2.7beta1-#4451.apk.monkey.result.emulator-5556.Android7.1#2020-06-26-00:59:33)
[AnkiDroid, #4451] testing time: 2020-06-26-00:59:34
[AnkiDroid, #4451] testing time: 2020-06-26-06:59:35
the start testing time is: 2020-06-26-00:59:34
the start testing time (parsed) is: 2020-06-26 00:59:34
[AnkiDroid, #4451] the crash was triggered (1) times
[AnkiDroid, #4451] the time duration: [‘55’] (mins)
Notes
(1) You can substitute --monkey with --ape or --combo in the command line in Step 1 to directly run the corresponding tool. You may need to change the output directory -o ../monkey-results/ to a distinct directory, e.g., -o ../ape-results. You can follow the similar steps described in Step 2 to inspect whether the target bug was found or not and the related info.
(2) Specifically, for humanoid, before running, you need to setup the specific virtual Python environment of Humanoid. Open a terminal, and run:
cd /home/themis/the-themis-benchmark/tools/Humanoid-tool
source venv/bin/activate # Humanoid depends on tensorflow 1.12, which requires specific Python version
cd Humanoid
python3 agent.py -c config.json # start the server of Humanoid
Open a new terminal, run Humanoid (which internally runs droidbot) on the emulator Android7.1_Humanoid (with specific screen size):
Remember to execute deactivate when you finish the running to exit from the specific virtual Python environment.
(3) For Q-testing, before running, you need to setup the specific virtual Python environment of Q-testing. Open a terminal, and run:
cd /home/themis/the-themis-benchmark/tools/Q-testing
source venv/bin/activate # Q-testing depends on Python 2.7
And then, run the command line:
cd /home/themis/the-themis-benchmark/scripts/
python3 themis.py --no-headless --avd Android7.1 --apk ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk --time 10m -o ../qtesting-results/ --qtesting
ing
Remember to execute deactivate when you finish the running to exit from the specific virtual Python environment.
(4) For TimeMachine, we cannot build TimeMachine within this VM because TimeMachine tests apps by using Docker on which it runs another layer of VirtualBox. Thus, we strongly recommend to build TimeMachine on the native machines if you want to evaluate it (see the instructions in its repo: https://github.com/the-themis-benchmarks/TimeMachine).
(5) If the app under test requires user login (see the Table of bug dataset), you should specify the login script. Themis will call the login script before testing. For example, if we run Monkey on ../nextcloud/nextcloud-#5173.apk which requires user login, the command line should be:
--login ../nextcloud/login-#5173.py specifies the login script (which will be executed before GUI testing)
In practice, we use the emulator snapshot to save the app login state directly.
3. Instructions for Reusing Themis
Build and Use Themis from Scratch
In practice, we strongly recommend the users to setup our artifact on local native machines or remote servers rather than virtual machines to ensure (1) the optimal testing performance and (2) evaluation efficiency. Thus, we provide the instructions to setup Themis from scratch.
Prerequisite
Ubuntu 18.04/20.04
Python 3
Android environment (Android 7.1 or above)
Docker (needed by TimeMachine)
Steps
create an Android emulator before running Themis (see this link for creating an emulator using avdmanager).
An example: create an Android emulator Android7.1 with SDK version 7.1 (API level 25), X86 ABI image and Google APIs:
(optional) modify the emulator configuration to ensure optimal testing performance of testing tools:
In our evaluation, we set an emulator with 2GB RAM, 1GB SdCard, 1GB internal storage and 256MB heap size (the file for modification usually is: ~/.android/avd/Android7.1.avd/config.ini)
install uiautomator2, which is used for executing login scripts
pip3 install --upgrade --pre uiautomator2
If you run Themis on remote servers, please omit the option --no-headless which turns off the emulator GUI.
Install all the necessary dependecies required by the respective testing tools. Please see the README.md of each tool in its Themis’s repository.
We provided the detailed building instructions.
Welcome your contribution!
Extend Themis for our research community
1. Add new crash bugs into Themis
Take ActivityDiary-1.1.8-debug-#118.apk as an example, the basic steps to add such a bug into Themis’s dataset include:
build the buggy app version into an executable apk file (i.e., ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk, where ActivityDiary is the app name, 1.1.8 is the code version, #118 is the original issue id.)
reproduce the bug and record the stack trace (i.e., ActivityDiary/crash_stack_#118.txt)
write a bug-triggering script in uiautomator2 and record a bug-triggering video (i.e., ActivityDiary/script-#118.py and ActivityDiary/video-#118.mp4)
add a JSON file to facilitate coverage computation at runtime (this step is only required by TimeMachine) which describes its class files and source files (i.e., ActivityDiary/class_files.json)
The basic steps to add such a bug into Themis’s infrastructure include:
In scripts/check_crash.py, one should add the app name into list ALL_APPS and the crash signature (i.e., the crash type info and the partial crash trace related to the app itself) into dict app_crash_data.
In the future, we plan to (1) add the crash bugs from other existing benchmarks, (2) add crash bugs with different levels of severity rather than only critical ones.
2. Add new testing tools into Themis
Take Monkey as an example, the basic steps to add a new tool into Themis’s infrastructure include:
add an internal shell script to invoke the new tool (see scripts/run_monkey.sh) which defines (1) the concrete command line of invoking the tool, (2) the outputs of the tool, and (3) the tool-specific configurations before running
add the call of this shell script in scripts/themis.py (see function run_monkey)
add the code of parsing the outputs of the new tool in scripts/check_crash.py.
In fact, we already have successfully integrated FastBot, an industrial testing tool developed by ByteDance, into Themis. See the script for supporting FastBot.
3. Optimize and enhance existing supported tools
In Themis’s paper, Section 4.2 and 4.3 point out many optimization opportunities and future research for improving existing testing tools. By using Themis,
The tool authors or other researchers can debug/validate the tool improvement and evaluate/compare with new testing tools
We can contribute new enhancement features to the original tools by pull requests because Themis forked the testing tools from their original repositories.
usage: compute_coverage.py [-h] -o O [-v] [--monkey] [--ape] [--timemachine] [--combo] [--humandroid] [--qtesting] [--stoat] [--app APP_NAME] [--id ISSUE_ID] [--acc_csv ACC_CSV] [--single_csv SINGLE_CSV]
[--average_csv AVERAGE_CSV]
optional arguments:
-h, --help show this help message and exit
-o O the output directory of testing results
-v
--monkey
--ape
--timemachine
--combo
--humandroid
--qtesting
--app APP_NAME
--id ISSUE_ID
--acc_csv ACC_CSV compute the accumulative coverage of all runs
--single_csv SINGLE_CSV
compute the coverage of single runs
--average_csv AVERAGE_CSV
compute the average coverage of all runs
By leveraging the results, one can inspect the detailed coverage report generated by Jacoco.
5. Oher research purposes
Themis can also benefit other research (e.g., fault localization, program repair, etc.)
We welcome any feedback or questions on Themis. Free feel to share your ideas, open issues or pull requests. We are actively maintaining Themis
to benefit our community.
The Themis Benchmark
Themis is a collection of real-world, reproducible crash bugs (collected from open-source Android apps) and a unified, extensible infrastructure for benchmarking automated GUI testing for Android and beyond.
Publication (Presentation Video)
[1] “Benchmarking Automated GUI Testing for Android against Real-World Bugs“. Ting Su, Jue Wang, Zhendong Su. 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2021)
News
- We are now actively developping an automated analysis tool to understand the effectiveness of GUI testing tools. We will release the tool soon!
- Themis is using by ByteDance’s FastBot to evaluate and improve its bug finding abilities! (Nov. 2021)
- Themis’s paper was accepted to ESEC/FSE’21! (Aug. 2021)
- We released Themis’s dataset and infrastructure (Mar. 2021)
1. Contents of Themis
Themis’s bug dataset
Themis now contains 52 reproducible crash bugs. All these bugs are labeled by the app developers as “critical bugs” (i.e., important bugs), which affected the major app functionalities and the larger percentage of app users.
For each bug, we provide:
The original bug report of the bug
An executable APK (Jacoco-instrumented for coverage collection),
A bug-reproducing script and a video
The stack trace of the bug
Metadata for supporting evaluation (e.g., app login scripts and configuration files used by Themis for code coverage computation)
The app source code w.r.t each bug
List of crash bugs
Themis’s Infrastructure
Themis contains a unified, extensible infrastructure for benchmarking automated GUI testing for Android. Any testing tool can be easily integrated into this infrastructure and deployed on a given machine with one line of command.
List of Supported Tools
The command line for deployment:
Implementation details
The directory structure of Themis is as follows:
2. Instructions for Using Themis
The instructions in this section was used for artifact evaluation. In the artifact evaluation, we run Themis in Virtual Machine with all the required stuffs already installed and prepared. You can follow the instructions in this section to get familar with Themis. You can download the VM image
Themis.ova
(15GB) from this link on Google Drive.For using Themis for your own research, we recommend you to build and run Themis on a native machine (see 3. Instructions for Reusing Themis).
Prerequisite
Setup Virtual Machine
Themis.ova
(this step may take about five to ten minutes to complete).themis
and the password isthemis-benchmark
.Getting Started
Take the quick test to get familar with Themis and validate whether it is ready.
Step 1. open a terminal and switch to Themis’s scripts directory
Step 2. run Monkey on one target bug for 10 minutes
Here,
--no-headless
shows the emulator GUI.--avd Android7.1
specifies the name of the emulator (which has already been created in the VM).--apk ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk
specifies the target bug which isActivityDiary
‘s bug#118
inv1.1.8
.--time 10m
allocates 10 minutes for the testing tool to find the bug-o ../monkey-results/
specifies the output directory of testing results--monkey
specifies the testing toolExpected results: you should see (1) an Android emulator is started, (2) the app
ActivityDiary
is installed and started, (3) Monkey is started to test the app, (4) the following sample texts are outputted on the terminal during testing, and (5) the emulator is automatically closed at the end.**click to see the sample output on the terminal of a successful run.**
Step 3. inspect the output files
If Step 2 succeeds, you can see the outputs under
../monkey-results/
(i.e.,/home/themis/the-themis-benchmark/monkey-results
).How to validate: If you can see all these files and these files are non-empty (use
ls -l
to check), the quick test succeeds. Note that the number of coverage data files (e.g.,coverage_1.ec
) varies according to the testing time. In practice, Themis notifies an app to dump coverage data every five minutes. Please note that the outuput files of different testing tools may vary (but all the other tools have these similar types of output files likeMonkey
).Detailed Instructions
I. The supported tools
Themis now supports and maintains 6 state-of-the-art fully-automated testing tools for Android (see below). These tools can be cloned from Themis’s repositories and are put under
the-themis-benchmark/tools
.Monkey
: distributed with Android SDKsApe
: https://github.com/the-themis-benchmarks/ape-bincombodroid
: https://github.com/the-themis-benchmarks/combodroidHumanoid
: https://github.com/the-themis-benchmarks/Humanoid, which depends ondroidbot
(https://github.com/the-themis-benchmarks/droidbot/tree/themis-branch)Q-testing
: https://github.com/the-themis-benchmarks/Q-testingTimeMachine
: https://github.com/the-themis-benchmarks/TimeMachineNote that these tools are the modified/enhanced versions of their originals because we coordinated with the authors of these tools to assure correct and rigorous setup (e.g., report the encountered tool bugs to the authors for fixing). We tried our best efforts to minimize the bias and ensure that each tool is at “its best state” in bug finding (see Section 3.2 in the Themis’s paper).
Specifically, we track the tool modifications to facilitate review and validation. We spent slight efforts to integrate
Monkey
,Ape
,Humanoid
andQ-testing
into Themis.Combodroid
was modifled by its author and intergrated into Themis, whileTimeMachine
was modified by us and later verified by its authors (view this commit to check all the modifications/enhancements made inTimeMachine
).II. Running the supported tools
** In the following, we take
Monkey
as a tool andActivityDiary-1.1.8-debug-#118.apk
as a target bug to illustrate how to replicate the whole evaluation, and how to validate the artifact if you do not have enough resources/time**[Replicate the whole evaluation]:
Step 1. run
Monkey
onActivityDiary-1.1.8-debug-#118.apk
for 6 hours and repeat this process for 5 runs. This step will take 30 hours to finish because of 5 runs of testing on one emulator. We do not recommend to run more than one emulators in the VM because of the limited memory and performance. If you do not have enough resource/time, see the instruction below.Here,
--no-headless
shows the emulator GUI--avd Android7.1
specifies the name of the emulator (which has already been created in the VM).--apk ../ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk
specifies the target bug which isActivityDiary
‘s bug#118
inv1.1.8
.-n 1
denotes one emulator instance will be created (in practice at most 16 emulators are allowed to run in parallel on one native machine)--repeat 5
denotes the testing process will be repeated for 5 runs (these 5 runs will be distributed to the available emulator instances)--time 6h
allocates 6 hours for the testing tool to find the bug-o ../monkey-results/
specifies the output directory of testing results--monkey
specifies the testing tool[Validate the artifact if you do not have enough resources/time: run 1-2 tools on 1-2 bugs at your will with limited testing time]:
For example, in Step 1, we recommend you to shorten the testing time (e.g., use
--time 1h
for 1 hour or--time 30m
for 30 minutes). Thus, you can use the following command (this step will take 2 hours to finish because of 2 runs of testing on one emulator).Step 2. When the testing terminates, you can inspect whether the target bug was found or not in each run, how long does it take to find the bug, and how many times the bug was found by using the command below.
Here,
--app ActivityDiary
specifies the target app (You can omit this option to check all the tested apps and their bugs).--id \#118
specifies the target bug id (You can omit this option to check all the target bugs of appActivityDiary
).--simple
outputs the checking result to the terminal for quick check (You can substitute--simple
with--csv FILE_PATH
to output the checking results into a CSV file).-h
to see the detailed list of command options.An example output could be (In this case, the target bug,
ActivityDiary
‘s#118
, was not found by Moneky in all the five runs):**click to see the sample output.**
Another example output could be (In this case, the target bug,
AnkiDroid
‘s#4451
, was found by 1 time after running Monkey for 55 minutes in one run):**click to see the sample output.**
Notes
(1) You can substitute
--monkey
with--ape
or--combo
in the command line in Step 1 to directly run the corresponding tool. You may need to change the output directory-o ../monkey-results/
to a distinct directory, e.g.,-o ../ape-results
. You can follow the similar steps described in Step 2 to inspect whether the target bug was found or not and the related info.(2) Specifically, for
humanoid
, before running, you need to setup the specific virtual Python environment of Humanoid. Open a terminal, and run:Open a new terminal, run
Humanoid
(which internally runsdroidbot
) on the emulatorAndroid7.1_Humanoid
(with specific screen size):Remember to execute
deactivate
when you finish the running to exit from the specific virtual Python environment.(3) For
Q-testing
, before running, you need to setup the specific virtual Python environment ofQ-testing
. Open a terminal, and run:And then, run the command line:
Remember to execute
deactivate
when you finish the running to exit from the specific virtual Python environment.(4) For
TimeMachine
, we cannot buildTimeMachine
within this VM becauseTimeMachine
tests apps by using Docker on which it runs another layer of VirtualBox. Thus, we strongly recommend to buildTimeMachine
on the native machines if you want to evaluate it (see the instructions in its repo: https://github.com/the-themis-benchmarks/TimeMachine).(5) If the app under test requires user login (see the Table of bug dataset), you should specify the login script. Themis will call the login script before testing. For example, if we run
Monkey
on../nextcloud/nextcloud-#5173.apk
which requires user login, the command line should be:Here,
--login ../nextcloud/login-#5173.py
specifies the login script (which will be executed before GUI testing)3. Instructions for Reusing Themis
Build and Use Themis from Scratch
In practice, we strongly recommend the users to setup our artifact on local native machines or remote servers rather than virtual machines to ensure (1) the optimal testing performance and (2) evaluation efficiency. Thus, we provide the instructions to setup Themis from scratch.
Prerequisite
TimeMachine
)Steps
Android7.1
with SDK version 7.1 (API level 25), X86 ABI image and Google APIs:In our evaluation, we set an emulator with 2GB RAM, 1GB SdCard, 1GB internal storage and 256MB heap size (the file for modification usually is:
~/.android/avd/Android7.1.avd/config.ini
)--no-headless
which turns off the emulator GUI.Welcome your contribution!
Extend Themis for our research community
1. Add new crash bugs into Themis
Take
ActivityDiary-1.1.8-debug-#118.apk
as an example, the basic steps to add such a bug into Themis’s dataset include:ActivityDiary/ActivityDiary-1.1.8-debug-#118.apk
, whereActivityDiary
is the app name,1.1.8
is the code version,#118
is the original issue id.)ActivityDiary/crash_stack_#118.txt
)uiautomator2
and record a bug-triggering video (i.e.,ActivityDiary/script-#118.py
andActivityDiary/video-#118.mp4
)TimeMachine
) which describes its class files and source files (i.e.,ActivityDiary/class_files.json
)The basic steps to add such a bug into Themis’s infrastructure include:
scripts/check_crash.py
, one should add the app name into listALL_APPS
and the crash signature (i.e., the crash type info and the partial crash trace related to the app itself) into dictapp_crash_data
.In the future, we plan to (1) add the crash bugs from other existing benchmarks, (2) add crash bugs with different levels of severity rather than only critical ones.
2. Add new testing tools into Themis
Take
Monkey
as an example, the basic steps to add a new tool into Themis’s infrastructure include:scripts/run_monkey.sh
) which defines (1) the concrete command line of invoking the tool, (2) the outputs of the tool, and (3) the tool-specific configurations before runningscripts/themis.py
(see functionrun_monkey
)scripts/check_crash.py
.In fact, we already have successfully integrated FastBot, an industrial testing tool developed by ByteDance, into Themis. See the script for supporting FastBot.
3. Optimize and enhance existing supported tools
In Themis’s paper, Section 4.2 and 4.3 point out many optimization opportunities and future research for improving existing testing tools. By using Themis,
4. Coverage profiling and analysis
Themis now supports coverage profiling by running
The main usage is:
By leveraging the results, one can inspect the detailed coverage report generated by Jacoco.
5. Oher research purposes
Themis can also benefit other research (e.g., fault localization, program repair, etc.)
Main maintainers/contributors
Ting Su, East China Normal University, China
Jue Wang, Nanjing University, China
Zhendong Su, ETH Zurich, Switzerland
We appreciate the contributions from:
Enze Ma, Beijing Forestry University, China
Weigang He, East China Normal University, China
Shan Huang, East China Normal University, China
We welcome any feedback or questions on Themis. Free feel to share your ideas, open issues or pull requests. We are actively maintaining Themis to benefit our community.