\n\n\n# HETU\n\n\n\n[Documentation](https://hetu-doc.readthedocs.io) | [Examples](https://hetu-doc.readthedocs.io/en/latest/Overview/performance.html)\n\nHetu is a high-performance distributed deep learning system targeting trillions of parameters DL model training, developed by DAIR Lab at Peking University. It takes account of both high availability in industry and innovation in academia, which has a number of advanced characteristics:\n\n- Applicability. DL model definition with standard dataflow graph; many basic CPU and GPU operators; efficient implementation of more than plenty of DL models and at least popular 10 ML algorithms.\n\n- Efficiency. Achieve at least 30% speedup compared to TensorFlow on DNN, CNN, RNN benchmarks.\n\n- Flexibility. Supporting various parallel training protocols and distributed communication architectures, such as Data/Model/Pipeline parallel; Parameter server & AllReduce.\n\n- Scalability. Deployment on more than 100 computation nodes; Training giant models with trillions of model parameters, e.g., Criteo Kaggle, Open Graph Benchmark\n\n- Agility. Automatically ML pipeline: feature engineering, model selection, hyperparameter search.\n\nWe welcome everyone interested in machine learning or graph computing to contribute codes, create issues or pull requests. Please refer to [Contribution Guide](https://forgeplus.trustie.net/projects/PKU-DAIR/Hetu/tree/master/CONTRIBUTING.md) for more details.\n\n## Installation\n1. Clone the repository.\n\n2. Prepare the environment. We use Anaconda to manage packages. The following command create the conda environment to be used:\n```conda env create -f environment.yml``` .\nPlease prepare Cuda toolkit and CuDNN in advance.\n\n3. We use CMake to compile Hetu. Please copy the example configuration for compilation by `cp cmake/config.example.cmake cmake/config.cmake`. Users can modify the configuration file to enable/disable the compilation of each module. For advanced users (who not using the provided conda environment), the prerequisites for different modules in Hetu is listed in appendix.\n\n```bash\n# modify paths and configurations in cmake/config.cmake\n\n# generate Makefile\nmkdir build && cd build && cmake ..\n\n# compile\n# make all\nmake -j 8\n# make hetu, version is specified in cmake/config.cmake\nmake hetu -j 8\n# make allreduce module\nmake allreduce -j 8\n# make ps module\nmake ps -j 8\n# make geometric module\nmake geometric -j 8\n# make hetu-cache module\nmake hetu_cache -j 8\n```\n\n\n4. Prepare environment for running. Edit the hetu.exp file and set the environment path for python and the path for executable mpirun if necessary (for advanced users not using the provided conda environment). Then execute the command `source hetu.exp` .\n\n\n\n## Usage\n\nTrain logistic regression on gpu:\n\n```bash\nbash examples/cnn/scripts/hetu_1gpu.sh logreg MNIST\n```\n\nTrain a 3-layer mlp on gpu:\n\n```bash\nbash examples/cnn/scripts/hetu_1gpu.sh mlp CIFAR10\n```\n\nTrain a 3-layer cnn with gpu:\n\n```bash\nbash examples/cnn/scripts/hetu_1gpu.sh cnn_3_layers MNIST\n```\n\nTrain a 3-layer mlp with allreduce on 8 gpus (use mpirun):\n```bash\nbash examples/cnn/scripts/hetu_8gpu.sh mlp CIFAR10\n```\n\nTrain a 3-layer mlp with PS on 1 server and 2 workers:\n```bash\n# in the script we launch the scheduler and server, and two workers\nbash examples/cnn/scripts/hetu_2gpu_ps.sh mlp CIFAR10\n```\n\n\n## More Examples\nPlease refer to examples directory, which contains CNN, NLP, CTR, GNN training scripts. For distributed training, please refer to CTR and GNN tasks.\n\n## Community\n* Email: xupeng.miao@pku.edu.cn\n* Slack: coming soon\n* Hetu homepage: https://hetu-doc.readthedocs.io\n* [Committers & Contributors](https://forgeplus.trustie.net/projects/PKU-DAIR/Hetu/tree/master/COMMITTERS.md)\n* [Contributing to Hetu](https://forgeplus.trustie.net/projects/PKU-DAIR/Hetu/tree/master/CONTRIBUTING.md)\n* [Development plan](https://hetu-doc.readthedocs.io/en/latest/plan.html)\n\n## Enterprise Users\n\nIf you are enterprise users and find Hetu is useful in your work, please let us know, and we are glad to add your company logo here.\n\n\n
\n\n
\n\n\n## License\n\nThe entire codebase is under [license](https://forgeplus.trustie.net/projects/PKU-DAIR/Hetu/tree/master/LICENSE)\n\n## Papers\n 1. Xupeng Miao, Linxiao Ma, Zhi Yang, Yingxia Shao, Bin Cui, Lele Yu, Jiawei Jiang. [CuWide: Towards Efficient Flow-based Training for Sparse Wide Models on GPUs.](https://ieeexplore.ieee.org/document/9261124). TKDE 2021, ICDE 2021\n 2. Xupeng Miao, Xiaonan Nie, Yingxia Shao, Zhi Yang, Jiawei Jiang, Lingxiao Ma, Bin Cui. [Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce](https://doi.org/10.1145/3448016.3452773) SIGMOD 2021\n 3. coming soon\n\n## Acknowledgements\n\nWe learned and borrowed insights from a few open source projects including [TinyFlow](https://github.com/tqchen/tinyflow), [autodist](https://github.com/petuum/autodist), [tf.distribute](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/python/distribute) and [Angel](https://github.com/Angel-ML/angel).\n\n## Appendix\nThe prerequisites for different modules in Hetu is listed as follows:\n ```\n \"*\" means you should prepare by yourself, while others support auto-download\n \n Hetu: OpenMP(*), CMake(*)\n Hetu (version mkl): MKL 1.6.1\n Hetu (version gpu): CUDA 10.1(*), CUDNN 7.5(*)\n Hetu (version all): both\n\n Hetu-AllReduce: MPI 3.1, NCCL 2.8(*), this module needs GPU version\n\n Hetu-PS: Protobuf(*), ZeroMQ 4.3.2\n\n Hetu-Geometric: Pybind11(*), Metis(*)\n\n Hetu-Cache: Pybind11(*), this module needs PS module\n\n ##################################################################\n Tips for preparing the prerequisites\n \n Preparing CUDA, CUDNN, NCCL(NCCl is already in conda environment):\n 1. download from https://developer.nvidia.com\n 2. install\n 3. modify paths in cmake/config.cmake if necessary\n \n Preparing OpenMP:\n Your just need to ensure your compiler support openmp.\n\n Preparing CMake, Protobuf, Pybind11, Metis:\n Install by anaconda: \n conda install cmake=3.18 libprotobuf pybind11=2.6.0 metis\n\n Preparing OpenMPI (not necessary):\n install by anaconda: `conda install -c conda-forge openmpi=4.0.3`\n or\n 1. download from https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.3.tar.gz\n 2. build openmpi by `./configure /path/to/build && make -j8 && make install`\n 3. modify MPI_HOME to /path/to/build in cmake/config.cmake\n\n Preparing MKL (not necessary):\n install by anaconda: `conda install -c conda-forge onednn`\n or\n 1. download from https://github.com/intel/mkl-dnn/archive/v1.6.1.tar.gz\n 2. build mkl by `mkdir /path/to/build && cd /path/to/build && cmake /path/to/root && make -j8` \n 3. modify MKL_ROOT to /path/to/root and MKL_BUILD to /path/to/build in cmake/config.cmake \n\n Preparing ZeroMQ (not necessary):\n install by anaconda: `conda install -c anaconda zeromq=4.3.2`\n or\n 1. download from https://github.com/zeromq/libzmq/releases/download/v4.3.2/zeromq-4.3.2.zip\n 2. build zeromq by 'mkdir /path/to/build && cd /path/to/build && cmake /path/to/root && make -j8`\n 3. modify ZMQ_ROOT to /path/to/build in cmake/config.cmake\n ```\n","website":null,"lesson_url":null,"identifier":"Hetu","invite_code":"r89URa","name":"河图","description":"分布式深度学习系统","project_id":1401518,"repo_id":1403144,"issues_count":2,"pull_requests_count":0,"project_identifier":"Hetu","praises_count":2,"forked_count":3,"watchers_count":0,"versions_count":0,"version_releases_count":0,"version_releasesed_count":0,"permission":"","mirror_url":null,"mirror":false,"has_actions":false,"web_site":null,"type":0,"open_devops":false,"topics":[{"id":2,"name":"python"},{"id":15,"name":"c++"},{"id":58,"name":"cuda"}],"watched":false,"praised":false,"status":1,"forked_from_project_id":null,"empty":false,"size":"110.7 MB","ssh_url":"git@code.gitlink.org.cn:PKU-DAIR/Hetu.git","clone_url":"https://gitlink.org.cn/PKU-DAIR/Hetu.git","default_branch":"master","full_name":"PKU-DAIR/Hetu","private":false,"license_name":"MulanPSL-2.0","branches_count":1,"tags_count":0,"author":{"id":88190,"login":"PKU-DAIR","type":"Organization","name":"北京大学数据与智能实验室","image_url":"images/avatars/Organization/88190?t=1627007354"}},"projectBase":{"identifier":"Hetu","name":"河图","platform":"forge","id":1401518,"repo_id":1403144,"open_devops":false,"type":0,"author":{"login":"PKU-DAIR","name":"北京大学数据与智能实验室","type":"Organization","image_url":"images/avatars/Organization/88190?t=1627007354"},"project_category_id":4,"project_language_id":4,"license_id":359,"jianmu_devops_url":"https://jianmu.gitlink.org.cn","cloud_ide_saas_url":"https://saasfactory.test.opentrs.com/oauth/login?product_account_id=PA1001218&tenant_code=TI1001383&oauth_url=https://www.gitlink.org.cn/api/users/info.json&token=6a2a7a21ca3843ecb172ff3febbe04ca7fcbf909","open_blockchain":false,"has_dataset":false,"open_portrait":false,"ignore_id":189},"projectEntries":{"last_commit":{"commit":{"sha":"6df869f592523a44142e3a359de7348be03cd037","message":"Merge pull request 'Modify scripts for cnn example.' (#2) from AlfredWang/Hetu:master into master\n","author":{"name":"Hsword","email":"swordonline@foxmail.com","date":"2022-04-01T22:41:23+08:00"},"committer":{"name":"Hsword","email":"swordonline@foxmail.com","date":"2022-04-01T22:41:23+08:00"},"timestamp":1648824083,"time_from_now":"2年前"},"author":{"id":"88188","login":"Hsword","name":"苗旭鹏","type":"User","image_url":"system/lets/letter_avatars/2/M/118_211_238/120.png"},"committer":{"id":"88188","login":"Hsword","name":"苗旭鹏","type":"User","image_url":"system/lets/letter_avatars/2/M/118_211_238/120.png"}},"commits_count":25,"zip_url":"https://www.gitlink.org.cn/api/PKU-DAIR/hetu/archive/master.zip","tar_url":"https://www.gitlink.org.cn/api/PKU-DAIR/hetu/archive/master.tar.gz","entries":[{"name":"bin","path":"bin","sha":"bec53ebe0b23f16cac9c0575065a1bc44e34f2aa","type":"dir","submodule_git_url":null,"size":0,"is_readme_file":false,"content":null,"target":null,"commit":{"message":"initial commit\n","sha":"7fd3de80ce7ff1e693dd855546e4a4f46d2229f5","created_at":"2021-07-23 15:27","time_from_now":"3年前","created_at_unix":1627025255}},{"name":"cmake","path":"cmake","sha":"52a790f0802c44a2de9b353086c08d39f0779ffe","type":"dir","submodule_git_url":null,"size":0,"is_readme_file":false,"content":null,"target":null,"commit":{"message":"Modify default cmake option.\n","sha":"e0fd6f3fc7f83c45830d7a79d100a8c85645eb7a","created_at":"2022-04-01 20:21","time_from_now":"2年前","created_at_unix":1648815679}},{"name":"documents","path":"documents","sha":"134aee31f9763a3f5f3b84442592a866ec25a3d3","type":"dir","submodule_git_url":null,"size":0,"is_readme_file":false,"content":null,"target":null,"commit":{"message":"documents commit\n","sha":"826f019bec85650250f6a3775d291a7199304108","created_at":"2021-11-13 11:09","time_from_now":"3年前","created_at_unix":1636772946}},{"name":"examples","path":"examples","sha":"106a47ffc5dd86aa48b65476deacbabc2f3e5606","type":"dir","submodule_git_url":null,"size":0,"is_readme_file":false,"content":null,"target":null,"commit":{"message":"Modify scripts.\n","sha":"311bfde6c19cfbd3eab89fb38d57aa5cf4f82aab","created_at":"2022-04-01 18:24","time_from_now":"2年前","created_at_unix":1648808682}},{"name":"img","path":"img","sha":"dc3a4230a001532fed9e80d8600975ae5643cead","type":"dir","submodule_git_url":null,"size":0,"is_readme_file":false,"content":null,"target":null,"commit":{"message":"initial commit\n","sha":"7fd3de80ce7ff1e693dd855546e4a4f46d2229f5","created_at":"2021-07-23 15:27","time_from_now":"3年前","created_at_unix":1627025255}},{"name":"ps-lite","path":"ps-lite","sha":"57a756380a54d17c3cc530c79969475ce5999fb8","type":"dir","submodule_git_url":null,"size":0,"is_readme_file":false,"content":null,"target":null,"commit":{"message":"initial commit\n","sha":"7fd3de80ce7ff1e693dd855546e4a4f46d2229f5","created_at":"2021-07-23 15:27","time_from_now":"3年前","created_at_unix":1627025255}},{"name":"python","path":"python","sha":"7f44997c6128e2b1a2ee5ff0ca56b364ec8eb34f","type":"dir","submodule_git_url":null,"size":0,"is_readme_file":false,"content":null,"target":null,"commit":{"message":"initial commit\n","sha":"7fd3de80ce7ff1e693dd855546e4a4f46d2229f5","created_at":"2021-07-23 15:27","time_from_now":"3年前","created_at_unix":1627025255}},{"name":"src","path":"src","sha":"6816aab24a890a21be1629c0e877cf55d0f64b65","type":"dir","submodule_git_url":null,"size":0,"is_readme_file":false,"content":null,"target":null,"commit":{"message":"initial commit\n","sha":"7fd3de80ce7ff1e693dd855546e4a4f46d2229f5","created_at":"2021-07-23 15:27","time_from_now":"3年前","created_at_unix":1627025255}},{"name":"tests","path":"tests","sha":"ae4e6af845c285c5f7fddc4a4a0c56ec69e59293","type":"dir","submodule_git_url":null,"size":0,"is_readme_file":false,"content":null,"target":null,"commit":{"message":"initial commit\n","sha":"7fd3de80ce7ff1e693dd855546e4a4f46d2229f5","created_at":"2021-07-23 15:27","time_from_now":"3年前","created_at_unix":1627025255}},{"name":"third_party","path":"third_party","sha":"1ded402858038a52c3cf1676f2499e52065cc5b4","type":"dir","submodule_git_url":null,"size":0,"is_readme_file":false,"content":null,"target":null,"commit":{"message":"initial commit\n","sha":"7fd3de80ce7ff1e693dd855546e4a4f46d2229f5","created_at":"2021-07-23 15:27","time_from_now":"3年前","created_at_unix":1627025255}},{"name":".gitignore","path":".gitignore","sha":"d4777d20f009fea66502a01f03dec3b23b67df3e","type":"file","submodule_git_url":null,"size":416,"is_readme_file":null,"content":null,"target":null,"commit":{"message":"Initial commit\n","sha":"b9c4690d577267139702efb2737688fc2db4e70a","created_at":"2021-07-23 15:18","time_from_now":"3年前","created_at_unix":1627024706}},{"name":".gitmodules","path":".gitmodules","sha":"1fccce5a9e9ec8e7c39d34cab3674c301d60a77b","type":"file","submodule_git_url":null,"size":218,"is_readme_file":null,"content":null,"target":null,"commit":{"message":"initial commit\n","sha":"7fd3de80ce7ff1e693dd855546e4a4f46d2229f5","created_at":"2021-07-23 15:27","time_from_now":"3年前","created_at_unix":1627025255}},{"name":"CMakeLists.txt","path":"CMakeLists.txt","sha":"fa225f101f441e429614e681de83e0c1a6df6c23","type":"file","submodule_git_url":null,"size":1641,"is_readme_file":null,"content":null,"target":null,"commit":{"message":"initial commit\n","sha":"7fd3de80ce7ff1e693dd855546e4a4f46d2229f5","created_at":"2021-07-23 15:27","time_from_now":"3年前","created_at_unix":1627025255}},{"name":"COMMITTERS.md","path":"COMMITTERS.md","sha":"34064855533cbff5a29767fbe1348682c979d6f1","type":"file","submodule_git_url":null,"size":1836,"is_readme_file":0,"content":null,"target":null,"commit":{"message":"initial commit\n","sha":"7fd3de80ce7ff1e693dd855546e4a4f46d2229f5","created_at":"2021-07-23 15:27","time_from_now":"3年前","created_at_unix":1627025255}},{"name":"CONTRIBUTING.md","path":"CONTRIBUTING.md","sha":"239c267daf851042dbcf6d832d8fb1d97a417149","type":"file","submodule_git_url":null,"size":2376,"is_readme_file":0,"content":null,"target":null,"commit":{"message":"initial commit\n","sha":"7fd3de80ce7ff1e693dd855546e4a4f46d2229f5","created_at":"2021-07-23 15:27","time_from_now":"3年前","created_at_unix":1627025255}},{"name":"LICENSE","path":"LICENSE","sha":"7849ac55f69709b261d6a1b37a847f044e51a20d","type":"file","submodule_git_url":null,"size":11350,"is_readme_file":null,"content":null,"target":null,"commit":{"message":"initial commit\n","sha":"7fd3de80ce7ff1e693dd855546e4a4f46d2229f5","created_at":"2021-07-23 15:27","time_from_now":"3年前","created_at_unix":1627025255}},{"name":"README.md","path":"README.md","sha":"112645a2a7dea162e5de5c8f3c2e3431a763380f","type":"file","submodule_git_url":null,"size":7807,"is_readme_file":true,"content":null,"target":null,"commit":{"message":"Update README.md\n","sha":"b4946314fe6c6f62a5c48afbcf9e01a8cf580a57","created_at":"2021-07-23 15:45","time_from_now":"3年前","created_at_unix":1627026317}},{"name":"environment.yml","path":"environment.yml","sha":"a230326d241e271ac0ec99a2dd0bf2aaecd44de4","type":"file","submodule_git_url":null,"size":2494,"is_readme_file":null,"content":null,"target":null,"commit":{"message":"initial commit\n","sha":"7fd3de80ce7ff1e693dd855546e4a4f46d2229f5","created_at":"2021-07-23 15:27","time_from_now":"3年前","created_at_unix":1627025255}},{"name":"hetu.exp","path":"hetu.exp","sha":"1528483894f47987c844d33c2d52aa6a433aa4b0","type":"file","submodule_git_url":null,"size":259,"is_readme_file":null,"content":null,"target":null,"commit":{"message":"initial commit\n","sha":"7fd3de80ce7ff1e693dd855546e4a4f46d2229f5","created_at":"2021-07-23 15:27","time_from_now":"3年前","created_at_unix":1627025255}}]},"projectMenu":[{"menu_name":"home"},{"menu_name":"code"},{"menu_name":"issues"},{"menu_name":"pulls"},{"menu_name":"devops"},{"menu_name":"versions"},{"menu_name":"wiki"},{"menu_name":"activity"}],"projectReadMe":"%7B%22type%22%3A%22file%22%2C%22encoding%22%3A%22base64%22%2C%22size%22%3A7807%2C%22name%22%3A%22README.md%22%2C%22path%22%3A%22README.md%22%2C%22content%22%3A%22%3Cdiv%20align%3Dcenter%3E%5Cn%3Cimg%20src%3D%5C%22https%3A%2F%2Fforgeplus.trustie.net%2Frepo%2FPKU-DAIR%2FHetu%2Fraw%2Fbranch%2Fmaster%2Fimg%2Fhetu.png%3Fraw%3Dtrue%5C%22%20width%3D%5C%22300%5C%22%20%2F%3E%5Cn%3C%2Fdiv%3E%5Cn%5Cn%23%20HETU%5Cn%5Cn%3C!---%20%5B!%5Blicense%5D(https%3A%2F%2Fimg.shields.io%2Fgithub%2Flicense%2Fapache%2Fzookeeper%3Fcolor%3D282661)%5D(LICENSE)%20---%3E%5Cn%5Cn%5BDocumentation%5D(https%3A%2F%2Fhetu-doc.readthedocs.io)%20%7C%20%5BExamples%5D(https%3A%2F%2Fhetu-doc.readthedocs.io%2Fen%2Flatest%2FOverview%2Fperformance.html)%5Cn%5CnHetu%20is%20a%20high-performance%20distributed%20deep%20learning%20system%20targeting%20trillions%20of%20parameters%20DL%20model%20training%2C%20developed%20by%20%3Ca%20href%3D%5C%22http%3A%2F%2Fnet.pku.edu.cn%2F~cuibin%2F%5C%22%20target%3D%5C%22_blank%5C%22%20rel%3D%5C%22nofollow%5C%22%3EDAIR%20Lab%3C%2Fa%3E%20at%20Peking%20University.%20It%20takes%20account%20of%20both%20high%20availability%20in%20industry%20and%20innovation%20in%20academia%2C%20which%20has%20a%20number%20of%20advanced%20characteristics%3A%5Cn%5Cn-%20Applicability.%20DL%20model%20definition%20with%20standard%20dataflow%20graph%3B%20many%20basic%20CPU%20and%20GPU%20operators%3B%20efficient%20implementation%20of%20more%20than%20plenty%20of%20DL%20models%20and%20at%20least%20popular%2010%20ML%20algorithms.%5Cn%5Cn-%20Efficiency.%20Achieve%20at%20least%2030%25%20speedup%20compared%20to%20TensorFlow%20on%20DNN%2C%20CNN%2C%20RNN%20benchmarks.%5Cn%5Cn-%20Flexibility.%20Supporting%20various%20parallel%20training%20protocols%20and%20distributed%20communication%20architectures%2C%20such%20as%20Data%2FModel%2FPipeline%20parallel%3B%20Parameter%20server%20%26%20AllReduce.%5Cn%5Cn-%20Scalability.%20Deployment%20on%20more%20than%20100%20computation%20nodes%3B%20Training%20giant%20models%20with%20trillions%20of%20model%20parameters%2C%20e.g.%2C%20Criteo%20Kaggle%2C%20Open%20Graph%20Benchmark%5Cn%5Cn-%20Agility.%20Automatically%20ML%20pipeline%3A%20feature%20engineering%2C%20model%20selection%2C%20hyperparameter%20search.%5Cn%5CnWe%20welcome%20everyone%20interested%20in%20machine%20learning%20or%20graph%20computing%20to%20contribute%20codes%2C%20create%20issues%20or%20pull%20requests.%20Please%20refer%20to%20%5BContribution%20Guide%5D(https%3A%2F%2Fforgeplus.trustie.net%2Fprojects%2FPKU-DAIR%2FHetu%2Ftree%2Fmaster%2FCONTRIBUTING.md)%20for%20more%20details.%5Cn%5Cn%23%23%20Installation%5Cn1.%20Clone%20the%20repository.%5Cn%5Cn2.%20Prepare%20the%20environment.%20We%20use%20Anaconda%20to%20manage%20packages.%20The%20following%20command%20create%20the%20conda%20environment%20to%20be%20used%3A%5Cn%60%60%60conda%20env%20create%20-f%20environment.yml%60%60%60%20.%5CnPlease%20prepare%20Cuda%20toolkit%20and%20CuDNN%20in%20advance.%5Cn%5Cn3.%20We%20use%20CMake%20to%20compile%20Hetu.%20Please%20copy%20the%20example%20configuration%20for%20compilation%20by%20%60cp%20cmake%2Fconfig.example.cmake%20cmake%2Fconfig.cmake%60.%20Users%20can%20modify%20the%20configuration%20file%20to%20enable%2Fdisable%20the%20compilation%20of%20each%20module.%20For%20advanced%20users%20(who%20not%20using%20the%20provided%20conda%20environment)%2C%20the%20prerequisites%20for%20different%20modules%20in%20Hetu%20is%20listed%20in%20appendix.%5Cn%5Cn%60%60%60bash%5Cn%23%20modify%20paths%20and%20configurations%20in%20cmake%2Fconfig.cmake%5Cn%5Cn%23%20generate%20Makefile%5Cnmkdir%20build%20%26%26%20cd%20build%20%26%26%20cmake%20..%5Cn%5Cn%23%20compile%5Cn%23%20make%20all%5Cnmake%20-j%208%5Cn%23%20make%20hetu%2C%20version%20is%20specified%20in%20cmake%2Fconfig.cmake%5Cnmake%20hetu%20-j%208%5Cn%23%20make%20allreduce%20module%5Cnmake%20allreduce%20-j%208%5Cn%23%20make%20ps%20module%5Cnmake%20ps%20-j%208%5Cn%23%20make%20geometric%20module%5Cnmake%20geometric%20-j%208%5Cn%23%20make%20hetu-cache%20module%5Cnmake%20hetu_cache%20-j%208%5Cn%60%60%60%5Cn%5Cn%5Cn4.%20Prepare%20environment%20for%20running.%20Edit%20the%20hetu.exp%20file%20and%20set%20the%20environment%20path%20for%20python%20and%20the%20path%20for%20executable%20mpirun%20if%20necessary%20(for%20advanced%20users%20not%20using%20the%20provided%20conda%20environment).%20Then%20execute%20the%20command%20%60source%20hetu.exp%60%20.%5Cn%5Cn%5Cn%5Cn%23%23%20Usage%5Cn%5CnTrain%20logistic%20regression%20on%20gpu%3A%5Cn%5Cn%60%60%60bash%5Cnbash%20examples%2Fcnn%2Fscripts%2Fhetu_1gpu.sh%20logreg%20MNIST%5Cn%60%60%60%5Cn%5CnTrain%20a%203-layer%20mlp%20on%20gpu%3A%5Cn%5Cn%60%60%60bash%5Cnbash%20examples%2Fcnn%2Fscripts%2Fhetu_1gpu.sh%20mlp%20CIFAR10%5Cn%60%60%60%5Cn%5CnTrain%20a%203-layer%20cnn%20with%20gpu%3A%5Cn%5Cn%60%60%60bash%5Cnbash%20examples%2Fcnn%2Fscripts%2Fhetu_1gpu.sh%20cnn_3_layers%20MNIST%5Cn%60%60%60%5Cn%5CnTrain%20a%203-layer%20mlp%20with%20allreduce%20on%208%20gpus%20(use%20mpirun)%3A%5Cn%60%60%60bash%5Cnbash%20examples%2Fcnn%2Fscripts%2Fhetu_8gpu.sh%20mlp%20CIFAR10%5Cn%60%60%60%5Cn%5CnTrain%20a%203-layer%20mlp%20with%20PS%20on%201%20server%20and%202%20workers%3A%5Cn%60%60%60bash%5Cn%23%20in%20the%20script%20we%20launch%20the%20scheduler%20and%20server%2C%20and%20two%20workers%5Cnbash%20examples%2Fcnn%2Fscripts%2Fhetu_2gpu_ps.sh%20mlp%20CIFAR10%5Cn%60%60%60%5Cn%5Cn%5Cn%23%23%20More%20Examples%5CnPlease%20refer%20to%20examples%20directory%2C%20which%20contains%20CNN%2C%20NLP%2C%20CTR%2C%20GNN%20training%20scripts.%20For%20distributed%20training%2C%20please%20refer%20to%20CTR%20and%20GNN%20tasks.%5Cn%5Cn%23%23%20Community%5Cn*%20Email%3A%20xupeng.miao%40pku.edu.cn%5Cn*%20Slack%3A%20coming%20soon%5Cn*%20Hetu%20homepage%3A%20https%3A%2F%2Fhetu-doc.readthedocs.io%5Cn*%20%5BCommitters%20%26%20Contributors%5D(https%3A%2F%2Fforgeplus.trustie.net%2Fprojects%2FPKU-DAIR%2FHetu%2Ftree%2Fmaster%2FCOMMITTERS.md)%5Cn*%20%5BContributing%20to%20Hetu%5D(https%3A%2F%2Fforgeplus.trustie.net%2Fprojects%2FPKU-DAIR%2FHetu%2Ftree%2Fmaster%2FCONTRIBUTING.md)%5Cn*%20%5BDevelopment%20plan%5D(https%3A%2F%2Fhetu-doc.readthedocs.io%2Fen%2Flatest%2Fplan.html)%5Cn%5Cn%23%23%20Enterprise%20Users%5Cn%5CnIf%20you%20are%20enterprise%20users%20and%20find%20Hetu%20is%20useful%20in%20your%20work%2C%20please%20let%20us%20know%2C%20and%20we%20are%20glad%20to%20add%20your%20company%20logo%20here.%5Cn%5Cn%3Cimg%20src%3D%5C%22https%3A%2F%2Fforgeplus.trustie.net%2Frepo%2FPKU-DAIR%2FHetu%2Fraw%2Fbranch%2Fmaster%2Fimg%2Ftencent.png%3Fraw%3Dtrue%5C%22%20width%20%3D%20%5C%22200%5C%22%2F%3E%5Cn%3Cbr%3E%3Cbr%3E%5Cn%3Cimg%20src%3D%5C%22https%3A%2F%2Fforgeplus.trustie.net%2Frepo%2FPKU-DAIR%2FHetu%2Fraw%2Fbranch%2Fmaster%2Fimg%2Falibabacloud.png%3Fraw%3Dtrue%5C%22%20width%20%3D%20%5C%22200%5C%22%2F%3E%5Cn%3Cbr%3E%3Cbr%3E%5Cn%3Cimg%20src%3D%5C%22https%3A%2F%2Fforgeplus.trustie.net%2Frepo%2FPKU-DAIR%2FHetu%2Fraw%2Fbranch%2Fmaster%2Fimg%2Fkuaishou.png%3Fraw%3Dtrue%5C%22%20width%20%3D%20%5C%22200%5C%22%2F%3E%5Cn%5Cn%23%23%20License%5Cn%5CnThe%20entire%20codebase%20is%20under%20%5Blicense%5D(https%3A%2F%2Fforgeplus.trustie.net%2Fprojects%2FPKU-DAIR%2FHetu%2Ftree%2Fmaster%2FLICENSE)%5Cn%5Cn%23%23%20Papers%5Cn%20%201.%20Xupeng%20Miao%2C%20Linxiao%20Ma%2C%20Zhi%20Yang%2C%20Yingxia%20Shao%2C%20Bin%20Cui%2C%20Lele%20Yu%2C%20Jiawei%20Jiang.%20%5BCuWide%3A%20Towards%20Efficient%20Flow-based%20Training%20for%20Sparse%20Wide%20Models%20on%20GPUs.%5D(https%3A%2F%2Fieeexplore.ieee.org%2Fdocument%2F9261124).%20TKDE%202021%2C%20ICDE%202021%5Cn%20%202.%20Xupeng%20Miao%2C%20Xiaonan%20Nie%2C%20Yingxia%20Shao%2C%20Zhi%20Yang%2C%20Jiawei%20Jiang%2C%20Lingxiao%20Ma%2C%20Bin%20Cui.%20%5BHeterogeneity-Aware%20Distributed%20Machine%20Learning%20Training%20via%20Partial%20Reduce%5D(https%3A%2F%2Fdoi.org%2F10.1145%2F3448016.3452773)%20SIGMOD%202021%5Cn%20%203.%20coming%20soon%5Cn%5Cn%23%23%20Acknowledgements%5Cn%5CnWe%20learned%20and%20borrowed%20insights%20from%20a%20few%20open%20source%20projects%20including%20%5BTinyFlow%5D(https%3A%2F%2Fgithub.com%2Ftqchen%2Ftinyflow)%2C%20%5Bautodist%5D(https%3A%2F%2Fgithub.com%2Fpetuum%2Fautodist)%2C%20%5Btf.distribute%5D(https%3A%2F%2Fgithub.com%2Ftensorflow%2Ftensorflow%2Ftree%2Fmaster%2Ftensorflow%2Fpython%2Fdistribute)%20and%20%5BAngel%5D(https%3A%2F%2Fgithub.com%2FAngel-ML%2Fangel).%5Cn%5Cn%23%23%20Appendix%5CnThe%20prerequisites%20for%20different%20modules%20in%20Hetu%20is%20listed%20as%20follows%3A%5Cn%20%20%60%60%60%5Cn%20%20%5C%22*%5C%22%20means%20you%20should%20prepare%20by%20yourself%2C%20while%20others%20support%20auto-download%5Cn%20%20%5Cn%20%20Hetu%3A%20OpenMP(*)%2C%20CMake(*)%5Cn%20%20Hetu%20(version%20mkl)%3A%20MKL%201.6.1%5Cn%20%20Hetu%20(version%20gpu)%3A%20CUDA%2010.1(*)%2C%20CUDNN%207.5(*)%5Cn%20%20Hetu%20(version%20all)%3A%20both%5Cn%5Cn%20%20Hetu-AllReduce%3A%20MPI%203.1%2C%20NCCL%202.8(*)%2C%20this%20module%20needs%20GPU%20version%5Cn%5Cn%20%20Hetu-PS%3A%20Protobuf(*)%2C%20ZeroMQ%204.3.2%5Cn%5Cn%20%20Hetu-Geometric%3A%20Pybind11(*)%2C%20Metis(*)%5Cn%5Cn%20%20Hetu-Cache%3A%20Pybind11(*)%2C%20this%20module%20needs%20PS%20module%5Cn%5Cn%20%20%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%5Cn%20%20Tips%20for%20preparing%20the%20prerequisites%5Cn%20%20%5Cn%20%20Preparing%20CUDA%2C%20CUDNN%2C%20NCCL(NCCl%20is%20already%20in%20conda%20environment)%3A%5Cn%20%201.%20download%20from%20https%3A%2F%2Fdeveloper.nvidia.com%5Cn%20%202.%20install%5Cn%20%203.%20modify%20paths%20in%20cmake%2Fconfig.cmake%20if%20necessary%5Cn%20%20%5Cn%20%20Preparing%20OpenMP%3A%5Cn%20%20Your%20just%20need%20to%20ensure%20your%20compiler%20support%20openmp.%5Cn%5Cn%20%20Preparing%20CMake%2C%20Protobuf%2C%20Pybind11%2C%20Metis%3A%5Cn%20%20Install%20by%20anaconda%3A%20%5Cn%20%20conda%20install%20cmake%3D3.18%20libprotobuf%20pybind11%3D2.6.0%20metis%5Cn%5Cn%20%20Preparing%20OpenMPI%20(not%20necessary)%3A%5Cn%20%20install%20by%20anaconda%3A%20%60conda%20install%20-c%20conda-forge%20openmpi%3D4.0.3%60%5Cn%20%20or%5Cn%20%201.%20download%20from%20https%3A%2F%2Fdownload.open-mpi.org%2Frelease%2Fopen-mpi%2Fv4.0%2Fopenmpi-4.0.3.tar.gz%5Cn%20%202.%20build%20openmpi%20by%20%60.%2Fconfigure%20%2Fpath%2Fto%2Fbuild%20%26%26%20make%20-j8%20%26%26%20make%20install%60%5Cn%20%203.%20modify%20MPI_HOME%20to%20%2Fpath%2Fto%2Fbuild%20in%20cmake%2Fconfig.cmake%5Cn%5Cn%20%20Preparing%20MKL%20(not%20necessary)%3A%5Cn%20%20install%20by%20anaconda%3A%20%60conda%20install%20-c%20conda-forge%20onednn%60%5Cn%20%20or%5Cn%20%201.%20download%20from%20https%3A%2F%2Fgithub.com%2Fintel%2Fmkl-dnn%2Farchive%2Fv1.6.1.tar.gz%5Cn%20%202.%20build%20mkl%20by%20%60mkdir%20%2Fpath%2Fto%2Fbuild%20%26%26%20cd%20%2Fpath%2Fto%2Fbuild%20%26%26%20cmake%20%2Fpath%2Fto%2Froot%20%26%26%20make%20-j8%60%20%5Cn%20%203.%20modify%20MKL_ROOT%20to%20%2Fpath%2Fto%2Froot%20and%20MKL_BUILD%20to%20%2Fpath%2Fto%2Fbuild%20in%20cmake%2Fconfig.cmake%20%5Cn%5Cn%20%20Preparing%20ZeroMQ%20(not%20necessary)%3A%5Cn%20%20install%20by%20anaconda%3A%20%60conda%20install%20-c%20anaconda%20zeromq%3D4.3.2%60%5Cn%20%20or%5Cn%20%201.%20download%20from%20https%3A%2F%2Fgithub.com%2Fzeromq%2Flibzmq%2Freleases%2Fdownload%2Fv4.3.2%2Fzeromq-4.3.2.zip%5Cn%20%202.%20build%20zeromq%20by%20'mkdir%20%2Fpath%2Fto%2Fbuild%20%26%26%20cd%20%2Fpath%2Fto%2Fbuild%20%26%26%20cmake%20%2Fpath%2Fto%2Froot%20%26%26%20make%20-j8%60%5Cn%20%203.%20modify%20ZMQ_ROOT%20to%20%2Fpath%2Fto%2Fbuild%20in%20cmake%2Fconfig.cmake%5Cn%20%20%60%60%60%5Cn%5Cn%22%2C%22sha%22%3A%22112645a2a7dea162e5de5c8f3c2e3431a763380f%22%2C%22replace_content%22%3A%22%3Cdiv%20align%3Dcenter%3E%5Cn%3Cimg%20src%3D%5C%22https%3A%2F%2Fforgeplus.trustie.net%2Frepo%2FPKU-DAIR%2FHetu%2Fraw%2Fbranch%2Fmaster%2Fimg%2Fhetu.png%3Fraw%3Dtrue%5C%22%20width%3D%5C%22300%5C%22%20%2F%3E%5Cn%3C%2Fdiv%3E%5Cn%5Cn%23%20HETU%5Cn%5Cn%3C!---%20%5B!%5Blicense%5D(https%3A%2F%2Fimg.shields.io%2Fgithub%2Flicense%2Fapache%2Fzookeeper%3Fcolor%3D282661)%5D(LICENSE)%20---%3E%5Cn%5Cn%5BDocumentation%5D(https%3A%2F%2Fhetu-doc.readthedocs.io)%20%7C%20%5BExamples%5D(https%3A%2F%2Fhetu-doc.readthedocs.io%2Fen%2Flatest%2FOverview%2Fperformance.html)%5Cn%5CnHetu%20is%20a%20high-performance%20distributed%20deep%20learning%20system%20targeting%20trillions%20of%20parameters%20DL%20model%20training%2C%20developed%20by%20%3Ca%20href%3D%5C%22http%3A%2F%2Fnet.pku.edu.cn%2F~cuibin%2F%5C%22%20target%3D%5C%22_blank%5C%22%20rel%3D%5C%22nofollow%5C%22%3EDAIR%20Lab%3C%2Fa%3E%20at%20Peking%20University.%20It%20takes%20account%20of%20both%20high%20availability%20in%20industry%20and%20innovation%20in%20academia%2C%20which%20has%20a%20number%20of%20advanced%20characteristics%3A%5Cn%5Cn-%20Applicability.%20DL%20model%20definition%20with%20standard%20dataflow%20graph%3B%20many%20basic%20CPU%20and%20GPU%20operators%3B%20efficient%20implementation%20of%20more%20than%20plenty%20of%20DL%20models%20and%20at%20least%20popular%2010%20ML%20algorithms.%5Cn%5Cn-%20Efficiency.%20Achieve%20at%20least%2030%25%20speedup%20compared%20to%20TensorFlow%20on%20DNN%2C%20CNN%2C%20RNN%20benchmarks.%5Cn%5Cn-%20Flexibility.%20Supporting%20various%20parallel%20training%20protocols%20and%20distributed%20communication%20architectures%2C%20such%20as%20Data%2FModel%2FPipeline%20parallel%3B%20Parameter%20server%20%26%20AllReduce.%5Cn%5Cn-%20Scalability.%20Deployment%20on%20more%20than%20100%20computation%20nodes%3B%20Training%20giant%20models%20with%20trillions%20of%20model%20parameters%2C%20e.g.%2C%20Criteo%20Kaggle%2C%20Open%20Graph%20Benchmark%5Cn%5Cn-%20Agility.%20Automatically%20ML%20pipeline%3A%20feature%20engineering%2C%20model%20selection%2C%20hyperparameter%20search.%5Cn%5CnWe%20welcome%20everyone%20interested%20in%20machine%20learning%20or%20graph%20computing%20to%20contribute%20codes%2C%20create%20issues%20or%20pull%20requests.%20Please%20refer%20to%20%5BContribution%20Guide%5D(https%3A%2F%2Fforgeplus.trustie.net%2Fprojects%2FPKU-DAIR%2FHetu%2Ftree%2Fmaster%2FCONTRIBUTING.md)%20for%20more%20details.%5Cn%5Cn%23%23%20Installation%5Cn1.%20Clone%20the%20repository.%5Cn%5Cn2.%20Prepare%20the%20environment.%20We%20use%20Anaconda%20to%20manage%20packages.%20The%20following%20command%20create%20the%20conda%20environment%20to%20be%20used%3A%5Cn%60%60%60conda%20env%20create%20-f%20environment.yml%60%60%60%20.%5CnPlease%20prepare%20Cuda%20toolkit%20and%20CuDNN%20in%20advance.%5Cn%5Cn3.%20We%20use%20CMake%20to%20compile%20Hetu.%20Please%20copy%20the%20example%20configuration%20for%20compilation%20by%20%60cp%20cmake%2Fconfig.example.cmake%20cmake%2Fconfig.cmake%60.%20Users%20can%20modify%20the%20configuration%20file%20to%20enable%2Fdisable%20the%20compilation%20of%20each%20module.%20For%20advanced%20users%20(who%20not%20using%20the%20provided%20conda%20environment)%2C%20the%20prerequisites%20for%20different%20modules%20in%20Hetu%20is%20listed%20in%20appendix.%5Cn%5Cn%60%60%60bash%5Cn%23%20modify%20paths%20and%20configurations%20in%20cmake%2Fconfig.cmake%5Cn%5Cn%23%20generate%20Makefile%5Cnmkdir%20build%20%26%26%20cd%20build%20%26%26%20cmake%20..%5Cn%5Cn%23%20compile%5Cn%23%20make%20all%5Cnmake%20-j%208%5Cn%23%20make%20hetu%2C%20version%20is%20specified%20in%20cmake%2Fconfig.cmake%5Cnmake%20hetu%20-j%208%5Cn%23%20make%20allreduce%20module%5Cnmake%20allreduce%20-j%208%5Cn%23%20make%20ps%20module%5Cnmake%20ps%20-j%208%5Cn%23%20make%20geometric%20module%5Cnmake%20geometric%20-j%208%5Cn%23%20make%20hetu-cache%20module%5Cnmake%20hetu_cache%20-j%208%5Cn%60%60%60%5Cn%5Cn%5Cn4.%20Prepare%20environment%20for%20running.%20Edit%20the%20hetu.exp%20file%20and%20set%20the%20environment%20path%20for%20python%20and%20the%20path%20for%20executable%20mpirun%20if%20necessary%20(for%20advanced%20users%20not%20using%20the%20provided%20conda%20environment).%20Then%20execute%20the%20command%20%60source%20hetu.exp%60%20.%5Cn%5Cn%5Cn%5Cn%23%23%20Usage%5Cn%5CnTrain%20logistic%20regression%20on%20gpu%3A%5Cn%5Cn%60%60%60bash%5Cnbash%20examples%2Fcnn%2Fscripts%2Fhetu_1gpu.sh%20logreg%20MNIST%5Cn%60%60%60%5Cn%5CnTrain%20a%203-layer%20mlp%20on%20gpu%3A%5Cn%5Cn%60%60%60bash%5Cnbash%20examples%2Fcnn%2Fscripts%2Fhetu_1gpu.sh%20mlp%20CIFAR10%5Cn%60%60%60%5Cn%5CnTrain%20a%203-layer%20cnn%20with%20gpu%3A%5Cn%5Cn%60%60%60bash%5Cnbash%20examples%2Fcnn%2Fscripts%2Fhetu_1gpu.sh%20cnn_3_layers%20MNIST%5Cn%60%60%60%5Cn%5CnTrain%20a%203-layer%20mlp%20with%20allreduce%20on%208%20gpus%20(use%20mpirun)%3A%5Cn%60%60%60bash%5Cnbash%20examples%2Fcnn%2Fscripts%2Fhetu_8gpu.sh%20mlp%20CIFAR10%5Cn%60%60%60%5Cn%5CnTrain%20a%203-layer%20mlp%20with%20PS%20on%201%20server%20and%202%20workers%3A%5Cn%60%60%60bash%5Cn%23%20in%20the%20script%20we%20launch%20the%20scheduler%20and%20server%2C%20and%20two%20workers%5Cnbash%20examples%2Fcnn%2Fscripts%2Fhetu_2gpu_ps.sh%20mlp%20CIFAR10%5Cn%60%60%60%5Cn%5Cn%5Cn%23%23%20More%20Examples%5CnPlease%20refer%20to%20examples%20directory%2C%20which%20contains%20CNN%2C%20NLP%2C%20CTR%2C%20GNN%20training%20scripts.%20For%20distributed%20training%2C%20please%20refer%20to%20CTR%20and%20GNN%20tasks.%5Cn%5Cn%23%23%20Community%5Cn*%20Email%3A%20xupeng.miao%40pku.edu.cn%5Cn*%20Slack%3A%20coming%20soon%5Cn*%20Hetu%20homepage%3A%20https%3A%2F%2Fhetu-doc.readthedocs.io%5Cn*%20%5BCommitters%20%26%20Contributors%5D(https%3A%2F%2Fforgeplus.trustie.net%2Fprojects%2FPKU-DAIR%2FHetu%2Ftree%2Fmaster%2FCOMMITTERS.md)%5Cn*%20%5BContributing%20to%20Hetu%5D(https%3A%2F%2Fforgeplus.trustie.net%2Fprojects%2FPKU-DAIR%2FHetu%2Ftree%2Fmaster%2FCONTRIBUTING.md)%5Cn*%20%5BDevelopment%20plan%5D(https%3A%2F%2Fhetu-doc.readthedocs.io%2Fen%2Flatest%2Fplan.html)%5Cn%5Cn%23%23%20Enterprise%20Users%5Cn%5CnIf%20you%20are%20enterprise%20users%20and%20find%20Hetu%20is%20useful%20in%20your%20work%2C%20please%20let%20us%20know%2C%20and%20we%20are%20glad%20to%20add%20your%20company%20logo%20here.%5Cn%5Cn%3Cimg%20src%3D%5C%22https%3A%2F%2Fforgeplus.trustie.net%2Frepo%2FPKU-DAIR%2FHetu%2Fraw%2Fbranch%2Fmaster%2Fimg%2Ftencent.png%3Fraw%3Dtrue%5C%22%20width%20%3D%20%5C%22200%5C%22%2F%3E%5Cn%3Cbr%3E%3Cbr%3E%5Cn%3Cimg%20src%3D%5C%22https%3A%2F%2Fforgeplus.trustie.net%2Frepo%2FPKU-DAIR%2FHetu%2Fraw%2Fbranch%2Fmaster%2Fimg%2Falibabacloud.png%3Fraw%3Dtrue%5C%22%20width%20%3D%20%5C%22200%5C%22%2F%3E%5Cn%3Cbr%3E%3Cbr%3E%5Cn%3Cimg%20src%3D%5C%22https%3A%2F%2Fforgeplus.trustie.net%2Frepo%2FPKU-DAIR%2FHetu%2Fraw%2Fbranch%2Fmaster%2Fimg%2Fkuaishou.png%3Fraw%3Dtrue%5C%22%20width%20%3D%20%5C%22200%5C%22%2F%3E%5Cn%5Cn%23%23%20License%5Cn%5CnThe%20entire%20codebase%20is%20under%20%5Blicense%5D(https%3A%2F%2Fforgeplus.trustie.net%2Fprojects%2FPKU-DAIR%2FHetu%2Ftree%2Fmaster%2FLICENSE)%5Cn%5Cn%23%23%20Papers%5Cn%20%201.%20Xupeng%20Miao%2C%20Linxiao%20Ma%2C%20Zhi%20Yang%2C%20Yingxia%20Shao%2C%20Bin%20Cui%2C%20Lele%20Yu%2C%20Jiawei%20Jiang.%20%5BCuWide%3A%20Towards%20Efficient%20Flow-based%20Training%20for%20Sparse%20Wide%20Models%20on%20GPUs.%5D(https%3A%2F%2Fieeexplore.ieee.org%2Fdocument%2F9261124).%20TKDE%202021%2C%20ICDE%202021%5Cn%20%202.%20Xupeng%20Miao%2C%20Xiaonan%20Nie%2C%20Yingxia%20Shao%2C%20Zhi%20Yang%2C%20Jiawei%20Jiang%2C%20Lingxiao%20Ma%2C%20Bin%20Cui.%20%5BHeterogeneity-Aware%20Distributed%20Machine%20Learning%20Training%20via%20Partial%20Reduce%5D(https%3A%2F%2Fdoi.org%2F10.1145%2F3448016.3452773)%20SIGMOD%202021%5Cn%20%203.%20coming%20soon%5Cn%5Cn%23%23%20Acknowledgements%5Cn%5CnWe%20learned%20and%20borrowed%20insights%20from%20a%20few%20open%20source%20projects%20including%20%5BTinyFlow%5D(https%3A%2F%2Fgithub.com%2Ftqchen%2Ftinyflow)%2C%20%5Bautodist%5D(https%3A%2F%2Fgithub.com%2Fpetuum%2Fautodist)%2C%20%5Btf.distribute%5D(https%3A%2F%2Fgithub.com%2Ftensorflow%2Ftensorflow%2Ftree%2Fmaster%2Ftensorflow%2Fpython%2Fdistribute)%20and%20%5BAngel%5D(https%3A%2F%2Fgithub.com%2FAngel-ML%2Fangel).%5Cn%5Cn%23%23%20Appendix%5CnThe%20prerequisites%20for%20different%20modules%20in%20Hetu%20is%20listed%20as%20follows%3A%5Cn%20%20%60%60%60%5Cn%20%20%5C%22*%5C%22%20means%20you%20should%20prepare%20by%20yourself%2C%20while%20others%20support%20auto-download%5Cn%20%20%5Cn%20%20Hetu%3A%20OpenMP(*)%2C%20CMake(*)%5Cn%20%20Hetu%20(version%20mkl)%3A%20MKL%201.6.1%5Cn%20%20Hetu%20(version%20gpu)%3A%20CUDA%2010.1(*)%2C%20CUDNN%207.5(*)%5Cn%20%20Hetu%20(version%20all)%3A%20both%5Cn%5Cn%20%20Hetu-AllReduce%3A%20MPI%203.1%2C%20NCCL%202.8(*)%2C%20this%20module%20needs%20GPU%20version%5Cn%5Cn%20%20Hetu-PS%3A%20Protobuf(*)%2C%20ZeroMQ%204.3.2%5Cn%5Cn%20%20Hetu-Geometric%3A%20Pybind11(*)%2C%20Metis(*)%5Cn%5Cn%20%20Hetu-Cache%3A%20Pybind11(*)%2C%20this%20module%20needs%20PS%20module%5Cn%5Cn%20%20%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%23%5Cn%20%20Tips%20for%20preparing%20the%20prerequisites%5Cn%20%20%5Cn%20%20Preparing%20CUDA%2C%20CUDNN%2C%20NCCL(NCCl%20is%20already%20in%20conda%20environment)%3A%5Cn%20%201.%20download%20from%20https%3A%2F%2Fdeveloper.nvidia.com%5Cn%20%202.%20install%5Cn%20%203.%20modify%20paths%20in%20cmake%2Fconfig.cmake%20if%20necessary%5Cn%20%20%5Cn%20%20Preparing%20OpenMP%3A%5Cn%20%20Your%20just%20need%20to%20ensure%20your%20compiler%20support%20openmp.%5Cn%5Cn%20%20Preparing%20CMake%2C%20Protobuf%2C%20Pybind11%2C%20Metis%3A%5Cn%20%20Install%20by%20anaconda%3A%20%5Cn%20%20conda%20install%20cmake%3D3.18%20libprotobuf%20pybind11%3D2.6.0%20metis%5Cn%5Cn%20%20Preparing%20OpenMPI%20(not%20necessary)%3A%5Cn%20%20install%20by%20anaconda%3A%20%60conda%20install%20-c%20conda-forge%20openmpi%3D4.0.3%60%5Cn%20%20or%5Cn%20%201.%20download%20from%20https%3A%2F%2Fdownload.open-mpi.org%2Frelease%2Fopen-mpi%2Fv4.0%2Fopenmpi-4.0.3.tar.gz%5Cn%20%202.%20build%20openmpi%20by%20%60.%2Fconfigure%20%2Fpath%2Fto%2Fbuild%20%26%26%20make%20-j8%20%26%26%20make%20install%60%5Cn%20%203.%20modify%20MPI_HOME%20to%20%2Fpath%2Fto%2Fbuild%20in%20cmake%2Fconfig.cmake%5Cn%5Cn%20%20Preparing%20MKL%20(not%20necessary)%3A%5Cn%20%20install%20by%20anaconda%3A%20%60conda%20install%20-c%20conda-forge%20onednn%60%5Cn%20%20or%5Cn%20%201.%20download%20from%20https%3A%2F%2Fgithub.com%2Fintel%2Fmkl-dnn%2Farchive%2Fv1.6.1.tar.gz%5Cn%20%202.%20build%20mkl%20by%20%60mkdir%20%2Fpath%2Fto%2Fbuild%20%26%26%20cd%20%2Fpath%2Fto%2Fbuild%20%26%26%20cmake%20%2Fpath%2Fto%2Froot%20%26%26%20make%20-j8%60%20%5Cn%20%203.%20modify%20MKL_ROOT%20to%20%2Fpath%2Fto%2Froot%20and%20MKL_BUILD%20to%20%2Fpath%2Fto%2Fbuild%20in%20cmake%2Fconfig.cmake%20%5Cn%5Cn%20%20Preparing%20ZeroMQ%20(not%20necessary)%3A%5Cn%20%20install%20by%20anaconda%3A%20%60conda%20install%20-c%20anaconda%20zeromq%3D4.3.2%60%5Cn%20%20or%5Cn%20%201.%20download%20from%20https%3A%2F%2Fgithub.com%2Fzeromq%2Flibzmq%2Freleases%2Fdownload%2Fv4.3.2%2Fzeromq-4.3.2.zip%5Cn%20%202.%20build%20zeromq%20by%20'mkdir%20%2Fpath%2Fto%2Fbuild%20%26%26%20cd%20%2Fpath%2Fto%2Fbuild%20%26%26%20cmake%20%2Fpath%2Fto%2Froot%20%26%26%20make%20-j8%60%5Cn%20%203.%20modify%20ZMQ_ROOT%20to%20%2Fpath%2Fto%2Fbuild%20in%20cmake%2Fconfig.cmake%5Cn%20%20%60%60%60%5Cn%5Cn%22%7D"},"zoneReducer":{"zoneDetail":"","newsDetail":""}}
HETU
Documentation | Examples
Hetu is a high-performance distributed deep learning system targeting trillions of parameters DL model training, developed by DAIR Lab at Peking University. It takes account of both high availability in industry and innovation in academia, which has a number of advanced characteristics:
Applicability. DL model definition with standard dataflow graph; many basic CPU and GPU operators; efficient implementation of more than plenty of DL models and at least popular 10 ML algorithms.
Efficiency. Achieve at least 30% speedup compared to TensorFlow on DNN, CNN, RNN benchmarks.
Flexibility. Supporting various parallel training protocols and distributed communication architectures, such as Data/Model/Pipeline parallel; Parameter server & AllReduce.
Scalability. Deployment on more than 100 computation nodes; Training giant models with trillions of model parameters, e.g., Criteo Kaggle, Open Graph Benchmark
Agility. Automatically ML pipeline: feature engineering, model selection, hyperparameter search.
We welcome everyone interested in machine learning or graph computing to contribute codes, create issues or pull requests. Please refer to Contribution Guide for more details.
Installation
Clone the repository.
Prepare the environment. We use Anaconda to manage packages. The following command create the conda environment to be used:
conda env create -f environment.yml
. Please prepare Cuda toolkit and CuDNN in advance.We use CMake to compile Hetu. Please copy the example configuration for compilation by
cp cmake/config.example.cmake cmake/config.cmake
. Users can modify the configuration file to enable/disable the compilation of each module. For advanced users (who not using the provided conda environment), the prerequisites for different modules in Hetu is listed in appendix.source hetu.exp
.Usage
Train logistic regression on gpu:
Train a 3-layer mlp on gpu:
Train a 3-layer cnn with gpu:
Train a 3-layer mlp with allreduce on 8 gpus (use mpirun):
Train a 3-layer mlp with PS on 1 server and 2 workers:
More Examples
Please refer to examples directory, which contains CNN, NLP, CTR, GNN training scripts. For distributed training, please refer to CTR and GNN tasks.
Community
Enterprise Users
If you are enterprise users and find Hetu is useful in your work, please let us know, and we are glad to add your company logo here.
License
The entire codebase is under license
Papers
Acknowledgements
We learned and borrowed insights from a few open source projects including TinyFlow, autodist, tf.distribute and Angel.
Appendix
The prerequisites for different modules in Hetu is listed as follows: