Mar. 16th, 2021. Fluid v0.5.0 is RELEASED! It provides various new features, such as on-the-fly dataset scale out/in, metadata backup, support Fuse global mode and so on. Please check the CHANGELOG for details.
Nov. 6th, 2020. Fluid v0.4.0 is RELEASED! It provides various features and bugfix, such as Prefetch Dataset automatically before using it. Please check the CHANGELOG for details.
Oct. 1st, 2020. Fluid v0.3.0 is RELEASED! It provides various features and bugfix, such as Data Access Acceleration For Persistent Volume and Hostpath mode in K8s. Please check the CHANGELOG for details.
What is Fluid?
Fluid is an open source Kubernetes-native Distributed Dataset Orchestrator and Accelerator for data-intesive applications, such as big data and AI applications.
Features
Native Support for DataSet Abstraction
Make the abilities needed by data-intensive applictions as navtive-supported functions, to achieve efficient data access and reduce the cost of multidimensional management.
Cloud Data Warming up and Accessing Acceleration
Fluid empowers Distributed Cache Capaicty(Alluixo inside) in Kubernetes with Observability, Portability, Horizontal Scalability
Co-Orchestration for Data and Application
During application scheduling and data placement on cloud, taking both the app’s characteristics and data location into consideration, to improve the performance.
Support Multiple Namespaces Management
User can create and manage datasets in multiple namespaces
Support Heterogeneous Data Source Management
Unify the Data access for OSS, HDFS, CEPH and Other underlayer storages
Key Concepts
Dataset: A set of logically related data that will be used by a computing engine, such as Spark for big data and TensorFlow for AI scenarios. The management of dataset has many metrics, has multiple dimensions, such as security, version management and data acceleration. And we hope to start with data acceleration and provide support for the management of data sets.
Runtime: Security, version management and data acceleration, and defines a series of life cycle interfaces. You can implement them.
AlluxioRuntime: From Alluixo,
Fluid manages and schedules Alluxio Runtime to achieve dataset visibility, elastic scaling, and data migration. It is an engine which supports data management and caching of datasets.
Prerequisites
Kubernetes version > 1.14, and support CSI
Golang 1.12+
Helm 3
Quick Start
You can follow our Get Started guide to quickly start a testing Kubernetes cluster.
Documentation
You can see our documentation at docs for more in-depth installation and instructions for production:
Demo 1: Accelerate Remote File Accessing with Fluid
Demo 2: Machine Learning with Fluid
Demo 3: Accelerate PVC with Fluid
Demo 4: Preload dataset with Fluid
Demo 5: On-the-fly dataset cache scaling
Roadmap
See ROADMAP.md for the roadmap details. It may be updated from time to time.
Community
Feel free to reach out if you have any questions. The maintainers of this project are reachable via:
DingTalk:
Contributing
Contributions are highly welcomed and greatly appreciated. See CONTRIBUTING.md for details on submitting patches and the contribution workflow.
Adopters
If you are intrested in Fluid and would like to share your experiences with others, you are warmly welcome to add your information on ADOPTERS.md page. We will continuousely discuss new requirements and feature design with you in advance.
Open Source License
Fluid is under the Apache 2.0 license. See the LICENSE file for details. It is vendor-neutral.
Fluid is an open source Kubernetes-native Distributed Dataset Orchestrator and Accelerator for data-intesive applications, such as big data and AI applications.
Fluid
English | 简体中文
What is Fluid?
Fluid is an open source Kubernetes-native Distributed Dataset Orchestrator and Accelerator for data-intesive applications, such as big data and AI applications.
Features
Native Support for DataSet Abstraction
Make the abilities needed by data-intensive applictions as navtive-supported functions, to achieve efficient data access and reduce the cost of multidimensional management.
Cloud Data Warming up and Accessing Acceleration
Fluid empowers Distributed Cache Capaicty(Alluixo inside) in Kubernetes with Observability, Portability, Horizontal Scalability
Co-Orchestration for Data and Application
During application scheduling and data placement on cloud, taking both the app’s characteristics and data location into consideration, to improve the performance.
Support Multiple Namespaces Management
Support Heterogeneous Data Source Management
Key Concepts
Dataset: A set of logically related data that will be used by a computing engine, such as Spark for big data and TensorFlow for AI scenarios. The management of dataset has many metrics, has multiple dimensions, such as security, version management and data acceleration. And we hope to start with data acceleration and provide support for the management of data sets.
Runtime: Security, version management and data acceleration, and defines a series of life cycle interfaces. You can implement them.
AlluxioRuntime: From Alluixo, Fluid manages and schedules Alluxio Runtime to achieve dataset visibility, elastic scaling, and data migration. It is an engine which supports data management and caching of datasets.
Prerequisites
Quick Start
You can follow our Get Started guide to quickly start a testing Kubernetes cluster.
Documentation
You can see our documentation at docs for more in-depth installation and instructions for production:
Quick Demo
Demo 1: Accelerate Remote File Accessing with Fluid
Demo 2: Machine Learning with Fluid
Demo 3: Accelerate PVC with Fluid
Demo 4: Preload dataset with Fluid
Demo 5: On-the-fly dataset cache scaling
Roadmap
See ROADMAP.md for the roadmap details. It may be updated from time to time.
Community
Feel free to reach out if you have any questions. The maintainers of this project are reachable via:
DingTalk:
Contributing
Contributions are highly welcomed and greatly appreciated. See CONTRIBUTING.md for details on submitting patches and the contribution workflow.
Adopters
If you are intrested in Fluid and would like to share your experiences with others, you are warmly welcome to add your information on ADOPTERS.md page. We will continuousely discuss new requirements and feature design with you in advance.
Open Source License
Fluid is under the Apache 2.0 license. See the LICENSE file for details. It is vendor-neutral.
Code of Conduct
Fluid adopts CNCF Code of Conduct.