hetu-core/hetu-docs/en/connector/datacenter.md

14 KiB

Data Center Connector

The Data Center connector allows querying a remote openLooKeng data center. This can be used to join data between different openLooKeng clusters from the local openLooKeng environment.

Local DC Connector Configuration

To configure the Data Center connector, create a catalog properties file in etc/catalog named, for example, <dc-name>.properties, to mount the Data Center connector as the <dc-name> catalog. Create the file with the following contents, replacing the connection properties as appropriate for your setup:

Basic Configuration

connector.name=dc
connection-url=http://example.net:8080
connection-user=<The User Name of Remote openLooKeng>
connection-password=<The Password of remote openLooKeng>
Property Name Description Required Default Value
connection-url URL of openLooKeng cluster to connect Yes
connection-user User Name to use when connecting to openLooKeng cluster No
connection-password Password to use when connecting to openLooKeng cluster No

Security Configuration

When the remote openLooKeng has enabled security authentication or TLS / SSL, the corresponding security configuration should be carried out on the <dc-name>.properties.

Kerberos Authentication Mode

Property Name Description Default Value
dc.kerberos.config.path Kerberos configuration file
dc.kerberos.credential.cachepath Kerberos credential cache
dc.kerberos.keytab.path Kerberos keytab file
dc.kerberos.principal The principal to use when authenticating to the openLooKeng coordinator
dc.kerberos.remote.service.name openLooKeng coordinator Kerberos service name. This parameter is required for Kerberos authentication
dc.kerberos.service.principal.pattern openLooKeng coordinator Kerberos service principal pattern. The default is ${SERVICE}@${HOST}.${SERVICE} is replaced with the value of dc.kerberos.remote.service.name and ${HOST} is replaced with the host name of the coordinator (after canonicalization if enabled) ${SERVICE}@${HOST}
dc.kerberos.use.canonical.hostname Use the canonical host name of the openLooKeng coordinator for the Kerberos service principal by first resolving the host name to an IP address and then doing a reverse DNS lookup for that IP address. false

Token Authentication Mode

Property Name Description Default Value
dc.accesstoken Access token for token based authentication

External Certificate Authentication Mode

Property Name Description Default Value
dc.extra.credentials Extra credentials for connecting to external services. The extra.credentials is a list of key-value pairs. Example: foo:bar;abc:xyz will create credentials abc=xyz and foo=bar

SSL/TLS Configuration

Property Name Description Default Value
dc.ssl Use HTTPS for connections false
dc.ssl.keystore.password The keystore password
dc.ssl.keystore.path The location of the Java keystore file that contains the certificate and private key to use for authentication
dc.ssl.truststore.password The truststore password
dc.ssl.truststore.path The location of the Java truststore file that will be used to validate HTTPS server certificates

Proxy Configuration

Property Name Description Default Value
dc.socksproxy SOCKS proxy host and port. Example: localhost:1080
dc.httpproxy HTTP proxy host and port. Example: localhost:8888

Performance Optimization Configuration

Property Name Description Default Value
dc.metadata.cache.enabled Metadata Cache Enabled true
dc.metadata.cache.maximum.size Metadata Cache Maximum Size 10000
dc.metadata.cache.ttl Metadata Cache TTL 1.00s
dc.query.pushdown.enabled Enable sub-query push down to this data center. If this property is not set, by default sub-queries are pushed down true
dc.query.pushdown.module FULL_PUSHDOWN: All push down. BASE_PUSHDOWN: Partial push down, which indicates that filter, aggregation, limit, topN and project can be pushed down. FULL_PUSHDOWN
dc.http-compression Whether use zstd compress response body, default value is false false

Other Properties

Property Name Description Default Value
dc.http-request-connectTimeout HTTP request connect timeout, default value is 30s 30.00s
dc.http-request-readTimeout HTTP request read timeout, default value is 30s 30.00s
dc.httpclient.maximum.idle.connections Maximum idle connections to be kept open in the HTTP client 20
dc.http-client-timeout Time until the client keeps retrying to fetch the data, default value is 10 min 10.00m
dc.max.anticipated.delay Maximum anticipated delay between two requests for a query in the cluster. If the remote openLooKeng did not receive a request for more than this delay, it may cancel the query 10.00m
dc.application.name.prefix Prefix to append to any specified ApplicationName client info property, which is used to Set source name for the openLooKeng query. If neither this property nor ApplicationName are set, the source for the query will be hetu-dc hetu-dc
dc.remote-http-server.max-request-header-size This property should be equivalent to the value of http-server.max-request-header-size in the remote server
dc.remote.cluster.id A unique id for the remote cluster

Remote openLooKeng Configuration

openLooKeng Configuration

You can set following properties in the etc/config.properties:

Property Name Description Default Value
hetu.data.center.split.count Maximum number of splits allowed per query 5
hetu.data.center.consumer.timeout The maximum delay of waiting to be taken after the data is obtained by executing the query 10min

Nginx Configuration

When HA is enabled at the remote end and Nginx is used as the proxy, the configuration of Nginx needs to be modified:

http {
    upstream for_aa {
        ip_hash;
        server 192.168.0.101:8090;   #coordinator-1;
        server 192.168.0.102:8090;   #coordinator-2;
        check interval=3000 rise=2 fall=5 timeout=1000 type=http;
    }

    upstream for_cross_region {
        hash $hashKey consistent;
        server 192.168.0.101:8090;   #coordinator-1;
        server 192.168.0.102:8090;   #coordinator-2;
        check interval=3000 rise=2 fall=5 timeout=1000 type=http;
    }
    
    server {
        listen nginx_ip:8888; # nginx port
        
        location / {
            proxy_pass http://for_aa;
            proxy_redirect off;
            proxy_set_header Host $host:$server_port;
        }
        
        location ^~/v1/dc/(.*)/(.*) {
            set $hashKey $2;
            proxy_redirect off;
            proxy_pass http://for_cross_region;
		    proxy_set_header Host $host:$server_port;
        }
        
        location ^~/v1/dc/statement/(.*)/(.*)/(.*) {
            set $hashKey $3;
            proxy_redirect off;
            proxy_pass http://for_cross_region;
		    proxy_set_header Host $host:$server_port;
        }
    }
}

Multiple openLooKeng Clusters

You can have as many catalogs as you need, so if you have additional data centers, simply add another properties file to etc/catalog with a different name (making sure it ends in .properties). For example, if you name the property file sales.properties, openLooKeng will create a catalog named sales using the configured connector.

Global Dynamic Filter

The global dynamic filtering is enabled, when the cross openLooKeng query is executed, the filter is generated locally and sent to the remote openLooKeng for data filtering to reduce the amount of data pulled from the remote openLooKeng. It is necessary to ensure that state store is enabled in openLooKeng environment (please refer to the configuration document of state store for relevant configuration). There are two ways to enable global dynamic filtering:

Method 1: You can set following properties in the etc/config.properties:

Property Name Description Default Value
enable-dynamic-filtering Whether the dynamic filtering feature is enabled false
dynamic-filtering-max-per-driver-row-count If the maximum number of rows per driver is exceeded, the dynamic filtering feature of the query will be automatically cancelled 100
dynamic-filtering-max-per-driver-size If the maximum amount of data allowed to be processed by each driver exceeds this value, the dynamic filtering feature of the query will be automatically cancelled 10KB

Method 2: Set properties in session

  • By openLooKeng CLI

    java -jar hetu-cli-*-execute.jar --server ip:port --session enable-dynamic-filter=ture --session dynamic-filtering-max-per-driver-row-count=10000 --session dynamic-filtering-max-per-driver-size=1MB
    
  • By openLooKeng JDBC:

    Properties properties = new Properties();
    properites.setProperties("enable-dynamic-filter", "true");
    properites.setProperties("dynamic-filtering-max-per-driver-row-count", "10000");
    properites.setProperties("dynamic-filtering-max-per-driver-size", "1MB");
    
    String url = "jdbc:lk://127.0.0.0:8090/hive/default";
    Connection connection = DriverManager.getConnection(url, properties);
    

Querying Remote Data Center

The Data Center connector provides a catalog prefixed with the property file name for every catalog in the remote data center. Treat each prefixed remote catalogs as a separate catalog in the local cluster. You can see the available remote catalogs by running SHOW CATALOGS:

SHOW CATALOGS;

If you have catalog named mysql in the remote data center, you can view the schemas in this remote catalog by running SHOW SCHEMAS:

SHOW SCHEMAS FROM dc.mysql;

If you have a schema named web in the remote catalog mysql, you can view the tables in that catalog by running SHOW TABLES:

SHOW TABLES FROM dc.mysql.web;

You can see a list of the columns in the clicks table in the web schema using either of the following:

DESCRIBE dc.mysql.web.clicks;
SHOW COLUMNS FROM dc.mysql.web.clicks;

Finally, you can access the clicks table in the web schema:

SELECT * FROM dc.mysql.web.clicks;

If you used a different name for your catalog properties file, use that catalog name instead of dc in the above examples.

Data Center Connector Limitations

Data Center connector is a read-only connector. The following SQL statements are not yet supported:

ALTER SCHEMA, ALTER TABLE, ANALYZE, CACHE TABLE, COMMENT, CREATE SCHEMA, CREATE TABLE, CREATE TABLE AS, CREATE VIEW, DELETE, DROP CACHE, DROP SCHEMA, DROP TABLE, DROP VIEW, GRANT, INSERT, INSERT OVERWRITE, REVOKE, SHOW CACHE, SHOW CREATE VIEW, SHOW GRANTS, SHOW ROLES, SHOW ROLE GRANTS, UPDATE, VACUUM