HugeGraph-Computer 配置

Computer Config Options

config optiondefault valuedescription
algorithm.message_classorg.apache.hugegraph.computer.core.config.NullThe class of message passed when compute vertex.
algorithm.params_classorg.apache.hugegraph.computer.core.config.NullThe class used to transfer algorithms’ parameters before algorithm been run.
algorithm.result_classorg.apache.hugegraph.computer.core.config.NullThe class of vertex’s value, the instance is used to store computation result for the vertex.
allocator.max_vertices_per_thread10000Maximum number of vertices per thread processed in each memory allocator
bsp.etcd_endpointshttp://localhost:2379The end points to access etcd.
bsp.log_interval30000The log interval(in ms) to print the log while waiting bsp event.
bsp.max_super_step10The max super step of the algorithm.
bsp.register_timeout300000The max timeout to wait for master and works to register.
bsp.wait_master_timeout86400000The max timeout(in ms) to wait for master bsp event.
bsp.wait_workers_timeout86400000The max timeout to wait for workers bsp event.
hgkv.max_data_block_size65536The max byte size of hgkv-file data block.
hgkv.max_file_size2147483648The max number of bytes in each hgkv-file.
hgkv.max_merge_files10The max number of files to merge at one time.
hgkv.temp_file_dir/tmp/hgkvThis folder is used to store temporary files, temporary files will be generated during the file merging process.
hugegraph.namehugegraphThe graph name to load data and write results back.
hugegraph.urlhttp://127.0.0.1:8080The hugegraph url to load data and write results back.
input.edge_directionOUTThe data of the edge in which direction is loaded, when the value is BOTH, the edges in both OUT and IN direction will be loaded.
input.edge_freqMULTIPLEThe frequency of edges can exist between a pair of vertices, allowed values: [SINGLE, SINGLE_PER_LABEL, MULTIPLE]. SINGLE means that only one edge can exist between a pair of vertices, use sourceId + targetId to identify it; SINGLE_PER_LABEL means that each edge label can exist one edge between a pair of vertices, use sourceId + edgelabel + targetId to identify it; MULTIPLE means that many edge can exist between a pair of vertices, use sourceId + edgelabel + sortValues + targetId to identify it.
input.filter_classorg.apache.hugegraph.computer.core.input.filter.DefaultInputFilterThe class to create input-filter object, input-filter is used to Filter vertex edges according to user needs.
input.loader_schema_pathThe schema path of loader input, only takes effect when the input.source_type=loader is enabled
input.loader_struct_pathThe struct path of loader input, only takes effect when the input.source_type=loader is enabled
input.max_edges_in_one_vertex200The maximum number of adjacent edges allowed to be attached to a vertex, the adjacent edges will be stored and transferred together as a batch unit.
input.source_typehugegraph-serverThe source type to load input data, allowed values: [‘hugegraph-server’, ‘hugegraph-loader’], the ‘hugegraph-loader’ means use hugegraph-loader load data from HDFS or file, if use ‘hugegraph-loader’ load data then please config ‘input.loader_struct_path’ and ‘input.loader_schema_path’.
input.split_fetch_timeout300The timeout in seconds to fetch input splits
input.split_max_splits10000000The maximum number of input splits
input.split_page_size500The page size for streamed load input split data
input.split_size1048576The input split size in bytes
job.idlocal_0001The job id on Yarn cluster or K8s cluster.
job.partitions_count1The partitions count for computing one graph algorithm job.
job.partitions_thread_nums4The number of threads for partition parallel compute.
job.workers_count1The workers count for computing one graph algorithm job.
master.computation_classorg.apache.hugegraph.computer.core.master.DefaultMasterComputationMaster-computation is computation that can determine whether to continue next superstep. It runs at the end of each superstep on master.
output.batch_size500The batch size of output
output.batch_threads1The threads number used to batch output
output.hdfs_core_site_pathThe hdfs core site path.
output.hdfs_delimiter,The delimiter of hdfs output.
output.hdfs_kerberos_enablefalseIs Kerberos authentication enabled for Hdfs.
output.hdfs_kerberos_keytabThe Hdfs’s key tab file for kerberos authentication.
output.hdfs_kerberos_principalThe Hdfs’s principal for kerberos authentication.
output.hdfs_krb5_conf/etc/krb5.confKerberos configuration file.
output.hdfs_merge_partitionstrueWhether merge output files of multiple partitions.
output.hdfs_path_prefix/hugegraph-computer/resultsThe directory of hdfs output result.
output.hdfs_replication3The replication number of hdfs.
output.hdfs_site_pathThe hdfs site path.
output.hdfs_urlhdfs://127.0.0.1:9000The hdfs url of output.
output.hdfs_userhadoopThe hdfs user of output.
output.output_classorg.apache.hugegraph.computer.core.output.LogOutputThe class to output the computation result of each vertex. Be called after iteration computation.
output.result_namevalueThe value is assigned dynamically by #name() of instance created by WORKER_COMPUTATION_CLASS.
output.result_write_typeOLAP_COMMONThe result write-type to output to hugegraph, allowed values are: [OLAP_COMMON, OLAP_SECONDARY, OLAP_RANGE].
output.retry_interval10The retry interval when output failed
output.retry_times3The retry times when output failed
output.single_threads1The threads number used to single output
output.thread_pool_shutdown_timeout60The timeout seconds of output threads pool shutdown
output.with_adjacent_edgesfalseOutput the adjacent edges of the vertex or not
output.with_edge_propertiesfalseOutput the properties of the edge or not
output.with_vertex_propertiesfalseOutput the properties of the vertex or not
sort.thread_nums4The number of threads performing internal sorting.
transport.client_connect_timeout3000The timeout(in ms) of client connect to server.
transport.client_threads4The number of transport threads for client.
transport.close_timeout10000The timeout(in ms) of close server or close client.
transport.finish_session_timeout0The timeout(in ms) to finish session, 0 means using (transport.sync_request_timeout * transport.max_pending_requests).
transport.heartbeat_interval20000The minimum interval(in ms) between heartbeats on client side.
transport.io_modeAUTOThe network IO Mode, either ‘NIO’, ‘EPOLL’, ‘AUTO’, the ‘AUTO’ means selecting the property mode automatically.
transport.max_pending_requests8The max number of client unreceived ack, it will trigger the sending unavailable if the number of unreceived ack >= max_pending_requests.
transport.max_syn_backlog511The capacity of SYN queue on server side, 0 means using system default value.
transport.max_timeout_heartbeat_count120The maximum times of timeout heartbeat on client side, if the number of timeouts waiting for heartbeat response continuously > max_heartbeat_timeouts the channel will be closed from client side.
transport.min_ack_interval200The minimum interval(in ms) of server reply ack.
transport.min_pending_requests6The minimum number of client unreceived ack, it will trigger the sending available if the number of unreceived ack < min_pending_requests.
transport.network_retries3The number of retry attempts for network communication,if network unstable.
transport.provider_classorg.apache.hugegraph.computer.core.network.netty.NettyTransportProviderThe transport provider, currently only supports Netty.
transport.receive_buffer_size0The size of socket receive-buffer in bytes, 0 means using system default value.
transport.recv_file_modetrueWhether enable receive buffer-file mode, it will receive buffer write file from socket by zero-copy if enable.
transport.send_buffer_size0The size of socket send-buffer in bytes, 0 means using system default value.
transport.server_host127.0.0.1The server hostname or ip to listen on to transfer data.
transport.server_idle_timeout360000The max timeout(in ms) of server idle.
transport.server_port0The server port to listen on to transfer data. The system will assign a random port if it’s set to 0.
transport.server_threads4The number of transport threads for server.
transport.sync_request_timeout10000The timeout(in ms) to wait response after sending sync-request.
transport.tcp_keep_alivetrueWhether enable TCP keep-alive.
transport.transport_epoll_ltfalseWhether enable EPOLL level-trigger.
transport.write_buffer_high_mark67108864The high water mark for write buffer in bytes, it will trigger the sending unavailable if the number of queued bytes > write_buffer_high_mark.
transport.write_buffer_low_mark33554432The low water mark for write buffer in bytes, it will trigger the sending available if the number of queued bytes < write_buffer_low_mark.org.apache.hugegraph.config.OptionChecker$$Lambda$97/0x00000008001c8440@776a6d9b
transport.write_socket_timeout3000The timeout(in ms) to write data to socket buffer.
valuefile.max_segment_size1073741824The max number of bytes in each segment of value-file.
worker.combiner_classorg.apache.hugegraph.computer.core.config.NullCombiner can combine messages into one value for a vertex, for example page-rank algorithm can combine messages of a vertex to a sum value.
worker.computation_classorg.apache.hugegraph.computer.core.config.NullThe class to create worker-computation object, worker-computation is used to compute each vertex in each superstep.
worker.data_dirs[jobs]The directories separated by ‘,’ that received vertices and messages can persist into.
worker.edge_properties_combiner_classorg.apache.hugegraph.computer.core.combiner.OverwritePropertiesCombinerThe combiner can combine several properties of the same edge into one properties at inputstep.
worker.partitionerorg.apache.hugegraph.computer.core.graph.partition.HashPartitionerThe partitioner that decides which partition a vertex should be in, and which worker a partition should be in.
worker.received_buffers_bytes_limit104857600The limit bytes of buffers of received data, the total size of all buffers can’t excess this limit. If received buffers reach this limit, they will be merged into a file.
worker.vertex_properties_combiner_classorg.apache.hugegraph.computer.core.combiner.OverwritePropertiesCombinerThe combiner can combine several properties of the same vertex into one properties at inputstep.
worker.wait_finish_messages_timeout86400000The max timeout(in ms) message-handler wait for finish-message of all workers.
worker.wait_sort_timeout600000The max timeout(in ms) message-handler wait for sort-thread to sort one batch of buffers.
worker.write_buffer_capacity52428800The initial size of write buffer that used to store vertex or message.
worker.write_buffer_threshold52428800The threshold of write buffer, exceeding it will trigger sorting, the write buffer is used to store vertex or message.

K8s Operator Config Options

NOTE: Option needs to be converted through environment variable settings, e.g. k8s.internal_etcd_url => INTERNAL_ETCD_URL

config optiondefault valuedescription
k8s.auto_destroy_podtrueWhether to automatically destroy all pods when the job is completed or failed.
k8s.close_reconciler_timeout120The max timeout(in ms) to close reconciler.
k8s.internal_etcd_urlhttp://127.0.0.1:2379The internal etcd url for operator system.
k8s.max_reconcile_retry3The max retry times of reconcile.
k8s.probe_backlog50The maximum backlog for serving health probes.
k8s.probe_port9892The value is the port that the controller bind to for serving health probes.
k8s.ready_check_internal1000The time interval(ms) of check ready.
k8s.ready_timeout30000The max timeout(in ms) of check ready.
k8s.reconciler_count10The max number of reconciler thread.
k8s.resync_period600000The minimum frequency at which watched resources are reconciled.
k8s.timezoneAsia/ShanghaiThe timezone of computer job and operator.
k8s.watch_namespacehugegraph-computer-systemThe value is watch custom resources in the namespace, ignore other namespaces, the ‘*’ means is all namespaces will be watched.

HugeGraph-Computer CRD

CRD: https://github.com/apache/hugegraph-computer/blob/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1.yaml

specdefault valuedescriptionrequired
algorithmNameThe name of algorithm.true
jobIdThe job id.true
imageThe image of algorithm.true
computerConfThe map of computer config options.true
workerInstancesThe number of worker instances, it will instead the ‘job.workers_count’ option.true
pullPolicyAlwaysThe pull-policy of image, detail please refer to: https://kubernetes.io/docs/concepts/containers/images/#image-pull-policyfalse
pullSecretsThe pull-secrets of Image, detail please refer to: https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-podfalse
masterCpuThe cpu limit of master, the unit can be ’m’ or without unit detail please refer to:https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpufalse
workerCpuThe cpu limit of worker, the unit can be ’m’ or without unit detail please refer to:https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpufalse
masterMemoryThe memory limit of master, the unit can be one of Ei、Pi、Ti、Gi、Mi、Ki detail please refer to:https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memoryfalse
workerMemoryThe memory limit of worker, the unit can be one of Ei、Pi、Ti、Gi、Mi、Ki detail please refer to:https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memoryfalse
log4jXmlThe content of log4j.xml for computer job.false
jarFileThe jar path of computer algorithm.false
remoteJarUriThe remote jar uri of computer algorithm, it will overlay algorithm image.false
jvmOptionsThe java startup parameters of computer job.false
envVarsplease refer to: https://kubernetes.io/docs/tasks/inject-data-application/define-interdependent-environment-variables/false
envFromplease refer to: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/false
masterCommandbin/start-computer.shThe run command of master, equivalent to ‘Entrypoint’ field of Docker.false
masterArgs[“-r master”, “-d k8s”]The run args of master, equivalent to ‘Cmd’ field of Docker.false
workerCommandbin/start-computer.shThe run command of worker, equivalent to ‘Entrypoint’ field of Docker.false
workerArgs[“-r worker”, “-d k8s”]The run args of worker, equivalent to ‘Cmd’ field of Docker.false
volumesPlease refer to: https://kubernetes.io/docs/concepts/storage/volumes/false
volumeMountsPlease refer to: https://kubernetes.io/docs/concepts/storage/volumes/false
secretPathsThe map of k8s-secret name and mount path.false
configMapPathsThe map of k8s-configmap name and mount path.false
podTemplateSpecPlease refer to: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-template-v1/#PodTemplateSpecfalse
securityContextPlease refer to: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/false

KubeDriver Config Options

config optiondefault valuedescription
k8s.build_image_bash_pathThe path of command used to build image.
k8s.enable_internal_algorithmtrueWhether enable internal algorithm.
k8s.framework_image_urlhugegraph/hugegraph-computer:latestThe image url of computer framework.
k8s.image_repository_passwordThe password for login image repository.
k8s.image_repository_registryThe address for login image repository.
k8s.image_repository_urlhugegraph/hugegraph-computerThe url of image repository.
k8s.image_repository_usernameThe username for login image repository.
k8s.internal_algorithm[pageRank]The name list of all internal algorithm.
k8s.internal_algorithm_image_urlhugegraph/hugegraph-computer:latestThe image url of internal algorithm.
k8s.jar_file_dir/cache/jars/The directory where the algorithm jar to upload location.
k8s.kube_config~/.kube/configThe path of k8s config file.
k8s.log4j_xml_pathThe log4j.xml path for computer job.
k8s.namespacehugegraph-computer-systemThe namespace of hugegraph-computer system.
k8s.pull_secret_names[]The names of pull-secret for pulling image.

Last modified January 1, 2023: enhance validate doc (#171) (89a0a1a6)