HugeGraph-Computer Quick Start

1 HugeGraph-Computer Overview

The HugeGraph-Computer is a distributed graph processing system for HugeGraph (OLAP). It is an implementation of Pregel. It runs on a Kubernetes framework.

Features

  • Support distributed MPP graph computing, and integrates with HugeGraph as graph input/output storage.
  • Based on BSP (Bulk Synchronous Parallel) model, an algorithm performs computing through multiple parallel iterations, every iteration is a superstep.
  • Auto memory management. The framework will never be OOM(Out of Memory) since it will split some data to disk if it doesn’t have enough memory to hold all the data.
  • The part of edges or the messages of super node can be in memory, so you will never lose it.
  • You can load the data from HDFS or HugeGraph, or any other system.
  • You can output the results to HDFS or HugeGraph, or any other system.
  • Easy to develop a new algorithm. You just need to focus on a vertex only processing just like as in a single server, without worrying about message transfer and memory/storage management.

2 Dependency for Building/Running

2.1 Install Java 11 (JDK 11)

Must use ≥ Java 11 to run Computer, and configure by yourself.

Be sure to execute the java -version command to check the jdk version before reading

3 Get Started

3.1 Run PageRank algorithm locally

To run algorithm with HugeGraph-Computer, you need to install Java 11 or later versions.

You also need to deploy HugeGraph-Server and Etcd.

There are two ways to get HugeGraph-Computer:

  • Download the compiled tarball
  • Clone source code then compile and package

3.1.1 Download the compiled archive

Download the latest version of the HugeGraph-Computer release package:

  1. wget https://downloads.apache.org/incubator/hugegraph/${version}/apache-hugegraph-computer-incubating-${version}.tar.gz
  2. tar zxvf apache-hugegraph-computer-incubating-${version}.tar.gz -C hugegraph-computer

3.1.2 Clone source code to compile and package

Clone the latest version of HugeGraph-Computer source package:

  1. $ git clone https://github.com/apache/hugegraph-computer.git

Compile and generate tar package:

  1. cd hugegraph-computer
  2. mvn clean package -DskipTests

3.1.3 Start master node

You can use -c parameter specify the configuration file, more computer config please see:Computer Config Options

  1. cd hugegraph-computer
  2. bin/start-computer.sh -d local -r master

3.1.4 Start worker node

  1. bin/start-computer.sh -d local -r worker

3.1.5 Query algorithm results

3.1.5.1 Enable OLAP index query for server

If OLAP index is not enabled, it needs to enable. More reference: modify-graphs-read-mode

  1. PUT http://localhost:8080/graphs/hugegraph/graph_read_mode
  2. "ALL"

3.1.5.2 Query page_rank property value:

  1. curl "http://localhost:8080/graphs/hugegraph/graph/vertices?page&limit=3" | gunzip

3.2 Run PageRank algorithm in Kubernetes

To run algorithm with HugeGraph-Computer, you need to deploy HugeGraph-Server first

3.2.1 Install HugeGraph-Computer CRD

  1. # Kubernetes version >= v1.16
  2. kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1.yaml
  3. # Kubernetes version < v1.16
  4. kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1beta1.yaml

3.2.2 Show CRD

  1. kubectl get crd
  2. NAME CREATED AT
  3. hugegraphcomputerjobs.hugegraph.apache.org 2021-09-16T08:01:08Z

3.2.3 Install hugegraph-computer-operator&etcd-server

  1. kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-operator.yaml

3.2.4 Wait for hugegraph-computer-operator&etcd-server deployment to complete

  1. kubectl get pod -n hugegraph-computer-operator-system
  2. NAME READY STATUS RESTARTS AGE
  3. hugegraph-computer-operator-controller-manager-58c5545949-jqvzl 1/1 Running 0 15h
  4. hugegraph-computer-operator-etcd-28lm67jxk5 1/1 Running 0 15h

3.2.5 Submit job

More computer crd please see: Computer CRD

More computer config please see: Computer Config Options

  1. cat <<EOF | kubectl apply --filename -
  2. apiVersion: hugegraph.apache.org/v1
  3. kind: HugeGraphComputerJob
  4. metadata:
  5. namespace: hugegraph-computer-operator-system
  6. name: &jobName pagerank-sample
  7. spec:
  8. jobId: *jobName
  9. algorithmName: page_rank
  10. image: hugegraph/hugegraph-computer:latest # algorithm image url
  11. jarFile: /hugegraph/hugegraph-computer/algorithm/builtin-algorithm.jar # algorithm jar path
  12. pullPolicy: Always
  13. workerCpu: "4"
  14. workerMemory: "4Gi"
  15. workerInstances: 5
  16. computerConf:
  17. job.partitions_count: "20"
  18. algorithm.params_class: org.apache.hugegraph.computer.algorithm.centrality.pagerank.PageRankParams
  19. hugegraph.url: http://${hugegraph-server-host}:${hugegraph-server-port} # hugegraph server url
  20. hugegraph.name: hugegraph # hugegraph graph name
  21. EOF

3.2.6 Show job

  1. kubectl get hcjob/pagerank-sample -n hugegraph-computer-operator-system
  2. NAME JOBID JOBSTATUS
  3. pagerank-sample pagerank-sample RUNNING

3.2.7 Show log of nodes

  1. # Show the master log
  2. kubectl logs -l component=pagerank-sample-master -n hugegraph-computer-operator-system
  3. # Show the worker log
  4. kubectl logs -l component=pagerank-sample-worker -n hugegraph-computer-operator-system
  5. # Show diagnostic log of a job
  6. # NOTE: diagnostic log exist only when the job fails, and it will only be saved for one hour.
  7. kubectl get event --field-selector reason=ComputerJobFailed --field-selector involvedObject.name=pagerank-sample -n hugegraph-computer-operator-system

3.2.8 Show success event of a job

NOTE: it will only be saved for one hour

  1. kubectl get event --field-selector reason=ComputerJobSucceed --field-selector involvedObject.name=pagerank-sample -n hugegraph-computer-operator-system

3.2.9 Query algorithm results

If the output to Hugegraph-Server is consistent with Locally, if output to HDFS, please check the result file in the directory of /hugegraph-computer/results/{jobId} directory.

4 Built-In algorithms document

4.1 Supported algorithms list:

Centrality Algorithm:
  • PageRank
  • BetweennessCentrality
  • ClosenessCentrality
  • DegreeCentrality
Community Algorithm:
  • ClusteringCoefficient
  • Kcore
  • Lpa
  • TriangleCount
  • Wcc
Path Algorithm:
  • RingsDetection
  • RingsDetectionWithFilter

More algorithms please see: Built-In algorithms

4.2 Algorithm describe

TODO

5 Algorithm development guide

TODO

6 Note

  • If some classes under computer-k8s cannot be found, you need to execute mvn compile in advance to generate corresponding classes.

Last modified December 13, 2024: chore: update version to 1.5.0 (#385) (5e7803ce)