HugeGraph BenchMark Performance

Note:

The current performance metrics are based on an earlier version. The latest version has significant improvements in both performance and functionality. We encourage you to refer to the most recent release featuring autonomous distributed storage and enhanced computational push down capabilities. Alternatively, you may wait for the community to update the data with these enhancements.

1 Test environment

1.1 Hardware information

CPUMemory网卡磁盘
48 Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz128G10000Mbps750GB SSD

1.2 Software information

1.2.1 Test cases

Testing is done using the graphdb-benchmark, a benchmark suite for graph databases. This benchmark suite mainly consists of four types of tests:

  • Massive Insertion, which involves batch insertion of vertices and edges, with a certain number of vertices or edges being submitted at once.
  • Single Insertion, which involves the immediate insertion of each vertex or edge, one at a time.
  • Query, which mainly includes the basic query operations of the graph database:
    • Find Neighbors, which queries the neighbors of all vertices.
    • Find Adjacent Nodes, which queries the adjacent vertices of all edges.
    • Find the Shortest Path, which queries the shortest path from the first vertex to 100 random vertices.
  • Clustering, which is a community detection algorithm based on the Louvain Method.
1.2.2 Test dataset

Tests are conducted using both synthetic and real data.

The size of the datasets used in this test is not mentioned.

NameNumber of VerticesNumber of EdgesFile Size
email-enron.txt36,691367,6614MB
com-youtube.ungraph.txt1,157,8062,987,62438.7MB
amazon0601.txt403,3933,387,38847.9MB
com-lj.ungraph.txt399796134681189479MB

1.3 Service configuration

  • HugeGraph version: 0.5.6, RestServer and Gremlin Server and backends are on the same server

    • RocksDB version: rocksdbjni-5.8.6
  • Titan version: 0.5.4, using thrift+Cassandra mode

    • Cassandra version: cassandra-3.10, commit-log and data use SSD together
  • Neo4j version: 2.0.1

The Titan version adapted by graphdb-benchmark is 0.5.4.

2 Test results

2.1 Batch insertion performance

Backendemail-enron(30w)amazon0601(300w)com-youtube.ungraph(300w)com-lj.ungraph(3000w)
HugeGraph0.6295.7115.24367.033
Titan10.15108.569150.2661217.944
Neo4j3.88418.93824.890281.537

Instructions

  • The data scale is in the table header in terms of edges
  • The data in the table is the time for batch insertion, in seconds
  • For example, HugeGraph(RocksDB) spent 5.711 seconds to insert 3 million edges of the amazon0601 dataset.
Conclusion
  • The performance of batch insertion: HugeGraph(RocksDB) > Neo4j > Titan(thrift+Cassandra)

2.2 Traversal performance

2.2.1 Explanation of terms
  • FN(Find Neighbor): Traverse all vertices, find the adjacent edges based on each vertex, and use the edges and vertices to find the other vertices adjacent to the original vertex.
  • FA(Find Adjacent): Traverse all edges, get the source vertex and target vertex based on each edge.
2.2.2 FN performance
Backendemail-enron(3.6w)amazon0601(40w)com-youtube.ungraph(120w)com-lj.ungraph(400w)
HugeGraph4.07245.11866.006609.083
Titan8.08492.507184.5431099.371
Neo4j2.42410.53711.609106.919

Instructions

  • The data in the table header “()” represents the data scale, in terms of vertices.
  • The data in the table represents the time spent traversing vertices in seconds.
  • For example, HugeGraph uses the RocksDB backend to traverse all vertices in amazon0601, and search for adjacent edges and another vertex, which takes a total of 45.118 seconds.
2.2.3 FA performance
Backendemail-enron(30w)amazon0601(300w)com-youtube.ungraph(300w)com-lj.ungraph(3000w)
HugeGraph1.54010.76411.243151.271
Titan7.36193.344169.2181085.235
Neo4j1.6734.7754.28440.507

Explanation

  • The data size in the header “()” is based on the number of vertices.
  • The data in the table is the time it takes to traverse the vertices in seconds.
  • For example, HugeGraph with RocksDB backend traverses all vertices in the amazon0601 dataset, and it looks up adjacent edges and other vertices, taking a total of 45.118 seconds.
Conclusion
  • Traversal performance: Neo4j > HugeGraph(RocksDB) > Titan(thrift+Cassandra)

2.3 Performance of Common Graph Analysis Methods in HugeGraph

Terminology Explanation
  • FS (Find Shortest Path): finding the shortest path between two vertices
  • K-neighbor: all vertices that can be reached by traversing K hops (including 1, 2, 3…(K-1) hops) from the starting vertex
  • K-out: all vertices that can be reached by traversing exactly K out-edges from the starting vertex.
FS performance
Backendemail-enron(30w)amazon0601(300w)com-youtube.ungraph(300w)com-lj.ungraph(3000w)
HugeGraph0.4940.1033.3648.155
Titan11.8180.239377.709575.678
Neo4j1.7191.8001.9568.530

Explanation

  • The data in the header “()” represents the data scale in terms of edges
  • The data in the table is the time it takes to find the shortest path from the first vertex to 100 randomly selected vertices in seconds
  • For example, HugeGraph using the RocksDB backend to find the shortest path from the first vertex to 100 randomly selected vertices in the amazon0601 graph took a total of 0.103s.
Conclusion
  • In scenarios with small data size or few vertex relationships, HugeGraph outperforms Neo4j and Titan.
  • As the data size increases and the degree of vertex association increases, the performance of HugeGraph and Neo4j tends to be similar, both far exceeding Titan.
K-neighbor Performance
VertexDepthDegree 1Degree 2Degree 3Degree 4Degree 5Degree 6
v1Time0.031s0.033s0.048s0.500s11.27sOOM
v111Time0.027s0.034s0.115s1.36sOOM
v1111Time0.039s0.027s0.052s0.511s10.96sOOM

Explanation

  • HugeGraph-Server’s JVM memory is set to 32GB and may experience OOM when the data is too large.
K-out performance
VertexDepth1st Degree2nd Degree3rd Degree4th Degree5th Degree6th Degree
v1Time0.054s0.057s0.109s0.526s3.77sOOM
Degree10133245350,8301,128,688
v111Time0.032s0.042s0.136s1.25s20.62sOOM
Degree1021149441131502,629,970
v1111Time0.039s0.045s0.053s1.10s2.92sOOM
Degree101402555508251,070,230

Explanation

  • The JVM memory of HugeGraph-Server is set to 32GB, and OOM may occur when the data is too large.
Conclusion
  • In the FS scenario, HugeGraph outperforms Neo4j and Titan in terms of performance.
  • In the K-neighbor and K-out scenarios, HugeGraph can achieve results returned within seconds within 5 degrees.

2.4 Comprehensive Performance Test - CW

DatabaseSize 1000Size 5000Size 10000Size 20000
HugeGraph(core)20.804242.099744.7801700.547
Titan45.790820.6332652.2359568.623
Neo4j5.91350.267142.354460.880

Explanation

  • The “scale” is based on the number of vertices.
  • The data in the table is the time required to complete community discovery in seconds. For example, if HugeGraph uses the RocksDB backend and operates on a dataset of 10,000 vertices, and the community aggregation is no longer changing, it takes 744.780 seconds.
  • The CW test is a comprehensive evaluation of CRUD operations.
  • In this test, HugeGraph, like Titan, did not use the client and directly operated on the core.
Conclusion
  • Performance of community detection algorithm: Neo4j > HugeGraph > Titan

Last modified April 23, 2024: refact: add 3 SEC issues & enhance the intro/perf doc (#358) (329c566b)