HugeGraph BenchMark Performance
Note:
The current performance metrics are based on an earlier version. The latest version has significant improvements in both performance and functionality. We encourage you to refer to the most recent release featuring autonomous distributed storage and enhanced computational push down capabilities. Alternatively, you may wait for the community to update the data with these enhancements.
1 Test environment
1.1 Hardware information
CPU | Memory | 网卡 | 磁盘 |
---|---|---|---|
48 Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz | 128G | 10000Mbps | 750GB SSD |
1.2 Software information
1.2.1 Test cases
Testing is done using the graphdb-benchmark, a benchmark suite for graph databases. This benchmark suite mainly consists of four types of tests:
- Massive Insertion, which involves batch insertion of vertices and edges, with a certain number of vertices or edges being submitted at once.
- Single Insertion, which involves the immediate insertion of each vertex or edge, one at a time.
- Query, which mainly includes the basic query operations of the graph database:
- Find Neighbors, which queries the neighbors of all vertices.
- Find Adjacent Nodes, which queries the adjacent vertices of all edges.
- Find the Shortest Path, which queries the shortest path from the first vertex to 100 random vertices.
- Clustering, which is a community detection algorithm based on the Louvain Method.
1.2.2 Test dataset
Tests are conducted using both synthetic and real data.
MIW, SIW, and QW use SNAP datasets:
- CW uses synthetic data generated by the LFR-Benchmark generator.
The size of the datasets used in this test is not mentioned.
Name | Number of Vertices | Number of Edges | File Size |
---|---|---|---|
email-enron.txt | 36,691 | 367,661 | 4MB |
com-youtube.ungraph.txt | 1,157,806 | 2,987,624 | 38.7MB |
amazon0601.txt | 403,393 | 3,387,388 | 47.9MB |
com-lj.ungraph.txt | 3997961 | 34681189 | 479MB |
1.3 Service configuration
HugeGraph version: 0.5.6, RestServer and Gremlin Server and backends are on the same server
- RocksDB version: rocksdbjni-5.8.6
Titan version: 0.5.4, using thrift+Cassandra mode
- Cassandra version: cassandra-3.10, commit-log and data use SSD together
- Neo4j version: 2.0.1
The Titan version adapted by graphdb-benchmark is 0.5.4.
2 Test results
2.1 Batch insertion performance
Backend | email-enron(30w) | amazon0601(300w) | com-youtube.ungraph(300w) | com-lj.ungraph(3000w) |
---|---|---|---|---|
HugeGraph | 0.629 | 5.711 | 5.243 | 67.033 |
Titan | 10.15 | 108.569 | 150.266 | 1217.944 |
Neo4j | 3.884 | 18.938 | 24.890 | 281.537 |
Instructions
- The data scale is in the table header in terms of edges
- The data in the table is the time for batch insertion, in seconds
- For example, HugeGraph(RocksDB) spent 5.711 seconds to insert 3 million edges of the amazon0601 dataset.
Conclusion
- The performance of batch insertion: HugeGraph(RocksDB) > Neo4j > Titan(thrift+Cassandra)
2.2 Traversal performance
2.2.1 Explanation of terms
- FN(Find Neighbor): Traverse all vertices, find the adjacent edges based on each vertex, and use the edges and vertices to find the other vertices adjacent to the original vertex.
- FA(Find Adjacent): Traverse all edges, get the source vertex and target vertex based on each edge.
2.2.2 FN performance
Backend | email-enron(3.6w) | amazon0601(40w) | com-youtube.ungraph(120w) | com-lj.ungraph(400w) |
---|---|---|---|---|
HugeGraph | 4.072 | 45.118 | 66.006 | 609.083 |
Titan | 8.084 | 92.507 | 184.543 | 1099.371 |
Neo4j | 2.424 | 10.537 | 11.609 | 106.919 |
Instructions
- The data in the table header “()” represents the data scale, in terms of vertices.
- The data in the table represents the time spent traversing vertices in seconds.
- For example, HugeGraph uses the RocksDB backend to traverse all vertices in amazon0601, and search for adjacent edges and another vertex, which takes a total of 45.118 seconds.
2.2.3 FA performance
Backend | email-enron(30w) | amazon0601(300w) | com-youtube.ungraph(300w) | com-lj.ungraph(3000w) |
---|---|---|---|---|
HugeGraph | 1.540 | 10.764 | 11.243 | 151.271 |
Titan | 7.361 | 93.344 | 169.218 | 1085.235 |
Neo4j | 1.673 | 4.775 | 4.284 | 40.507 |
Explanation
- The data size in the header “()” is based on the number of vertices.
- The data in the table is the time it takes to traverse the vertices in seconds.
- For example, HugeGraph with RocksDB backend traverses all vertices in the amazon0601 dataset, and it looks up adjacent edges and other vertices, taking a total of 45.118 seconds.
Conclusion
- Traversal performance: Neo4j > HugeGraph(RocksDB) > Titan(thrift+Cassandra)
2.3 Performance of Common Graph Analysis Methods in HugeGraph
Terminology Explanation
- FS (Find Shortest Path): finding the shortest path between two vertices
- K-neighbor: all vertices that can be reached by traversing K hops (including 1, 2, 3…(K-1) hops) from the starting vertex
- K-out: all vertices that can be reached by traversing exactly K out-edges from the starting vertex.
FS performance
Backend | email-enron(30w) | amazon0601(300w) | com-youtube.ungraph(300w) | com-lj.ungraph(3000w) |
---|---|---|---|---|
HugeGraph | 0.494 | 0.103 | 3.364 | 8.155 |
Titan | 11.818 | 0.239 | 377.709 | 575.678 |
Neo4j | 1.719 | 1.800 | 1.956 | 8.530 |
Explanation
- The data in the header “()” represents the data scale in terms of edges
- The data in the table is the time it takes to find the shortest path from the first vertex to 100 randomly selected vertices in seconds
- For example, HugeGraph using the RocksDB backend to find the shortest path from the first vertex to 100 randomly selected vertices in the amazon0601 graph took a total of 0.103s.
Conclusion
- In scenarios with small data size or few vertex relationships, HugeGraph outperforms Neo4j and Titan.
- As the data size increases and the degree of vertex association increases, the performance of HugeGraph and Neo4j tends to be similar, both far exceeding Titan.
K-neighbor Performance
Vertex | Depth | Degree 1 | Degree 2 | Degree 3 | Degree 4 | Degree 5 | Degree 6 |
---|---|---|---|---|---|---|---|
v1 | Time | 0.031s | 0.033s | 0.048s | 0.500s | 11.27s | OOM |
v111 | Time | 0.027s | 0.034s | 0.115s | 1.36s | OOM | – |
v1111 | Time | 0.039s | 0.027s | 0.052s | 0.511s | 10.96s | OOM |
Explanation
- HugeGraph-Server’s JVM memory is set to 32GB and may experience OOM when the data is too large.
K-out performance
Vertex | Depth | 1st Degree | 2nd Degree | 3rd Degree | 4th Degree | 5th Degree | 6th Degree |
---|---|---|---|---|---|---|---|
v1 | Time | 0.054s | 0.057s | 0.109s | 0.526s | 3.77s | OOM |
Degree | 10 | 133 | 2453 | 50,830 | 1,128,688 | ||
v111 | Time | 0.032s | 0.042s | 0.136s | 1.25s | 20.62s | OOM |
Degree | 10 | 211 | 4944 | 113150 | 2,629,970 | ||
v1111 | Time | 0.039s | 0.045s | 0.053s | 1.10s | 2.92s | OOM |
Degree | 10 | 140 | 2555 | 50825 | 1,070,230 |
Explanation
- The JVM memory of HugeGraph-Server is set to 32GB, and OOM may occur when the data is too large.
Conclusion
- In the FS scenario, HugeGraph outperforms Neo4j and Titan in terms of performance.
- In the K-neighbor and K-out scenarios, HugeGraph can achieve results returned within seconds within 5 degrees.
2.4 Comprehensive Performance Test - CW
Database | Size 1000 | Size 5000 | Size 10000 | Size 20000 |
---|---|---|---|---|
HugeGraph(core) | 20.804 | 242.099 | 744.780 | 1700.547 |
Titan | 45.790 | 820.633 | 2652.235 | 9568.623 |
Neo4j | 5.913 | 50.267 | 142.354 | 460.880 |
Explanation
- The “scale” is based on the number of vertices.
- The data in the table is the time required to complete community discovery in seconds. For example, if HugeGraph uses the RocksDB backend and operates on a dataset of 10,000 vertices, and the community aggregation is no longer changing, it takes 744.780 seconds.
- The CW test is a comprehensive evaluation of CRUD operations.
- In this test, HugeGraph, like Titan, did not use the client and directly operated on the core.
Conclusion
- Performance of community detection algorithm: Neo4j > HugeGraph > Titan
Last modified April 23, 2024: refact: add 3 SEC issues & enhance the intro/perf doc (#358) (329c566b)