HugeGraph configuration

1 Overview

The directory for the configuration files is hugegraph-release/conf, and all the configurations related to the service and the graph itself are located in this directory.

The main configuration files include gremlin-server.yaml, rest-server.properties, and hugegraph.properties.

The HugeGraphServer integrates the GremlinServer and RestServer internally, and gremlin-server.yaml and rest-server.properties are used to configure these two servers.

  • GremlinServer: GremlinServer accepts Gremlin statements from users, parses them, and then invokes the Core code.
  • RestServer: It provides a RESTful API that, based on different HTTP requests, calls the corresponding Core API. If the user’s request body is a Gremlin statement, it will be forwarded to GremlinServer to perform operations on the graph data.

Now let’s introduce these three configuration files one by one.

2. gremlin-server.yaml

The default content of the gremlin-server.yaml file is as follows:

  1. # host and port of gremlin server, need to be consistent with host and port in rest-server.properties
  2. #host: 127.0.0.1
  3. #port: 8182
  4. # timeout in ms of gremlin query
  5. evaluationTimeout: 30000
  6. channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer
  7. # don't set graph at here, this happens after support for dynamically adding graph
  8. graphs: {
  9. }
  10. scriptEngines: {
  11. gremlin-groovy: {
  12. staticImports: [
  13. org.opencypher.gremlin.process.traversal.CustomPredicates.*',
  14. org.opencypher.gremlin.traversal.CustomFunctions.*
  15. ],
  16. plugins: {
  17. org.apache.hugegraph.plugin.HugeGraphGremlinPlugin: {},
  18. org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
  19. org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {
  20. classImports: [
  21. java.lang.Math,
  22. org.apache.hugegraph.backend.id.IdGenerator,
  23. org.apache.hugegraph.type.define.Directions,
  24. org.apache.hugegraph.type.define.NodeRole,
  25. org.apache.hugegraph.traversal.algorithm.CollectionPathsTraverser,
  26. org.apache.hugegraph.traversal.algorithm.CountTraverser,
  27. org.apache.hugegraph.traversal.algorithm.CustomizedCrosspointsTraverser,
  28. org.apache.hugegraph.traversal.algorithm.CustomizePathsTraverser,
  29. org.apache.hugegraph.traversal.algorithm.FusiformSimilarityTraverser,
  30. org.apache.hugegraph.traversal.algorithm.HugeTraverser,
  31. org.apache.hugegraph.traversal.algorithm.JaccardSimilarTraverser,
  32. org.apache.hugegraph.traversal.algorithm.KneighborTraverser,
  33. org.apache.hugegraph.traversal.algorithm.KoutTraverser,
  34. org.apache.hugegraph.traversal.algorithm.MultiNodeShortestPathTraverser,
  35. org.apache.hugegraph.traversal.algorithm.NeighborRankTraverser,
  36. org.apache.hugegraph.traversal.algorithm.PathsTraverser,
  37. org.apache.hugegraph.traversal.algorithm.PersonalRankTraverser,
  38. org.apache.hugegraph.traversal.algorithm.SameNeighborTraverser,
  39. org.apache.hugegraph.traversal.algorithm.ShortestPathTraverser,
  40. org.apache.hugegraph.traversal.algorithm.SingleSourceShortestPathTraverser,
  41. org.apache.hugegraph.traversal.algorithm.SubGraphTraverser,
  42. org.apache.hugegraph.traversal.algorithm.TemplatePathsTraverser,
  43. org.apache.hugegraph.traversal.algorithm.steps.EdgeStep,
  44. org.apache.hugegraph.traversal.algorithm.steps.RepeatEdgeStep,
  45. org.apache.hugegraph.traversal.algorithm.steps.WeightedEdgeStep,
  46. org.apache.hugegraph.traversal.optimize.ConditionP,
  47. org.apache.hugegraph.traversal.optimize.Text,
  48. org.apache.hugegraph.traversal.optimize.TraversalUtil,
  49. org.apache.hugegraph.util.DateUtil,
  50. org.opencypher.gremlin.traversal.CustomFunctions,
  51. org.opencypher.gremlin.traversal.CustomPredicate
  52. ],
  53. methodImports: [
  54. java.lang.Math#*,
  55. org.opencypher.gremlin.traversal.CustomPredicate#*,
  56. org.opencypher.gremlin.traversal.CustomFunctions#*
  57. ]
  58. },
  59. org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {
  60. files: [scripts/empty-sample.groovy]
  61. }
  62. }
  63. }
  64. }
  65. serializers:
  66. - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1,
  67. config: {
  68. serializeResultToString: false,
  69. ioRegistries: [org.apache.hugegraph.io.HugeGraphIoRegistry]
  70. }
  71. }
  72. - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0,
  73. config: {
  74. serializeResultToString: false,
  75. ioRegistries: [org.apache.hugegraph.io.HugeGraphIoRegistry]
  76. }
  77. }
  78. - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV2d0,
  79. config: {
  80. serializeResultToString: false,
  81. ioRegistries: [org.apache.hugegraph.io.HugeGraphIoRegistry]
  82. }
  83. }
  84. - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0,
  85. config: {
  86. serializeResultToString: false,
  87. ioRegistries: [org.apache.hugegraph.io.HugeGraphIoRegistry]
  88. }
  89. }
  90. metrics: {
  91. consoleReporter: {enabled: false, interval: 180000},
  92. csvReporter: {enabled: false, interval: 180000, fileName: ./metrics/gremlin-server-metrics.csv},
  93. jmxReporter: {enabled: false},
  94. slf4jReporter: {enabled: false, interval: 180000},
  95. gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  96. graphiteReporter: {enabled: false, interval: 180000}
  97. }
  98. maxInitialLineLength: 4096
  99. maxHeaderSize: 8192
  100. maxChunkSize: 8192
  101. maxContentLength: 65536
  102. maxAccumulationBufferComponents: 1024
  103. resultIterationBatchSize: 64
  104. writeBufferLowWaterMark: 32768
  105. writeBufferHighWaterMark: 65536
  106. ssl: {
  107. enabled: false
  108. }

There are many configuration options mentioned above, but for now, let’s focus on the following options: channelizer and graphs.

  • graphs: This option specifies the graphs that need to be opened when the GremlinServer starts. It is a map structure where the key is the name of the graph and the value is the configuration file path for that graph.
  • channelizer: The GremlinServer supports two communication modes with clients: WebSocket and HTTP (default). If WebSocket is chosen, users can quickly experience the features of HugeGraph using Gremlin-Console, but it does not support importing large-scale data. It is recommended to use HTTP for communication, as all peripheral components of HugeGraph are implemented based on HTTP.

By default, the GremlinServer serves at localhost:8182. If you need to modify it, configure the host and port settings.

  • host: The hostname or IP address of the machine where the GremlinServer is deployed. Currently, HugeGraphServer does not support distributed deployment, and GremlinServer is not directly exposed to users.
  • port: The port number of the machine where the GremlinServer is deployed.

Additionally, you need to add the corresponding configuration gremlinserver.url=http://host:port in rest-server.properties.

3. rest-server.properties

The default content of the rest-server.properties file is as follows:

  1. # bind url
  2. restserver.url=http://127.0.0.1:8080
  3. # gremlin server url, need to be consistent with host and port in gremlin-server.yaml
  4. #gremlinserver.url=http://127.0.0.1:8182
  5. # graphs list with pair NAME:CONF_PATH
  6. graphs=[hugegraph:conf/hugegraph.properties]
  7. # authentication
  8. #auth.authenticator=
  9. #auth.admin_token=
  10. #auth.user_tokens=[]
  11. server.id=server-1
  12. server.role=master
  • restserver.url: The URL at which the RestServer provides its services. Modify it according to the actual environment.
  • graphs: The RestServer also needs to open graphs when it starts. This option is a map structure where the key is the name of the graph and the value is the configuration file path for that graph.

Note: Both gremlin-server.yaml and rest-server.properties contain the graphs configuration option, and the init-store command initializes based on the graphs specified in the graphs section of gremlin-server.yaml.

The gremlinserver.url configuration option is the URL at which the GremlinServer provides services to the RestServer. By default, it is set to http://localhost:8182. If you need to modify it, it should match the host and port settings in gremlin-server.yaml.

4. hugegraph.properties

hugegraph.properties is a type of file. If the system has multiple graphs, there will be multiple similar files. This file is used to configure parameters related to graph storage and querying. The default content of the file is as follows:

  1. # gremlin entrence to create graph
  2. gremlin.graph=org.apache.hugegraph.HugeFactory
  3. # cache config
  4. #schema.cache_capacity=100000
  5. # vertex-cache default is 1000w, 10min expired
  6. #vertex.cache_capacity=10000000
  7. #vertex.cache_expire=600
  8. # edge-cache default is 100w, 10min expired
  9. #edge.cache_capacity=1000000
  10. #edge.cache_expire=600
  11. # schema illegal name template
  12. #schema.illegal_name_regex=\s+|~.*
  13. #vertex.default_label=vertex
  14. backend=rocksdb
  15. serializer=binary
  16. store=hugegraph
  17. raft.mode=false
  18. raft.safe_read=false
  19. raft.use_snapshot=false
  20. raft.endpoint=127.0.0.1:8281
  21. raft.group_peers=127.0.0.1:8281,127.0.0.1:8282,127.0.0.1:8283
  22. raft.path=./raft-log
  23. raft.use_replicator_pipeline=true
  24. raft.election_timeout=10000
  25. raft.snapshot_interval=3600
  26. raft.backend_threads=48
  27. raft.read_index_threads=8
  28. raft.queue_size=16384
  29. raft.queue_publish_timeout=60
  30. raft.apply_batch=1
  31. raft.rpc_threads=80
  32. raft.rpc_connect_timeout=5000
  33. raft.rpc_timeout=60000
  34. # if use 'ikanalyzer', need download jar from 'https://github.com/apache/hugegraph-doc/raw/ik_binary/dist/server/ikanalyzer-2012_u6.jar' to lib directory
  35. search.text_analyzer=jieba
  36. search.text_analyzer_mode=INDEX
  37. # rocksdb backend config
  38. #rocksdb.data_path=/path/to/disk
  39. #rocksdb.wal_path=/path/to/disk
  40. # cassandra backend config
  41. cassandra.host=localhost
  42. cassandra.port=9042
  43. cassandra.username=
  44. cassandra.password=
  45. #cassandra.connect_timeout=5
  46. #cassandra.read_timeout=20
  47. #cassandra.keyspace.strategy=SimpleStrategy
  48. #cassandra.keyspace.replication=3
  49. # hbase backend config
  50. #hbase.hosts=localhost
  51. #hbase.port=2181
  52. #hbase.znode_parent=/hbase
  53. #hbase.threads_max=64
  54. # mysql backend config
  55. #jdbc.driver=com.mysql.jdbc.Driver
  56. #jdbc.url=jdbc:mysql://127.0.0.1:3306
  57. #jdbc.username=root
  58. #jdbc.password=
  59. #jdbc.reconnect_max_times=3
  60. #jdbc.reconnect_interval=3
  61. #jdbc.ssl_mode=false
  62. # postgresql & cockroachdb backend config
  63. #jdbc.driver=org.postgresql.Driver
  64. #jdbc.url=jdbc:postgresql://localhost:5432/
  65. #jdbc.username=postgres
  66. #jdbc.password=
  67. # palo backend config
  68. #palo.host=127.0.0.1
  69. #palo.poll_interval=10
  70. #palo.temp_dir=./palo-data
  71. #palo.file_limit_size=32

Pay attention to the following uncommented items:

  • gremlin.graph: The entry point for GremlinServer startup. Users should not modify this item.
  • backend: The backend storage used, with options including memory, cassandra, scylladb, mysql, hbase, postgresql, and rocksdb.
  • serializer: Mainly for internal use, used to serialize schema, vertices, and edges to the backend. The corresponding options are text, cassandra, scylladb, and binary (Note: The rocksdb backend should have a value of binary, while for other backends, the values of backend and serializer should remain consistent. For example, for the hbase backend, the value should be hbase).
  • store: The name of the database used for storing the graph in the backend. In Cassandra and ScyllaDB, it corresponds to the keyspace name. The value of this item is unrelated to the graph name in GremlinServer and RestServer, but for clarity, it is recommended to use the same name.
  • cassandra.host: This item is only meaningful when the backend is set to cassandra or scylladb. It specifies the seeds of the Cassandra/ScyllaDB cluster.
  • cassandra.port: This item is only meaningful when the backend is set to cassandra or scylladb. It specifies the native port of the Cassandra/ScyllaDB cluster.
  • rocksdb.data_path: This item is only meaningful when the backend is set to rocksdb. It specifies the data directory for RocksDB.
  • rocksdb.wal_path: This item is only meaningful when the backend is set to rocksdb. It specifies the log directory for RocksDB.
  • admin.token: A token used to retrieve server configuration information. For example: http://localhost:8080/graphs/hugegraph/conf?token=162f7848-0b6d-4faf-b557-3a0797869c55

5. Multi-Graph Configuration

Our system can have multiple graphs, and the backend of each graph can be different, such as hugegraph_rocksdb and hugegraph_mysql, where hugegraph_rocksdb uses RocksDB as the backend, and hugegraph_mysql uses MySQL as a backend.

The configuration method is simple:

[Optional]: Modify rest-server.properties

You can modify the graph profile directory in the graphs option of rest-server.properties. The default configuration is graphs=./conf/graphs, if you want to change it to another directory then adjust the graphs option, e.g. adjust it to graphs=/etc/hugegraph/graphs, example is as follows:

  1. graphs=./conf/graphs

Modify hugegraph_mysql_backend.properties and hugegraph_rocksdb_backend.properties based on hugegraph.properties under conf/graphs path

The modified part of hugegraph_mysql_backend.properties is as follows:

  1. backend=mysql
  2. serializer=mysql
  3. store=hugegraph_mysql
  4. # mysql backend config
  5. jdbc.driver=com.mysql.cj.jdbc.Driver
  6. jdbc.url=jdbc:mysql://127.0.0.1:3306
  7. jdbc.username=root
  8. jdbc.password=123456
  9. jdbc.reconnect_max_times=3
  10. jdbc.reconnect_interval=3
  11. jdbc.ssl_mode=false

The modified part of hugegraph_rocksdb_backend.properties is as follows:

  1. backend=rocksdb
  2. serializer=binary
  3. store=hugegraph_rocksdb

Stop the server, execute init-store.sh (to create a new database for the new graph), and restart the server.

  1. $ ./bin/stop-hugegraph.sh
  1. $ ./bin/init-store.sh
  2. Initializing HugeGraph Store...
  3. 2023-06-11 14:16:14 [main] [INFO] o.a.h.u.ConfigUtil - Scanning option 'graphs' directory './conf/graphs'
  4. 2023-06-11 14:16:14 [main] [INFO] o.a.h.c.InitStore - Init graph with config file: ./conf/graphs/hugegraph_rocksdb_backend.properties
  5. ...
  6. 2023-06-11 14:16:15 [main] [INFO] o.a.h.StandardHugeGraph - Graph 'hugegraph_rocksdb' has been initialized
  7. 2023-06-11 14:16:15 [main] [INFO] o.a.h.c.InitStore - Init graph with config file: ./conf/graphs/hugegraph_mysql_backend.properties
  8. ...
  9. 2023-06-11 14:16:16 [main] [INFO] o.a.h.StandardHugeGraph - Graph 'hugegraph_mysql' has been initialized
  10. 2023-06-11 14:16:16 [main] [INFO] o.a.h.StandardHugeGraph - Close graph standardhugegraph[hugegraph_rocksdb]
  11. ...
  12. 2023-06-11 14:16:16 [main] [INFO] o.a.h.HugeFactory - HugeFactory shutdown
  13. 2023-06-11 14:16:16 [hugegraph-shutdown] [INFO] o.a.h.HugeFactory - HugeGraph is shutting down
  14. Initialization finished.
  1. $ ./bin/start-hugegraph.sh
  2. Starting HugeGraphServer...
  3. Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)...OK
  4. Started [pid 21614]

Check out created graphs:

  1. curl http://127.0.0.1:8080/graphs/
  2. {"graphs":["hugegraph_rocksdb","hugegraph_mysql"]}

Get details of the graph

  1. curl http://127.0.0.1:8080/graphs/hugegraph_mysql_backend
  2. {"name":"hugegraph_mysql","backend":"mysql"}
  1. curl http://127.0.0.1:8080/graphs/hugegraph_rocksdb_backend
  2. {"name":"hugegraph_rocksdb","backend":"rocksdb"}

Last modified November 1, 2023: doc: optimize description about preload, init-store and others (#293) (62101543)