Neo4j Import
The Neo4j Import transform runs an import command using the provided CSV files.
Check the neo4j-admin-import docs for full details.
Option | Default | Header |
---|---|---|
Transform name | the name to use for this transform in the pipeline | |
Filename field | the field to get the file name to import from | |
Fiel type field | the field to get the file type to import from | |
Database filename | neo4j | the Neo4j database to import to |
neo4j-admin command path | neo4j-admin | the (full) path to the |
Base folder (below import/ folder) | the folder to read the import files from | |
Verbose output | Enable verbose output. | |
High IO | true | Ignore environment-based heuristics, and specify whether the target storage subsystem can support parallel IO with high throughput. Typically this is true for SSDs, large raid arrays and network-attached storage. |
Cache on heap? | false | Determines whether or not to allow allocating memory for the cache on heap. If false, then caches will still be allocated off-heap, but the additional free memory inside the JVM will not be allocated for the caches. Use this to have better control over the heap memory. |
Ignore Empty Strings | false | Determines whether or not empty string fields, such as “”, from input source are ignored (treated as null). |
Ignore extra columns? | false | If unspecified columns should be ignored during the import. |
Legacy style quoting? | false | Determines whether or not backslash-escaped quote e.g. \” is interpreted as inner quote. |
Fields can have multi-line data? | false | Determines whether or not fields from input source can span multiple lines, i.e. contain newline characters. Setting |
Normalize types? | false | Determines whether or not to normalize property types to Cypher types, e.g. int becomes long and float becomes double. |
Skip logging bad entries during import? | Determines whether or not to skip logging bad entries detected during import. | |
Skip bad relationships? | false | Determines whether or not to skip importing relationships that refer to missing node IDs, i.e. either start or end node ID/group referring to node that was not specified by the node input data. Skipped relationships will be logged, containing at most the number of entities specified by |
Skip duplicate nodes? | false | Determines whether or not to skip importing nodes that have the same ID/group. In the event of multiple nodes within the same group having the same ID, the first encountered will be imported, whereas consecutive such nodes will be skipped. Skipped nodes will be logged, containing at most the number of entities specified by |
Trim strings? | false | Determines whether or not strings should be trimmed for whitespaces. |
Bad tolerance | 1000 | Number of bad entries before the import is considered failed. This tolerance threshold is about relationships referring to missing nodes. Format errors in input data are still treated as errors. |
Max memory | false | Maximum memory that neo4j-admin can use for various data structures and caching to improve performance. Values can be plain numbers such as 10000000, or 20G for 20 gigabyte. It can also be specified as a percentage of the available memory, for example 70%. |
Read buffer size | 4M | Size of each buffer for reading input data. It has to at least be large enough to hold the biggest single value in the input data. Value can be a plain number or byte units string, e.g. 128k, 1m. |
Processors | 90% | Max number of processors used by the importer. Defaults to the number of available processors reported by the JVM. There is a certain amount of minimum threads needed, so for that reason there is no lower bound for this value. For optimal performance, this value shouldn’t be greater than the number of available processors. |