String Processing

RegexMatch

Usage

The function is used to fetch matched contents from text with given regular expression.

Name: REGEXMATCH

Input Series: Only support a single input series. The data type is TEXT.

Parameter:

  • regex: The regular expression to match in the text. All grammars supported by Java are acceptable, for example, \d+\.\d+\.\d+\.\d+ is expected to match any IPv4 addresses.
  • group: The wanted group index in the matched result. Reference to java.util.regex, group 0 is the whole pattern and the next ones are numbered with the appearance order of left parentheses. For example, the groups in A(B(CD)) are: 0-A(B(CD)), 1-B(CD), 2-CD.

Output Series: Output a single series. The type is TEXT.

Note: Those points with null values or not matched with the given pattern will not return any results.

Examples

Input series:

  1. +-----------------------------+-------------------------------+
  2. | Time| root.test.d1.s1|
  3. +-----------------------------+-------------------------------+
  4. |2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]|
  5. |2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]|
  6. |2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]|
  7. |2021-01-01T00:00:04.000+08:00| [192.168.0.5] [SUCCESS]|
  8. |2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]|
  9. +-----------------------------+-------------------------------+

SQL for query:

  1. select regexmatch(s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0") from root.test.d1

Output series:

  1. +-----------------------------+----------------------------------------------------------------------+
  2. | Time|regexmatch(root.test.d1.s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0")|
  3. +-----------------------------+----------------------------------------------------------------------+
  4. |2021-01-01T00:00:01.000+08:00| 192.168.0.1|
  5. |2021-01-01T00:00:02.000+08:00| 192.168.0.24|
  6. |2021-01-01T00:00:03.000+08:00| 192.168.0.2|
  7. |2021-01-01T00:00:04.000+08:00| 192.168.0.5|
  8. |2021-01-01T00:00:05.000+08:00| 192.168.0.124|
  9. +-----------------------------+----------------------------------------------------------------------+

RegexReplace

Usage

The function is used to replace the specific regular expression matches with given string.

Name: REGEXREPLACE

Input Series: Only support a single input series. The data type is TEXT.

Parameter:

  • regex: The target regular expression to be replaced. All grammars supported by Java are acceptable.
  • replace: The string to be put on and back reference notes in Java is also supported, for example, ‘$1’ refers to group 1 in the regex which will be filled with corresponding matched results.
  • limit: The number of matches to be replaced which should be an integer no less than -1, default to -1 which means all matches will be replaced.
  • offset: The number of matches to be skipped, which means the first offset matches will not be replaced, default to 0.
  • reverse: Whether to count all the matches reversely, default to ‘false’.

Output Series: Output a single series. The type is TEXT.

Examples

Input series:

  1. +-----------------------------+-------------------------------+
  2. | Time| root.test.d1.s1|
  3. +-----------------------------+-------------------------------+
  4. |2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]|
  5. |2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]|
  6. |2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]|
  7. |2021-01-01T00:00:04.000+08:00| [192.168.0.5] [SUCCESS]|
  8. |2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]|
  9. +-----------------------------+-------------------------------+

SQL for query:

  1. select regexreplace(s1, "regex"="192\.168\.0\.(\d+)", "replace"="cluster-$1", "limit"="1") from root.test.d1

Output series:

  1. +-----------------------------+-----------------------------------------------------------+
  2. | Time|regexreplace(root.test.d1.s1, "regex"="192\.168\.0\.(\d+)",|
  3. | | "replace"="cluster-$1", "limit"="1")|
  4. +-----------------------------+-----------------------------------------------------------+
  5. |2021-01-01T00:00:01.000+08:00| [cluster-1] [SUCCESS]|
  6. |2021-01-01T00:00:02.000+08:00| [cluster-24] [SUCCESS]|
  7. |2021-01-01T00:00:03.000+08:00| [cluster-2] [FAIL]|
  8. |2021-01-01T00:00:04.000+08:00| [cluster-5] [SUCCESS]|
  9. |2021-01-01T00:00:05.000+08:00| [cluster-124] [SUCCESS]|
  10. +-----------------------------+-----------------------------------------------------------+

RegexSplit

Usage

The function is used to split text with given regular expression and return specific element.

Name: REGEXSPLIT

Input Series: Only support a single input series. The data type is TEXT.

Parameter:

  • regex: The regular expression used to split the text. All grammars supported by Java are acceptable, for example, ['"] is expected to match ' and ".
  • index: The wanted index of elements in the split result. It should be an integer no less than -1, default to -1 which means the length of the result array is returned and any non-negative integer is used to fetch the text of the specific index starting from 0.

Output Series: Output a single series. The type is INT32 when index is -1 and TEXT when it’s an valid index.

Note: When index is out of the range of the result array, for example 0,1,2 split with , and index is set to 3, no result are returned for that record.

Examples

Input series:

  1. +-----------------------------+---------------+
  2. | Time|root.test.d1.s1|
  3. +-----------------------------+---------------+
  4. |2021-01-01T00:00:01.000+08:00| A,B,A+,B-|
  5. |2021-01-01T00:00:02.000+08:00| A,A+,A,B+|
  6. |2021-01-01T00:00:03.000+08:00| B+,B,B|
  7. |2021-01-01T00:00:04.000+08:00| A+,A,A+,A|
  8. |2021-01-01T00:00:05.000+08:00| A,B-,B,B|
  9. +-----------------------------+---------------+

SQL for query:

  1. select regexsplit(s1, "regex"=",", "index"="-1") from root.test.d1

Output series:

  1. +-----------------------------+------------------------------------------------------+
  2. | Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="-1")|
  3. +-----------------------------+------------------------------------------------------+
  4. |2021-01-01T00:00:01.000+08:00| 4|
  5. |2021-01-01T00:00:02.000+08:00| 4|
  6. |2021-01-01T00:00:03.000+08:00| 3|
  7. |2021-01-01T00:00:04.000+08:00| 4|
  8. |2021-01-01T00:00:05.000+08:00| 4|
  9. +-----------------------------+------------------------------------------------------+

Another SQL for query:

SQL for query:

  1. select regexsplit(s1, "regex"=",", "index"="3") from root.test.d1

Output series:

  1. +-----------------------------+-----------------------------------------------------+
  2. | Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="3")|
  3. +-----------------------------+-----------------------------------------------------+
  4. |2021-01-01T00:00:01.000+08:00| B-|
  5. |2021-01-01T00:00:02.000+08:00| B+|
  6. |2021-01-01T00:00:04.000+08:00| A|
  7. |2021-01-01T00:00:05.000+08:00| B|
  8. +-----------------------------+-----------------------------------------------------+

StrReplace

Usage

The function is used to replace the specific substring with given string.

Name: STRREPLACE

Input Series: Only support a single input series. The data type is TEXT.

Parameter:

  • target: The target substring to be replaced.
  • replace: The string to be put on.
  • limit: The number of matches to be replaced which should be an integer no less than -1, default to -1 which means all matches will be replaced.
  • offset: The number of matches to be skipped, which means the first offset matches will not be replaced, default to 0.
  • reverse: Whether to count all the matches reversely, default to ‘false’.

Output Series: Output a single series. The type is TEXT.

Examples

Input series:

  1. +-----------------------------+---------------+
  2. | Time|root.test.d1.s1|
  3. +-----------------------------+---------------+
  4. |2021-01-01T00:00:01.000+08:00| A,B,A+,B-|
  5. |2021-01-01T00:00:02.000+08:00| A,A+,A,B+|
  6. |2021-01-01T00:00:03.000+08:00| B+,B,B|
  7. |2021-01-01T00:00:04.000+08:00| A+,A,A+,A|
  8. |2021-01-01T00:00:05.000+08:00| A,B-,B,B|
  9. +-----------------------------+---------------+

SQL for query:

  1. select strreplace(s1, "target"=",", "replace"="/", "limit"="2") from root.test.d1

Output series:

  1. +-----------------------------+-----------------------------------------+
  2. | Time|strreplace(root.test.d1.s1, "target"=",",|
  3. | | "replace"="/", "limit"="2")|
  4. +-----------------------------+-----------------------------------------+
  5. |2021-01-01T00:00:01.000+08:00| A/B/A+,B-|
  6. |2021-01-01T00:00:02.000+08:00| A/A+/A,B+|
  7. |2021-01-01T00:00:03.000+08:00| B+/B/B|
  8. |2021-01-01T00:00:04.000+08:00| A+/A/A+,A|
  9. |2021-01-01T00:00:05.000+08:00| A/B-/B,B|
  10. +-----------------------------+-----------------------------------------+

Another SQL for query:

  1. select strreplace(s1, "target"=",", "replace"="/", "limit"="1", "offset"="1", "reverse"="true") from root.test.d1

Output series:

  1. +-----------------------------+-----------------------------------------------------+
  2. | Time|strreplace(root.test.d1.s1, "target"=",", "replace"= |
  3. | | "|", "limit"="1", "offset"="1", "reverse"="true")|
  4. +-----------------------------+-----------------------------------------------------+
  5. |2021-01-01T00:00:01.000+08:00| A,B/A+,B-|
  6. |2021-01-01T00:00:02.000+08:00| A,A+/A,B+|
  7. |2021-01-01T00:00:03.000+08:00| B+/B,B|
  8. |2021-01-01T00:00:04.000+08:00| A+,A/A+,A|
  9. |2021-01-01T00:00:05.000+08:00| A,B-/B,B|
  10. +-----------------------------+-----------------------------------------------------+