Data Profiling

ACF

Usage

This function is used to calculate the auto-correlation factor of the input time series, which equals to cross correlation between the same series. For more information, please refer to XCorr function.

Name: ACF

Input Series: Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE.

Output Series: Output a single series. The type is DOUBLE. There are 2N−12N-12N−1 data points in the series, and the values are interpreted in details in XCorr function.

Note:

  • null and NaN values in the input series will be ignored and treated as 0.

Examples

Input series:

  1. +-----------------------------+---------------+
  2. | Time|root.test.d1.s1|
  3. +-----------------------------+---------------+
  4. |2020-01-01T00:00:01.000+08:00| 1|
  5. |2020-01-01T00:00:02.000+08:00| null|
  6. |2020-01-01T00:00:03.000+08:00| 3|
  7. |2020-01-01T00:00:04.000+08:00| NaN|
  8. |2020-01-01T00:00:05.000+08:00| 5|
  9. +-----------------------------+---------------+

SQL for query:

  1. select acf(s1) from root.test.d1 where time <= 2020-01-01 00:00:05

Output series:

  1. +-----------------------------+--------------------+
  2. | Time|acf(root.test.d1.s1)|
  3. +-----------------------------+--------------------+
  4. |1970-01-01T08:00:00.001+08:00| 1.0|
  5. |1970-01-01T08:00:00.002+08:00| 0.0|
  6. |1970-01-01T08:00:00.003+08:00| 3.6|
  7. |1970-01-01T08:00:00.004+08:00| 0.0|
  8. |1970-01-01T08:00:00.005+08:00| 7.0|
  9. |1970-01-01T08:00:00.006+08:00| 0.0|
  10. |1970-01-01T08:00:00.007+08:00| 3.6|
  11. |1970-01-01T08:00:00.008+08:00| 0.0|
  12. |1970-01-01T08:00:00.009+08:00| 1.0|
  13. +-----------------------------+--------------------+

Distinct

Usage

This function returns all unique values in time series.

Name: DISTINCT

Input Series: Only support a single input series. The type is arbitrary.

Output Series: Output a single series. The type is the same as the input.

Note:

  • The timestamp of the output series is meaningless. The output order is arbitrary.
  • Missing points and null points in the input series will be ignored, but NaN will not.
  • Case Sensitive.

Examples

Input series:

  1. +-----------------------------+---------------+
  2. | Time|root.test.d2.s2|
  3. +-----------------------------+---------------+
  4. |2020-01-01T08:00:00.001+08:00| Hello|
  5. |2020-01-01T08:00:00.002+08:00| hello|
  6. |2020-01-01T08:00:00.003+08:00| Hello|
  7. |2020-01-01T08:00:00.004+08:00| World|
  8. |2020-01-01T08:00:00.005+08:00| World|
  9. +-----------------------------+---------------+

SQL for query:

  1. select distinct(s2) from root.test.d2

Output series:

  1. +-----------------------------+-------------------------+
  2. | Time|distinct(root.test.d2.s2)|
  3. +-----------------------------+-------------------------+
  4. |1970-01-01T08:00:00.001+08:00| Hello|
  5. |1970-01-01T08:00:00.002+08:00| hello|
  6. |1970-01-01T08:00:00.003+08:00| World|
  7. +-----------------------------+-------------------------+

Histogram

Usage

This function is used to calculate the distribution histogram of a single column of numerical data.

Name: HISTOGRAM

Input Series: Only supports a single input sequence, the type is INT32 / INT64 / FLOAT / DOUBLE

Parameters:

  • min: The lower limit of the requested data range, the default value is -Double.MAX_VALUE.
  • max: The upper limit of the requested data range, the default value is Double.MAX_VALUE, and the value of start must be less than or equal to end.
  • count: The number of buckets of the histogram, the default value is 1. It must be a positive integer.

Output Series: The value of the bucket of the histogram, where the lower bound represented by the i-th bucket (index starts from 1) is min+(i−1)⋅max−mincountmin+ (i-1)\cdot\frac{max-min}{count}min+(i−1)⋅countmax−min​ and the upper bound is min+i⋅max−mincountmin + i \cdot \frac{max-min}{count}min+i⋅countmax−min​.

Note:

  • If the value is lower than min, it will be put into the 1st bucket. If the value is larger than max, it will be put into the last bucket.
  • Missing points, null points and NaN in the input series will be ignored.

Examples

Input series:

  1. +-----------------------------+---------------+
  2. | Time|root.test.d1.s1|
  3. +-----------------------------+---------------+
  4. |2020-01-01T00:00:00.000+08:00| 1.0|
  5. |2020-01-01T00:00:01.000+08:00| 2.0|
  6. |2020-01-01T00:00:02.000+08:00| 3.0|
  7. |2020-01-01T00:00:03.000+08:00| 4.0|
  8. |2020-01-01T00:00:04.000+08:00| 5.0|
  9. |2020-01-01T00:00:05.000+08:00| 6.0|
  10. |2020-01-01T00:00:06.000+08:00| 7.0|
  11. |2020-01-01T00:00:07.000+08:00| 8.0|
  12. |2020-01-01T00:00:08.000+08:00| 9.0|
  13. |2020-01-01T00:00:09.000+08:00| 10.0|
  14. |2020-01-01T00:00:10.000+08:00| 11.0|
  15. |2020-01-01T00:00:11.000+08:00| 12.0|
  16. |2020-01-01T00:00:12.000+08:00| 13.0|
  17. |2020-01-01T00:00:13.000+08:00| 14.0|
  18. |2020-01-01T00:00:14.000+08:00| 15.0|
  19. |2020-01-01T00:00:15.000+08:00| 16.0|
  20. |2020-01-01T00:00:16.000+08:00| 17.0|
  21. |2020-01-01T00:00:17.000+08:00| 18.0|
  22. |2020-01-01T00:00:18.000+08:00| 19.0|
  23. |2020-01-01T00:00:19.000+08:00| 20.0|
  24. +-----------------------------+---------------+

SQL for query:

  1. select histogram(s1,"min"="1","max"="20","count"="10") from root.test.d1

Output series:

  1. +-----------------------------+---------------------------------------------------------------+
  2. | Time|histogram(root.test.d1.s1, "min"="1", "max"="20", "count"="10")|
  3. +-----------------------------+---------------------------------------------------------------+
  4. |1970-01-01T08:00:00.000+08:00| 2|
  5. |1970-01-01T08:00:00.001+08:00| 2|
  6. |1970-01-01T08:00:00.002+08:00| 2|
  7. |1970-01-01T08:00:00.003+08:00| 2|
  8. |1970-01-01T08:00:00.004+08:00| 2|
  9. |1970-01-01T08:00:00.005+08:00| 2|
  10. |1970-01-01T08:00:00.006+08:00| 2|
  11. |1970-01-01T08:00:00.007+08:00| 2|
  12. |1970-01-01T08:00:00.008+08:00| 2|
  13. |1970-01-01T08:00:00.009+08:00| 2|
  14. +-----------------------------+---------------------------------------------------------------+

Integral

Usage

This function is used to calculate the integration of time series, which equals to the area under the curve with time as X-axis and values as Y-axis.

Name: INTEGRAL

Input Series: Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE.

Parameters:

  • unit: The unit of time used when computing the integral. The value should be chosen from “1S”, “1s”, “1m”, “1H”, “1d”(case-sensitive), and each represents taking one millisecond / second / minute / hour / day as 1.0 while calculating the area and integral.

Output Series: Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the integration.

Note:

  • The integral value equals to the sum of the areas of right-angled trapezoids consisting of each two adjacent points and the time-axis. Choosing different unit implies different scaling of time axis, thus making it apparent to convert the value among those results with constant coefficient.

  • NaN values in the input series will be ignored. The curve or trapezoids will skip these points and use the next valid point.

Examples

Default Parameters

With default parameters, this function will take one second as 1.0.

Input series:

  1. +-----------------------------+---------------+
  2. | Time|root.test.d1.s1|
  3. +-----------------------------+---------------+
  4. |2020-01-01T00:00:01.000+08:00| 1|
  5. |2020-01-01T00:00:02.000+08:00| 2|
  6. |2020-01-01T00:00:03.000+08:00| 5|
  7. |2020-01-01T00:00:04.000+08:00| 6|
  8. |2020-01-01T00:00:05.000+08:00| 7|
  9. |2020-01-01T00:00:08.000+08:00| 8|
  10. |2020-01-01T00:00:09.000+08:00| NaN|
  11. |2020-01-01T00:00:10.000+08:00| 10|
  12. +-----------------------------+---------------+

SQL for query:

  1. select integral(s1) from root.test.d1 where time <= 2020-01-01 00:00:10

Output series:

  1. +-----------------------------+-------------------------+
  2. | Time|integral(root.test.d1.s1)|
  3. +-----------------------------+-------------------------+
  4. |1970-01-01T08:00:00.000+08:00| 57.5|
  5. +-----------------------------+-------------------------+

Calculation expression:

12[(1+2)×1+(2+5)×1+(5+6)×1+(6+7)×1+(7+8)×3+(8+10)×2]\=57.5 \frac{1}{2}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 57.5 21​[(1+2)×1+(2+5)×1+(5+6)×1+(6+7)×1+(7+8)×3+(8+10)×2]\=57.5

Specific time unit

With time unit specified as “1m”, this function will take one minute as 1.0.

Input series is the same as above, the SQL for query is shown below:

  1. select integral(s1, "unit"="1m") from root.test.d1 where time <= 2020-01-01 00:00:10

Output series:

  1. +-----------------------------+-------------------------+
  2. | Time|integral(root.test.d1.s1)|
  3. +-----------------------------+-------------------------+
  4. |1970-01-01T08:00:00.000+08:00| 0.958|
  5. +-----------------------------+-------------------------+

Calculation expression:

12×60[(1+2)×1+(2+5)×1+(5+6)×1+(6+7)×1+(7+8)×3+(8+10)×2]\=0.958 \frac{1}{2\times 60}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 0.958 2×601​[(1+2)×1+(2+5)×1+(5+6)×1+(6+7)×1+(7+8)×3+(8+10)×2]\=0.958

IntegralAvg

Usage

This function is used to calculate the function average of time series. The output equals to the area divided by the time interval using the same time unit. For more information of the area under the curve, please refer to Integral function.

Name: INTEGRALAVG

Input Series: Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE.

Output Series: Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the time-weighted average.

Note:

  • The time-weighted value equals to the integral value with any unit divided by the time interval of input series. The result is irrelevant to the time unit used in integral, and it’s consistent with the timestamp precision of IoTDB by default.

  • NaN values in the input series will be ignored. The curve or trapezoids will skip these points and use the next valid point.

  • If the input series is empty, the output value will be 0.0, but if there is only one data point, the value will equal to the input value.

Examples

Input series:

  1. +-----------------------------+---------------+
  2. | Time|root.test.d1.s1|
  3. +-----------------------------+---------------+
  4. |2020-01-01T00:00:01.000+08:00| 1|
  5. |2020-01-01T00:00:02.000+08:00| 2|
  6. |2020-01-01T00:00:03.000+08:00| 5|
  7. |2020-01-01T00:00:04.000+08:00| 6|
  8. |2020-01-01T00:00:05.000+08:00| 7|
  9. |2020-01-01T00:00:08.000+08:00| 8|
  10. |2020-01-01T00:00:09.000+08:00| NaN|
  11. |2020-01-01T00:00:10.000+08:00| 10|
  12. +-----------------------------+---------------+

SQL for query:

  1. select integralavg(s1) from root.test.d1 where time <= 2020-01-01 00:00:10

Output series:

  1. +-----------------------------+----------------------------+
  2. | Time|integralavg(root.test.d1.s1)|
  3. +-----------------------------+----------------------------+
  4. |1970-01-01T08:00:00.000+08:00| 5.75|
  5. +-----------------------------+----------------------------+

Calculation expression:

12[(1+2)×1+(2+5)×1+(5+6)×1+(6+7)×1+(7+8)×3+(8+10)×2]/10\=5.75 \frac{1}{2}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] / 10 = 5.75 21​[(1+2)×1+(2+5)×1+(5+6)×1+(6+7)×1+(7+8)×3+(8+10)×2]/10\=5.75

Mad

Usage

The function is used to compute the exact or approximate median absolute deviation (MAD) of a numeric time series. MAD is the median of the deviation of each element from the elements’ median.

Take a dataset {1,3,3,5,5,6,7,8,9}\{1,3,3,5,5,6,7,8,9\}{1,3,3,5,5,6,7,8,9} as an instance. Its median is 5 and the deviation of each element from the median is {0,0,1,2,2,2,3,4,4}\{0,0,1,2,2,2,3,4,4\}{0,0,1,2,2,2,3,4,4}, whose median is 2. Therefore, the MAD of the original dataset is 2.

Name: MAD

Input Series: Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE.

Parameter:

  • error: The relative error of the approximate MAD. It should be within [0,1) and the default value is 0. Taking error\=0.01 as an instance, suppose the exact MAD is aaa and the approximate MAD is bbb, we have 0.99a≤b≤1.01a0.99a \le b \le 1.01a0.99a≤b≤1.01a. With error\=0, the output is the exact MAD.

Output Series: Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the MAD.

Note: Missing points, null points and NaN in the input series will be ignored.

Examples

Exact Query

With the default error(error\=0), the function queries the exact MAD.

Input series:

  1. +-----------------------------+------------+
  2. | Time|root.test.s0|
  3. +-----------------------------+------------+
  4. |2021-03-17T10:32:17.054+08:00| 0.5319929|
  5. |2021-03-17T10:32:18.054+08:00| 0.9304316|
  6. |2021-03-17T10:32:19.054+08:00| -1.4800133|
  7. |2021-03-17T10:32:20.054+08:00| 0.6114087|
  8. |2021-03-17T10:32:21.054+08:00| 2.5163336|
  9. |2021-03-17T10:32:22.054+08:00| -1.0845392|
  10. |2021-03-17T10:32:23.054+08:00| 1.0562582|
  11. |2021-03-17T10:32:24.054+08:00| 1.3867859|
  12. |2021-03-17T10:32:25.054+08:00| -0.45429882|
  13. |2021-03-17T10:32:26.054+08:00| 1.0353678|
  14. |2021-03-17T10:32:27.054+08:00| 0.7307929|
  15. |2021-03-17T10:32:28.054+08:00| 2.3167255|
  16. |2021-03-17T10:32:29.054+08:00| 2.342443|
  17. |2021-03-17T10:32:30.054+08:00| 1.5809103|
  18. |2021-03-17T10:32:31.054+08:00| 1.4829416|
  19. |2021-03-17T10:32:32.054+08:00| 1.5800357|
  20. |2021-03-17T10:32:33.054+08:00| 0.7124368|
  21. |2021-03-17T10:32:34.054+08:00| -0.78597564|
  22. |2021-03-17T10:32:35.054+08:00| 1.2058644|
  23. |2021-03-17T10:32:36.054+08:00| 1.4215064|
  24. |2021-03-17T10:32:37.054+08:00| 1.2808295|
  25. |2021-03-17T10:32:38.054+08:00| -0.6173715|
  26. |2021-03-17T10:32:39.054+08:00| 0.06644377|
  27. |2021-03-17T10:32:40.054+08:00| 2.349338|
  28. |2021-03-17T10:32:41.054+08:00| 1.7335888|
  29. |2021-03-17T10:32:42.054+08:00| 1.5872132|
  30. ............
  31. Total line number = 10000

SQL for query:

  1. select mad(s0) from root.test

Output series:

  1. +-----------------------------+------------------+
  2. | Time| mad(root.test.s0)|
  3. +-----------------------------+------------------+
  4. |1970-01-01T08:00:00.000+08:00|0.6806197166442871|
  5. +-----------------------------+------------------+

Approximate Query

By setting error within (0,1), the function queries the approximate MAD.

SQL for query:

  1. select mad(s0, "error"="0.01") from root.test

Output series:

  1. +-----------------------------+---------------------------------+
  2. | Time|mad(root.test.s0, "error"="0.01")|
  3. +-----------------------------+---------------------------------+
  4. |1970-01-01T08:00:00.000+08:00| 0.6806616245859518|
  5. +-----------------------------+---------------------------------+

Median

Usage

The function is used to compute the exact or approximate median of a numeric time series. Median is the value separating the higher half from the lower half of a data sample.

Name: MEDIAN

Input Series: Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE.

Parameter:

  • error: The rank error of the approximate median. It should be within [0,1) and the default value is 0. For instance, a median with error\=0.01 is the value of the element with rank percentage 0.49~0.51. With error\=0, the output is the exact median.

Output Series: Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the median.

Examples

Input series:

  1. +-----------------------------+------------+
  2. | Time|root.test.s0|
  3. +-----------------------------+------------+
  4. |2021-03-17T10:32:17.054+08:00| 0.5319929|
  5. |2021-03-17T10:32:18.054+08:00| 0.9304316|
  6. |2021-03-17T10:32:19.054+08:00| -1.4800133|
  7. |2021-03-17T10:32:20.054+08:00| 0.6114087|
  8. |2021-03-17T10:32:21.054+08:00| 2.5163336|
  9. |2021-03-17T10:32:22.054+08:00| -1.0845392|
  10. |2021-03-17T10:32:23.054+08:00| 1.0562582|
  11. |2021-03-17T10:32:24.054+08:00| 1.3867859|
  12. |2021-03-17T10:32:25.054+08:00| -0.45429882|
  13. |2021-03-17T10:32:26.054+08:00| 1.0353678|
  14. |2021-03-17T10:32:27.054+08:00| 0.7307929|
  15. |2021-03-17T10:32:28.054+08:00| 2.3167255|
  16. |2021-03-17T10:32:29.054+08:00| 2.342443|
  17. |2021-03-17T10:32:30.054+08:00| 1.5809103|
  18. |2021-03-17T10:32:31.054+08:00| 1.4829416|
  19. |2021-03-17T10:32:32.054+08:00| 1.5800357|
  20. |2021-03-17T10:32:33.054+08:00| 0.7124368|
  21. |2021-03-17T10:32:34.054+08:00| -0.78597564|
  22. |2021-03-17T10:32:35.054+08:00| 1.2058644|
  23. |2021-03-17T10:32:36.054+08:00| 1.4215064|
  24. |2021-03-17T10:32:37.054+08:00| 1.2808295|
  25. |2021-03-17T10:32:38.054+08:00| -0.6173715|
  26. |2021-03-17T10:32:39.054+08:00| 0.06644377|
  27. |2021-03-17T10:32:40.054+08:00| 2.349338|
  28. |2021-03-17T10:32:41.054+08:00| 1.7335888|
  29. |2021-03-17T10:32:42.054+08:00| 1.5872132|
  30. ............
  31. Total line number = 10000

SQL for query:

  1. select median(s0, "error"="0.01") from root.test

Output series:

  1. +-----------------------------+------------------------------------+
  2. | Time|median(root.test.s0, "error"="0.01")|
  3. +-----------------------------+------------------------------------+
  4. |1970-01-01T08:00:00.000+08:00| 1.021884560585022|
  5. +-----------------------------+------------------------------------+

MinMax

Usage

This function is used to standardize the input series with min-max. Minimum value is transformed to 0; maximum value is transformed to 1.

Name: MINMAX

Input Series: Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.

  • compute: When set to “batch”, anomaly test is conducted after importing all data points; when set to “stream”, it is required to provide minimum and maximum values. The default method is “batch”.
  • min: The maximum value when method is set to “stream”.
  • max: The minimum value when method is set to “stream”.

Output Series: Output a single series. The type is DOUBLE.

Examples

Batch computing

Input series:

  1. +-----------------------------+------------+
  2. | Time|root.test.s1|
  3. +-----------------------------+------------+
  4. |1970-01-01T08:00:00.100+08:00| 0.0|
  5. |1970-01-01T08:00:00.200+08:00| 0.0|
  6. |1970-01-01T08:00:00.300+08:00| 1.0|
  7. |1970-01-01T08:00:00.400+08:00| -1.0|
  8. |1970-01-01T08:00:00.500+08:00| 0.0|
  9. |1970-01-01T08:00:00.600+08:00| 0.0|
  10. |1970-01-01T08:00:00.700+08:00| -2.0|
  11. |1970-01-01T08:00:00.800+08:00| 2.0|
  12. |1970-01-01T08:00:00.900+08:00| 0.0|
  13. |1970-01-01T08:00:01.000+08:00| 0.0|
  14. |1970-01-01T08:00:01.100+08:00| 1.0|
  15. |1970-01-01T08:00:01.200+08:00| -1.0|
  16. |1970-01-01T08:00:01.300+08:00| -1.0|
  17. |1970-01-01T08:00:01.400+08:00| 1.0|
  18. |1970-01-01T08:00:01.500+08:00| 0.0|
  19. |1970-01-01T08:00:01.600+08:00| 0.0|
  20. |1970-01-01T08:00:01.700+08:00| 10.0|
  21. |1970-01-01T08:00:01.800+08:00| 2.0|
  22. |1970-01-01T08:00:01.900+08:00| -2.0|
  23. |1970-01-01T08:00:02.000+08:00| 0.0|
  24. +-----------------------------+------------+

SQL for query:

  1. select minmax(s1) from root.test

Output series:

  1. +-----------------------------+--------------------+
  2. | Time|minmax(root.test.s1)|
  3. +-----------------------------+--------------------+
  4. |1970-01-01T08:00:00.100+08:00| 0.16666666666666666|
  5. |1970-01-01T08:00:00.200+08:00| 0.16666666666666666|
  6. |1970-01-01T08:00:00.300+08:00| 0.25|
  7. |1970-01-01T08:00:00.400+08:00| 0.08333333333333333|
  8. |1970-01-01T08:00:00.500+08:00| 0.16666666666666666|
  9. |1970-01-01T08:00:00.600+08:00| 0.16666666666666666|
  10. |1970-01-01T08:00:00.700+08:00| 0.0|
  11. |1970-01-01T08:00:00.800+08:00| 0.3333333333333333|
  12. |1970-01-01T08:00:00.900+08:00| 0.16666666666666666|
  13. |1970-01-01T08:00:01.000+08:00| 0.16666666666666666|
  14. |1970-01-01T08:00:01.100+08:00| 0.25|
  15. |1970-01-01T08:00:01.200+08:00| 0.08333333333333333|
  16. |1970-01-01T08:00:01.300+08:00| 0.08333333333333333|
  17. |1970-01-01T08:00:01.400+08:00| 0.25|
  18. |1970-01-01T08:00:01.500+08:00| 0.16666666666666666|
  19. |1970-01-01T08:00:01.600+08:00| 0.16666666666666666|
  20. |1970-01-01T08:00:01.700+08:00| 1.0|
  21. |1970-01-01T08:00:01.800+08:00| 0.3333333333333333|
  22. |1970-01-01T08:00:01.900+08:00| 0.0|
  23. |1970-01-01T08:00:02.000+08:00| 0.16666666666666666|
  24. +-----------------------------+--------------------+

Mode

Usage

This function is used to calculate the mode of time series, that is, the value that occurs most frequently.

Name: MODE

Input Series: Only support a single input series. The type is arbitrary.

Output Series: Output a single series. The type is the same as the input. There is only one data point in the series, whose timestamp is the same as which the first mode value has and value is the mode.

Note:

  • If there are multiple values with the most occurrences, the arbitrary one will be output.
  • Missing points and null points in the input series will be ignored, but NaN will not.

Examples

Input series:

  1. +-----------------------------+---------------+
  2. | Time|root.test.d2.s2|
  3. +-----------------------------+---------------+
  4. |1970-01-01T08:00:00.001+08:00| Hello|
  5. |1970-01-01T08:00:00.002+08:00| hello|
  6. |1970-01-01T08:00:00.003+08:00| Hello|
  7. |1970-01-01T08:00:00.004+08:00| World|
  8. |1970-01-01T08:00:00.005+08:00| World|
  9. |1970-01-01T08:00:01.600+08:00| World|
  10. |1970-01-15T09:37:34.451+08:00| Hello|
  11. |1970-01-15T09:37:34.452+08:00| hello|
  12. |1970-01-15T09:37:34.453+08:00| Hello|
  13. |1970-01-15T09:37:34.454+08:00| World|
  14. |1970-01-15T09:37:34.455+08:00| World|
  15. +-----------------------------+---------------+

SQL for query:

  1. select mode(s2) from root.test.d2

Output series:

  1. +-----------------------------+---------------------+
  2. | Time|mode(root.test.d2.s2)|
  3. +-----------------------------+---------------------+
  4. |1970-01-01T08:00:00.004+08:00| World|
  5. +-----------------------------+---------------------+

MvAvg

Usage

This function is used to calculate moving average of input series.

Name: MVAVG

Input Series: Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.

  • window: Length of the moving window. Default value is 10.

Output Series: Output a single series. The type is DOUBLE.

Examples

Batch computing

Input series:

  1. +-----------------------------+------------+
  2. | Time|root.test.s1|
  3. +-----------------------------+------------+
  4. |1970-01-01T08:00:00.100+08:00| 0.0|
  5. |1970-01-01T08:00:00.200+08:00| 0.0|
  6. |1970-01-01T08:00:00.300+08:00| 1.0|
  7. |1970-01-01T08:00:00.400+08:00| -1.0|
  8. |1970-01-01T08:00:00.500+08:00| 0.0|
  9. |1970-01-01T08:00:00.600+08:00| 0.0|
  10. |1970-01-01T08:00:00.700+08:00| -2.0|
  11. |1970-01-01T08:00:00.800+08:00| 2.0|
  12. |1970-01-01T08:00:00.900+08:00| 0.0|
  13. |1970-01-01T08:00:01.000+08:00| 0.0|
  14. |1970-01-01T08:00:01.100+08:00| 1.0|
  15. |1970-01-01T08:00:01.200+08:00| -1.0|
  16. |1970-01-01T08:00:01.300+08:00| -1.0|
  17. |1970-01-01T08:00:01.400+08:00| 1.0|
  18. |1970-01-01T08:00:01.500+08:00| 0.0|
  19. |1970-01-01T08:00:01.600+08:00| 0.0|
  20. |1970-01-01T08:00:01.700+08:00| 10.0|
  21. |1970-01-01T08:00:01.800+08:00| 2.0|
  22. |1970-01-01T08:00:01.900+08:00| -2.0|
  23. |1970-01-01T08:00:02.000+08:00| 0.0|
  24. +-----------------------------+------------+

SQL for query:

  1. select mvavg(s1, "window"="3") from root.test

Output series:

  1. +-----------------------------+---------------------------------+
  2. | Time|mvavg(root.test.s1, "window"="3")|
  3. +-----------------------------+---------------------------------+
  4. |1970-01-01T08:00:00.300+08:00| 0.3333333333333333|
  5. |1970-01-01T08:00:00.400+08:00| 0.0|
  6. |1970-01-01T08:00:00.500+08:00| -0.3333333333333333|
  7. |1970-01-01T08:00:00.600+08:00| 0.0|
  8. |1970-01-01T08:00:00.700+08:00| -0.6666666666666666|
  9. |1970-01-01T08:00:00.800+08:00| 0.0|
  10. |1970-01-01T08:00:00.900+08:00| 0.6666666666666666|
  11. |1970-01-01T08:00:01.000+08:00| 0.0|
  12. |1970-01-01T08:00:01.100+08:00| 0.3333333333333333|
  13. |1970-01-01T08:00:01.200+08:00| 0.0|
  14. |1970-01-01T08:00:01.300+08:00| -0.6666666666666666|
  15. |1970-01-01T08:00:01.400+08:00| 0.0|
  16. |1970-01-01T08:00:01.500+08:00| 0.3333333333333333|
  17. |1970-01-01T08:00:01.600+08:00| 0.0|
  18. |1970-01-01T08:00:01.700+08:00| 3.3333333333333335|
  19. |1970-01-01T08:00:01.800+08:00| 4.0|
  20. |1970-01-01T08:00:01.900+08:00| 0.0|
  21. |1970-01-01T08:00:02.000+08:00| -0.6666666666666666|
  22. +-----------------------------+---------------------------------+

PACF

Usage

This function is used to calculate partial autocorrelation of input series by solving Yule-Walker equation. For some cases, the equation may not be solved, and NaN will be output.

Name: PACF

Input Series: Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.

  • lag: Maximum lag of pacf to calculate. The default value is min⁡(10log⁡10n,n−1)\min(10\log_{10}n,n-1)min(10log10​n,n−1), where nnn is the number of data points.

Output Series: Output a single series. The type is DOUBLE.

Examples

Assigning maximum lag

Input series:

  1. +-----------------------------+------------+
  2. | Time|root.test.s1|
  3. +-----------------------------+------------+
  4. |2019-12-27T00:00:00.000+08:00| 5.0|
  5. |2019-12-27T00:05:00.000+08:00| 5.0|
  6. |2019-12-27T00:10:00.000+08:00| 5.0|
  7. |2019-12-27T00:15:00.000+08:00| 5.0|
  8. |2019-12-27T00:20:00.000+08:00| 6.0|
  9. |2019-12-27T00:25:00.000+08:00| 5.0|
  10. |2019-12-27T00:30:00.000+08:00| 6.0|
  11. |2019-12-27T00:35:00.000+08:00| 6.0|
  12. |2019-12-27T00:40:00.000+08:00| 6.0|
  13. |2019-12-27T00:45:00.000+08:00| 6.0|
  14. |2019-12-27T00:50:00.000+08:00| 6.0|
  15. |2019-12-27T00:55:00.000+08:00| 5.982609|
  16. |2019-12-27T01:00:00.000+08:00| 5.9652176|
  17. |2019-12-27T01:05:00.000+08:00| 5.947826|
  18. |2019-12-27T01:10:00.000+08:00| 5.9304347|
  19. |2019-12-27T01:15:00.000+08:00| 5.9130435|
  20. |2019-12-27T01:20:00.000+08:00| 5.8956523|
  21. |2019-12-27T01:25:00.000+08:00| 5.878261|
  22. |2019-12-27T01:30:00.000+08:00| 5.8608694|
  23. |2019-12-27T01:35:00.000+08:00| 5.843478|
  24. ............
  25. Total line number = 18066

SQL for query:

  1. select pacf(s1, "lag"="5") from root.test

Output series:

  1. +-----------------------------+-----------------------------+
  2. | Time|pacf(root.test.s1, "lag"="5")|
  3. +-----------------------------+-----------------------------+
  4. |2019-12-27T00:00:00.000+08:00| 1.0|
  5. |2019-12-27T00:05:00.000+08:00| 0.3528915091942786|
  6. |2019-12-27T00:10:00.000+08:00| 0.1761346122516304|
  7. |2019-12-27T00:15:00.000+08:00| 0.1492391973294682|
  8. |2019-12-27T00:20:00.000+08:00| 0.03560059645868398|
  9. |2019-12-27T00:25:00.000+08:00| 0.0366222998995286|
  10. +-----------------------------+-----------------------------+

Percentile

Usage

The function is used to compute the exact or approximate percentile of a numeric time series. A percentile is value of element in the certain rank of the sorted series.

Name: PERCENTILE

Input Series: Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE.

Parameter:

  • rank: The rank percentage of the percentile. It should be (0,1] and the default value is 0.5. For instance, a percentile with rank\=0.5 is the median.
  • error: The rank error of the approximate percentile. It should be within [0,1) and the default value is 0. For instance, a 0.5-percentile with error\=0.01 is the value of the element with rank percentage 0.49~0.51. With error\=0, the output is the exact percentile.

Output Series: Output a single series. The type is the same as input series. If error\=0, there is only one data point in the series, whose timestamp is the same has which the first percentile value has, and value is the percentile, otherwise the timestamp of the only data point is 0.

Note: Missing points, null points and NaN in the input series will be ignored.

Examples

Input series:

  1. +-----------------------------+------------+
  2. | Time|root.test.s0|
  3. +-----------------------------+------------+
  4. |2021-03-17T10:32:17.054+08:00| 0.5319929|
  5. |2021-03-17T10:32:18.054+08:00| 0.9304316|
  6. |2021-03-17T10:32:19.054+08:00| -1.4800133|
  7. |2021-03-17T10:32:20.054+08:00| 0.6114087|
  8. |2021-03-17T10:32:21.054+08:00| 2.5163336|
  9. |2021-03-17T10:32:22.054+08:00| -1.0845392|
  10. |2021-03-17T10:32:23.054+08:00| 1.0562582|
  11. |2021-03-17T10:32:24.054+08:00| 1.3867859|
  12. |2021-03-17T10:32:25.054+08:00| -0.45429882|
  13. |2021-03-17T10:32:26.054+08:00| 1.0353678|
  14. |2021-03-17T10:32:27.054+08:00| 0.7307929|
  15. |2021-03-17T10:32:28.054+08:00| 2.3167255|
  16. |2021-03-17T10:32:29.054+08:00| 2.342443|
  17. |2021-03-17T10:32:30.054+08:00| 1.5809103|
  18. |2021-03-17T10:32:31.054+08:00| 1.4829416|
  19. |2021-03-17T10:32:32.054+08:00| 1.5800357|
  20. |2021-03-17T10:32:33.054+08:00| 0.7124368|
  21. |2021-03-17T10:32:34.054+08:00| -0.78597564|
  22. |2021-03-17T10:32:35.054+08:00| 1.2058644|
  23. |2021-03-17T10:32:36.054+08:00| 1.4215064|
  24. |2021-03-17T10:32:37.054+08:00| 1.2808295|
  25. |2021-03-17T10:32:38.054+08:00| -0.6173715|
  26. |2021-03-17T10:32:39.054+08:00| 0.06644377|
  27. |2021-03-17T10:32:40.054+08:00| 2.349338|
  28. |2021-03-17T10:32:41.054+08:00| 1.7335888|
  29. |2021-03-17T10:32:42.054+08:00| 1.5872132|
  30. ............
  31. Total line number = 10000

SQL for query:

  1. select percentile(s0, "rank"="0.2", "error"="0.01") from root.test

Output series:

  1. +-----------------------------+------------------------------------------------------+
  2. | Time|percentile(root.test.s0, "rank"="0.2", "error"="0.01")|
  3. +-----------------------------+------------------------------------------------------+
  4. |2021-03-17T10:35:02.054+08:00| 0.1801469624042511|
  5. +-----------------------------+------------------------------------------------------+

Quantile

Usage

The function is used to compute the approximate quantile of a numeric time series. A quantile is value of element in the certain rank of the sorted series.

Name: QUANTILE

Input Series: Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE.

Parameter:

  • rank: The rank of the quantile. It should be (0,1] and the default value is 0.5. For instance, a quantile with rank\=0.5 is the median.
  • K: The size of KLL sketch maintained in the query. It should be within [100,+inf) and the default value is 800. For instance, the 0.5-quantile computed by a KLL sketch with K=800 items is a value with rank quantile 0.49~0.51 with a confidence of at least 99%. The result will be more accurate as K increases.

Output Series: Output a single series. The type is the same as input series. The timestamp of the only data point is 0.

Note: Missing points, null points and NaN in the input series will be ignored.

Examples

Input series:

  1. +-----------------------------+------------+
  2. | Time|root.test.s0|
  3. +-----------------------------+------------+
  4. |2021-03-17T10:32:17.054+08:00| 0.5319929|
  5. |2021-03-17T10:32:18.054+08:00| 0.9304316|
  6. |2021-03-17T10:32:19.054+08:00| -1.4800133|
  7. |2021-03-17T10:32:20.054+08:00| 0.6114087|
  8. |2021-03-17T10:32:21.054+08:00| 2.5163336|
  9. |2021-03-17T10:32:22.054+08:00| -1.0845392|
  10. |2021-03-17T10:32:23.054+08:00| 1.0562582|
  11. |2021-03-17T10:32:24.054+08:00| 1.3867859|
  12. |2021-03-17T10:32:25.054+08:00| -0.45429882|
  13. |2021-03-17T10:32:26.054+08:00| 1.0353678|
  14. |2021-03-17T10:32:27.054+08:00| 0.7307929|
  15. |2021-03-17T10:32:28.054+08:00| 2.3167255|
  16. |2021-03-17T10:32:29.054+08:00| 2.342443|
  17. |2021-03-17T10:32:30.054+08:00| 1.5809103|
  18. |2021-03-17T10:32:31.054+08:00| 1.4829416|
  19. |2021-03-17T10:32:32.054+08:00| 1.5800357|
  20. |2021-03-17T10:32:33.054+08:00| 0.7124368|
  21. |2021-03-17T10:32:34.054+08:00| -0.78597564|
  22. |2021-03-17T10:32:35.054+08:00| 1.2058644|
  23. |2021-03-17T10:32:36.054+08:00| 1.4215064|
  24. |2021-03-17T10:32:37.054+08:00| 1.2808295|
  25. |2021-03-17T10:32:38.054+08:00| -0.6173715|
  26. |2021-03-17T10:32:39.054+08:00| 0.06644377|
  27. |2021-03-17T10:32:40.054+08:00| 2.349338|
  28. |2021-03-17T10:32:41.054+08:00| 1.7335888|
  29. |2021-03-17T10:32:42.054+08:00| 1.5872132|
  30. ............
  31. Total line number = 10000

SQL for query:

  1. select quantile(s0, "rank"="0.2", "K"="800") from root.test

Output series:

  1. +-----------------------------+------------------------------------------------------+
  2. | Time|quantile(root.test.s0, "rank"="0.2", "K"="800")|
  3. +-----------------------------+------------------------------------------------------+
  4. |1970-01-01T08:00:00.000+08:00| 0.1801469624042511|
  5. +-----------------------------+------------------------------------------------------+

Period

Usage

The function is used to compute the period of a numeric time series.

Name: PERIOD

Input Series: Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE.

Output Series: Output a single series. The type is INT32. There is only one data point in the series, whose timestamp is 0 and value is the period.

Examples

Input series:

  1. +-----------------------------+---------------+
  2. | Time|root.test.d3.s1|
  3. +-----------------------------+---------------+
  4. |1970-01-01T08:00:00.001+08:00| 1.0|
  5. |1970-01-01T08:00:00.002+08:00| 2.0|
  6. |1970-01-01T08:00:00.003+08:00| 3.0|
  7. |1970-01-01T08:00:00.004+08:00| 1.0|
  8. |1970-01-01T08:00:00.005+08:00| 2.0|
  9. |1970-01-01T08:00:00.006+08:00| 3.0|
  10. |1970-01-01T08:00:00.007+08:00| 1.0|
  11. |1970-01-01T08:00:00.008+08:00| 2.0|
  12. |1970-01-01T08:00:00.009+08:00| 3.0|
  13. +-----------------------------+---------------+

SQL for query:

  1. select period(s1) from root.test.d3

Output series:

  1. +-----------------------------+-----------------------+
  2. | Time|period(root.test.d3.s1)|
  3. +-----------------------------+-----------------------+
  4. |1970-01-01T08:00:00.000+08:00| 3|
  5. +-----------------------------+-----------------------+

QLB

Usage

This function is used to calculate Ljung-Box statistics QLBQ_{LB}QLB​ for time series, and convert it to p value.

Name: QLB

Input Series: Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.

Parameters:

lag: max lag to calculate. Legal input shall be integer from 1 to n-2, where n is the sample number. Default value is n-2.

Output Series: Output a single series. The type is DOUBLE. The output series is p value, and timestamp means lag.

Note: If you want to calculate Ljung-Box statistics QLBQ_{LB}QLB​ instead of p value, you may use ACF function.

Examples

Using Default Parameter

Input series:

  1. +-----------------------------+---------------+
  2. | Time|root.test.d1.s1|
  3. +-----------------------------+---------------+
  4. |1970-01-01T00:00:00.100+08:00| 1.22|
  5. |1970-01-01T00:00:00.200+08:00| -2.78|
  6. |1970-01-01T00:00:00.300+08:00| 1.53|
  7. |1970-01-01T00:00:00.400+08:00| 0.70|
  8. |1970-01-01T00:00:00.500+08:00| 0.75|
  9. |1970-01-01T00:00:00.600+08:00| -0.72|
  10. |1970-01-01T00:00:00.700+08:00| -0.22|
  11. |1970-01-01T00:00:00.800+08:00| 0.28|
  12. |1970-01-01T00:00:00.900+08:00| 0.57|
  13. |1970-01-01T00:00:01.000+08:00| -0.22|
  14. |1970-01-01T00:00:01.100+08:00| -0.72|
  15. |1970-01-01T00:00:01.200+08:00| 1.34|
  16. |1970-01-01T00:00:01.300+08:00| -0.25|
  17. |1970-01-01T00:00:01.400+08:00| 0.17|
  18. |1970-01-01T00:00:01.500+08:00| 2.51|
  19. |1970-01-01T00:00:01.600+08:00| 1.42|
  20. |1970-01-01T00:00:01.700+08:00| -1.34|
  21. |1970-01-01T00:00:01.800+08:00| -0.01|
  22. |1970-01-01T00:00:01.900+08:00| -0.49|
  23. |1970-01-01T00:00:02.000+08:00| 1.63|
  24. +-----------------------------+---------------+

SQL for query:

  1. select QLB(s1) from root.test.d1

Output series:

  1. +-----------------------------+--------------------+
  2. | Time|QLB(root.test.d1.s1)|
  3. +-----------------------------+--------------------+
  4. |1970-01-01T00:00:00.001+08:00| 0.2168702295315677|
  5. |1970-01-01T00:00:00.002+08:00| 0.3068948509261751|
  6. |1970-01-01T00:00:00.003+08:00| 0.4217859150918444|
  7. |1970-01-01T00:00:00.004+08:00| 0.5114539874276656|
  8. |1970-01-01T00:00:00.005+08:00| 0.6560619525616759|
  9. |1970-01-01T00:00:00.006+08:00| 0.7722398654053280|
  10. |1970-01-01T00:00:00.007+08:00| 0.8532491661465290|
  11. |1970-01-01T00:00:00.008+08:00| 0.9028575017542528|
  12. |1970-01-01T00:00:00.009+08:00| 0.9434989988192729|
  13. |1970-01-01T00:00:00.010+08:00| 0.8950280161464689|
  14. |1970-01-01T00:00:00.011+08:00| 0.7701048398839656|
  15. |1970-01-01T00:00:00.012+08:00| 0.7845536060001281|
  16. |1970-01-01T00:00:00.013+08:00| 0.5943030981705825|
  17. |1970-01-01T00:00:00.014+08:00| 0.4618413512531093|
  18. |1970-01-01T00:00:00.015+08:00| 0.2645948244673964|
  19. |1970-01-01T00:00:00.016+08:00| 0.3167530476666645|
  20. |1970-01-01T00:00:00.017+08:00| 0.2330010780351453|
  21. |1970-01-01T00:00:00.018+08:00| 0.0666611237622325|
  22. +-----------------------------+--------------------+

Resample

Usage

This function is used to resample the input series according to a given frequency, including up-sampling and down-sampling. Currently, the supported up-sampling methods are NaN (filling with NaN), FFill (filling with previous value), BFill (filling with next value) and Linear (filling with linear interpolation). Down-sampling relies on group aggregation, which supports Max, Min, First, Last, Mean and Median.

Name: RESAMPLE

Input Series: Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.

Parameters:

  • every: The frequency of resampling, which is a positive number with an unit. The unit is ‘ms’ for millisecond, ‘s’ for second, ‘m’ for minute, ‘h’ for hour and ‘d’ for day. This parameter cannot be lacked.
  • interp: The interpolation method of up-sampling, which is ‘NaN’, ‘FFill’, ‘BFill’ or ‘Linear’. By default, NaN is used.
  • aggr: The aggregation method of down-sampling, which is ‘Max’, ‘Min’, ‘First’, ‘Last’, ‘Mean’ or ‘Median’. By default, Mean is used.
  • start: The start time (inclusive) of resampling with the format ‘yyyy-MM-dd HH:mm:ss’. By default, it is the timestamp of the first valid data point.
  • end: The end time (exclusive) of resampling with the format ‘yyyy-MM-dd HH:mm:ss’. By default, it is the timestamp of the last valid data point.

Output Series: Output a single series. The type is DOUBLE. It is strictly equispaced with the frequency every.

Note: NaN in the input series will be ignored.

Examples

Up-sampling

When the frequency of resampling is higher than the original frequency, up-sampling starts.

Input series:

  1. +-----------------------------+---------------+
  2. | Time|root.test.d1.s1|
  3. +-----------------------------+---------------+
  4. |2021-03-06T16:00:00.000+08:00| 3.09|
  5. |2021-03-06T16:15:00.000+08:00| 3.53|
  6. |2021-03-06T16:30:00.000+08:00| 3.5|
  7. |2021-03-06T16:45:00.000+08:00| 3.51|
  8. |2021-03-06T17:00:00.000+08:00| 3.41|
  9. +-----------------------------+---------------+

SQL for query:

  1. select resample(s1,'every'='5m','interp'='linear') from root.test.d1

Output series:

  1. +-----------------------------+----------------------------------------------------------+
  2. | Time|resample(root.test.d1.s1, "every"="5m", "interp"="linear")|
  3. +-----------------------------+----------------------------------------------------------+
  4. |2021-03-06T16:00:00.000+08:00| 3.0899999141693115|
  5. |2021-03-06T16:05:00.000+08:00| 3.2366665999094644|
  6. |2021-03-06T16:10:00.000+08:00| 3.3833332856496177|
  7. |2021-03-06T16:15:00.000+08:00| 3.5299999713897705|
  8. |2021-03-06T16:20:00.000+08:00| 3.5199999809265137|
  9. |2021-03-06T16:25:00.000+08:00| 3.509999990463257|
  10. |2021-03-06T16:30:00.000+08:00| 3.5|
  11. |2021-03-06T16:35:00.000+08:00| 3.503333330154419|
  12. |2021-03-06T16:40:00.000+08:00| 3.506666660308838|
  13. |2021-03-06T16:45:00.000+08:00| 3.509999990463257|
  14. |2021-03-06T16:50:00.000+08:00| 3.4766666889190674|
  15. |2021-03-06T16:55:00.000+08:00| 3.443333387374878|
  16. |2021-03-06T17:00:00.000+08:00| 3.4100000858306885|
  17. +-----------------------------+----------------------------------------------------------+

Down-sampling

When the frequency of resampling is lower than the original frequency, down-sampling starts.

Input series is the same as above, the SQL for query is shown below:

  1. select resample(s1,'every'='30m','aggr'='first') from root.test.d1

Output series:

  1. +-----------------------------+--------------------------------------------------------+
  2. | Time|resample(root.test.d1.s1, "every"="30m", "aggr"="first")|
  3. +-----------------------------+--------------------------------------------------------+
  4. |2021-03-06T16:00:00.000+08:00| 3.0899999141693115|
  5. |2021-03-06T16:30:00.000+08:00| 3.5|
  6. |2021-03-06T17:00:00.000+08:00| 3.4100000858306885|
  7. +-----------------------------+--------------------------------------------------------+

Specify the time period

The time period of resampling can be specified with start and end. The period outside the actual time range will be interpolated.

Input series is the same as above, the SQL for query is shown below:

  1. select resample(s1,'every'='30m','start'='2021-03-06 15:00:00') from root.test.d1

Output series:

  1. +-----------------------------+-----------------------------------------------------------------------+
  2. | Time|resample(root.test.d1.s1, "every"="30m", "start"="2021-03-06 15:00:00")|
  3. +-----------------------------+-----------------------------------------------------------------------+
  4. |2021-03-06T15:00:00.000+08:00| NaN|
  5. |2021-03-06T15:30:00.000+08:00| NaN|
  6. |2021-03-06T16:00:00.000+08:00| 3.309999942779541|
  7. |2021-03-06T16:30:00.000+08:00| 3.5049999952316284|
  8. |2021-03-06T17:00:00.000+08:00| 3.4100000858306885|
  9. +-----------------------------+-----------------------------------------------------------------------+

Sample

Usage

This function is used to sample the input series, that is, select a specified number of data points from the input series and output them. Currently, three sampling methods are supported: Reservoir sampling randomly selects data points. All of the points have the same probability of being sampled. Isometric sampling selects data points at equal index intervals. Triangle sampling assigns data points to the buckets based on the number of sampling. Then it calculates the area of the triangle based on these points inside the bucket and selects the point with the largest area of the triangle. For more detail, please read paperData Profiling - 图1open in new window

Name: SAMPLE

Input Series: Only support a single input series. The type is arbitrary.

Parameters:

  • method: The method of sampling, which is ‘reservoir’, ‘isometric’ or ‘triangle’. By default, reservoir sampling is used.
  • k: The number of sampling, which is a positive integer. By default, it’s 1.

Output Series: Output a single series. The type is the same as the input. The length of the output series is k. Each data point in the output series comes from the input series.

Note: If k is greater than the length of input series, all data points in the input series will be output.

Examples

Reservoir Sampling

When method is ‘reservoir’ or the default, reservoir sampling is used. Due to the randomness of this method, the output series shown below is only a possible result.

Input series:

  1. +-----------------------------+---------------+
  2. | Time|root.test.d1.s1|
  3. +-----------------------------+---------------+
  4. |2020-01-01T00:00:01.000+08:00| 1.0|
  5. |2020-01-01T00:00:02.000+08:00| 2.0|
  6. |2020-01-01T00:00:03.000+08:00| 3.0|
  7. |2020-01-01T00:00:04.000+08:00| 4.0|
  8. |2020-01-01T00:00:05.000+08:00| 5.0|
  9. |2020-01-01T00:00:06.000+08:00| 6.0|
  10. |2020-01-01T00:00:07.000+08:00| 7.0|
  11. |2020-01-01T00:00:08.000+08:00| 8.0|
  12. |2020-01-01T00:00:09.000+08:00| 9.0|
  13. |2020-01-01T00:00:10.000+08:00| 10.0|
  14. +-----------------------------+---------------+

SQL for query:

  1. select sample(s1,'method'='reservoir','k'='5') from root.test.d1

Output series:

  1. +-----------------------------+------------------------------------------------------+
  2. | Time|sample(root.test.d1.s1, "method"="reservoir", "k"="5")|
  3. +-----------------------------+------------------------------------------------------+
  4. |2020-01-01T00:00:02.000+08:00| 2.0|
  5. |2020-01-01T00:00:03.000+08:00| 3.0|
  6. |2020-01-01T00:00:05.000+08:00| 5.0|
  7. |2020-01-01T00:00:08.000+08:00| 8.0|
  8. |2020-01-01T00:00:10.000+08:00| 10.0|
  9. +-----------------------------+------------------------------------------------------+

Isometric Sampling

When method is ‘isometric’, isometric sampling is used.

Input series is the same as above, the SQL for query is shown below:

  1. select sample(s1,'method'='isometric','k'='5') from root.test.d1

Output series:

  1. +-----------------------------+------------------------------------------------------+
  2. | Time|sample(root.test.d1.s1, "method"="isometric", "k"="5")|
  3. +-----------------------------+------------------------------------------------------+
  4. |2020-01-01T00:00:01.000+08:00| 1.0|
  5. |2020-01-01T00:00:03.000+08:00| 3.0|
  6. |2020-01-01T00:00:05.000+08:00| 5.0|
  7. |2020-01-01T00:00:07.000+08:00| 7.0|
  8. |2020-01-01T00:00:09.000+08:00| 9.0|
  9. +-----------------------------+------------------------------------------------------+

Segment

Usage

This function is used to segment a time series into subsequences according to linear trend, and returns linear fitted values of first values in each subsequence or every data point.

Name: SEGMENT

Input Series: Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.

Parameters:

  • output :”all” to output all fitted points; “first” to output first fitted points in each subsequence.

  • error: error allowed at linear regression. It is defined as mean absolute error of a subsequence.

Output Series: Output a single series. The type is DOUBLE.

Note: This function treat input series as equal-interval sampled. All data are loaded, so downsample input series first if there are too many data points.

Examples

Input series:

  1. +-----------------------------+------------+
  2. | Time|root.test.s1|
  3. +-----------------------------+------------+
  4. |1970-01-01T08:00:00.000+08:00| 5.0|
  5. |1970-01-01T08:00:00.100+08:00| 0.0|
  6. |1970-01-01T08:00:00.200+08:00| 1.0|
  7. |1970-01-01T08:00:00.300+08:00| 2.0|
  8. |1970-01-01T08:00:00.400+08:00| 3.0|
  9. |1970-01-01T08:00:00.500+08:00| 4.0|
  10. |1970-01-01T08:00:00.600+08:00| 5.0|
  11. |1970-01-01T08:00:00.700+08:00| 6.0|
  12. |1970-01-01T08:00:00.800+08:00| 7.0|
  13. |1970-01-01T08:00:00.900+08:00| 8.0|
  14. |1970-01-01T08:00:01.000+08:00| 9.0|
  15. |1970-01-01T08:00:01.100+08:00| 9.1|
  16. |1970-01-01T08:00:01.200+08:00| 9.2|
  17. |1970-01-01T08:00:01.300+08:00| 9.3|
  18. |1970-01-01T08:00:01.400+08:00| 9.4|
  19. |1970-01-01T08:00:01.500+08:00| 9.5|
  20. |1970-01-01T08:00:01.600+08:00| 9.6|
  21. |1970-01-01T08:00:01.700+08:00| 9.7|
  22. |1970-01-01T08:00:01.800+08:00| 9.8|
  23. |1970-01-01T08:00:01.900+08:00| 9.9|
  24. |1970-01-01T08:00:02.000+08:00| 10.0|
  25. |1970-01-01T08:00:02.100+08:00| 8.0|
  26. |1970-01-01T08:00:02.200+08:00| 6.0|
  27. |1970-01-01T08:00:02.300+08:00| 4.0|
  28. |1970-01-01T08:00:02.400+08:00| 2.0|
  29. |1970-01-01T08:00:02.500+08:00| 0.0|
  30. |1970-01-01T08:00:02.600+08:00| -2.0|
  31. |1970-01-01T08:00:02.700+08:00| -4.0|
  32. |1970-01-01T08:00:02.800+08:00| -6.0|
  33. |1970-01-01T08:00:02.900+08:00| -8.0|
  34. |1970-01-01T08:00:03.000+08:00| -10.0|
  35. |1970-01-01T08:00:03.100+08:00| 10.0|
  36. |1970-01-01T08:00:03.200+08:00| 10.0|
  37. |1970-01-01T08:00:03.300+08:00| 10.0|
  38. |1970-01-01T08:00:03.400+08:00| 10.0|
  39. |1970-01-01T08:00:03.500+08:00| 10.0|
  40. |1970-01-01T08:00:03.600+08:00| 10.0|
  41. |1970-01-01T08:00:03.700+08:00| 10.0|
  42. |1970-01-01T08:00:03.800+08:00| 10.0|
  43. |1970-01-01T08:00:03.900+08:00| 10.0|
  44. +-----------------------------+------------+

SQL for query:

  1. select segment(s1, "error"="0.1") from root.test

Output series:

  1. +-----------------------------+------------------------------------+
  2. | Time|segment(root.test.s1, "error"="0.1")|
  3. +-----------------------------+------------------------------------+
  4. |1970-01-01T08:00:00.000+08:00| 5.0|
  5. |1970-01-01T08:00:00.200+08:00| 1.0|
  6. |1970-01-01T08:00:01.000+08:00| 9.0|
  7. |1970-01-01T08:00:02.000+08:00| 10.0|
  8. |1970-01-01T08:00:03.000+08:00| -10.0|
  9. |1970-01-01T08:00:03.200+08:00| 10.0|
  10. +-----------------------------+------------------------------------+

Skew

Usage

This function is used to calculate the population skewness.

Name: SKEW

Input Series: Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.

Output Series: Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the population skewness.

Note: Missing points, null points and NaN in the input series will be ignored.

Examples

Input series:

  1. +-----------------------------+---------------+
  2. | Time|root.test.d1.s1|
  3. +-----------------------------+---------------+
  4. |2020-01-01T00:00:00.000+08:00| 1.0|
  5. |2020-01-01T00:00:01.000+08:00| 2.0|
  6. |2020-01-01T00:00:02.000+08:00| 3.0|
  7. |2020-01-01T00:00:03.000+08:00| 4.0|
  8. |2020-01-01T00:00:04.000+08:00| 5.0|
  9. |2020-01-01T00:00:05.000+08:00| 6.0|
  10. |2020-01-01T00:00:06.000+08:00| 7.0|
  11. |2020-01-01T00:00:07.000+08:00| 8.0|
  12. |2020-01-01T00:00:08.000+08:00| 9.0|
  13. |2020-01-01T00:00:09.000+08:00| 10.0|
  14. |2020-01-01T00:00:10.000+08:00| 10.0|
  15. |2020-01-01T00:00:11.000+08:00| 10.0|
  16. |2020-01-01T00:00:12.000+08:00| 10.0|
  17. |2020-01-01T00:00:13.000+08:00| 10.0|
  18. |2020-01-01T00:00:14.000+08:00| 10.0|
  19. |2020-01-01T00:00:15.000+08:00| 10.0|
  20. |2020-01-01T00:00:16.000+08:00| 10.0|
  21. |2020-01-01T00:00:17.000+08:00| 10.0|
  22. |2020-01-01T00:00:18.000+08:00| 10.0|
  23. |2020-01-01T00:00:19.000+08:00| 10.0|
  24. +-----------------------------+---------------+

SQL for query:

  1. select skew(s1) from root.test.d1

Output series:

  1. +-----------------------------+-----------------------+
  2. | Time| skew(root.test.d1.s1)|
  3. +-----------------------------+-----------------------+
  4. |1970-01-01T08:00:00.000+08:00| -0.9998427402292644|
  5. +-----------------------------+-----------------------+

Spline

Usage

This function is used to calculate cubic spline interpolation of input series.

Name: SPLINE

Input Series: Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.

  • points: Number of resampling points.

Output Series: Output a single series. The type is DOUBLE.

Note: Output series retains the first and last timestamps of input series. Interpolation points are selected at equal intervals. The function tries to calculate only when there are no less than 4 points in input series.

Examples

Assigning number of interpolation points

Input series:

  1. +-----------------------------+------------+
  2. | Time|root.test.s1|
  3. +-----------------------------+------------+
  4. |1970-01-01T08:00:00.000+08:00| 0.0|
  5. |1970-01-01T08:00:00.300+08:00| 1.2|
  6. |1970-01-01T08:00:00.500+08:00| 1.7|
  7. |1970-01-01T08:00:00.700+08:00| 2.0|
  8. |1970-01-01T08:00:00.900+08:00| 2.1|
  9. |1970-01-01T08:00:01.100+08:00| 2.0|
  10. |1970-01-01T08:00:01.200+08:00| 1.8|
  11. |1970-01-01T08:00:01.300+08:00| 1.2|
  12. |1970-01-01T08:00:01.400+08:00| 1.0|
  13. |1970-01-01T08:00:01.500+08:00| 1.6|
  14. +-----------------------------+------------+

SQL for query:

  1. select spline(s1, "points"="151") from root.test

Output series:

  1. +-----------------------------+------------------------------------+
  2. | Time|spline(root.test.s1, "points"="151")|
  3. +-----------------------------+------------------------------------+
  4. |1970-01-01T08:00:00.000+08:00| 0.0|
  5. |1970-01-01T08:00:00.010+08:00| 0.04870000251134237|
  6. |1970-01-01T08:00:00.020+08:00| 0.09680000495910646|
  7. |1970-01-01T08:00:00.030+08:00| 0.14430000734329226|
  8. |1970-01-01T08:00:00.040+08:00| 0.19120000966389972|
  9. |1970-01-01T08:00:00.050+08:00| 0.23750001192092896|
  10. |1970-01-01T08:00:00.060+08:00| 0.2832000141143799|
  11. |1970-01-01T08:00:00.070+08:00| 0.32830001624425253|
  12. |1970-01-01T08:00:00.080+08:00| 0.3728000183105469|
  13. |1970-01-01T08:00:00.090+08:00| 0.416700020313263|
  14. |1970-01-01T08:00:00.100+08:00| 0.4600000222524008|
  15. |1970-01-01T08:00:00.110+08:00| 0.5027000241279602|
  16. |1970-01-01T08:00:00.120+08:00| 0.5448000259399414|
  17. |1970-01-01T08:00:00.130+08:00| 0.5863000276883443|
  18. |1970-01-01T08:00:00.140+08:00| 0.627200029373169|
  19. |1970-01-01T08:00:00.150+08:00| 0.6675000309944153|
  20. |1970-01-01T08:00:00.160+08:00| 0.7072000325520833|
  21. |1970-01-01T08:00:00.170+08:00| 0.7463000340461731|
  22. |1970-01-01T08:00:00.180+08:00| 0.7848000354766846|
  23. |1970-01-01T08:00:00.190+08:00| 0.8227000368436178|
  24. |1970-01-01T08:00:00.200+08:00| 0.8600000381469728|
  25. |1970-01-01T08:00:00.210+08:00| 0.8967000393867494|
  26. |1970-01-01T08:00:00.220+08:00| 0.9328000405629477|
  27. |1970-01-01T08:00:00.230+08:00| 0.9683000416755676|
  28. |1970-01-01T08:00:00.240+08:00| 1.0032000427246095|
  29. |1970-01-01T08:00:00.250+08:00| 1.037500043710073|
  30. |1970-01-01T08:00:00.260+08:00| 1.071200044631958|
  31. |1970-01-01T08:00:00.270+08:00| 1.1043000454902647|
  32. |1970-01-01T08:00:00.280+08:00| 1.1368000462849934|
  33. |1970-01-01T08:00:00.290+08:00| 1.1687000470161437|
  34. |1970-01-01T08:00:00.300+08:00| 1.2000000476837158|
  35. |1970-01-01T08:00:00.310+08:00| 1.2307000483103594|
  36. |1970-01-01T08:00:00.320+08:00| 1.2608000489139557|
  37. |1970-01-01T08:00:00.330+08:00| 1.2903000494873524|
  38. |1970-01-01T08:00:00.340+08:00| 1.3192000500233967|
  39. |1970-01-01T08:00:00.350+08:00| 1.3475000505149364|
  40. |1970-01-01T08:00:00.360+08:00| 1.3752000509548186|
  41. |1970-01-01T08:00:00.370+08:00| 1.402300051335891|
  42. |1970-01-01T08:00:00.380+08:00| 1.4288000516510009|
  43. |1970-01-01T08:00:00.390+08:00| 1.4547000518929958|
  44. |1970-01-01T08:00:00.400+08:00| 1.480000052054723|
  45. |1970-01-01T08:00:00.410+08:00| 1.5047000521290301|
  46. |1970-01-01T08:00:00.420+08:00| 1.5288000521087646|
  47. |1970-01-01T08:00:00.430+08:00| 1.5523000519867738|
  48. |1970-01-01T08:00:00.440+08:00| 1.575200051755905|
  49. |1970-01-01T08:00:00.450+08:00| 1.597500051409006|
  50. |1970-01-01T08:00:00.460+08:00| 1.619200050938924|
  51. |1970-01-01T08:00:00.470+08:00| 1.6403000503385066|
  52. |1970-01-01T08:00:00.480+08:00| 1.660800049600601|
  53. |1970-01-01T08:00:00.490+08:00| 1.680700048718055|
  54. |1970-01-01T08:00:00.500+08:00| 1.7000000476837158|
  55. |1970-01-01T08:00:00.510+08:00| 1.7188475466453037|
  56. |1970-01-01T08:00:00.520+08:00| 1.7373800457262996|
  57. |1970-01-01T08:00:00.530+08:00| 1.7555825448831923|
  58. |1970-01-01T08:00:00.540+08:00| 1.7734400440724702|
  59. |1970-01-01T08:00:00.550+08:00| 1.790937543250622|
  60. |1970-01-01T08:00:00.560+08:00| 1.8080600423741364|
  61. |1970-01-01T08:00:00.570+08:00| 1.8247925413995016|
  62. |1970-01-01T08:00:00.580+08:00| 1.8411200402832066|
  63. |1970-01-01T08:00:00.590+08:00| 1.8570275389817397|
  64. |1970-01-01T08:00:00.600+08:00| 1.8725000374515897|
  65. |1970-01-01T08:00:00.610+08:00| 1.8875225356492449|
  66. |1970-01-01T08:00:00.620+08:00| 1.902080033531194|
  67. |1970-01-01T08:00:00.630+08:00| 1.9161575310539258|
  68. |1970-01-01T08:00:00.640+08:00| 1.9297400281739288|
  69. |1970-01-01T08:00:00.650+08:00| 1.9428125248476913|
  70. |1970-01-01T08:00:00.660+08:00| 1.9553600210317021|
  71. |1970-01-01T08:00:00.670+08:00| 1.96736751668245|
  72. |1970-01-01T08:00:00.680+08:00| 1.9788200117564232|
  73. |1970-01-01T08:00:00.690+08:00| 1.9897025062101101|
  74. |1970-01-01T08:00:00.700+08:00| 2.0|
  75. |1970-01-01T08:00:00.710+08:00| 2.0097024933913334|
  76. |1970-01-01T08:00:00.720+08:00| 2.0188199867081615|
  77. |1970-01-01T08:00:00.730+08:00| 2.027367479995188|
  78. |1970-01-01T08:00:00.740+08:00| 2.0353599732971155|
  79. |1970-01-01T08:00:00.750+08:00| 2.0428124666586482|
  80. |1970-01-01T08:00:00.760+08:00| 2.049739960124489|
  81. |1970-01-01T08:00:00.770+08:00| 2.056157453739342|
  82. |1970-01-01T08:00:00.780+08:00| 2.06207994754791|
  83. |1970-01-01T08:00:00.790+08:00| 2.067522441594897|
  84. |1970-01-01T08:00:00.800+08:00| 2.072499935925006|
  85. |1970-01-01T08:00:00.810+08:00| 2.07702743058294|
  86. |1970-01-01T08:00:00.820+08:00| 2.081119925613404|
  87. |1970-01-01T08:00:00.830+08:00| 2.0847924210611|
  88. |1970-01-01T08:00:00.840+08:00| 2.0880599169707317|
  89. |1970-01-01T08:00:00.850+08:00| 2.0909374133870027|
  90. |1970-01-01T08:00:00.860+08:00| 2.0934399103546166|
  91. |1970-01-01T08:00:00.870+08:00| 2.0955824079182768|
  92. |1970-01-01T08:00:00.880+08:00| 2.0973799061226863|
  93. |1970-01-01T08:00:00.890+08:00| 2.098847405012549|
  94. |1970-01-01T08:00:00.900+08:00| 2.0999999046325684|
  95. |1970-01-01T08:00:00.910+08:00| 2.1005574051201332|
  96. |1970-01-01T08:00:00.920+08:00| 2.1002599065303778|
  97. |1970-01-01T08:00:00.930+08:00| 2.0991524087846245|
  98. |1970-01-01T08:00:00.940+08:00| 2.0972799118041947|
  99. |1970-01-01T08:00:00.950+08:00| 2.0946874155104105|
  100. |1970-01-01T08:00:00.960+08:00| 2.0914199198245944|
  101. |1970-01-01T08:00:00.970+08:00| 2.0875224246680673|
  102. |1970-01-01T08:00:00.980+08:00| 2.083039929962151|
  103. |1970-01-01T08:00:00.990+08:00| 2.0780174356281687|
  104. |1970-01-01T08:00:01.000+08:00| 2.0724999415874406|
  105. |1970-01-01T08:00:01.010+08:00| 2.06653244776129|
  106. |1970-01-01T08:00:01.020+08:00| 2.060159954071038|
  107. |1970-01-01T08:00:01.030+08:00| 2.053427460438006|
  108. |1970-01-01T08:00:01.040+08:00| 2.046379966783517|
  109. |1970-01-01T08:00:01.050+08:00| 2.0390624730288924|
  110. |1970-01-01T08:00:01.060+08:00| 2.031519979095454|
  111. |1970-01-01T08:00:01.070+08:00| 2.0237974849045237|
  112. |1970-01-01T08:00:01.080+08:00| 2.015939990377423|
  113. |1970-01-01T08:00:01.090+08:00| 2.0079924954354746|
  114. |1970-01-01T08:00:01.100+08:00| 2.0|
  115. |1970-01-01T08:00:01.110+08:00| 1.9907018211101906|
  116. |1970-01-01T08:00:01.120+08:00| 1.9788509124245144|
  117. |1970-01-01T08:00:01.130+08:00| 1.9645127287932083|
  118. |1970-01-01T08:00:01.140+08:00| 1.9477527250665083|
  119. |1970-01-01T08:00:01.150+08:00| 1.9286363560946513|
  120. |1970-01-01T08:00:01.160+08:00| 1.9072290767278735|
  121. |1970-01-01T08:00:01.170+08:00| 1.8835963418164114|
  122. |1970-01-01T08:00:01.180+08:00| 1.8578036062105014|
  123. |1970-01-01T08:00:01.190+08:00| 1.8299163247603802|
  124. |1970-01-01T08:00:01.200+08:00| 1.7999999523162842|
  125. |1970-01-01T08:00:01.210+08:00| 1.7623635841923329|
  126. |1970-01-01T08:00:01.220+08:00| 1.7129696477516976|
  127. |1970-01-01T08:00:01.230+08:00| 1.6543635959181928|
  128. |1970-01-01T08:00:01.240+08:00| 1.5890908816156328|
  129. |1970-01-01T08:00:01.250+08:00| 1.5196969577678319|
  130. |1970-01-01T08:00:01.260+08:00| 1.4487272772986044|
  131. |1970-01-01T08:00:01.270+08:00| 1.3787272931317647|
  132. |1970-01-01T08:00:01.280+08:00| 1.3122424581911272|
  133. |1970-01-01T08:00:01.290+08:00| 1.251818225400506|
  134. |1970-01-01T08:00:01.300+08:00| 1.2000000476837158|
  135. |1970-01-01T08:00:01.310+08:00| 1.1548000470995912|
  136. |1970-01-01T08:00:01.320+08:00| 1.1130667107899999|
  137. |1970-01-01T08:00:01.330+08:00| 1.0756000393033045|
  138. |1970-01-01T08:00:01.340+08:00| 1.043200033187868|
  139. |1970-01-01T08:00:01.350+08:00| 1.016666692992053|
  140. |1970-01-01T08:00:01.360+08:00| 0.9968000192642223|
  141. |1970-01-01T08:00:01.370+08:00| 0.9844000125527389|
  142. |1970-01-01T08:00:01.380+08:00| 0.9802666734059655|
  143. |1970-01-01T08:00:01.390+08:00| 0.9852000023722649|
  144. |1970-01-01T08:00:01.400+08:00| 1.0|
  145. |1970-01-01T08:00:01.410+08:00| 1.023999999165535|
  146. |1970-01-01T08:00:01.420+08:00| 1.0559999990463256|
  147. |1970-01-01T08:00:01.430+08:00| 1.0959999996423722|
  148. |1970-01-01T08:00:01.440+08:00| 1.1440000009536744|
  149. |1970-01-01T08:00:01.450+08:00| 1.2000000029802322|
  150. |1970-01-01T08:00:01.460+08:00| 1.264000005722046|
  151. |1970-01-01T08:00:01.470+08:00| 1.3360000091791153|
  152. |1970-01-01T08:00:01.480+08:00| 1.4160000133514405|
  153. |1970-01-01T08:00:01.490+08:00| 1.5040000182390214|
  154. |1970-01-01T08:00:01.500+08:00| 1.600000023841858|
  155. +-----------------------------+------------------------------------+

Spread

Usage

This function is used to calculate the spread of time series, that is, the maximum value minus the minimum value.

Name: SPREAD

Input Series: Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.

Output Series: Output a single series. The type is the same as the input. There is only one data point in the series, whose timestamp is 0 and value is the spread.

Note: Missing points, null points and NaN in the input series will be ignored.

Examples

Input series:

  1. +-----------------------------+---------------+
  2. | Time|root.test.d1.s1|
  3. +-----------------------------+---------------+
  4. |2020-01-01T00:00:02.000+08:00| 100.0|
  5. |2020-01-01T00:00:03.000+08:00| 101.0|
  6. |2020-01-01T00:00:04.000+08:00| 102.0|
  7. |2020-01-01T00:00:06.000+08:00| 104.0|
  8. |2020-01-01T00:00:08.000+08:00| 126.0|
  9. |2020-01-01T00:00:10.000+08:00| 108.0|
  10. |2020-01-01T00:00:14.000+08:00| 112.0|
  11. |2020-01-01T00:00:15.000+08:00| 113.0|
  12. |2020-01-01T00:00:16.000+08:00| 114.0|
  13. |2020-01-01T00:00:18.000+08:00| 116.0|
  14. |2020-01-01T00:00:20.000+08:00| 118.0|
  15. |2020-01-01T00:00:22.000+08:00| 120.0|
  16. |2020-01-01T00:00:26.000+08:00| 124.0|
  17. |2020-01-01T00:00:28.000+08:00| 126.0|
  18. |2020-01-01T00:00:30.000+08:00| NaN|
  19. +-----------------------------+---------------+

SQL for query:

  1. select spread(s1) from root.test.d1 where time <= 2020-01-01 00:00:30

Output series:

  1. +-----------------------------+-----------------------+
  2. | Time|spread(root.test.d1.s1)|
  3. +-----------------------------+-----------------------+
  4. |1970-01-01T08:00:00.000+08:00| 26.0|
  5. +-----------------------------+-----------------------+

Stddev

Usage

This function is used to calculate the population standard deviation.

Name: STDDEV

Input Series: Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.

Output Series: Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the population standard deviation.

Note: Missing points, null points and NaN in the input series will be ignored.

Examples

Input series:

  1. +-----------------------------+---------------+
  2. | Time|root.test.d1.s1|
  3. +-----------------------------+---------------+
  4. |2020-01-01T00:00:00.000+08:00| 1.0|
  5. |2020-01-01T00:00:01.000+08:00| 2.0|
  6. |2020-01-01T00:00:02.000+08:00| 3.0|
  7. |2020-01-01T00:00:03.000+08:00| 4.0|
  8. |2020-01-01T00:00:04.000+08:00| 5.0|
  9. |2020-01-01T00:00:05.000+08:00| 6.0|
  10. |2020-01-01T00:00:06.000+08:00| 7.0|
  11. |2020-01-01T00:00:07.000+08:00| 8.0|
  12. |2020-01-01T00:00:08.000+08:00| 9.0|
  13. |2020-01-01T00:00:09.000+08:00| 10.0|
  14. |2020-01-01T00:00:10.000+08:00| 11.0|
  15. |2020-01-01T00:00:11.000+08:00| 12.0|
  16. |2020-01-01T00:00:12.000+08:00| 13.0|
  17. |2020-01-01T00:00:13.000+08:00| 14.0|
  18. |2020-01-01T00:00:14.000+08:00| 15.0|
  19. |2020-01-01T00:00:15.000+08:00| 16.0|
  20. |2020-01-01T00:00:16.000+08:00| 17.0|
  21. |2020-01-01T00:00:17.000+08:00| 18.0|
  22. |2020-01-01T00:00:18.000+08:00| 19.0|
  23. |2020-01-01T00:00:19.000+08:00| 20.0|
  24. +-----------------------------+---------------+

SQL for query:

  1. select stddev(s1) from root.test.d1

Output series:

  1. +-----------------------------+-----------------------+
  2. | Time|stddev(root.test.d1.s1)|
  3. +-----------------------------+-----------------------+
  4. |1970-01-01T08:00:00.000+08:00| 5.7662812973353965|
  5. +-----------------------------+-----------------------+

ZScore

Usage

This function is used to standardize the input series with z-score.

Name: ZSCORE

Input Series: Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.

  • compute: When set to “batch”, anomaly test is conducted after importing all data points; when set to “stream”, it is required to provide mean and standard deviation. The default method is “batch”.
  • avg: Mean value when method is set to “stream”.
  • sd: Standard deviation when method is set to “stream”.

Output Series: Output a single series. The type is DOUBLE.

Examples

Batch computing

Input series:

  1. +-----------------------------+------------+
  2. | Time|root.test.s1|
  3. +-----------------------------+------------+
  4. |1970-01-01T08:00:00.100+08:00| 0.0|
  5. |1970-01-01T08:00:00.200+08:00| 0.0|
  6. |1970-01-01T08:00:00.300+08:00| 1.0|
  7. |1970-01-01T08:00:00.400+08:00| -1.0|
  8. |1970-01-01T08:00:00.500+08:00| 0.0|
  9. |1970-01-01T08:00:00.600+08:00| 0.0|
  10. |1970-01-01T08:00:00.700+08:00| -2.0|
  11. |1970-01-01T08:00:00.800+08:00| 2.0|
  12. |1970-01-01T08:00:00.900+08:00| 0.0|
  13. |1970-01-01T08:00:01.000+08:00| 0.0|
  14. |1970-01-01T08:00:01.100+08:00| 1.0|
  15. |1970-01-01T08:00:01.200+08:00| -1.0|
  16. |1970-01-01T08:00:01.300+08:00| -1.0|
  17. |1970-01-01T08:00:01.400+08:00| 1.0|
  18. |1970-01-01T08:00:01.500+08:00| 0.0|
  19. |1970-01-01T08:00:01.600+08:00| 0.0|
  20. |1970-01-01T08:00:01.700+08:00| 10.0|
  21. |1970-01-01T08:00:01.800+08:00| 2.0|
  22. |1970-01-01T08:00:01.900+08:00| -2.0|
  23. |1970-01-01T08:00:02.000+08:00| 0.0|
  24. +-----------------------------+------------+

SQL for query:

  1. select zscore(s1) from root.test

Output series:

  1. +-----------------------------+--------------------+
  2. | Time|zscore(root.test.s1)|
  3. +-----------------------------+--------------------+
  4. |1970-01-01T08:00:00.100+08:00|-0.20672455764868078|
  5. |1970-01-01T08:00:00.200+08:00|-0.20672455764868078|
  6. |1970-01-01T08:00:00.300+08:00| 0.20672455764868078|
  7. |1970-01-01T08:00:00.400+08:00| -0.6201736729460423|
  8. |1970-01-01T08:00:00.500+08:00|-0.20672455764868078|
  9. |1970-01-01T08:00:00.600+08:00|-0.20672455764868078|
  10. |1970-01-01T08:00:00.700+08:00| -1.033622788243404|
  11. |1970-01-01T08:00:00.800+08:00| 0.6201736729460423|
  12. |1970-01-01T08:00:00.900+08:00|-0.20672455764868078|
  13. |1970-01-01T08:00:01.000+08:00|-0.20672455764868078|
  14. |1970-01-01T08:00:01.100+08:00| 0.20672455764868078|
  15. |1970-01-01T08:00:01.200+08:00| -0.6201736729460423|
  16. |1970-01-01T08:00:01.300+08:00| -0.6201736729460423|
  17. |1970-01-01T08:00:01.400+08:00| 0.20672455764868078|
  18. |1970-01-01T08:00:01.500+08:00|-0.20672455764868078|
  19. |1970-01-01T08:00:01.600+08:00|-0.20672455764868078|
  20. |1970-01-01T08:00:01.700+08:00| 3.9277665953249348|
  21. |1970-01-01T08:00:01.800+08:00| 0.6201736729460423|
  22. |1970-01-01T08:00:01.900+08:00| -1.033622788243404|
  23. |1970-01-01T08:00:02.000+08:00|-0.20672455764868078|
  24. +-----------------------------+--------------------+