Geo Search
Table of Contents
Introduction
CrateDB can be used to store and query geographical information of many kinds using the geo_point and geo_shape types. With these it is possible to store geographical locations, ways, shapes, areas and other entities. These can be queried for distance, containment, intersection and so on, making it possible to create apps and services with rich geographical features.
Geographic shapes are stored using special indices. ref:Geographic points <geo_point_data_type> are represented by their coordinates. They are represented as columns of the respective datatypes.
Geographic indices for geo_shape columns are used in order to speed up geographic searches even on complex shapes. This indexing process results in a representation that is not exact (See geo_shape for details). CrateDB does not operate on vector shapes but on a kind of a grid with the given precision as resolution.
Creating tables containing geographic information is straightforward:
cr> CREATE TABLE country (
... name string,
... country_code string primary key,
... shape geo_shape INDEX USING "geohash" WITH (precision='100m'),
... capital string,
... capital_location geo_point
... ) WITH (number_of_replicas=0);
CREATE OK, 1 row affected (... sec)
This table will contain the shape of a country and the location of its capital alongside with other metadata. The shape is indexed with a maximum precision of 100 meters using a geohash
index (For more information, see Geo Shape Index Structure).
Let’s insert Austria:
cr> INSERT INTO country (mame, country_code, shape, capital, capital_location)
... VALUES (
... 'Austria',
... 'at',
... {type='Polygon', coordinates=[
... [[16.979667, 48.123497], [16.903754, 47.714866],
... [16.340584, 47.712902], [16.534268, 47.496171],
... [16.202298, 46.852386], [16.011664, 46.683611],
... [15.137092, 46.658703], [14.632472, 46.431817],
... [13.806475, 46.509306], [12.376485, 46.767559],
... [12.153088, 47.115393], [11.164828, 46.941579],
... [11.048556, 46.751359], [10.442701, 46.893546],
... [9.932448, 46.920728], [9.47997, 47.10281],
... [9.632932, 47.347601], [9.594226, 47.525058],
... [9.896068, 47.580197], [10.402084, 47.302488],
... [10.544504, 47.566399], [11.426414, 47.523766],
... [12.141357, 47.703083], [12.62076, 47.672388],
... [12.932627, 47.467646], [13.025851, 47.637584],
... [12.884103, 48.289146], [13.243357, 48.416115],
... [13.595946, 48.877172], [14.338898, 48.555305],
... [14.901447, 48.964402], [15.253416, 49.039074],
... [16.029647, 48.733899], [16.499283, 48.785808],
... [16.960288, 48.596982], [16.879983, 48.470013],
... [16.979667, 48.123497]]
... ]},
... 'Vienna',
... [16.372778, 48.209206]
... );
INSERT OK, 1 row affected (... sec)
Caution
Geoshapes has to be fully valid by ISO 19107. If you have problems importing geo data, they may not be fully valid. In most cases they could be repaired using this tool: https://github.com/tudelft3d/prepair
Note
When using a polygon shape that resembles a rectangle, and that rectangle is wider than 180 degrees the CrateDB geoshape validator will convert it into a multipolygon consisting of 2 rectangular shapes covering the narrower area between the 4 original points split by the dateline (+/- 180deg).
This is due to CrateDB operating in the geospatial context of the earth.
Geographic points can be inserted as double array with lon and lat as seen above or as WKT string.
Geographic shapes can be inserted as GeoJSON object literal or parameter as seen above and as WKT string.
When it comes to get some meaningful insights into your geographical data CrateDB supports different kinds of geographic queries.
Fast queries that leverage the geographic index are done using the Match Predicate:
Match Predicate
The MATCH
predicate can be used to perform multiple kinds of searches on indices or indexed columns. While it can be used to perform fulltext searches on analyzed indices of type string, it is also handy for operating on geographic indices, querying for relations between geographical shapes and points.
MATCH (column_ident, query_term) [ using match_type ]
The Match predicate for geographical search supports a single column_ident
of a geo_shape indexed column as first argument.
The second argument, the query_term
is taken to match against the indexed geo_shape.
The matching operation is determined by the match_type
which determines the spatial relation we want to match. Available match_types
are:
intersects: | (Default) If the two shapes share some points and/or area, they are intersecting and considered matching using this match_type . This also precludes containment or complete equality. |
---|---|
disjoint: | If the two shapes share no single point and not area, they are disjoint. This is the opposite of intersects . |
within: | If the indexed column_ident shape is completely inside the query_term shape they are considered matching using this match_type . |
Note
The MATCH
predicate can only be used in the WHERE Clause and on user-created tables. Using the MATCH
predicate on system tables is not supported.
One MATCH
predicate cannot combine columns of both relations of a join.
Additionally, MATCH
predicates cannot be used on columns of both relations of a join if they cannot be logically applied to each of them separately. For example:
This is allowed:
FROM t1, t2
WHERE match(t1.shape, 'POINT(1.1 2.2)')
AND match(t2.shape, 'POINT(3.3 4.4)')
But this is not:
FROM t1, t2
WHERE match(t1.shape, 'POINT(1.1 2.2)')
OR match(t2.shape, 'POINT(3.3 4.4)')``
Having a table countries
with a GEO_SHAPE
column “geo” indexed using geohash
you can query that column using the MATCH
predicate with different match types as described above:
cr> SELECT name from countries
... WHERE match("geo",
... 'LINESTRING (13.3813 52.5229, 11.1840 51.5497, 8.6132 50.0782, 8.3715 47.9457, 8.5034 47.3685)'
... );
+---------+
| name |
+---------+
| Germany |
+---------+
SELECT 1 row in set (... sec)
cr> SELECT name from countries
... WHERE match("geo",
... 'LINESTRING (13.3813 52.5229, 11.1840 51.5497, 8.6132 50.0782, 8.3715 47.9457, 8.5034 47.3685)'
... ) USING disjoint
... ORDER BY name;
+--------------+
| name |
+--------------+
| Austria |
| France |
| South Africa |
| Turkey |
+--------------+
SELECT 4 rows in set (... sec)
Exact Queries
Exact queries are done using the following scalar functions:
They are exact, but this comes at the price of performance.
They do not make use of the index but work on the GeoJSON that was inserted to compute the shape vector. This access is quite expensive and may significantly slow down your queries.
For fast querying, use the Match Predicate.
But executed on a limited result set, they will help you get precise insights into your geographic data:
cr> SELECT within(capital_location, shape) AS capital_in_country
... FROM country;
+--------------------+
| capital_in_country |
+--------------------+
| TRUE |
+--------------------+
SELECT 1 row in set (... sec)
cr> SELECT distance(capital_location, 'POINT(0.0 90.0)') as from_northpole
... FROM country ORDER BY country_code;
+--------------------+
| from_northpole |
+--------------------+
| 4646930.6709925765 |
+--------------------+
SELECT 1 row in set (... sec)
cr> SELECT intersects(
... {type='LineString', coordinates=[[13.3813, 52.5229],
... [11.1840, 51.5497],
... [8.6132, 50.0782],
... [8.3715, 47.9457],
... [8.5034, 47.3685]]},
... shape) as berlin_zurich_intersects
... FROM country ORDER BY country_code;
+--------------------------+
| berlin_zurich_intersects |
+--------------------------+
| FALSE |
+--------------------------+
SELECT 1 row in set (... sec)
Nonetheless these scalars can be used everywhere in a SQL query where scalar functions are allowed.