Sample Dataset

In the previous steps you used only a few movies, let’s now import:

  • More movies to discover more queries.
  • Theaters to discover the geospatial capabilities.
  • Users to do some aggregations.

Dataset Description

Movies

The file sample-app/redisearch-docker/dataset/import_movies.redis is a script that creates 922 Hashes.

The movie hashes contain the following fields.

  • movie:id : The unique ID of the movie, internal to this database (used as the key of the hash)
  • title : The title of the movie.
  • plot : A summary of the movie.
  • genre : The genre of the movie, for now a movie will only have a single genre.
  • release_year : The year the movie was released as a numerical value.
  • rating : A numeric value representing the public’s rating for this movie.
  • votes : Number of votes.
  • poster : Link to the movie poster.
  • imdb_id : id of the movie in the IMDB database.

Sample Data: movie:343
Field Value
title Spider-Man
plot When bitten by a genetically modified spider a nerdy shy and awkward high school student gains spider-like abilities that he eventually must use to fight evil as a superhero after tragedy befalls his family.
genre Action
release_year 2002
rating 7.3
votes 662219
poster https://m.media-amazon.com/images/M/MV5BZDEyN2NhMjgtMjdhNi00MmNlLWE5YTgtZGE4MzNjMTRlMGEwXkEyXkFqcGdeQXVyNDUyOTg3Njg@._V1_SX300.jpg
imdb_id tt0145487

Theaters

The file sample-app/redisearch-docker/dataset/import_theaters.redis is a script that creates 117 Hashes (used for Geospatial queries). This dataset is a list of New York Theaters, and not movie theaters, but it is not that critical for this project ;).

The theater hashes contain the following fields.

  • theater:id : The unique ID of the theater, internal to this database (used as the key of the hash)
  • name : The name of the theater
  • address : The street address
  • city : The city, in this sample dataset all the theaters are in New York
  • zip : The zip code
  • phone : The phone number
  • url : The URL of the theater
  • location : Contains the longitude,latitude used to create the Geo-indexed field

Sample Data: theater:20
Field Value
name Broadway Theatre
address 1681 Broadway
city New York
zip 10019
phone 212 944-3700
url http://www.shubertorganization.com/theatres/broadway.asp
location -73.98335054631019,40.763270202723625

Users

The file sample-app/redisearch-docker/dataset/import_users.redis is a script that creates 5996 Hashes.

The user hashes contain the following fields.

  • user:id : The unique ID of the user.
  • first_name : The first name of the user.
  • last_name : The last name of the user.
  • email : The email of the user.
  • gender : The gender of the user (female/male).
  • country : The country name of the user.
  • country_code : The country code of the user.
  • city : The city of the user.
  • longitude : The longitude of the user.
  • latitude : The latitude of the user.
  • last_login : The last login time for the user, as EPOC time.
  • ip_address : The IP address of the user.

Sample Data: user:3233
Field Value
first_name Rosetta
last_name Olyff
email rolyff6g@163.com
gender female
country China
country_code CN
city Huangdao
longitude 120.04619
latitude 35.872664
last_login 1570386621
ip_address 218.47.90.79


Importing the Movies, Theaters and Users

Before importing the data, flush the database:

  1. > FLUSHALL

The easiest way to import the file is to use the redis-cli, using the following terminal command:

  1. $ redis-cli -h localhost -p 6379 < ./sample-app/redisearch-docker/dataset/import_movies.redis
  2. $ redis-cli -h localhost -p 6379 < ./sample-app/redisearch-docker/dataset/import_theaters.redis
  3. $ redis-cli -h localhost -p 6379 < ./sample-app/redisearch-docker/dataset/import_users.redis

Using Redis Insight or the redis-cli you can look at the dataset:

  1. > HMGET "movie:343" title release_year genre
  2. 1) "Spider-Man"
  3. 2) "2002"
  4. 3) "Action"
  5. > HMGET "theater:20" name location
  6. 1) "Broadway Theatre"
  7. 2) "-73.98335054631019,40.763270202723625"
  8. > HMGET "user:343" first_name last_name last_login
  9. 1) "Umeko"
  10. 2) "Castagno"
  11. 3) "1574769122"

You can also use the DBSIZE command to see how many keys you have in your database.


Create Indexes

Create the idx:movie index:

  1. > FT.CREATE idx:movie ON hash PREFIX 1 "movie:" SCHEMA title TEXT SORTABLE plot TEXT WEIGHT 0.5 release_year NUMERIC SORTABLE rating NUMERIC SORTABLE votes NUMERIC SORTABLE genre TAG SORTABLE
  2. "OK"

The movies have now been indexed, you can run the FT.INFO "idx:movie" command and look at the num_docs returned value. (should be 922).

Create the idx:theater index:

This index will mostly be used to show the geospatial capabilties of RediSearch.

In the previous examples we have created indexes with 3 types:

  • Text
  • Numeric
  • Tag

You will now discover a new type of field: Geo.

The theater hashes contains a field location with the longitude and latitude, that will be used in the index as follows:

  1. > FT.CREATE idx:theater ON hash PREFIX 1 "theater:" SCHEMA name TEXT SORTABLE location GEO
  2. "OK"

The theaters have been indexed, you can run the FT.INFO "idx:theater" command and look at the num_docs returned value. (should be 117).

Create the idx:user index:

  1. > FT.CREATE idx:user ON hash PREFIX 1 "user:" SCHEMA gender TAG country TAG SORTABLE last_login NUMERIC SORTABLE location GEO
  2. "OK"

Next: Querying the movie database