Running the Application
There are many flavors of time series data. Some can be windowed in the stream, others can not be windowed in the stream
because queries are not by time slice but by specific year,month,day,hour. Spark Streaming lets you do both.
Cassandra in particular is excellent for time series data, working with raw data, transformations with Spark to aggregate data,
and so forth. In some cases, using Spark with Cassandra (and the right data model) reduces the number of Spark
transformations necessary on your data because Cassandra does that for you in its cluster.
When using Apache Spark & Apache Cassandra together, it is best practice to co-locate Spark and Cassandra nodes for data-locality and decreased network calls, resulting in overall reduced latency.
Setup
Download and install the latest Cassandra release
- Configuration Step: Modify apache-cassandra-{latest.version}/conf/cassandra.yaml to increase batch_size_warn_threshold_in_kb to 64.
Start Cassandra.
./apache-cassandra-{latest.version}/bin/cassandra -f
Note: If you get an error - you may need to prepend with sudo, or chown /var/lib/cassandra.
Run the setup cql scripts to create the schema and populate the weather stations table.
Go to the timeseries data folder and start a cqlsh sell there:% cd /path/to/reference-apps/timeseries/scala/data
% /path/to/apache-cassandra-{latest.version}/bin/cqlsh
You should see:
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh {latest.version} | Cassandra {latest.version} | CQL spec {latest.version} | Native protocol {latest.version}]
Use HELP for help.
cqlsh>
Then run the script:
cqlsh> source ‘create-timeseries.cql’;
cqlsh> quit;
See this Github repo to find out more about the weather stations table data.
Running the WeatherApp and WeatherClientApp
To Run from an IDE
- Start com.databricks.apps.WeatherApp
- Then start com.databricks.apps.WeatherClientApp
To Run from Command Line
- Use SBT to run the app.
You should see:% cd /path/to/reference-apps/timeseries/scala
% sbt weather/run
Select option 1 to open the weather app.Multiple main classes detected, select one to run:
[1] com.databricks.apps.WeatherApp
[2] com.databricks.apps.WeatherClientApp
- Use SBT to run the client app.
Run the same commands above, but select option 2.
About The Time Series Data Model
See this github repo to find out more about the Time Series Data Model