Examine the Tweets and Train a Model

浏览 606 扫码分享 2018-04-15 22:39:49

Part 2: Examine Tweets and Train a Model

Part 2: Examine Tweets and Train a Model

The second program examines the data found in tweets and trains a language classifier using K-Means clustering on the tweets:

Examine - Spark SQL is used to gather data about the tweets — to look at a few of them, and to count the total number of tweets for the most common languages of the user.
Train - Spark MLLib is used for applying the K-Means algorithm for clustering the tweets. The number of clusters and the number of iterations of algorithm are configurable. After training the model, some sample tweets from the different clusters are shown.

See here for the command to run part 2.

本文档使用 BookStack 构建

展开/收起文章目录