Information out: search and analyze

While you can use Elasticsearch as a document store and retrieve documents and their metadata, the real power comes from being able to easily access the full suite of search capabilities built on the Apache Lucene search engine library.

Elasticsearch provides a simple, coherent REST API for managing your cluster and indexing and searching your data. For testing purposes, you can easily submit requests directly from the command line or through the Developer Console in Kibana. From your applications, you can use the Elasticsearch client for your language of choice: Java, JavaScript, Go, .NET, PHP, Perl, Python or Ruby.

Searching your data

The Elasticsearch REST APIs support structured queries, full text queries, and complex queries that combine the two. Structured queries are similar to the types of queries you can construct in SQL. For example, you could search the gender and age fields in your employee index and sort the matches by the hire_date field. Full-text queries find all documents that match the query string and return them sorted by relevance—how good a match they are for your search terms.

In addition to searching for individual terms, you can perform phrase searches, similarity searches, and prefix searches, and get autocomplete suggestions.

Have geospatial or other numerical data that you want to search? Elasticsearch indexes non-textual data in optimized data structures that support high-performance geo and numerical queries.

You can access all of these search capabilities using Elasticsearch’s comprehensive JSON-style query language (Query DSL). You can also construct SQL-style queries to search and aggregate data natively inside Elasticsearch, and JDBC and ODBC drivers enable a broad range of third-party applications to interact with Elasticsearch via SQL.

Analyzing your data

Elasticsearch aggregations enable you to build complex summaries of your data and gain insight into key metrics, patterns, and trends. Instead of just finding the proverbial “needle in a haystack”, aggregations enable you to answer questions like:

  • How many needles are in the haystack?
  • What is the average length of the needles?
  • What is the median length of the needles, broken down by manufacturer?
  • How many needles were added to the haystack in each of the last six months?

You can also use aggregations to answer more subtle questions, such as:

  • What are your most popular needle manufacturers?
  • Are there any unusual or anomalous clumps of needles?

Because aggregations leverage the same data-structures used for search, they are also very fast. This enables you to analyze and visualize your data in real time. Your reports and dashboards update as your data changes so you can take action based on the latest information.

What’s more, aggregations operate alongside search requests. You can search documents, filter results, and perform analytics at the same time, on the same data, in a single request. And because aggregations are calculated in the context of a particular search, you’re not just displaying a count of all size 70 needles, you’re displaying a count of the size 70 needles that match your users’ search criteria—​for example, all size 70 non-stick embroidery needles.

But wait, there’s more

Want to automate the analysis of your time series data? You can use machine learning features to create accurate baselines of normal behavior in your data and identify anomalous patterns. With machine learning, you can detect:

  • Anomalies related to temporal deviations in values, counts, or frequencies
  • Statistical rarity
  • Unusual behaviors for a member of a population

And the best part? You can do this without having to specify algorithms, models, or other data science-related configurations.