SQL Like Search Queries With Elasticsearch

E

lasticsearch is an amazing piece of technology. Built on top of luecine it offers all of he incredible search facilities that you'd expect from a full featured search. What makes elasticsearch so powerful, however, is the fact that it stores the actual data that was originally index as JSON documents. Basically, it is a Full Text Search Database more so than a search engine. This allows elastic search to do things that other search engines can't do like aggregations, scripted queries, multi-query searches, etc; All in addition to the expected searching capabilities like suggestions, spelling corrections, faceting, and so on. For these reasons people are using elasticsearch as the primary data store for massive amounts of data.

One thing that is also uniquely different with elasticsearch is that it's query language is just JSON objects. You'll see examples like this:

{
  "query": {
    "match" : {
      "message" : "this is a test"
    }
  }
}

This is a pretty simple search query. match is a full text search that support wildcard matching, partial word matching, fuzzy matching and the like. Many queries work this way and this is great when your search criteria are broad. Very much like how you would expect any search engine to work today. If you are using elasticsearch as a primary data store, you'll want to do more complicated things like you are probably used to doing with SQL. At first glance it is not apparently obvious that elasticsearch can do this. However, the query DSL has a fantastic gem of a query type - compound bool that lets you combine multiple query fragments together. The meat of it looks like this:

{
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": []
          "should": []
        }
      }
    }
  }
}

Search (sərCH) -v, verb., -n, noun;

  1. try to find something by looking or otherwise seeking carefully and thoroughly
  2. an act of searching for someone or something

The two important parts of the bool query, are the should and must. These keys have very SQL-like behaviors behind them.

  • should <> sql OR
  • must <> sql AND

Let's take a look at some more concrete examples.

Shirt Document

Lets say we have an index of shirts for some made up e-commerce application. It's a basic data set, but we want to let our uses find exactly what they are looking for with laser like precision.

{
  "brand": "Gildan"
, "size": "XL"
, "color": "black"
, "type": "long-sleeve"
, "price": 19.00
, "gender": "men"
, "age": "adult"
}, {
  "brand": "Hanes"
, "size": "M"
, "color": "purple"
, "type": "hooded-sweatshrt"
, "price": 22.50
, "gender": "female"
, "age": "toddler"
}

If we want to find all of the Large mens apparel we could add 2 term queries to the must block.

{
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [{
            "term": {
              "size": "L"
            }
          }, { 
            "term": {
              "gender": "men"
            }
          }]
        }
      }
    }
  }
}

This translates to SQL almost literally, and will return documents for Men's Large apparel.

SELECT *  
FROM shirt  
WHERE (size = 'L' AND  gender = 'men')  

The queries are going to get a little complex, so we can leave off the query envelope bits and deal with the filter block itself. From here, if we just move the query fragments from the must block to the should block and we'll a different query.

Elasticsearch

{
  "bool": {
    "should": [{
      "term": {
        "size": "L"
      }
    }, {
      "term": {
        "gender": "men"
      }
    }]
  }
}

SQL

This will return documents for apparel of size Large, OR apparel for men

SELECT *  
FROM shirt  
WHERE (size = 'L' OR  gender = 'men')  

Red Pill (rəd' pil) -n., --noun;

Take the red pill - you stay in wonderland and we shall see how deep the rabbit hole goes

  1. A dose of the truth
  2. An eye opening experience

Simple enough, and there is a lot that you can do with this. Now, there is one simple fact that isn't apparently obvious, and it is a bit of a red pill. The should and must arrays take any number of query fragments. Bool is a query fragment. This means they are nest-able, and the rabbit hold runs deep.

Elasticsearch

{
  "bool": {
    "should": [{
      "bool": {
        "must": [{
          "term": {
            "color": "black"
          }
        , {
            "term": {
              "gender": "men"
            }
          }
        }]
      }
    , {
        "bool": {
          "must": [{
            "term": {
              "size": "XL"
            }
          , {
              "term": {
                "type": "hooded-sweatshirt"
              }
            }
          }]
        }      
      }
    }]
  }
}

SQL

SELECT *  
FROM shirt  
WHERE  (  
  ( color = 'black' AND gender = 'male') 
  OR
  ( size = 'XL' AND type = 'hooded-sweatshirt')
)

You aren't limited to single term queries either. You can use any legal query - terms, multi_match, range, etc., at any level.

Elasticsearch

{
  "bool": {
    "should" [{
      "bool": {
        "must": [{
          "terms": {
            "color": [ "red", "black", "white"]
          }
        }, {
          "term": {
            "gender": "men"
          }
        }, {
          "terms": {
            "type": ["hooded-sweatshirts", "long-sleeve"]
          }
        }]
      }      
    }, {
      "bool": {
        "must": [{
          "term": {
            "color": "blue"
          }
        }]
      , "should": [{
          "bool": {
            "must": [{
              "gender": "female"
            }, {
              "type": "hooded-sweatshirt"
            }, {
              "range": {
                "lte": 45.00
              }
            }]
          }
        }, {
          "bool": {
            "must": [{
              "term": {
                "age": "toddler"
              }
            }, {
              "term": {
                "brand": "Hanes"
              }
            }]
          }
        }]
      }
    }]
  }
}

Complex, but still legal. This translates rather easily and still does what you think it would.

SQL

SELECT *  
FROM shirt  
WHERE (  
    ( 
      color IN ('red', 'black', 'white') 
      AND
        gender = 'men'
      AND
        type IN ('hooded-sweatshirt', 'long-sleeve')
    )
    OR 
    (
      (color = 'blue')
      AND (
        ( gender = 'female' AND type = 'hooded-sweatshirt' AND price <= 45.00)
        OR 
        (age = 'todler' AND brand = 'Hanes')
      )
    )
)

Crazy! Moreover, these are still search queries using your indexed content. Your queries will return in under ~10ms over millions of records. It's power and speed make elasticsearch one of most popular the data stores of choice for real-time big data analytics, monitoring and sensor data aggregations and big commerce.

Elasticsearch - It's pretty good.

elasticsearch sql