Search Operationsedit

Well…it isn’t called elasticsearch for nothing! Let’s talk about search operations in the PHP client.

The client gives you full access to every query and parameter exposed by the REST API, following the naming scheme as much as possible. Let’s look at a few examples so you can become familiar with the syntax.

Match Queryedit

Here is a standard curl for a Match query:

curl -XGET 'localhost:9200/my_index/my_type/_search' -d '{
    "query" : {
        "match" : {
            "testField" : "abc"
        }
    }
}'

And here is the same query constructed in the client:

$params['index'] = 'my_index';
$params['type']  = 'my_type';
$params['body']['query']['match']['testField'] = 'abc';

$results = $client->search($params);

The search results that come back are simply elasticsearch response elements serialized into an array. Working with the search results is as simple as iterating over the array values:

$milliseconds = $results['took'];
$maxScore     = $results['hits']['max_score'];

$score = $results['hits']['hits'][0]['_score'];
$doc   = $results['hits']['hits'][0]['_source'];

Bool Queriesedit

Bool queries can be easily constructed using the client. For example, this query:

curl -XGET 'localhost:9200/my_index/my_type/_search' -d '{
    "query" : {
        "bool" : {
            "must": {
                "match" : {
                    "testField" : "abc"
                },
                "match" : {
                    "anotherTestField" : "xyz"
                }
            }
        }
    }
}'

Would be structured like this (Note the position of the square brackets):

$params['index'] = 'my_index';
$params['type']  = 'my_type';
$params['body']['query']['bool']['must'] = array(
    array('match' => array('testField' => 'abc')),
    array('match' => array('anotherTestField' => 'xyz')),
);

$results = $client->search($params);

A more complicated exampleedit

Let’s construct a slightly more complicated example: a filtered query that contains both a filter and a query. This is a very common activity in elasticsearch queries, so it will be a good demonstration.

The curl version of the query:

curl -XGET 'localhost:9200/my_index/my_type/_search' -d '{
    "query" : {
        "filtered" : {
            "filter" : {
                "term" : {
                    "my_field" : "abc"
                }
            },
            "query" : {
                "match" : {
                    "my_other_field" : "xyz"
                }
            }
        }
    }
}'

And in PHP:

$params['index'] = 'my_index';
$params['type']  = 'my_type';

$filter = array();
$filter['term']['my_field'] = 'abc';

$query = array();
$query['match']['my_other_field'] = 'xyz';

$params['body']['query']['filtered'] = array(
    "filter" => $filter,
    "query"  => $query
);

$results = $client->search($params);

For clarity and ease of readability, the filter and query sections were allocated individually as variables and then composed together later. This is often a good design pattern for applications, since it lets you treat the queries and filters as building blocks that can be passed around your application.

Of course, at the end of the day, it is built into a single array. You could easily build the entire array in one definition of nested array blocks, or build them line-by-line.

All the client requires is an associative array with a structure that matches the JSON query structure.

Function_Score queryedit

A special note needs to be made about the function_score query. Due to the way PHP handles JSON encoding, everything is converted to an array of one for or another. This is usually not a problem, since most places in the Elasticsearch API accept arrays or empty objects interchangeably.

However, the function_score is a little different and needs to differentiate between empty arrays and empty objects. For example, consider this query:

{
   "query":{
      "function_score":{
         "functions":[
            {
               "random_score":{}
            }
         ],
         "boost_mode":"replace",
         "query":{
            "match_all":{}
         }
      }
   }
}

The function_score defines an array of objects, and the random_score key has an empty object as it’s value. PHP’s json_encode will convert that query to this:

{
   "query":{
      "function_score":{
         "functions":[
            {
               "random_score":[]
            }
         ],
         "boost_mode":"replace",
         "query":{
            "match_all":[]
         }
      }
   }
}

Which will result in a parse exception from Elasticsearch. What we need to do is tell PHP that random_score contains an empty object, not an array. To do so, we need to specify an explicitly empty object:

$params['body'] = array(
    'query' => array(
        'function_score' => array(
            'functions' => array(
                array("random_score" => (object) array())
            ),
            'query' => array('match_all' => array())
        )
    )
);
$results = $client->search($params);

Now, the JSON will be encoded properly and your query will no longer generate a parser exception.

Scan/Scrolledit

The Scan/Scroll functionality of Elasticsearch is similar to search, but different in many ways. It works by executing a search query with a search_type of scan. This initiates a "scan window" which will remain open for the duration of the scan. This allows proper, consistent pagination.

Once a scan window is open, you may start _scrolling) over that window. This returns results matching your query…but returns them in random order. This random ordering is important to performance. Deep pagination is expensive when you need to maintain a sorted, consistent order across shards. By removing this obligation, Scan/Scroll can efficiently export all the data from your index.

This is an example which can be used as a template for more advanced operations:

$client = new Elasticsearch\Client();
$params = array(
    "search_type" => "scan",    // use search_type=scan
    "scroll" => "30s",          // how long between scroll requests. should be small!
    "size" => 50,               // how many results *per shard* you want back
    "index" => "my_index",
    "body" => array(
        "query" => array(
            "match_all" => array()
        )
    )
);

$docs = $client->search($params);   // Execute the search
$scroll_id = $docs['_scroll_id'];   // The response will contain no results, just a _scroll_id

// Now we loop until the scroll "cursors" are exhausted
while (\true) {

    // Execute a Scroll request
    $response = $client->scroll(
        array(
            "scroll_id" => $scroll_id,  //...using our previously obtained _scroll_id
            "scroll" => "30s"           // and the same timeout window
        )
    );

    // Check to see if we got any search hits from the scroll
    if (count($response['hits']['hits']) > 0) {
        // If yes, Do Work Here

        // Get new scroll_id
        // Must always refresh your _scroll_id!  It can change sometimes
        $scroll_id = $response['_scroll_id'];
    } else {
        // No results, scroll cursor is empty.  You've exported all the data
        break;
    }
}