Skip to content
This repository has been archived by the owner on Sep 16, 2024. It is now read-only.
/ es-pagination Public archive

Deep pagination for the Elasticsearch client

License

Notifications You must be signed in to change notification settings

Ekman/es-pagination

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Elasticsearch pagination

Build Status Coverage Status

A library to deep paginate an Elasticsearch search operation. There are three ways to paginate:

  1. Scroll
  2. From
  3. Search after

Which one to use depends on the context, read more in the Elasticsearch documentation.

The library will get pageSize amount of hits in memory at the same time, which means a lower amount will result in less memory used but more requests to Elasticsearch (and the opposite). Never will it fully exhaust an index before returning the results.

Usage

The first step is to construct an $elasticsearchClient (instance of Elasticsearch\Client) which you can read more about in the Elasticsearch official PHP driver.

Scroll

use Nekman\EsPagination\CursorFactories\EsScrollCursorFactory;

$cursorFactory = new EsScrollCursorFactory(
    $elasticsearchClient,
    $pageSize = 1000,
    $scrollDuration = "1m"
);

$params = [
    /*
     * Same params as a normal Elasticsearch search operation.
     * See Elasticsearch documentation for more information.
     */
];

$cursor = $cursorFactory->hits($params);

foreach ($cursor as $hit) {
    echo "Hit {$hit['_id']}";
}

From

use Nekman\EsPagination\CursorFactories\EsFromCursorFactory;

$cursorFactory = new EsFromCursorFactory(
    $elasticsearchClient,
    $pageSize = 1000
);

$params = [
    /*
     * Same params as a normal Elasticsearch search operation.
     * See Elasticsearch documentation for more information.
     */
];

$cursor = $cursorFactory->hits($params);

foreach ($cursor as $hit) {
    echo "Hit {$hit['_id']}";
}

Search after

use Nekman\EsPagination\CursorFactories\EsSearchAfterCursorFactory;

$cursorFactory = new EsSearchAfterCursorFactory(
    $elasticsearchClient,
    $pageSize = 1000
);

$params = [
    /*
     * Same params as a normal Elasticsearch search operation.
     * See Elasticsearch documentation for more information.
     */
];

$cursor = $cursorFactory->hits($params);

foreach ($cursor as $hit) {
    echo "Hit {$hit['_id']}";
}

Point in time (PIT)

Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. Create a cursor factory and decorate it with PIT:

use \Nekman\EsPagination\CursorFactories\EsPitCursorFactory;

$cursorFactory = /* Create cursor factory, see above */;

$pitCursorFactory = new EsPitCursorFactory(
	$cursorFactory,
	$elasticsearchFactory,
	$pitKeepAlive = "1m"
);

$params = [
    /*
     * Same params as a normal Elasticsearch search operation.
     * See Elasticsearch documentation for more information.
     */
];

$cursor = $cursorFactory->hits($params);

foreach ($cursor as $hit) {
    echo "Hit {$hit['_id']}";
}

Versioning

This project complies with Semantic Versioning.

Changelog

For a complete list of changes, and how to migrate between major versions, see releases page.