Make Petabytes Searchable — Elasticsearch Data Tiering Made Simple and Fast
Elastic searchable snapshots with fast S3 object storage
In this blog, I would like to talk about how to make petabytes of data quickly searchable, with Elasticsearch Searchable Snapshots and fast S3 object storage.
In a Elasticsearch cluster, we can define data tiers including hot, warm, cold and frozen tiers. For example, we can define a policy in Elasticsearch to let the data remain in hot for the first month after ingestion, and move it to the warm tier after 3 months, then the cold tier after 6 months, and finally the frozen tier after a year.
The purpose of doing this is to trade off between performance and cost. The traditional way of Elasticsearch tiering is to use different server profiles for different data tiers. For example, we may use high spec servers with NVMe SSDs for hot tier, low spec servers with HDDs for warm or cold tier, and even slower and cheaper servers for frozen tier. Although this servers the purpose, as the cluster scales, it becomes quite complex to operate, because we have so many servers, cables and disks to manage. Each of these components introduces a failure point in the cluster.
There is a way to dramatically simplify Elasticsearch data tiering with a fast S3 object storage. With this new way, we will still keep the first month of data in hot, but instead of going through the warm and cold tiers, after a month, the data will be moved to the frozen tier right after the hot tier. We will use the Elasticsearch Searchable Snapshots feature to directly search on the frozen data. This way, we reduce the number of components not only by skipping the warm and cold tier, but also by relaying on the S3 object storage as the main data repository for all but hot data.
In a way, Elasticsearch frozen tier as searchable snapshots in a fast S3 object storage, is kind of like the traditional warm, cold and frozen tiers all together in an elegant way. This process can be done using the index lifecycle management (ILM) UI in Kibana (Elastic enterprise license required).
Because the frozen data in S3 does not relay on Elasticsearch replicas for data durability, and it scales separately from the frozen nodes, we can retain petabytes of Elasticsearch data easily.
But how about search performance for the frozen data? Two key points make the search fast:
- A fast S3 object storage like FlashBlade//S S3 provides high throughput (tens of GBps) and low latency (2~4ms most times) for Elasticsearch.
- Elasticsearch uses a local cache containing only recently searched parts of the snapshotted index’s data.
So this makes petabytes of data searchable fast.
Elasticsearch searchable snapshots in fast S3 in action
In order to use Elasticsearch frozen data as searchable snapshots in S3, I first add frozen nodes to a ES cluster. In the example below, I have one frozen node with 90GB of local cache.
- name: data-frozen
count: 1
config:
node.roles: ["data_frozen"]
thread_pool.snapshot.max: 4
xpack.searchable.snapshot.shared_cache.size: 90gb
I then register a FlashBlade S3 bucket as an Elasticsearch snapshot repository. I call it reddot-s3-repo
in this example.
PUT _snapshot/reddot-s3-repo?pretty
{
"type": "s3",
"settings": {
"bucket": "deephub",
"base_path": "elastic/snapshots",
"endpoint": "192.168.170.11",
"protocol": "http"
}
}
After that, I create a snapshot in the S3 repository for testing.
PUT /_snapshot/reddot-s3-repo/demo
{
"indices": "elasticlogs_q_01-000001",
"include_global_state": false
}
At this point, for comparison reasons, I take a regular full restore to verify the snapshot.
POST /_snapshot/reddot-s3-repo/demo/_restore
{
"indices": "elasticlogs_q_01-000001",
"rename_pattern": "elasticlogs_q_01-000001",
"rename_replacement": "elasticlogs_q_01-000001-fullrestore"
}
The restore will download the snapshot from FlashBlade S3 and mount it to an Elasticsearch node (non-frozen node). This may take some time depending on the data size. Until the restore is finished, my search query on the restoring index like the below will fail.
# This will fail until the restore is finished
GET /elasticlogs_q_01-000001-fullrestore/_search
This error is as expected for regular snapshots. However, for searchable snapshots, the behaviour is different. We can start searching without having to wait for the restore to finish, because it is directly searchable from the S3 repository.
Elasticsearch searchable snapshots requires an enterprise license. We can start a trial license for testing.
# Start a 30 days trial license
POST /_license/start_trial?acknowledge=true
With searchable snapshot, we are not going to restore the snapshot locally anymore, but we will be using a cache on a frozen node, which can download the parts of the snapshotted index’s data from S3 and cache it locally.
POST /_snapshot/reddot-s3-repo/demo/_mount?storage=shared_cache
{
"index": "elasticlogs_q_01-000001",
"renamed_index": "elasticlogs_q_01-000001-partialmount"
}
Note the storage=shared_cache
parameter in the mount
operation. This tells Elasticsearch to use a searchable snapshot index. We can then start searching immediately. Behind the scenes, Elasticsearch is caching files in the frozen data node.
# Search a searchable snapshot index
GET /elasticlogs_q_01–000001-partialmount/_search
This time, the search will succeed. And we can view the snapshots in FlashBlade S3, and the I/O generated by the search.
Another interesting thing to notice is that searchable snapshot indies use zero byte on disk on frozen nodes. We can verify this using the following API call.
# Searchable snapshot indies use zero byte on disk
GET /_cat/shards/elasticlogs_q_01-000001*/?v&h=index,shard,prirep,state,docs,store,node
The result will be something like the below.
This indicates that no shards are stored locally on the frozen nodes, and the query directly searches for data in the S3 repository.
Conclusion
In this blog, I described the idea to retain and quickly search for a large amount of Elasticsearch data, using its searchable snapshots with a fast S3 object storage like FlashBlade//S. While more tests are ongoing, the results and benefits look promising:
- Infinitely retain Elasticsearch frozen data, while keeping fast search performance.
- Simplify Elastic tiering by skipping warm and cold tiers.
- Reduce data re-balancing time in node failure as frozen nodes hold no shards locally.
- Reduce storage cost for frozen data as shard replica is not needed for data durability anymore.
- Scale frozen nodes and storage separately.
While I was writing this blog, Pure Storage just announced the new FlashBlade//E system, a unified file and object all-flash storage optimised for capacity and cost. FlashBlade//E could also be a good fit for Elasticsearch frozen data, if search performance is not the primary concern. I am thinking, FlashBlade//S for fast search, and FlashBlade//E for longer term storage and decent search speed.
I look forward to getting my hands on it. Stay tuned.