1—
title: 03 - Managing Documents
Creating and Deleting Indexes
PUT /pages
DELETE /pages
# Specify Index Settings while creating Index
PUT /products
{
"settings": {
"number_of_shards": 2,
"number_of_replicas": 2
}
}
Indexing Document
POST /products/_doc
{
"name": "Cell Phone",
"price": 100,
"inStock": 10
}
PUT /products/_doc/100
{
"name": "Toaster",
"price": 20,
"inStock": 100
}
GET /products/_doc/100
POST /products/_update/100
{
"doc": {
"inStock": 3
}
}
POST /products/_update/100
{
"doc": {
"isAvailable": "true"
}
}
How does the Update API works
- The current document is retreived
- The field values are changed
- The existing document is replaced with the modified document
Introduction to routing
- Routing is the process of resolving a shard for a document
- A formula is used when indexing, retreiving and updating documents
- Routing may be customized
- The default routing strategry distributes documents evenly
- One of the reason why an index shards cannot be changed is that the routing formulate would yield in different shard number
shard_num = hash(_routing) % num_primary_shards
How Elasticsearch reads data
- A read request is received and handled by a coordinator node
- Routing is used to resolve the document’s replication group
- Adaptice Replica Selection (ARS) is used to send the query to the best available shard
- ARS helps reduce query response times
- ARS is essentially an intelligent load balancer
- The coordinating node collects the response and sends it to the client
How Elasticsearch writes data
- Write operations are sent to primary shards
- The primary shard forwards the operation to its replica shards
- Primary terms and sequence numbers are used to recover from failures
- Global and local checkpoints help speed up the recovery process
- Primary terms and sequence numbers are available within responses
Optimistic Concurrency Control
- Sending write requests to Elasticsearch concurrently may overwrite changes made by other concurrent process
_primary_terms
and _seq_no
field are used to optimistic concurrency control
- Elasticsearch will reject a write operation if it contains the wrong primary term or sequence number
Update By Query / Delete By Query
- The query creates a snapshot to do optimistic concurrency control
- Search queries and bulk requests are sent to replication groups sequentially
- Elasticsearch retries these queries upto 10 times
- If there queries still fail, the whole query is aborted
- Any changes already made to documents, are not rolled back
- The api returns information about failures
- If a document has been modified since taking the snapshot, the query is aborted. This is checked with the document’s primary term and sequence number
- To count version conflicts instead of aborting the query, the
conflicts
option can be set to proceed
Bulk API
- The HTTP Content-Type header should be set as
Content-Type: application/x-ndjson
- A failed action will NOT affect other actions, neither will the bulk request as a whole be aborted
- The Bulk API returns detailed information about each action
- Bulk Api Use case
- When you need to perform lots of write operations at the same time
- Bulk API is more efficient than sending individual write requests
Importing Data with Curl
curl -k -u username:password -H 'Content-Type: application/x-ndjson' -XPOST https://localhost:9200/products/_bulk --data-binary @products-bulk.json
Working Document