Post-Indexing Updates

Spatialized founder Jozef Sorocin
Jozef Soročin
Updated 07/13/2025

Documents often require updating — fields need to be incremented, modified, overwritten, or deleted. In this section we'll discuss situations where you:

  1. know the ID of the doc in question and want to update only that particular document
  2. want to update a group of documents that have something in common (i.e. match a query)

I'm tracking visits to my site based on the slug. A sample entry in my site_visits index:

POST site_visits/_doc/home_id
{
  "slug": "/home",
  "visits": 0,
  "tags": ["landing", "ab_test_100"],
  "unneeded_attribute": "old"
}

How do I

  • add a modified_at field and set it to now
  • increment the visits count
  • remove the ab_test_100 tag
  • and delete the unneeded attribute?

For adding new fields such as modified_at, you may be tempted to repeat the POST call from above with only the new field being present:

POST site_visits/_doc/home_id
{
  "modified_at": "2020-12-05T14:11:41.634Z"
}

While a perfectly valid call, it would completely overwrite the existing doc.

What's needed instead is a request to the _update API:

POST site_visits/_update/home_id
{
  "doc": {
    "modified_at": "2020-12-05T14:11:41.634Z"
  }
}

Notice that the new field had to be wrapped inside of doc, and also that the contents of the other fields were left untouched.


Now, as to modifying existing fields, all of that can be done in one go inside a script which targets the same URI path as above:

POST site_visits/_update/home_id
{
  "script": {
    "source": """
      // incrementing a number
      ctx._source.visits++;

      // removing an array list entry
      if (ctx._source.tags.contains(params.tag_to_remove)) {
        ctx._source.tags.remove(ctx._source.tags.indexOf(params.tag_to_remove));
      }

      // removing a field
      ctx._source.remove(params.field_to_remove);

      // assigning a timestamp
      def now_millis = ctx._now;
      def now_date = new Date(now_millis);
      def df = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm'Z'");
      def now_iso = df.format(now_date);
      ctx._source.modified_at = now_iso;
    """,
    "params": {
      "tag_to_remove": "ab_test_100",
      "field_to_remove": "unneeded_attribute"
    }
  }
}

I've used ctx._now only to illustrate its availability in the _update API. That's essentially the only place where a system-level now would be safe to use in a script due to the (often) distributed nature of ES. If instead of updating we were to, say, compare dates, and our query were executed on multiple nodes, it'd be very hard to appropriately synchronize now. So, for all intents and purposes, it's safer to work with a parametrized now — i.e. to add a runtime-generated now attribute to the params dictionary to guarantee that all workers touching the script will be exposed to the same value.

Join 200+ developers who've mastered this! Get Complete Access — €19
Already a member? Sign in here