Mapping Automation

Document types have been deprecated since v7.x so it's recommended to treat ES indices as single-purpose tables with single-type documents with shared fields whose data types can be defined in a straightforward manner. In this book we'll be focusing on single-type documents/indices only.
Elasticsearch's effectiveness is often attributed to its ability to efficiently index and store your documents' fields. Each field has its own data type and the resulting list of these field definitions is called a mapping. The mapping can be:
- defined up front
- as an explicit mapping (we know what we know)
- or as a dynamic template mapping (we know what we expect)
- or guessed by ES at ingest time (we don't know what we don't know; rather rare IRL)
Remember that if ES encounters any fields that aren't defined in the mapping, the system will still guess the data types and auto-add the definitions to the already defined mapping. This feature can be turned off using the dynamic
setting which can keep your mapping locked in place, and even reject new documents if the strict
mode is activated.
In my experience, letting ES guess the mappings is not good enough because, by default,
- any epoch timestamp will be auto-mapped as a
long
instead of adate
- every text field will be mapped as
text
andkeyword
— but you may need just the keyword, esp. in short attributes like tags and categories - intentionally
nested
fields will not be recognized as nested — ES will default to anobject
instead. More on nested documents in Median Duration of a Project Build .
Plus there's the overhead of having to drop the index, adjust the mapping and reindex each time you've incrementally improved what you needed to improve (there are workarounds ). Yes, you can change the mappings after you've synced the docs but the updated mapping won't apply to the old docs — only to the ones that'll be indexed from that point onwards.
- Explicitly define as many fields as is reasonably practical
- Dynamically define the rest based on whether:
- you've got an object containing X smaller objects, all/most of which share the same properties — more in Path Matching in Objects
- you know what field name patterns you're expecting — more in Regex Pattern Field Names
There are use cases which'll require you to spread your data across multiple indices (e.. logs — logs-2021-01
, logs-2021-02
, logs-2021-*
…
If your indices already exist and you'd like to update them in one go, you can multi-target them like so:
PUT /logs-2020*,logs-2021*/_mapping?allow_no_indices=true
{
"properties": {
...
}
}
When the indices don't exist yet but you know what their names will look like, it's customary to use index patterns :
PUT _index_template/logs_template
{
"index_patterns": [
"logs-2021-*",
"logs-k8s-2021*"
],
"template": {
"mappings": {
"properties": {
"createdAt": {
"type": "date"
}
}
}
}
}
which can of course be combined with the mapping strategies introduced in the next 2 chapters.
The multi-target principe applies to _search
requests too. But let's pace ourselves — searching multiple indices is discussed in more detail later on .