Summary
Recently we upgraded to Spring Data Elasticsearch 4.x. Part of this major release meant that Jackson is no longer used to convert our domain objects to json (using MappingElasticsearchConverter
instead) [1]. This means we are now forced to add a new id
field to all our documents.
Previously we had domain objects like this:
import org.springframework.data.annotation.Id; public ESDocument { @Id private String id; private String field1; @JsonIgnore public String getId() { return id; } public String getField1() { return field1; }
Which resulted in documents like this in ES:
{ "_index" : "test_index", "_type" : "_doc", "_id" : "d5bf7b5c-7a44-42f9-94d6-d59fe3988482", "_score" : 1.0, "_source" : { "field1" : "blabla" } }
Note that:
- The
@JsonIgnore
annotation used to ensure that we were not required to have aid
field in the_source
. - We are setting the document id ourselves and it ends up in
_id
.
Problem
With Spring Data Elastic 4.x the @JsonIgnore
annotation is no longer respected which means we are now forced to have an id
field in the _source
as shown below:
{ "_index" : "test_index", "_type" : "_doc", "_id" : "d5bf7b5c-7a44-42f9-94d6-d59fe3988482", "_score" : 1.0, "_source" : { "id": "d5bf7b5c-7a44-42f9-94d6-d59fe3988482", "field1" : "blabla" } }
Questions
- Is it no longer possible to omit the duplication of the identifier of the document (i.e. in the
_id
andid
fields)? If so how? (Note we already tried@org.springframework.data.annotation.Transient
which does not work because spring-data-elastic then thinks our document does not have an id). - Was our previous approach of suppressing the
id
field in_source
incorrect or problematic?
Versions
java: 1.8.0_252
elasticsearch: 7.6.2
spring-boot: 2.3.1.RELEASE
spring-data-elastic: 4.0.1.RELEASE
References
[1] – https://spring.io/blog/2020/05/27/what-s-new-in-spring-data-elasticsearch-4-0
Advertisement
Answer
Question 1:
To omit the id field from the _source, you would normally use the @Transient
annotation, but as you wrote, this does not work for the id property. Transient properties are ignored in Spring Data modules (not only Spring Data Elasticsearch).
But you you can use the org.springframework.data.annotation.ReadOnlyProperty
annotation for this:
@Id @ReadOnlyProperty private String id;
To be honest, I didn’t know up to now that this exists, this comes from Spring Data Commons as well and is checked in the isWriteable()
method of the property when properties are written by the MappingElasticsearchConverter
.
Question 2:
Surely not incorrect, but problematic as you found out. We always consider the whole entity when storing it, so we never thought about not writing the id. Strictly speaking, it is not necessary, there you’re right, because we always get the id back in the _id field together with the _source, so we can easily put the entity back together, but we never considered this a necessary feature to have.
Note:
When you look at the data in your ES index you will find that with the MappingElasticsearchConverter
an additional _source field named _class is written which contains the name of the entity class (or a defined alias). This allows for mapping generics; for further info check the documentation – just in case you wonder where this comes from.
Edit 18.11.2022:
Recently (with version 4.4.3) we had a change that fixed a wrong behaviour in Spring Data Elasticsearch: Spring Data Elasticsearch must not write data into a property that is marked with @ReadOnlyProperty
. This leads to the proposed solution not working any longer because on reading data from Elasticsearch the id property is not filled anymore.
To get the id property being set in this case it is necessary to add an AfterConvertCallback
to your application:
#import org.springframework.data.elasticsearch.core.event.AfterConvertCallback; @Component public class EntityAfterConvertCallback implements AfterConvertCallback<EsDocument> { @Override public EsDocument onAfterConvert(EsDocument entity, Document document, IndexCoordinates indexCoordinates) { entity.setId(document.getId()); return entity; } }