Skip to content
Advertisement

Why would serialized objects update the version of a schema on Schema Registry?

I’ve built an Avro schema that I’ve stored on the Confluent Schema Registry (version 5.5.1) running locally. However it seems that the schema that is being used to serialize the record is actually different than what I expect it to be. The schema definition is several hundred lines long, so I’m sharing a very pared-down version here that represents how it is structured:

[
    {
        "name": "AddressBase",
        "type": "record",
        "namespace": "com.namespace",
        "fields": [
            {
                "name": "line1",
                "type": "string"
            }
        ]
    },
    {
        "name": "Address",
        "type": "record",
        "namespace": "com.namespace",
        "fields": [
            {
                "name": "addressBase",
                "type": "AddressBase"
            }
        ]
    },
    {
        "name": "SchemaName",
        "fields": [
            {
                "name": "agency",
                "type": {
                    "fields": [
                        {
                            "name": "code",
                            "type": "string"
                        },
                        {
                            "name": "name",
                            "type": "string"
                        },
                        {
                            "name": "currentMailingAddress",
                            "type": "Address"
                        }
                    ],
                    "name": "Agency",
                    "type": "record"
                }
            }
        ],
        "namespace": "com.namespace",
        "type": "record"
    }
]

Here are the steps I’ve taken to reproduce the problem:

  1. Saved schema in Schema Registry – this was version 2 of the schema for the topic

  2. Built local class files using that same schema

  3. Created POJO with appropriate values populated

  4. Ran the producer to store serialized object on Kafka, with auto.register.schemas set to “false”

  5. Received an error “schema not found”:

        Error (truncated) Caused by:
        io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException:
        Schema not found; error code: 40403
    
    
  6. Set auto.register.schemas to “true”

  7. Ran again, serializing a new record

  8. This time, the message was stored successfully, but the schema was updated and is now on version 3.

I’ve searched quite a bit but have come up at a loss as to why this may occur. Are there any particular things I might need to double check within my schema definition that could cause this behavior?

Advertisement

Answer

After working through some more of this issue, I discovered that the following two settings need to be made for this to work properly:

auto.register.schemas=false
use.latest.version=true

It’s that second setting that takes care of matching the record being written – which will have the nested type(s) expanded – with the appropriate schema. More information is available on the linked page. After adding the version flag to my code I was able to test and verify this now works properly.

Additionally – as suggested in the comments – using IDL and then generating the schema using the Avro Tools appears to get around having to set those two configuration values, as the generated schema can more easily be used for serdes. It is this approach that thus far has worked the best for me.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement