I have been trying to implement Avro deserialization without confluent schema registry. A quick research shows that I can embed the schema in header before sending the record to topic. But the schema again has to be serialized to bytes before embedding on the header which again makes this problematic. Is there anyway to achieve this?
What are what are the cons associated with this approach.
How is it possible for confluent to extract schema from the data and store it in schema registry? I understood that the schema registry embeds a schema id on the record header while sending the data to topic. Isn’t the data serialized before sending to schema registry?
Again if we take a look at the Confluent JsonDeserialzier, it deserializes data without any schema and works with JsonNode. Why isn’t a similar mechanism possible for Avro?
Advertisement
Answer
A quick research shows that I can embed the schema in header before sending the record to topic
You can, yes. Note that the Confluent serializers do not utilize the headers, if you are following their source code.
Header values must also be strings or bytes, last I checked.
the schema again has to be serialized to bytes before embedding on the header
Not sure what you mean “again”. For every message, yes, and there’s no way around that, but not within the same process of serializing a single record.
What are what are the cons associated with this approach.
There’s plenty of documentation about the pros of using a Registry. Cons include maintaining additional infrastructure and not all tools can integrate with it.
How is it possible for confluent to extract schema from the data and store it in schema registry?
Refer source code (extract Schema as text) then POST schema text to the registry to get the ID and embed in the record
Isn’t the data serialized before sending to schema registry?
If by “serialized”, you mean as Avro, then yes, but the schema is UTF8 serialized
JsonDeserialzier, it deserializes data without any schema and works with JsonNode. Why isn’t a similar mechanism possible for Avro?
I think you should be comparing the JsonSchemaDeserializer class. Obviously plain JSON has no concept of schemas. Avro requires a reader schema for deserialization, but there is a similar mechanism – GenericRecord
operates similarly to JsonNode