Skip to content
Advertisement

Why does jackson convert byte array to base64 string on converting to json?

When I have a byte array in a DTO and convert it to json using jackson’s ObjectMapper, it automatically converts the byte array into base64 string. Example below.

@Data
@AllArgsConstructor
class TestDTO {
    private byte[] binaryFile;
}

class TestByteSerialization {
    public static void main(String[] args) throws Exception {
        ObjectMapper objectMapper = new ObjectMapper();
        byte[] bytes = Files.readAllBytes(new File("path/to/file/test.pdf").toPath());

        TestDTO dto = new TestDTO(bytes);

        String json = objectMapper.writeValueAsString(dto);
        System.out.println(json);
    }
}

I expected jackson to convert it to an array of integers like the following:

{
    "binaryFile" : [21, 45, 12, 65, 12 ,37, etc]    
}

But instead, I found it to be converted to base64 string.

{
    "binaryFile" : "ZXhhbXBsZSB0ZXh0IG9ubHkuIEJpbmFyeSBmaWxlIHdhcyBkaWZmZXJlbnQgTE9MLg=="    
}

After researching a bit, It seems json does not support byte array as mentioned here. This makes sense because, json is a string representation of data.

But I still could not find the answer for why does json not support byte array? It still is just an array of numbers right? What is the need of converting that to base64 encoded string? What is wrong in passing byte array as is to the json String as an array of numbers?

For those marking it an opinion based question:

Developers definitely wouldn’t have thought “Passing bytes as an array of numbers is boring. Let’s try some crazy looking encoded string”. There has to be some rationale behind this.

Advertisement

Answer

What is wrong in passing byte array as is to the json String as an array of numbers?

Nothing, if you’re happy with each byte of input taking (on average, assuming even distribution of bytes) 3.57 characters. That’s assuming you don’t have a space after each comma – otherwise it’s 4.57 characters.

So compare these data sizes with 10K of data:

  • Raw: 10240 bytes (can’t be represented directly in JSON)
  • Base64: 13656 characters
  • Array of numbers: 36556 characters

The size increase of 33% for base64 is painful enough… the size increase of using an array is much, much worse. So the convention is to use base64 instead. (It’s only a convention – it’s not like it’s baked into the JSON spec. But it’s followed by most JSON encoders and decoders.)

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement