Java Serialization Performance Showdown
Java Serialization Performance Showdown
In the world of modern software engineering, data is constantly in motion. It’s sent over networks between microservices, stored in caches, and persisted in databases. The process of converting in-memory objects into a format suitable for transmission or storage is called serialization, and its counterpart, deserialization, is the process of converting it back.
Choosing the right serialization format is a critical architectural decision. It can have a profound impact on your application’s performance, network bandwidth consumption, and even ease of debugging. While JSON has become the de facto standard for its human-readability and ubiquity, is it always the right choice?
To answer this, we ran a comprehensive benchmark comparing some of the most popular serialization frameworks in the Java ecosystem.
The Contenders
We pitted four major players against each other, using the Java Microbenchmark Harness (JMH) to ensure accurate and reliable results.
- JSON (Jackson): The undisputed champion of web APIs and configuration files. It’s text-based, human-readable, and supported by virtually every language and platform.
- Protocol Buffers (Protobuf): Google’s binary serialization format. It’s schema-first, meaning you define your data structures in a
.protofile, and it’s designed for high performance and compact payloads. - Apache Avro (Binary): A powerful binary serialization format from the Apache ecosystem, widely used in big data platforms like Kafka and Hadoop. It also uses a schema-first approach but with more dynamic capabilities.
- Apache Avro (JSON): Avro also supports a JSON encoding. This provides a human-readable format while still leveraging Avro’s schema management capabilities.
The Test Subject: A Real-World SocialMediaPost
To ensure our benchmarks reflect real-world scenarios, we used a moderately complex SocialMediaPost object. This object was designed to include a mix of data types and structures that are common in modern applications, such as:
- Primitive types: Strings, longs
- A list of strings:
likes - A list of nested objects:
commentsandmediaAttachments
Here is a sample of the SocialMediaPost object in JSON format:
{
"postId": "some-random-string",
"authorId": "some-random-string",
"content": "some-random-string",
"timestamp": 1678886400000,
"mediaAttachments": [
{
"type": "IMAGE",
"url": "http://example.com/some-random-image.jpg"
},
{
"type": "VIDEO",
"url": "http://example.com/some-random-video.mp4"
}
],
"likes": [
"some-random-string",
"some-random-string",
"some-random-string",
"some-random-string",
"some-random-string",
"some-random-string",
"some-random-string",
"some-random-string",
"some-random-string",
"some-random-string"
],
"comments": [
{
"commentId": "some-random-string",
"commenterId": "some-random-string",
"commentText": "some-random-string",
"timestamp": 1678886400000
},
{
"commentId": "some-random-string",
"commenterId": "some-random-string",
"commentText": "some-random-string",
"timestamp": 1678886400000
}
]
}
By using an object with this level of complexity, we can be more confident that our benchmark results are representative of the performance you might see in your own applications. The benchmarks were run across multiple JDK versions (17, 21, and 25) to see how performance varies.
The Results: A Deep Dive
You can explore the full, interactive results yourself on jmh.morethan.io. The raw data for the JMH report is also available in the Gist. But here’s the breakdown of what we found.
Serialization: The Need for Speed
When it comes to writing data, speed is paramount. Here, the binary formats demonstrated a staggering advantage.
- Winner: Protobuf and Avro Binary were neck-and-neck for the top spot, consistently clocking in as the fastest formats. On average, they took around 400-600 nanoseconds per operation.
- Middle Ground: Jackson JSON was significantly slower, taking roughly 1100-1200 ns/op—more than twice as long as the binary leaders.
- Slowest: Avro JSON was the clear laggard in the serialization race, requiring over 4000 ns/op.
Key Takeaway: The overhead of converting data to a text-based format like JSON is substantial. Binary formats are dramatically more efficient.
Deserialization: Reading the Data
The story here is very similar.
- Winner: Protobuf was the undisputed champion, consistently deserializing the object in around 700 ns/op.
- Runner-Up: Avro Binary was not far behind, but Protobuf maintained a consistent edge.
- The Rest: Jackson JSON and Avro JSON were again much slower, taking ~1600 ns/op and ~5900 ns/op, respectively.
Key Takeaway: For performance-critical applications, Protobuf offers a clear and measurable advantage. The cost of parsing text, validating syntax, and mapping fields is significant compared to the direct, structured approach of binary formats.
Beyond Speed: Payload Size Matters
Performance isn’t just about CPU time; it’s also about resource utilization. A key, unmeasured factor in the JMH report is the size of the serialized payload.
- Binary Formats (Protobuf, Avro Binary): Produce highly compact payloads. In our tests, the object was serialized to ~750 bytes.
- Text Formats (JSON, Avro JSON): Are much more verbose, resulting in a payload size of ~1220 bytes—over 60% larger!
This difference is crucial. Smaller payloads mean less network bandwidth consumption, and faster transmission times, especially for large objects or high-throughput systems.
Conclusion: Which Format Should You Choose?
Based on the data, here are our recommendations:
- For High-Performance Internal Services (RPC, Microservices):
- Choose Protobuf or Avro Binary. The performance gains in both serialization and deserialization are too significant to ignore. Protobuf often has a slight edge in raw speed, while Avro’s schema evolution capabilities are top-notch.
- For Public APIs and Web Frontends:
- Stick with JSON. Its human-readability, vast ecosystem, and ease of debugging make it the ideal choice for interfaces that are consumed by external clients or browser applications. The performance trade-off is acceptable for the gain in usability.
- For Big Data and Streaming Pipelines (e.g., Kafka):
- Strongly consider Avro. While Protobuf is fast, Avro was practically designed for this world. Its robust schema evolution support is a massive advantage in environments where data structures change over time.
The great serialization debate doesn’t have a single winner, but it does have clear use cases. By understanding the trade-offs between performance, payload size, and usability, you can make an informed decision that best fits your needs.
Want to run the benchmarks yourself? Check out the full project on GitHub at GSSwain/benchmark-serialization-deserialization and dive into the code.