A Deep Dive into Java Serialization

Alexander Obregon
12 min readNov 6, 2023

--

Image Source

Introduction

Java Serialization is a mechanism by which Java objects can be converted into a byte stream, and consequently, can be reverted back into a copy of the object. This is a fundamental concept in Java programming, especially when it comes to the transmission of objects over a network or their storage in files. In this post, we will delve into the intricacies of Java Serialization, exploring its nuances, how to implement it, its benefits, and potential pitfalls.

Understanding Java Serialization

Java Serialization is more than just a conversion of Java objects into a byte stream; it is a sophisticated mechanism that plays a significant role in various Java applications, whether they be for persistence, communication, or even caching. In this section, we’ll expand upon what Java Serialization is, how it works, its integral components, and its practical use cases.

The Serializable Interface

The journey of an object from a heap in memory to a sequence of bytes that can be stored or transmitted is made possible by implementing the java.io.Serializable interface. This interface is a marker interface, meaning it does not contain any methods to implement; it merely signals to the Java Virtual Machine (JVM) that the object is eligible for serialization.

Serialization Under the Hood

When an object is serialized, the serialization runtime traverses the objects via their fields. This process is recursive, following the graph of objects until all the reachable objects from the original source have been included in the serialization process. This does not only apply to the immediate fields of the current object but to all the objects that are reachable from the fields of all objects, down the chain.

The process can be illustrated in a simplified manner:

  1. If an object to be serialized implements the Serializable interface, the process begins.
  2. Each serializable object is assigned a unique identifier that the JVM uses to ensure that each instance is only serialized once. If an object is referenced multiple times, subsequent references after the first serialization point to the previously serialized data, maintaining the object graph’s integrity.
  3. For each object, the non-static and non-transient fields are included in the serialized representation.
  4. If any field is an object that also implements Serializable, the process is recursively applied to that object.
  5. The writeObject method may be implemented for custom serialization of an object.

Here’s an example of how a class that implements Serializable would be constructed:

import java.io.Serializable;

public class Employee implements Serializable {
private static final long serialVersionUID = 2L;

private String name;
private int age;
private String department;
// Assume getters, setters, and constructors are provided

// Custom serialization method
private void writeObject(java.io.ObjectOutputStream stream)
throws IOException {
stream.defaultWriteObject();
// Custom code or additional parameters
}

// Custom deserialization method
private void readObject(java.io.ObjectInputStream stream)
throws IOException, ClassNotFoundException {
stream.defaultReadObject();
// Custom read logic or additional validations
}
}

In this example, writeObject and readObject are not part of the Serializable interface. Instead, they are optional hooks that the serialization mechanism calls if they are present in the class being serialized.

Why Use Serialization?

Persistent Storage

Serialization is commonly used to persist the state of an object. Once an object is serialized and written to disk, it can be read and deserialized later, reconstructing the original object with its state intact. This is essential for applications that need to save user sessions, settings, or other dynamic information between runs.

Communication

In distributed systems, objects often need to be transmitted over the network to other JVMs running on different hosts. Serialization provides a way to convert objects into a format that can be easily transmitted and then reconstructed on the other end, making it foundational for Java RMI (Remote Method Invocation) and frameworks like Apache Hadoop and Apache Spark.

Deep Cloning

While not its primary use, serialization can serve to deep clone objects. By serializing an object and then immediately deserializing it, you can create a new instance that is a deep copy of the original object.

Serialization Internals and Process Flow

The mechanics of Java Serialization are governed by the Java Object Serialization Specification. Let’s take a deeper look at how the serialization process is orchestrated:

  1. Object Graph Analysis: The serialization runtime analyzes the object graph to determine the sequence of objects to serialize.
  2. Assigning SerialVersionUID: A serialVersionUID is computed if it’s not specified. This ID must match during deserialization to verify that the sender and receiver of a serialized object have loaded classes for that object that are compatible with respect to serialization.
  3. Writing Object Metadata: The class description (metadata including the class name, serialVersionUID, and class description of each object in the graph) is written.
  4. Writing Class Data: The actual data associated with the object’s fields are written. If any of the fields are themselves serializable objects, the serialization of those objects is handled recursively.
  5. Handling transient and static Fields: Fields declared as transient or static are ignored by the serialization process, as transient fields are not part of the persistent state of the object, and static fields are part of the class state, not the individual object state.

The Art of Custom Serialization

While default serialization is sufficient for many cases, there may be times when you need to customize the serialization process. For example:

  • Customizing Default Behavior: You can implement the writeObject and readObject methods to include additional logic during serialization and deserialization, such as encrypting data as it’s serialized or performing validations during deserialization.
  • Dealing with transient Fields: If you have sensitive information marked as transient that you do want to serialize securely, custom serialization methods can allow you to do so in a secure manner, such as encrypting the data before writing it to the output stream.

Serialization with Inheritance

If an object’s class has a superclass that is also serializable, then the fields of the superclass are included in the serialized representation of the object. The serialization process starts with the superclass and proceeds down the inheritance chain.

However, if the superclass is not serializable, the subclass can still be serialized, but the process will not include the fields of the non-serializable superclass. If the subclass has a parameterized constructor to initialize the state, this state will not be captured by serialization and will have to be reinitialized when the subclass is deserialized.

Here is a brief example:

import java.io.Serializable;

public class Employee extends Person implements Serializable {
private static final long serialVersionUID = 1L;

private int employeeId;
private String department;
// Assume getters, setters, and constructors are provided
}

// Superclass
public class Person {
private String name;
private int age;
// Assume getters, setters, and constructors are provided
}

In this case, Person is not serializable. If you were to serialize an Employee object, the state associated with the Person class would not be saved. To fully recover an Employee object during deserialization, you would need to ensure that Person either implements Serializable or that you handle the superclass's state yourself.

The Role of serialVersionUID

serialVersionUID is a unique identifier for each Serializable class. It is used to verify that the saved object and the loaded class are compatible in terms of serialization. If a deserialized object does not have a matching serialVersionUID, an InvalidClassException is thrown.

Understanding serialVersionUID

The serialization runtime associates a version number with each Serializable class called a serialVersionUID, which is used during the deserialization process to ensure that a loaded class is compatible with the serialized object. If no serialVersionUID is declared, the JVM will generate one at runtime based on various aspects of the class, including:

  • Class name
  • Class modifiers (e.g., public, abstract, etc.)
  • The list of interfaces
  • List of methods
  • List of fields
  • Other internal metadata

Because this auto-generated number can vary between different Java compiler implementations, it is strongly recommended to declare an explicit serialVersionUID for Serializable classes.

The Importance of Declaring serialVersionUID

Declaring an explicit serialVersionUID ensures that you maintain control over the serialization process, especially when you make subsequent changes to the class. When you update a class by adding methods or fields, you do not necessarily increment the serialVersionUID unless you want to invalidate the previous versions of the class.

When to Modify serialVersionUID

You should modify the serialVersionUID in the following cases:

  1. Incompatible Changes: When you make a change that is not backward compatible, such as removing fields or changing the type of fields, you should update the serialVersionUID. This way, you can ensure that only compatible versions of the class are used to deserialize objects.
  2. Compatible Changes: If you make a change that is backward compatible, such as adding new fields, you might choose not to update the serialVersionUID if you have handled the serialization of the new fields in the readObject method.

Default Serialization and serialVersionUID

In default serialization, if no serialVersionUID is provided, any change to the class will likely result in a different auto-generated serialVersionUID, which can cause problems when older serialized objects are deserialized with a newer version of the class.

Custom Serialization and serialVersionUID

In custom serialization, where the writeObject and readObject methods are used, you have more flexibility. You can manage backward and forward compatibility even without changing the serialVersionUID. For example, you could add optional data to the stream and manage this data in the readObject method, allowing newer versions of the class to read streams written by the older versions.

Best Practices for serialVersionUID

  1. Always explicitly declare serialVersionUID for Serializable classes.
  2. Increment serialVersionUID if you introduce changes that are not compatible with the previous version of the class.
  3. Consider using a version control system to keep track of changes to Serializable classes and their associated serialVersionUID values.
  4. Treat the serialization of classes as a public API and handle changes with care to maintain backward and forward compatibility.

Example of Using serialVersionUID

Let’s consider a class Employee that has gone through several iterations:

// Version 1
public class Employee implements Serializable {
private static final long serialVersionUID = 1L;
private String name;
// ...
}

// Version 2 - added a new field 'department'
public class Employee implements Serializable {
private static final long serialVersionUID = 2L;
private String name;
private String department;
// ...
}

// Version 3 - removed a field 'department', added 'division'
public class Employee implements Serializable {
private static final long serialVersionUID = 3L;
private String name;
private String division;
// ...
}

Each time a backward-incompatible change is made, the serialVersionUID is incremented. This way, the serialization mechanism can prevent attempts to deserialize old serialized objects that are incompatible with the new class structure.

Security Concerns and Best Practices

Java Serialization has been under scrutiny due to various security concerns, with vulnerabilities that can lead to attacks such as denial of service, access control issues, and even remote code execution. Understanding these security concerns is critical to protecting your Java applications.

Security Concerns

  1. Deserialization of Untrusted Data: The most significant risk with Java Serialization comes from deserializing data from untrusted sources. It can lead to remote code execution because the readObject method may end up executing code without any explicit instruction from the programmer.
  2. Arbitrary Code Execution: During deserialization, classes are often dynamically loaded. If an attacker can inject malicious bytecode into the serialization stream, they can execute arbitrary code on the JVM.
  3. Denial of Service (DoS): Serialization objects can be crafted to consume an excessive amount of memory or CPU when they are deserialized, leading to DoS attacks.
  4. Sensitive Data Exposure: If sensitive data is serialized and the serialization data is intercepted, it can expose sensitive information to attackers.

Best Practices

Use of transient Keyword

For sensitive fields that should not be serialized, use the transient keyword. This will ensure they are not included in the serialized form of an object.

private transient String password;

Avoid Serialization of Sensitive Data

Do not serialize sensitive information. If you must serialize objects containing sensitive data, make sure the data is encrypted.

Implement Custom writeObject and readObject

Control the serialization process by implementing custom writeObject and readObject methods, and include your own validation logic to prevent deserialization of tampered or invalid data.

Avoid Default Java Serialization for Public APIs

If you are designing a system where objects are exchanged over networks or between different application components, consider using safer alternatives to Java’s default serialization, such as JSON or XML. These are not only safer but often more interoperable.

Use Serialization Proxies

Serialization proxies provide a way to serialize an instance of a class by creating a “proxy” that represents the logical state of the instance. This approach avoids the pitfalls of default serialization by not relying on the actual class’s byte stream.

private Object writeReplace() {
return new SerializationProxy(this);
}

private static class SerializationProxy implements Serializable {
private final String data;

SerializationProxy(MyClass myClass) {
this.data = myClass.getData();
}

private Object readResolve() {
return new MyClass(this.data);
}
}

Code Signing

In environments where code mobility is a feature (like mobile code systems), make sure to use code signing to ensure that the code being loaded during the deserialization process is from a trusted source.

Validate Input

When deserializing objects, especially in applications exposed to untrusted clients, validate the inputs before deserialization. You can use filtering based on classes, object graphs, or schema validation.

Use of ObjectInputFilter

Java 9 introduced ObjectInputFilter to provide a way to filter which classes can be deserialized. Use it to specify allowed classes or to check various attributes of the incoming serialized data (like array size, depth, etc.).

ObjectInputFilter filter = ObjectInputFilter.Config.createFilter("java.base/*;!*");
ObjectInputStream ois = new ObjectInputStream(bais);
ois.setObjectInputFilter(filter);

Security Managers

Use a security manager that restricts the actions that can be performed by the JVM, particularly if you must deserialize from untrusted sources.

Regularly Update Java

Regularly update your Java runtime to ensure you have the latest security features and fixes.

Security Libraries

Consider using libraries designed to mitigate serialization risks, such as Apache Commons Lang SerializationUtils, which includes serialization utilities that provide additional layers of defense.

Serialization Alternatives and Future

Java Serialization’s security issues and performance overhead have led to the exploration and adoption of various alternatives. These alternatives aim to address some of the inherent problems of Java’s native serialization mechanism and also often provide more flexibility and efficiency.

Alternatives to Java Serialization

  1. JSON (JavaScript Object Notation): JSON is a lightweight data-interchange format that is easy for humans to read and write, and easy for machines to parse and generate. Libraries like Gson and Jackson can be used to serialize and deserialize Java objects to and from JSON.
  2. XML (eXtensible Markup Language): XML is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. Libraries such as JAXB (Java Architecture for XML Binding) facilitate the conversion of Java objects to XML and vice versa.
  3. Protocol Buffers: Developed by Google, Protocol Buffers are a language-agnostic binary serialization format. They are designed to be more efficient, both in terms of serialization speed and the size of the serialized data, compared to traditional serialization.
  4. Apache Thrift: Created by Facebook and now part of the Apache Software Foundation, Thrift is an interface definition language and binary communication protocol that is used for defining and creating services for numerous languages. It is efficient and flexible.
  5. Avro: Apache Avro is a binary serialization format that is compact, fast, and suitable for serializing large amounts of data. Its schema is stored with the data which allows for full dynamic data structures.
  6. Kryo: Kryo is a fast and efficient object graph serialization framework for Java. It is not tied to a specific schema generation pattern, and it offers low overhead.
  7. MessagePack: It’s an efficient binary serialization format that’s like JSON but faster and smaller.
  8. Custom Serialization: For maximum control, developers can implement their own serialization mechanism. This is often done for systems with extremely high performance requirements or where the data format needs to be controlled at the byte level.

Future of Serialization in Java

The future of serialization in Java is likely to evolve in a few directions:

  1. Removal of Java Serialization: Given the security risks, there have been discussions about deprecating and eventually removing the native Java serialization API.
  2. Improved Safety Features: Alternatives within Java, such as records (introduced in Java 14 as a preview feature and standardized in Java 16), are aimed at making data aggregation and sharing safer and more straightforward.
  3. Data Format Libraries: The continued development and improvement of libraries for JSON, XML, and other formats will make these alternatives more robust, performant, and secure.
  4. Project Amber: This project is working on language productivity features. One of the goals of Project Amber is to make it easier to work with data in Java, potentially affecting how data is serialized and deserialized.
  5. Project Loom: With the introduction of lightweight threads (fibers), there might be implications for how data is serialized and shared across threads, leading to new or improved serialization mechanisms.
  6. Record Patterns: The introduction of record patterns in future Java versions might also influence serialization, providing a way to destructure records, which could make serialization and deserialization processes more intuitive and less error-prone.

Conclusion

Java Serialization is a powerful feature that facilitates a wide range of applications, from persistent storage to inter-process communication. Despite its usefulness, it’s essential to understand its workings, potential security risks, and best practices to ensure robust and secure Java applications. With the rise of new serialization techniques and paradigms, developers should weigh the benefits and drawbacks of traditional Java Serialization against modern alternatives.

  1. Gson User Guide
  2. Jackson Github Repository
  3. JAXB Tutorial
  4. Protocol Buffers Developer Guide
  5. Apache Thrift Official Website
  6. Apache Avro Documentation
  7. Kryo Github Repository
  8. MessagePack Official Website
  9. Project Amber
  10. Project Loom: Java Project

--

--

Alexander Obregon

Software Engineer, fervent coder & writer. Devoted to learning & assisting others. Connect on LinkedIn: https://www.linkedin.com/in/alexander-obregon-97849b229/