Overview of Serializers and Copiers
While Ehcache is a Java cache, it cannot always store its mappings as Java objects.
The on-heap store is capable of storing cached objects either by reference (where the given key and value references are stored) or by value (where a copy of the given key and value are made and those copies are then stored). All other stores are only capable of storing a byte representation of the key/value pair.
Serializer
and Copier
are the abstractions to enable these different storage options.
Serializers
All stores but the on-heap one need some form of serialization/deserialization of objects to be able to store and retrieve mappings. This is because they cannot internally store plain java objects but only binary representations of them.
Serializer
is the Ehcache abstraction solving this:
every cache that has at least one store that cannot store by reference is going to use a pair of Serializer
instances,
one for the key and another one for the value.
A Serializer
is scoped at the cache level and all stores of a cache will be using and sharing the same pair of serializers.
How is a serializer configured?
There are two places where serializers can be configured:
-
at the cache level where one can use
-
CacheConfigurationBuilder.withKeySerializer(Class<? extends Serializer<K>> keySerializerClass)
, -
CacheConfigurationBuilder.withKeySerializer(Serializer<K> keySerializer)
, -
CacheConfigurationBuilder.withValueSerializer(Class<? extends Serializer<V>> valueSerializerClass)
, -
and
CacheConfigurationBuilder.withValueSerializer(Serializer<V> valueSerializer)
, which allow by instance or by class configuration.
-
-
at the cache manager level where one can use
-
CacheManagerBuilder.withSerializer(Class<C> clazz, Class<? extends Serializer<C>> serializer)
-
If a serializer is configured directly at the cache level, it will be used, ignoring any cache manager level configuration.
If a serializer is configured at the cache manager level, upon initialization, a cache with no specifically configured serializer will search through its cache manager’s registered list of serializers and try to find one that directly matches the cache’s key or value type. If such search fails, all the registered serializers will be tried in the added order to find one that handles compatible types.
For instance, let’s say you have a Person
interface and two subclasses: Employee
and Customer
.
If you configure your cache manager as follows:
CacheManagerBuilder.newCacheManagerBuilder().withSerializer(Employee.class,
EmployeeSerializer.class).withSerializer(Person.class, PersonSerializer.class)
then configuring a Cache<Long, Employee>
would make it use the EmployeeSerializer
while a Cache<Long, Customer>
would make it use the PersonSerializer
.
A Serializer
configured at the cache level by class will not be shared to other caches when instantiated.
Given the above, it is recommended to limit Serializer registration to concrete classes and not aim for generality.
|
Bundled implementations
By default, cache managers are pre-configured with specially optimized Serializer
that can handle the following types, in the following order:
-
java.io.Serializable
-
java.lang.Long
-
java.lang.Integer
-
java.lang.Float
-
*java.lang.Double
-
java.lang.Character
-
java.lang.String
-
byte[]
All bundled Serializer
implementations support both persistent and transient caches.
A consequence of providing serializers registered by default is that you will not be able to register a generic However, registering a different |
Lifecycle: instances vs. class
When a Serializer
is configured by providing an instance, it is up to the provider of that instance to manage its lifecycle.
It will need to dispose of any resource the serializer might hold, persisting or reloading the serializer’s state.
When a Serializer
is configured by providing a class either at the cache or cache manager level,
since Ehcache is responsible for creating the instance, it also is responsible for disposing of it.
If the Serializer
implements java.io.Closeable
then close()
will be called when the cache is closed and the Serializer
no longer needed.
Writing your own serializer
Serializer
defines a very strict contract.
So if you’re planning to write your own implementation you have to keep in mind that
the class of the serialized object MUST be retained after deserialization, that is:
object.getClass().equals(mySerializer.read(mySerializer.serialize(object)).getClass())
This is especially important when you are planning to write a serializer for an abstract type, e.g. a serializer of type com.pany.MyInterface
should:
-
deserialize a
com.pany.MyClassImplementingMyInterface
when the serialized object is of classcom.pany.MyClassImplementingMyInterface
-
return a
com.pany.AnotherClassImplementingMyInterface
object when the serialized object is of classcom.pany.AnotherClassImplementingMyInterface
Implement the following interface, from package org.ehcache.spi.serialization
:
/**
* Defines the contract used to transform type instances to and from a serial form.
* <p>
* Implementations must be thread-safe.
* <p>
* When used within the default serialization provider, there is an additional requirement.
* The implementations must define a constructor that takes in a {@code ClassLoader}.
* The {@code ClassLoader} value may be {@code null}. If not {@code null}, the class loader
* instance provided should be used during deserialization to load classes needed by the deserialized objects.
* <p>
* The serialized object's class must be preserved; deserialization of the serial form of an object must
* return an object of the same class. The following contract must always be true:
* <p>
* {@code object.getClass().equals( mySerializer.read(mySerializer.serialize(object)).getClass())}
*
* @param <T> the type of the instances to serialize
*
* @see SerializationProvider
*/
public interface Serializer<T> {
/**
* Transforms the given instance into its serial form.
*
* @param object the instance to serialize
*
* @return the binary representation of the serial form
*
* @throws SerializerException if serialization fails
*/
ByteBuffer serialize(T object) throws SerializerException;
/**
* Reconstructs an instance from the given serial form.
*
* @param binary the binary representation of the serial form
*
* @return the de-serialized instance
*
* @throws SerializerException if reading the byte buffer fails
* @throws ClassNotFoundException if the type to de-serialize to cannot be found
*/
T read(ByteBuffer binary) throws ClassNotFoundException, SerializerException;
/**
* Checks if the given instance and serial form {@link Object#equals(Object) represent} the same instance.
*
* @param object the instance to check
* @param binary the serial form to check
*
* @return {@code true} if both parameters represent equal instances, {@code false} otherwise
*
* @throws SerializerException if reading the byte buffer fails
* @throws ClassNotFoundException if the type to de-serialize to cannot be found
*/
boolean equals(T object, ByteBuffer binary) throws ClassNotFoundException, SerializerException;
}
As the Javadoc states, there are some constructor rules, see the section Persistent vs. transient caches for that.
You can optionally implement java.io.Closeable
.
If you do, Ehcache will call close()
when a cache using such a serializer gets disposed of,
but only if Ehcache instantiated the serializer itself.
ClassLoaders
When Ehcache instantiates a serializer itself, it will pass it a ClassLoader
via the constructor.
Such class loader must be used to access the classes of the serialized types as they might not be available in the current class loader.
Persistent vs. transient caches
All custom serializers must have a constructor with the following signature:
public MySerializer(ClassLoader classLoader) {
}
Attempting to configure a serializer that lacks such a constructor on a cache using either of
CacheConfigurationBuilder.withKeySerializer(Class<? extends Serializer<K>> keySerializerClass)
or CacheConfigurationBuilder.withValueSerializer(Class<? extends Serializer<V>> valueSerializerClass)
will cause an exception upon cache initialization.
But if an instance of the serializer is configured using either of CacheConfigurationBuilder.withKeySerializer(Serializer keySerializer)
or CacheConfigurationBuilder.withValueSerializer(Serializer valueSerializer)
it will work since the instantiation is done by the user code itself.
Registering a serializer that lacks such a constructor at the cache manager level will prevent it from being chosen for caches.
Custom serializer implementations could have some state that is used in the serialization/deserialization process. When configured on a persistent cache, the state of such serializers needs to be persisted across restarts.
To address these requirements you can have a StatefulSerializer
implementation.
StatefulSerializer
is a specialized Serializer
with an additional init
method with the following signature:
public void init(StateRepository repository) {
}
The StateRepository.getPersistentStateHolder(String, Class<K>, Class<V>)
provides a StateHolder
(a map like structure) that you can use to store any relevant state.
The StateRepository
is provided by the authoritative tier of the cache and hence will have the same persistence properties of that tier.
For persistent caches it is highly recommended that all state is stored in these holders
as the users won’t have to worry about the persistence aspects of this state holder as it is taken care of by Ehcache.
-
In the case of a disk persistent cache, the contents of the state holder will be persisted locally on to the disk.
-
For clustered caches, the contents are persisted in the cluster itself so that other clients using the same cache can also access the contents of the state holder.
The constructor with the signature (ClassLoader classLoader, FileBasedPersistenceContext persistenceContext)
that existed till v3.1 has been removed since v3.2 in favor of `StatefulSerializer`s.
|
Copiers
As the on-heap store is capable of storing plain Java objects as such, it is not necessary to rely on a serialization mechanism to copy keys and values in order to provide by value semantics. Other forms of copy mechanism can be a lot more performant, such as using a copy constructor but it requires custom code to be able to copy user classes.
Copier
is the Ehcache abstraction solving this: it is specific to the on-heap store.
By default, the on-heap mappings are stored by reference. The way to store them by value is to configure copier(s) on the cache for the key, value or both.
Of course, the exact semantic of by value in this context depends heavily on the Copier
implementation.
How is a copier configured?
There are two places where copiers can be configured:
-
at the cache level where one can use
-
CacheConfigurationBuilder.withKeyCopier(Class<? extends Copier<K>> keyCopierClass)
, -
CacheConfigurationBuilder.withKeyCopier(Copier<K> keyCopier)
, -
CacheConfigurationBuilder.withValueCopier(Class<? extends Copier<V>> valueCopierClass)
, -
and
CacheConfigurationBuilder.withValueCopier(Copier<V> valueCopier)
, which allow by instance or by class configuration.
-
-
at the cache manager level where one can use
-
CacheManagerBuilder.withCopier(Class<C> clazz, Class<? extends Copier<C>> copier)
-
If a copier is configured directly at the cache level, it will be used, ignoring any cache manager level configuration.
If a copier is configured at the cache manager level, upon initialization, a cache with no specifically configured copier will search through its cache manager’s registered list of copiers and try to find one that directly matches the cache’s key or value type. If such search fails, all the registered copiers will be tried in the added order to find one that handles compatible types.
For instance, let’s say you have a Person
interface and two subclasses: Employee
and Customer
.
If you configure your cache manager as follows:
CacheManagerBuilder.newCacheManagerBuilder().withCopier(Employee.class,
EmployeeCopier.class).withCopier(Person.class, PersonCopier.class)
then configuring a Cache<Long, Employee>
would make it use the EmployeeCopier
while a Cache<Long, Customer> would make it use the PersonCopier
.
A Copier
configured at the cache level by class will not be shared to other caches when instantiated.
Given the above, it is recommended to limit Copier registration to concrete classes and not aim for generality. |
Bundled implementations
A SerializingCopier
class exists in case you want to configure store by value on-heap using the configured (or default) serializer.
Note that this implementation performs a serialization / deserialization on each read or write operation.
The CacheConfigurationBuilder
provides the following methods to make use of this specialized copier:
-
CacheConfigurationBuilder.withKeySerializingCopier()
for the key. -
CacheConfigurationBuilder.withValueSerializingCopier()
for the value.
Lifecycle: instances vs class
When a Copier
is configured by providing an instance, it is up to the provider of that instance to manage its lifecycle.
It will need to dispose of any resource it used after it is no longer required.
When a Copier
is configured by providing a class either at the cache or cache manager level,
since Ehcache is responsible for creating the instance, it also is responsible for disposing of it.
If the Copier
implements java.io.Closeable
then close()
will be called when the cache is closed and the Copier
no longer needed.
Writing your own Copier
Implement the following interface:
/**
* Defines the contract used to copy type instances.
* <p>
* The copied object's class must be preserved. The following must always be true:
* <p>
* <code>object.getClass().equals( myCopier.copyForRead(object).getClass() )</code>
* <code>object.getClass().equals( myCopier.copyForWrite(object).getClass() )</code>
*
* @param <T> the type of the instance to copy
*/
public interface Copier<T> {
/**
* Creates a copy of the instance passed in.
* <p>
* This method is invoked as a value is read from the cache.
*
* @param obj the instance to copy
* @return the copy of the {@code obj} instance
*/
T copyForRead(T obj);
/**
* Creates a copy of the instance passed in.
* <p>
* This method is invoked as a value is written to the cache.
*
* @param obj the instance to copy
* @return the copy of the {@code obj} instance
*/
T copyForWrite(T obj);
}
-
T copyForRead(T obj)
is invoked when a copy must be made upon a read operation (like a cacheget()
), -
T copyForWrite(T obj)
is invoked when a copy must be made upon a write operation (like a cacheput()
).
The separation between copying for read and for write can be useful when you want to store a lighter version of your objects into the cache.
Alternatively, you can extend from org.ehcache.impl.copy.ReadWriteCopier
if copying for read and copying for write implementations
are identical, in which case you only have to implement:
-
public abstract T copy(T obj)