By Antonio Warren,2014-06-19 07:21
16 views 0
protostuff | My favorites ? | Profile | Sign out


    java serialization library, proto compiler, code generator,

    protobuf utilities, gwt overlays, j2me, android and kindle


     Project Home Downloads Wiki Issues Source




     Best practices, tradeoffs, techniques


    Updated May 14, 2012 by

    Things you need to know

    Serialization Formats

    When choosing a format, you need to know the limitations and tradeoffs each format brings. protobuf

    messages cannot be streamed (comes with its internal format)

    what the official protobuf java implementation does:

    it computes the size of each message and stores it on a private int field named "memoizedSerializedSize".

    this means when your message contains a lot of nested messages, it will traverse every message (iterates

    the list if repeated field) on the graph (to compute the total size of the root message) before it can

    perform any serialization.

    the good thing about it is, it can perform validation on each message's fields (while computing the size)

    before the message is actually written (w/c is the proper way to do so).


    oNone. Protobuf was designed to have validation from the get-go, therefore, you cannot be

    streaming messages anyway.

    what protobuf/protostuff does:

    same applies, cannot be streamed (comes with the format)

    validation and message size computation needs to happen during serialization (3-in-1 setting)

    oBut wait, isn't it a bad idea to perform serialization when the message itself has not yet been

    declared valid?


    oDoes it mean I could be partially writing to the socket (OutputStream) when the

    UnintializedMessageException (required field missing) is thrown?

    ;No. We are buffering the writes to a series of logical buffers (one physical byte array if

    message size <= byte array size)

    ;This means that the message needs to be fully computed and validated (while serializing

    to a byte array) before it can be written to the socket (OutputStream).


    ;Even if you don't require validation, you still need to fully buffer the messages to

    reduce the serialization step to 1 instead of 2 (w/c is normally: 1. traverse the graph for

    validation/computation and 2. traverse the graph again for the actual serialization)


    messages can be streamed like most serialization formats (e.g json)

    If you need validation, then you need to buffer your writes before writing to the socket (OutputStream). Same rules apply for json, xml and other streaming formats.

     LinkedBuffer buffer = ...;



     int size = ProtostuffIOUtil.writeTo(buffer, message, schema);

     // you can prefix the message with the size (delimited message)

     LinkedBuffer.writeTo(outputStream, buffer);


     catch(UninitializedMessageException e)


     // your message was not written to the stream

     // because it was missing required fields.






Note that validation is intended for incoming messages.

    When populating the messages, you as the developer should already know w/c fields are required. The above snippet is a safety-net in case you forget.

    When you have tested your application on your staging server, on the production server you can directly stream the messages without full-buffering like the snippet above.

    validation is built-in on the schema api but that does not mean you have to use it (especially if you're using protostuff as a serialization library for object graphs).

    If messages are generated from .proto, simply avoid using "required" on any of your fields.

    If using the runtime schemas, everything is "optional" (or "repeated" for collections). when serializing utf8 strings, protostuff tries to do 2 things at once by writing to the buffer and computing the size at the same time.

    Why? Computation is required since a protobuf string requires that it's length be written first before

    the actual value.

    How? The max size of a utf8 string can be computed and protostuff uses that to decide how to

    efficiently perform the serialization.

    In streaming mode (directly writing to socket OutputStream), if the size of a single utf8 string is too

    large to fit in the initial buffer(LinkedBuffer.allocate(size)), protostuff will attach session buffers to it.

    oWhat are session buffers?

    ;These are small incremental buffers that are created, cached and re-used while writing to

    the OutputStream.

    ;These buffers are flushed to the stream immediately after a complete utf8 string write

    (2-in-1: buffer write + utf8 size computation).

    See the source code for protostuff-api (StreamedStringSerializer) for more details. If you are fully buffering the writes and you have large messages, please do allocate a LinkedBuffer relative to the message size.

    It is better to allocate large buffers and re-use them (application managed or thread-local) than to create one



    protostuff does a lot of atomic operations for speed in exchange for incremental internal buffering

    when needed.

    oIn buffering mode, internal buffering is not needed if the entire message fits in the buffer.

    oIn streaming mode, incremental session buffers are not needed if the there is no single utf8

    string larger than the buffer.

    these tradeoffs arguably makes protostuff the fastest in terms of serialization speed.


    Use it if you're serializing object graphs with cyclic references.

    Same tradeoffs with protostuff plus a small overhead of using an IdentityHashMap to identify cyclic references. json

    Use it if you're talking to a browser.

    An internal set of buffers are kept and recycled by jackson's BufferRecycler when writing/reading to/from streams.

    JsonIOUtil has methods where you can include the LinkedBuffer as an extra arg to effectively re-use your existing buffers when writing/reading to/from streams.


    Not as fast as the other formats but could be useful for legacy comms.


    The most human-readable format supported by protostuff.

    If you want to visualize the messages coming from your services/server, you'll want to use this. The default indention is 2 spaces.

    For better readability, you can increase this indention with the system property -Dyamloutput.extra_indent=2 to make the indention 4 spaces.

    Serializing object graphs

    Not portable to the other formats.

    The graph/reference logic is embedded in the serialization format (protostuff) to achieve high performance graph ser/deser.

    When you serialize the root object, its nested messages should not have a reference to it. See SerializingObjectGraphs for more details.

    Schema Evolution

    Generated pojos

    To remove a field, simply remove the field from the .proto (or you can comment that out). Make sure its field number is not re-used. (The same rules apply when adding new fields). If you're mixing this with protostuff-runtime, its best that you do not use the field number 127. Runtime pojos

    To remove a field, annotate with @Deprecated.

    When adding new fields, append the field on your pojo's field declaration. (The order is top to bottom). The limit on the number of fields is 126.

    If you are using the @Tag annotation, then there is no limit. Only requirement is that you don't use 127. If not and you have base classes and your messages inherit from those classes, make sure that the base classes are not subject to any change.

    When a pojo inherits from a parent, the field number ordering will be based from the parent primarily then

onto the child.

    If you add a field to the parent, the field number ordering of the subclass will be messed up.

    The whole concept is to preserve the field numbers. It is best you use the @Tag annotation to fully support schema evolution on object hierarchies.

    Polymorphic serialization

    This is only available for protostuff-runtime (obviously). What does this really mean?

    A pojo can have a field that is not a concrete type.


     public interface Foo


     // ...


     public abstract class Bar


     // ...


     public final class Person


     Foo foo;

     Bar bar;

     Object baz; // can be foo or bar


    Runtime Options

    Runtime options for protostuff-runtime:


    oYour enums are serialized using the string from;

    oBy default, this is disabled for protobuf compatibility (enums serialized using their number)


    oThe collection (repeated field) will be serialized like a regular message (even if the collection is


    oDisabled by default for protobuf compatibility (the collection is not serialized, only its values).

    oHere's an example (read the comments):

    public final class Foo


     List stringList;

     // equals and hashCode methods



     LinkedBuffer buffer = ...;

     Foo foo = new Foo();

     // empty list

     foo.stringList = new ArrayList();

     final byte[] data;



     data = ProtostuffIOUtil.toByteArray(foo, schema, buffer);






     Foo f = new Foo();

     ProtostuffIOUtil.mergeFrom(data, f, schema);

     assertEquals(f, foo); // this will fail if the option is not enabled because f.stringList will be


     // It would have been an empty list if the option was enabled (w/c makes it equal)


    oPolymorphic serialization includes the concrete type of the object being serialized. Upon

    deserialization, that className is read and is used to fetch the derived schema.

     boolean autoLoad = RuntimeSchema.AUTO_LOAD_POLYMORHIC_CLASSES;

     String className = ...; // read from the input.

     Schema derivedSchema = RuntimeSchema.getSchema(className, autoLoad);

     // If the class has not been loaded, and autoLoad is true, it will be loaded from the context


     // If autoLoad is false, a ProtostuffException is thrown (illegal operation, unknown message)oEnabled by default. For security purposes, you can pre-load all your known pojos and disable

    this. Here's how:

    // the code below preloads the schema of your pojos.



    // and so on ...


    oDisabled by default (Some devs have a habit of not using the final keyword). oBasically, when your pojo's class is not declared final ... it can be subclassed (polymorphic) oNote that if you enable this option, every non-final pojo will have the overhead of including the

    type metadata on serialization

    oSo, if you know that a particular class will not be subclassed, mark it final. oSee this issue for more details.

    -Dprotostuff.runtime.morph_collection_interfaces=true (since 1.0.7)

    oDisabled by default. Type metadata will not be included and instead, the collection will be

    mapped to a default impl.

    ;Collection = ArrayList

    ;List = ArrayList

    ;Set = HashSet

    ;SortedSet = TreeSet

    ;NavigableSet = TreeSet

    ;Queue = LinkedList

    ;BlockingQueue = LinkedBlockingQueue

    ;Deque = LinkedList

    ;BlockingDequeue = LinkedBlockingDeque

    oEnabling this is useful if you want to retain the actual impl used (type metadata will be


    oTo enable/override for a particular field, annotate the field with com.dyuproject.protostuff.Morph

    (since 1.0.7)

    oSince it is disabled by default, "List names;" would be serialized to json like:


     "names": ["foo","bar"]


    oIf enabled:


     "names": {"y":"ArrayList", "v":[{"i":"foo"},{"i":"bar"}]}


    oIf you're using protostuff for webservices, then you'll probably want to leave it disabled and let

    protostuff map it to an ArrayList.

    -Dprotostuff.runtime.morph_map_interfaces=true (since 1.0.7)

    oDisabled by default. Type metadata will not be included and instead, the map will be mapped to

    a default impl.

    ;Map = HashMap

    ;SortedMap = TreeMap

    ;NavigableMap = TreeMap

    ;ConcurrentMap = ConcurrentHashMap

    ;ConcurrentNavigableMap = ConcurrentSkipListMap

    oEnabling this is useful if you want to retain the actual impl used (type metadata will be


    oTo enable/override for a particular field, annotate the field with com.dyuproject.protostuff.Morph

    (since 1.0.7)

    -Dprotostuff.runtime.id_strategy_factory=com.dyuproject.protostuff.runtime.IncrementalIdStrategy$Factory oBy default (if property is not present), the DefaultIdStrategy is used, which means a

    polymorphic pojo is identified by serializing its type as a string (FQCN).

    oIf you set the above property, int ids are generated on the fly (thread-safe/atomic) and are

    mapped to your polymorphic pojos. The end result is faster ser/deser and the serialized size is

    smaller (around 1/3-1/4 the size of the default strategy)

    oYou can also reserve the first few ids (via IncrementalIdStrategy.Registry) for your core pojos,

    as well as set the max size for the ArrayList which holds the ids.

    Collection fields

    Null values are not serialized. (A deserialized collection coming from a collection with null values will fail on


    Collections with simple values(scalar,enum,pojo,delegate) are serialized without type metadata (normal operation).

    Complex values (E.g Queue>, List, Set, Deque) will be serialized with type


    Map fields

    Allows null keys and values. (You can rely on the equality from Map.equals())

    Maps with simple keys and values(scalar,enum,pojo,delegate) are serialized without type metadata (normal operation).

    Complex values (E.g Map>, HashMap,float[]>, TreeMap, SortedMap) will be serialized with type metadata.

    Comment by, Mar 18, 2011

     what you mean by "protobuf messages cannot be streamed (comes with its internal format)". I looked at the CodedOutputStream? of protobuf, the

    output is written to outputstream we passed to the "writeTo" method as soon as 4K limit is reached (if the payload serialized size is > 4K). I really do not get the idea of streaming from your point of view. Could you please help?

    -- Prabhakhar K

    Comment by project member, Mar 18, 2011

    If you notice in streaming formats (json and xml), there is no computation needed. Start-tags and end-tags are mostly used. Parsing will be a little bit harder.

    In protobuf, you need to compute before serialization. The size of the nested message is unknown simply because it does not have a fixed length. The upside is that parsing will be much easier. Protostuff uses start-tags and end-tags for nested messages, and at the same time still computing the size of the other fields during serialization.

    Basically, protostuff tries to be json only on nested messages.

    Comment by, Oct 21, 2011

    Hi.. I try to serialize an object that contain Maps, some of that item inside the map has same value , and when I try to deserialize that object, only one item with the same value that successfully restored, the other item just return null value. I'm using GraphIOUtil and RuntimeSchema? .

    Comment by project member, Oct 21, 2011

    Can you try against trunk? I've committed some recent fixes for runtime graph serialization. Comment by, Oct 23, 2011

    Hi.. It's work now. Thanks :D

    Comment by, Feb 11, 2012

    Your documentation is littered with this acronym "w/c" - what does it stand for? I can't find any likely candidates in a google search.

    Comment by project member, Feb 11, 2012

    w/c = which (at least from where I live). I tried googling and it indeed does not show. Must be something we made up here in ph.

    Comment by, Mar 2, 2012

    Just a question. I found <1>, which keeps the json deserializer from erring on unknown scalar attributes. Is there a way to run with the previous behavior (IE, I want to bail on converting my json stream into a protobuf if it encounters an attribute in the json stream not defined in the protobuf). I'm still pretty new to protostuff, and i could potentially find my answer soon. As of now, i'm generating schema for my existing protobufs with java_v2protoc_schema, and handling an translation through the

    JsonIoUtil?.mergeFrom(InputSTream, T, Schema, numeric) method.

    Thanks in advance


Comment by project member, Mar 2, 2012

    The easiest way is to extend FilterInput? (wrap JsonInput?) and override the handleUnknownField method. See

    GraphInput? for references.

    Comment by, Mar 2, 2012

    Thanks for the feedback. I was actually starting to look into doing something similar. I'm not sure that would actually work now that I'm deep into the internals of the JsonInput? class. First off, for my json stream, I'm

    using the names, rather than the attribute numbers. So when it tries to look up the field number matching my 'unkown' name from the generated schema's field map, it's not found, and a '0' for the key is returned, which the generated BuilderSchema? simply ignores

    So, it looks like I can get the desired functionality by actually wrapping the JsonInput? used and handling the

    "0" differently on calls to the public int readFieldNumber(Schema) method. This might be a bug in the generated schema though. Rather than returning a 0 from the BuilderSchema?.getFieldNumber(String) method

    when the item can't be loaded, you could return a negative number or something (or 1 over the highest index), so the handleUnkownField method will be called from the big 'switch" statement in the BuilderSchema?.mergeFrom method.

    Does that make sense?

    Comment by, Mar 2, 2012

    BTW, I'm using protostuff-json, 1.0.4


    com.dyuproject.protostuff protostuff-json 1.0.4

    Comment by project member, Mar 2, 2012

    Greg, returning zero (field numbers start at 1) from the generated schema is not a bug. "So, it looks like I can get the desired functionality by actually wrapping the JsonInput?? used and handling

    the "0" differently on calls to the public int readFieldNumber(Schema) method"

    yep. That's the right solution for json input handling.

    Comment by, Mar 3, 2012

    David, Thanks for the response. Problem I'm having is the schema returns a 0 if lookup of the attribute in the fieldMap fails (via the Schema.getFieldNumber(String) method, which is the same reply the JsonInput?.readFieldNumber(Schema,JsonParser?) method returns when it reaches the JsonToken? that matches

    the end of an object (JsonToken?.END_OBJECT). So, in the custom FilterInput? implementation, I cant tell

    the difference between the object being completely parsed and an unknown attribute. If I modify the return value as described, then the generated schema will treat the JsonToken?.END_OBJECT reply case as an

    UnknownField?. With the generated schema returning a 0 as well, the case statement evaluating the field numbers won't ever reach the handleUnkownField case.

    Having the schema return something other than 0, or a valid field number when lookups from the map fails, is an obvious solution. With the replies for the END_OBJECT and a lookup miss in the internal map having the same fieldnumber of '0', the generated switch statement in the mergeFrom(Input, Builder) can't tell the difference between the two cases

    Comment by, Apr 28, 2012

    I'm getting duplicate entries in my ArrayList? when deserializing through the Runtime schema. Is there a VM

    setting to fix this? I have version 1.0.5

    Comment by project member, Apr 28, 2012

    Err. This is not the place to discuss. Please use the mailing list in the future. From the outside, it looks like you're basically merging an ArrayList? that is not empty in the first place. If you can attach a sample demo that triggers what you mentioned, it would be easier to spot the culprit.

    Comment by, Apr 30, 2012

    Apologies for posting in the wrong place David. My test had an error where it re-used the same instance in the deserialization, causing it to simply add onto the list. All is good now. This is a nice library and works well.

    Comment by, May 30, 2012


    maybe i just dont get it but is there a way to access the fieldMap of a Schema? Thanks Andi

    Hint: You can use Wiki Syntax.

    Enter a comment:

    Terms - Privacy - Project Hosting Help

    Powered by Google Project Hosting

     ? | Profile | Sign out | My favorites


    java serialization library, proto compiler, code generator,

    protobuf utilities, gwt overlays, j2me, android and kindle


     Project Home Downloads Wiki Issues Source



    1 - 19 of 19

    Changed PageName ?Summary + Labels ?ChangedBy ?...?

    PipeUsage pipes explained, usage Featured Nov 16

    protobuf serialization and ProtobufSerialization May 2012 deserialization, howto, usage

    protostuff binary serialization and ProtostuffSerialization May 2012 deserialization, howto, usage

    runtime delegates howto/usage Delegate May 2012 Featured

    Best practices, tradeoffs, techniques ThingsYouNeedToKnow May 2012 Featured

    schemas for existing ProtostuffRuntime May 2012 pojos/beans/objects/etc Featured

    GwtJsonOverlays gwt json overlays Apr 2012

    maven archetypes for rapid MavenArchetypes Apr 2012 development

    json serialization and deserialization, JsonSerialization Apr 2012 howto, usage

    YamlSerialization yaml serialization howto, usage Apr 2012

    xml serialization and deserialization, XmlSerialization Apr 2012 howto, usage

    Flexible and Inheritable schema

    JavaBeanModelCompiler generation for your Apr 2012

    separated\independent models ProtoToProtoCompiler ProtoToProto Compiler Description Apr 2012

    Ser/Deser for deep object graphs

    SerializingObjectGraphs (references and cyclic dependencies) Oct 2011


    CompilerOptions Compiler Options Featured Jul 2011 Schema Explaining the schema Featured Jun 2011

    Writing custom code generators with WritingCustomCodeGenerators Feb 2011 stringtemplate Featured

    CompilerViaAnt using the proto compiler via ant Feb 2011 CompilerViaMaven using the compiler via maven Feb 2011

Report this document

For any questions or suggestions please email