Java object serialization is a way of freezing the state of an object graph and moving it around from one place to another. That seems like a pretty good idea, right? The motivation is good: allowing objects to break free of the boundaries of a single virtual machine is critical for object-oriented distributed computing. The practice is bad: there are three insidious evils that ride on the coat-tails of Java serialization:
1. Entire-object graph replication
2. breaks object identity
3. invades the programming model
The first issue, full-object graph replication, is mostly just a problem of efficiency. If objects are replicated across the JVM boundary using Java serialization, any changes to those objects requires serializing, transporting, and deserializing the full object graph to communicate what might be just a small change.
The second problem is the Achilles heel of Java serialization: it breaks object identity. The deep clone that is implicit with a serialization/deserialization round-trip necessarily creates copies of the objects in the object graph. Dealing with copies of objects when you don't want copies is a real pain and you have to jump through a lot of hoops to make things turn out the way you want. A lot has already been written on this subject that I won't reiterate here. Instead, for a thoroough treatment of the topic, I'll point you to Patrick Calahan's excellent series of blog entries on object identity.
The core problem with Java serialization and RMI is that it forces programmers to deal with the effects of externalizing objects and of explicitly acknowledging the presence of foreign JVMs. That's not something a programmer should have to deal with.
The very notion of "Remote Method Invocation" burdens the programmer with the responsibility of treating external JVMs as separate, foreign entities. In the absence of JVM-level clustering, this is a necessary burden to bear. But, now that there is JVM-level clustering (yes, I'm talking about Terracotta--what else?), we should no longer have to think of objects and method invocation as "remote." The "remoteness" of other JVMs should melt into the infrastructure in the same way that garbage collection allows memory management to disappear into the JVM. Developers should only have to think in terms of threads and objects-- the JVM should take care of the heavy lifting to virtualize the heap and signal threads across JVMs.
Virtual heap, network attached memory, and JVM-level clustering is the easiest, most elegant, and most efficient way to harness the power of multiple JVMs.
Monday, January 22, 2007
Why Java Serialization is Bad
Subscribe to:
Post Comments (Atom)
7 comments:
How about from the point of view of saving object graphs for later use, rather than RMI?
Serialisation is still bad then, but for reasons that you don't mention. Do you have a suggested solution there?
Is it natural to parallelise object orientation? In its base form, OO is about message passing. Perhaps distributed OO would depend on continuations, so that message passing between distributed objects can be non-blocking.
I think distributed OO will work like transactions in Postgres.
http://linuxgazette.net/issue68/mitchell.html
Multi-Version Concurrency Control gives us lock-free architecture for lessened contention and easier development while simultaneously avoiding the "dirty read" scenario.
Very kewl.
A bit extreme. The thing is to think before serializing.
The "transient" keyword can prevent object graph replication.
Object identity can be dealt with by a proper implementation of hashCode, equals, and readResolve depending on your project.
Maintaining state of a serialized object across RMI sounds like there may be a design issue.
;-)
As to whether or not a developer whould have to think about how serialization works when performing RMI, well, thats a whole new debate I chose not to take part in.
If you are using terracotta's JVM clustering, you aren't really perfrming RMI. However, I must admit, what those guys have done with terracotta is pretty sweet.
Serialization, like threading and concurrency, is a powerful tool. Like all powerful tools care must be taken lest disaster ensue.
Serialization is for lazy people, real programmer would write xml file with DOM or even by hand. But xml is bad too :( so dont do it, use .properties file(nobody is complaining about them) :>
I think your problem is that your using serialization the wrong way :(
This is a reHashing of the "impedence mismatch" argument that goes on betwixt RDMBS folk (yr hmble srvnt), and OO folk. OO folk say that the entirety of an Object has to be hauled around everywhere. RDBMS folk point out that only the instance data makes one Object of a Class different from another; they by definition have the same Method Text. And if you look at how *nix, for example, loads something; you see that all ThingA(s) run from the same Text Segment, differing only in their individual Data Segments.
It follows, it's always seemed to me, that a FooBar instance no matter whether it's in My JVM or Your JVM had best have identical Method Text irregardless of where its been loaded. That leaves only the data to make them different. And thus, the RDBMS becomes the Object Store without further tweaking.
This is not a good way to market your company.
You can say quite a few bad things about Java serialization, but you've somehow managed to not touch on one. You clearly don't understand serialization even enough to know you shouldn't try to compete with it.
For example, serialization's schema evolution story may not be ideal (you evolve your serialization logic, not your data as you would normally do with an RDMS), but last time I checked, Terracotta's story was even scarier: Ari, who had just taken similar misguided stabs at serialization, told me to write scripts to port my data. Then again, maybe that's the reason you didn't mention it as a serialization fault.
Serialization seems to be one of the least understood core Java libraries. You might be surprised by what it's capable of. Read through the specification if you get a chance; it's not long.
The comments to this post are interesting and not exactly what I expected. I've posted a new entry to sort of restate my position and clarify my point: Why Java Serialization is Bad, Redux
Post a Comment