Zbyněk Šlajchrt: April 2010

JPA is indisputably a great simplification in the domain of enterprise applications built on the Java platform. As a developer who had to cope up with the intricacies of the old entity beans in J2EE I see the inclusion of JPA among the Java EE specifications as a big leap forward. However, while delving deeper into the JPA details I find things that are not so easy. In this article I deal with comparison of the EntityManager’s merge and persist methods whose overlapping behavior may cause confusion not only to a newbie. Furthermore I propose a generalization that sees both methods as special cases of a more general method combine.

Persisting entities

In contrast to the merge method the persist method is pretty straightforward and intuitive. The most common scenario of the persist method's usage can be summed up as follows:

"A newly created instance of the entity class is passed to the persist method. After this method returns, the entity is managed and planned for insertion into the database. It may happen at or before the transaction commits or when the flush method is called. If the entity references another entity through a relationship marked with the PERSIST cascade strategy this procedure is applied to it also."

The specification goes more into details, however, remembering them is not crucial as these details cover more or less exotic situations only.

Note: If the entity has been removed from the persistence context then it becomes managed again when passed to the persist method. If the entity is detached (i.e. it was already managed) then an exception may be thrown.

Merging entities

In comparison to persist, the description of the merge's behavior is not so simple. There is no main scenario, as it is in the case of persist, and a programmer must remember all scenarios in order to write a correct code. It seems to me that the JPA designers wanted to have some method whose primary concern would be handling detached entities (as the opposite to the persist method that deals with newly created entities primarily.) The merge method's major task is to transfer the state from an unmanaged entity (passed as the argument) to its managed counterpart within the persistence context. This task, however, divides further into several scenarios which worsen the intelligibility of the overall method's behavior.

Instead of repeating paragraphs from the JPA specification I have prepared a flow diagram that schematically depicts the behaviour of the merge method:

Comparison

persist deals with new entities (passing a detached entity may end up with an exception.)
merge deals with both new and detached entities

persist always causes INSERT SQL operation is executed (i.e. an exception may be thrown if the entity has already been inserted and thus the primary key violation happens.)
merge causes either INSERT or UPDATE operation according to the sub-scenario (on the one hand it is more robust, on the other hand this robustness needn't be required.)

Note: Both SQL operations are postponed at or before the transaction commits
or flush is called

persist makes a previously removed entity managed again
merge throws an exception if a previously removed entity is passed

persist makes the passed entity managed
merge copies the state of the passed entity to the managed entity

persist does not return any value
merge returns the managed entity - the clone of the passed entity

both methods ignore a managed entity and turn their attention to the entities referenced through PERSIST, resp. MERGE, relationships

In contrast to merge, passing a detached entity to persist may lead to throwing an exception.

So, when should I use persist and when merge?

persist

You want the method always creates a new entity and never updates an entity. Otherwise, the method throws an exception as a consequence of primary key uniqueness violation.
Batch processes, handling entities in a stateful manner (see Gateway pattern)
Performance optimization

merge

You want the method either inserts or updates an entity in the database.
You want to handle entities in a stateless manner (data transfer objects in services)
You want to insert a new entity that may have a reference to another entity that may but may not be created yet (relationship must be marked MERGE). For example, inserting a new photo with a reference to either a new or a preexisting album.

Design flaws

The persist method implements inserting a new entity. The merge method implements both inserting and updating. There is apparently one method missing, which would implement updating without inserting. I can go on in generalization and think about possibility to define a custom behavior that occurs when an entity is being combined with the persistence context. From this point persisting, merging and updating would be mere three strategies how to combine an incoming entity with the content of the persistence context. The EntityManager interface would contain one general method, let's call it combine(entity, strategy), that would take two arguments: the entity and the strategy used for combining the entity with the persistence context. The strategy would be an interface having two main implementations: PersistStategy and MergeStrategy which would comply with the persist, resp. merge method. In this design both methods would simply delegate their invocations to the combine method passing the corresponding strategy instance.
The concept of cascade policies could be also generalized: instead of using the values from the CascadeType enumeration a programmer would use the class of the strategy itself as a value for the strategy attribute (or other) of the relationship annotations.
Instead:

   @OneToOne(cascade=CascadeType.PERSIST)
public Address getAddress() {
return address;
}

the code would look like this:

   @OneToOne(cascadeStrategy=PersistStrategy.class)
public Address getAddress() {
return address;
}

If the programmer wanted to declare a relationship through which the update-only entity combination strategy would be propagated to an associated entity, he/she would do it as follows:

   @OneToOne(cascadeStrategy=UpdateOnlyStrategy.class)
public Address getAddress() {
return address;
}

Some generalization should be also done in generating SQL command. Considering that the persist and merge strategies result in generating INSERT or UPDATE SQL command, there should be also some mechanism that would allow a custom strategy to generate its own SQL commands.

A Java EE greenhorn may find uneasy to determine quickly the appropriate transaction attribute for a method in an enterprise bean. He or she must remember well the definitions of all six attributes in order to choose the right one. Some attribute names are not very self-describing and may be confusing. For example the names of both Required and Mandatory attributes sounds very similar as both say the method must run within a transaction. Of course, there is a nuance that ascribes a stronger sense to the Mandatory attribute. However, especially for programmers for whom English is not the native language it may take a longer time to become familiar with the meanings of all attributes.

I personally prefer to assign a correct attribute in the two-step scenario: in each step I ask myself what I want the EJB container to do when the method is about to be invoked 1) if there is no pending transaction and 2) if there is a pending transaction. I have to choose one answer from the four options:

Create a new transaction
Nothing
Throw an exception
Suspend any pending transaction (makes sense in the second step only)

Once I have chosen the two answers I consult the following table for picking the correct attribute:

The rows in the table represent the two situations (i.e. no transaction and a pending transaction). The columns indicate the possible answers and a cell corresponds to the selected answer. There are six red arrows in the table that join all feasible answers in both situations. Each arrow is accompanied by an attribute’s acronym. Once I have answered the both questions I simply select the arrow connecting the cells and that’s it.

The table can be also useful as a part of the documentation. A cell may contain an explanations and rationale for choosing it as shown in the following example:

Yet, the table can become a part of the method’s JavaDoc. For example:

<h2>TX table</h2>
<table>
<tr>
<td></td>
<td><b>New transaction</b></td>
<td><b>Nothing</b></td>
<td><b>Throw an exception</b></td>
<td><b>Suspend any pending TX</b></td>
</tr>
<tr>
<td><b>No pending TX</b></td>
<td>This method commits changes in the persistent context.</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td><b>A pending TX</b></td>
<td>This methods commits the chages even if there is a pending transaction.</td>
<td></td>
<td></td>
<td></td>
</tr>
</table>

Zbyněk Šlajchrt

Saturday, April 17, 2010

JPA: persisting vs. merging entites

Monday, April 12, 2010

A Gadget for Determining Transaction Attributes for EJB methods

About Me

Kategorie

Blog Archive

Followers

My Other Blogs