Decoding strings with an unknown encoding

Unicode SnowmanWe’ve all been there, the system is humming along when you bring up the UI only to see “Espa?ol” staring you back in the face. What is this? Why is there a question mark in there? Well, you’ve been bitten by they mystical world of unicode. Usually this issue is an easy fix; always use UTF-8 encoding. However, what to do if you don’t own the whole code path? What if you have to support poorly written third party client libraries? What if you have client libraries in the wild that will never be updated? What if these client libraries out right lie about what encoding they are using? Follow me and we’ll find out.

Bytes To String

On the JVM there are two standard ways of converting bytes to Strings; new String(bytes[], encoding) and using the NIO Charset. Since they both accomplish the same feat my decision came down to performance. Luckily someone else did the heavy lifting and figured out that new String(bytes[], encoding) ekes out a small win over NIO Charset. However, this option poses a challenge. How do we know if the decoding succeeded? The NIO option can throw an Exception (effective but slow) or insert a Unicode Replacement Character (easy to find with a String.contains()) if it encounters a some bytes that cannot be decoded . new String(bytes[], encoding) does no such thing. It blindly decodes characters and will output random garbage in the resulting string if the decoding chokes on some bad byte values. We need a way to find those garbage characters.

Regex for non unicode characters

The clue that got me on the right track was an odd looking pattern in the javadoc for Pattern.

[\p{L}&&[^\p{Lu}]] Any letter except an uppercase letter (subtraction)

It seemed that \p{L} was some magical regex incantation for any unicode letter and after some additional searching it appeared that that was exactly the case. Of course what we really want to find are characters that are not unicode letters, spaces, punctuation or digits. Lucky for us there are matching groups for all of these, leading us to this regex:

[^\p{L}\p{Space}\p{Punct}\p{Digit}]

Performance Testing

Since performance is of particular concern it was important to test the overhead of this regex check. I fired up the Scala REPL and tested the using new String(bytes[], encoding) with the above regex as compared to NIO Coded using String.contains() to check for the replacement character. After all that work it turns out that the regex was significantly more expensive than String.contains(). So much so that the NIO code was about 2x as fast. So, in the end, I ended up going with the simpler NIO option.

The Code

import io.Codec
import java.nio.ByteBuffer

val UTF8 = "UTF-8"
val ISO8859 = "ISO-8859-1"
val REPLACEMENT_CHAR = '\uFFFD'

def bytesToString(bytes: Array[Byte], encoding: String) = {
  val upper = encoding.toUpperCase
  val codec = if (ISO8859 == upper) Codec.ISO8859 else Codec.UTF8
  val decoded = codec.decode(ByteBuffer.wrap(bytes)).toString
  if (!decoded.contains(REPLACEMENT_CHAR)) {
    decoded
  } else {
    val otherCodec = if (ISO8859 == upper) Codec.UTF8 else Codec.ISO8859
    val otherDecoded = otherCodec.decode(ByteBuffer.wrap(bytes)).toString
    if (!otherDecoded.contains(REPLACEMENT_CHAR)) {
      otherDecoded
    } else {
      val utf8 = if (ISO8859 == upper) otherDecoded else decoded
      utf8.replace(REPLACEMENT_CHAR, '?')
    }
  }
}

Update

Thanks to Joni Salonen for pointing out that I was wrong about new String() not inserting the unicode replacement char. In light of that info the following code is just a touch faster.

val UTF8 = "UTF-8"
val ISO8859 = "ISO-8859-1"
val REPLACEMENT_CHAR = '\uFFFD'
def bytesToString(bytes: Array[Byte], encoding: String) = {
  val upper = encoding.toUpperCase
  val firstEncoding = if (ISO8859 == upper) ISO8859 else UTF8
  val firstDecoded = new String(bytes, firstEncoding)
  if (!firstDecoded.contains(REPLECEMENT_CHAR)) {
    firstDecoded
  } else {
    val secondEncoding = if (ISO8859 == upper) UTF8 else ISO8859
    val secondDecoded = new String(bytes, secondEncoding)
    if (!secondDecoded.contains(REPLECEMENT_CHAR)) {
      secondDecoded
    } else {
      val utf8 = if (ISO8859 == upper) secondDecoded else firstDecoded
      utf8.replace(REPLECEMENT_CHAR, '?')
    }
  }
}

Software Development is not Just Coding

CodingThroughout my career as a startup software developer I have constantly come across fellow developers who seem to have a confused concept of our common profession. Specifically, they believe coding to be the single activity which comprises software development. Now its not hard to see how this misconception comes about. Too frequently developers are interrupted by frivolous meetings causing a backlash against anything that hints at time away from the desk. Moreover, at the end of the day, the top priority is getting features out the door and any time spent not coding smells of time lost. However, this narrow view of software development is holding us back and cramping our productivity. Specifically, there are three realms in which we need to spend more time up front so we can run faster over all.

Quality assurance and the rise of unit tests

I have yet to meet a startup that had enough resources for a QA team and rightly so. The rise of unit test and continuous integration tools throws into doubt the benefit of a dedicated QA team in all but the most extreme cases. However, I still see far too many developers who pay lip service to their value and then, in the name of speed, proceed to code without any. What they are missing is the insane productivity gains to be found in having a suit of tests running against every checkin. Being able to refactor and hack away without needing weeks of subsequent manual testing or months of sleepless nights fixing live bugs is something that is missed by many, particularly in the startup world, and is a huge drain on productivity.

Ops becomes DevOps

More recently we are beginning to see the influence of tools effecting the operations side as well. These days, through the power of Amazon Web Services, Rackspace, and others, developers no longer need to rely on in house operations for their hardware needs. However, operations provided more services than simply assembling servers and connecting them to a network. Developers are now in the position where they must take on the responsibility of deploying and monitoring their software. Fortunately, hardware is something no startup can run without so, unlike QA, understanding the necessity of AWS or Rackspace is a no brainer. Unfortunately, understanding the need to spend time setting up monitoring and automating deployment is still a work in progress. Too often clients and customers are the first ones to discover a service outage and deploying more servers takes days and not hours. The seemingly never ending march of minor crises which is operations can destroy productivity if some judicious work and planning is not done upfront.

What are we doing and we will we get there?

Finally we come to the most important and also the most contentious issue, planning. Many developers approach any sort of planning with apprehension if not outright disgust. However, these are the same developers that end up taking days if not weeks longer to launch a product or wrap up a milestone because there was not enough coordination and thought ahead of time to plan the multitude of steps that goes into such an event. At a higher level I have seen whole companies spin while different developers over engineer and over refine products based on vague requirements causing a jump in coding “productivity” due to less meetings and yet a serious drop in output as nothing is delivered and what is delivered is inevitably not what anyone wanted. If we want to step up our game and get things done we need to stop shying away from all process and being adopting the right processes. There are any number of options (SCRUM, Lean, Kanban, etc.) to learn and borrow from. Make your own process. Start with the simplest thing that works. Add (or remove!) when something isn’t working. Whatever it might be we need to move beyond the “all process is evil” mentality. Yes there will be meetings but you will shock yourself at how much faster you are in the end.

Monit alerts sent from a Dreamhost email account

This little snippet of configuration took way to long to figure out so I’m sharing it in hopes of saving others the trouble. If you are using monit and trying to send alerts from a Dreamhost email account you will need to use the following settings.

set mailserver mail.your.host.name port 465
username "monit@your.host.name"
password "password"
using SSLV3
with timeout 15 seconds
set alert alerts@your.host.name
with mail-format { from: monit@your.host.name }

It turns out that monit and Dreamhost don’t get along on the default port 587. The solution according to Dreamhost is to use port 465. This works as long as you use SSLv3 instead of TLSv1. The other important part of this configuration is making sure you set the from address in your mail format. Otherwise monit tries to send from monit@<hostname> which in most cases is not a fully qualified name and will cause Dreamhost to fail.

ConnectionFactories and Caching with Spring and ActiveMQ

© Sean Ryan

In my previous two posts on ActiveMQ and Spring I’ve shown how to implement both asynchronous and synchronous messaging using MessageListenerContainers and JmsTemplates. Since then I have continued to tweak and improve my use of Spring’s JMS support and have uncovered a few more details about the ConnectionFactories supplied by both ActiveMQ and Spring and how they play together nicely and not so nicely. Some of this info is new while some was covered in my second post and lead to an edit in my initial post but I will reiterate that information here to keep it all in one place.

PooledConnectionFactory

This bad boy is the ActiveMQ provide connection pooler. It pools connections, sessions and producers and does a fine job of it. There is nothing particularly wrong with this connection factory though it does strike me as odd that it caches connections since, according to the JMS spec, connections support concurrent use.

SingleConnectionFactory

It might seem odd to have a connection factory that ensures only one connection but in reality this is quite useful. When using multiple JmsTemplates and MessageListenerContainers in Spring they will end up each creating their own connection since the default ConnectionFactory will create a new connection for each createConnection() call. Initially I thought this would be fine but I ran into some very weird issues where producers and consumers couldn’t see each other. I’m not sure if this had something to do with using embedded brokers and the VM Transport but it wasn’t good. Using a SingleConnectionFactory as the initial decorator around the default ConnectionFactory solved these issues.

CachingConnectionFactory

This is the real heavy lifter. It will cache sessions and can optionally cache consumers and producers. As any post on JmsTemplate will tell you, you MUST use caching for any sort of reasonable performance if you are not running in a JCA container. As such, this is the factory to use when using a JmsTemplate. A few points should be made clear though. First, the default is for only one session to be cached. The javadoc claims this is sufficient for low concurrency but if you are doing something more high end you probably want to increase the SessionCacheSize. Additionally, there is some weirdness around cached consumers. They don’t get closed until they are removed from the pool and they are cached using a key that contains the destination and the selector. This lead to essentially a memory leak when trying to use cached consumers for request response semantics using JMSCorrelationIds. In general caching consumers is hard which is why MessageListenerContainers should be used instead.

How’s it all work

<amq:connectionFactory id="connectionFactory" brokerURL="${jms.url}"/>
<bean id="singleConnectionFactory" p:targetConnectionFactory-ref="connectionFactory" p:reconnectOnException="true"/>
<bean id="cachingConnectionFactory" p:targetConnectionFactory-ref="singleConnectionFactory" p:sessionCacheSize="100"/>
<bean id="jmsTemplate" p:connectionFactory-ref="cachingConnectionFactory"/>
<jms:listener-container connection-factory="singleConnectionFactory">
<jms:listener id="QueueHandler" destination="Queue" ref="queueHandler"/>
</jms:listener-container>

First we start with a basic ActiveMQ ConnectionFactory. From there we wrap it in a SingleConnectionFactory and then wrap that in a CachingConnectionFactory. I’ve increased the SessionCacheSize on the CachingConnectionFactory to 100, this might be overkill but my application is anything but memory hungry so I figured I could err on the side of too many sessions. Additionally, and this is important, ReconnectOnException is set to true on the SingleConnectionFactory. The CachingConnectionFactory requires this and will complain loudly but in such as way as to make it unclear what is going on if this is not set. From there we use the CachingConnectionFactory to create a JmsTemplate and we use the SingleConnectionFactory to create a MessageListenerContainer since it takes care of any caching of sessions and consumers that it needs.

ActiveRecord Naming Conventions with Spring JDBC

java ruby on railsOne of the best choices we made at my current job was splitting the web development from the heavy message processing system, allowing the former to be written in Ruby on Rails and the latter to be written in Java. However, for ease of development, we allowed the two code bases share a single database. What then to do about database naming conventions? Those familiar with ActiveRecord will know that it is very particular about its database naming conventions so it was up to the Java code to play nicely. The first solution to present itself required hardcoding database names in each Java object but a more elegant solution seemed possible and, after the discovery of one small but important 3rd party project, it revealed itself.

Inflection

In Rails, the class responsible for converting object names to database names is called Inflections. For those unfamiliar with the term (as I was), inflection is the modification of a word to express different grammatical categories. Armed with this new knowledge I set about googling and was relieved to find that Tom White had already implemented a Java Inflector. However, the Java Inflector only implements the pluralization aspect of the Rails Inflections; for this to work I was going to need some additional string manipulation tools. Luckily the old Apache Commons StringUtils was there to lend a helping hand. Finally, confident in my string manipulation tools, the next step was to look into the JDBC side of things.

ORMs

When dealing with mapping objects to and from a database Hibernate is always the name that pops up. I’ve worked with it in the past and, while incredibly powerful, I always felt like I was wielding a hatchet when what I wanted was a surgical knife. I’ve heard its configuration has been simplified with JPA but its still a beast of a project that relies on a bit more magic than I’m comfortable with. In a more recent project I played with iBatis (now myBatis) and enjoyed the simplicity but felt overwhelmed by the xml configuration. Version 3.0 supports annotation configuration which greatly simplifies this but I was looking for something based on naming conventions so I wouldn’t have to supply the mapping myself. I don’t mind getting down and dirty with some SQL so I opted for Spring JDBC. Spring provides a super thin layer on top of raw JDBC which takes what is a fairly developer hostile API and makes it a breeze to use, all while adding some wonderful utilities to speed development.

Putting It All Together

AbstractDbObj

The first step was to create an AbstractDbObj to allow all other database objects to inherit from. This is where the logic for creating db names using the Inflector and StringUtils lives along with some other shared code.

import static com.google.common.collect.Collections2.filter;
import static com.google.common.collect.Collections2.transform;
import static org.apache.commons.lang.StringUtils.join;
import static org.apache.commons.lang.StringUtils.lowerCase;
import static org.apache.commons.lang.StringUtils.splitByCharacterTypeCamelCase;
import static org.jvnet.inflector.Noun.pluralOf;
import java.util.Collection;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import org.apache.commons.beanutils.BeanMap;
import org.apache.commons.lang.builder.EqualsBuilder;
import org.apache.commons.lang.builder.HashCodeBuilder;
import org.apache.commons.lang.builder.ReflectionToStringBuilder;
import org.apache.commons.lang.builder.ToStringStyle;
import com.google.common.base.Function;
import com.google.common.base.Predicate;
import com.google.common.collect.ImmutableSet;

public abstract class AbstractDbObj  implements DbObj {
  private static final String[] EXCLUDED_FIELD_NAMES = new String[] { "columnNames", "map", "fieldNames", "tableName" };
  private static final Predicate<String> IS_COLUMN = new Predicate<String>() {
    private final ImmutableSet<String> notColumns = ImmutableSet.of( "class", "tableName", "columnNames", "fieldNames" );
    public boolean apply( final String propName ) {
      return !notColumns.contains( propName );
    }
  };

  private static final ConcurrentMap<Class<? extends LocaDbObj>, String> TABLE_NAME_LOOKUP = new ConcurrentHashMap<Class<? extends LocaDbObj>, String>();

  private static final Function<String, String> TO_COLUMN = new Function<String, String>() {
    public String apply( final String fieldName ) {
      return underscore( fieldName );
    }
  };

  private static String tableName( final Class<? extends LocaDbObj> clazz ) {
    return pluralOf( underscore( clazz.getSimpleName() ) );
  }

  protected static String underscore( final String camelCasedWord ) {
    return lowerCase( join( splitByCharacterTypeCamelCase( camelCasedWord ), '_' ) );
  }

  private Integer id;
  private final Collection<String> fieldNames;
  private final String tableName;
  private final Map<String, Object> map;
  private final Collection<String> columnNames;

  @SuppressWarnings( "unchecked" )
  protected AbstractLocaDbObj() {
    super();
    final Class<? extends LocaDbObj> clazz = getClass();
    String tblName = TABLE_NAME_LOOKUP.get( clazz );
    if ( tblName == null ) {
      tblName = tableName( clazz );
      TABLE_NAME_LOOKUP.put( clazz, tblName );
    }
    tableName = tblName;
    final BeanMap beanMap = new BeanMap( this );
    fieldNames = filter( beanMap.keySet(), IS_COLUMN );
    columnNames = transform( fieldNames, TO_COLUMN );
    map = beanMap;
}

  @SuppressWarnings( "unchecked" )
  protected AbstractLocaDbObj( final String tableName ) {
    super();
    this.tableName = tableName;
    final BeanMap beanMap = new BeanMap( this );
    fieldNames = filter( beanMap.keySet(), IS_COLUMN );
    columnNames = transform( fieldNames, TO_COLUMN );
    map = beanMap;
  }

  @Override
  public boolean equals( final Object obj ) {
    return EqualsBuilder.reflectionEquals( this, obj, EXCLUDED_FIELD_NAMES );
  }

  public String getColumnName( final String fieldName ) {
    return underscore( fieldName );
  }

  public Collection<String> getColumnNames() {
    return columnNames;
  }

  public Collection<String> getFieldNames() {
    return fieldNames;
  }

  public Integer getId() {
    return id;
  }

  public String getTableName() {
    return tableName;
  }

  @Override
  public int hashCode() {
    return HashCodeBuilder.reflectionHashCode( this, EXCLUDED_FIELD_NAMES );
  }

  public void setId( final Integer id ) {
    this.id = id;
  }

  public Map<String, Object> toMap() {
    return map;
  }

  @Override
  public String toString() {
    return new ReflectionToStringBuilder( this ).setExcludeFieldNames(EXCLUDED_FIELD_NAMES ).toString();
  }
}

A few things to point out. First, I’m making use of the Google Collections Function and Predicate. For more info on that check out the developer.com article. Second, I’m using Apache BeanUtils BeanMap which uses reflection to easily create a property to value map of a JavaBean. Finally, I’m storing the table names in a lookup since the pluralization a very expensive operation. The rest should hopefully be somewhat straight forward.

SimpleJdbcInsert

Using Spring’s SimpleJdbcInsert, combined with the heavy lifting in the abstract class, insertions are practically free as you can see in the following code.

// Get Datasource from Spring Context
SimpleJdbcInsert jdbcInsert = new SimpleJdbcInsert( dataSource ).withTableName( dbObj.getTableName() ).usingGeneratedKeyColumns( "id" );
final int id = jdbcInsert.executeAndReturnKey( new BeanPropertySqlParameterSource( dbObj ) ).intValue();
dbObj.setId( id );

Note: you can reuse a SimpleJdbcInsert since it is thread safe, I created a new one in the above code just to show how it is done.

SimpleJdbcTemplate

After inserts, the other two pieces of required functionality are query and update, both of which are supplied by Spring’s SimpleJdbcTemplate. The trick here is to make generous use of Spring’s BeanPropertySqlParameterSource and BeanPropertyRowManager which effortlessly map between JavaBean properties and database columns.

Lets start with the simple case of deleting an object.


simpleJdbcTemplate.update( "DELETE FROM " + dbObj.getTableName() + " WHERE id = :id", new BeanPropertySqlParameterSource( dbObj ) );

You’ll notice the string “id = :id” in the query. This allows Spring to do value substitution, using the BeanPropertySqlParameterSource to replace :id with the actual value of .getId() from the dbObj. With this in mind we can step up to the more complicated update case.

final Collection<String> parts = buildUpdateParts( dbObj );
if ( parts.isEmpty() ) { return; }
final String query = "UPDATE " + dbObj.getTableName() + " SET " + join( parts, ", " ) + " WHERE id = :id";
simpleJdbcTemplate.update( query, new BeanPropertySqlParameterSource( dbObj ) );

private static final Collection<String> buildUpdateParts( final LocaDbObj dbObj ) {
  return transform( filter( dbObj.getFieldNames(), not( IS_ID ) ), new BuildPart( dbObj ) );
}

private static final class BuildPart implements Function<String, String> {
  private final LocaDbObj dbObj;
  public BuildPart( final LocaDbObj dbObj ) {
    this.dbObj = dbObj;
  }
  public String apply( final String propName ) {
    return dbObj.getColumnName( propName ) + " = :" + propName;
  }
}

This code again makes use of google collections but the point is that it takes the field names and the column names from the DbObj and creates snips of sql setting the column name to the field name identifier (i.e. the field name prefixed with ‘:’) just like was done in the deletion case with “id = :id”. Continuing to build on the previous examples we finally come to the query case. This one is a bit difference since we are interested in reading results from the database as well as building a query so we will have to deal with the BeanPropertyRowMapper.

final Collection<String> parts = buildQueryParts( dbObj );
if ( parts.isEmpty() ) { return ImmutableList.of(); }
final String query = "SELECT " + columnNames + " FROM " + dbObj.getTableName() + " WHERE " + join( parts, " AND " );
return result = simpleJdbcTemplate.query( query, new BeanPropertyRowMapper<DbObj>( dbObj.getClass() ), new BeanPropertySqlParameterSource( dbObj ) );

private static final Collection<String> buildQueryParts( final LocaDbObj dbObj ) {
  return transform( filter( dbObj.getFieldNames(), new HasValue( dbObj.toMap() ) ), new BuildPart( dbObj ) );
}

private static final class HasValue implements Predicate<String> {
  private final Map<String, Object> map;
  public HasValue( final Map<String, Object> map ) {
    this.map = map;
  }
  public boolean apply( final String propName ) {
    return map.get( propName ) != null;
  }
}

Hopefully by this point the google collections syntax is becoming familiar. We want to include only object fields that are non null so we use the HasValue Predicate to check each property by name. From there on the rest is quite similar to the update method where we build parts setting each column name equal to its field name identifier. The only other difference is we use the BeanPropertyRowMapper so Spring can return us a list of the correct type of DbObj.

While certainly a bit heavy on the code I hope this will be useful to those trying to create a simple, straight forward database layer with Spring and ActiveRecord naming conventions. As a bit of a tease, the full code base that this is pulled from contains an genericized DAO object that contains basic annotation driven caching as well as the ability to execute CRUD commands via example objects, ids, sql snippets (i.e. where clauses), or full raw SQL all while keeping things nice and type safe as well as sql injection free using PreparedStatements and value substitution.

Approximating Java Case Objects without Project Lombok

LombokOver the past few months Dick Wall of the Java Posse has been talking up Project Lombok and for good reason. As he points out, its great for reducing boilerplate code in basic Java data objects. However, the more important point that Dick makes is that it prevents stupid errors, particularly when adding new object fields later on in the development cycle. The only unfortunate issue is that Project Lombok requires a compiler hack that can be rather confusing to those not using a supported IDE or for those coding outside of an IDE. However, Apache Commons provides an elegant solution to this issue.

Project Lombok

Lets start with a quick overview of Project Lombok. At its most basic level it provides a set of annotations (i.e. @Data) that you add to your Java data objects. You simply create private fields in an annotated object and the annotation will cause a full set of constructors, getters/setters, equals/hashCode and toString to be generated in the class file at compile time while keeping a simple and minimal source file. Its an elegant solution but it comes with a fair amount of magic surrounding it.

Alternatives

The simple alternative to Project Lombok is to have the IDE generate the code for you. Eclipse, for instance, is very good at doing this with auto generation of constructors, getters/setters, hashCode/equals and toString. This works wonders when creating the object for the first time but, as Dick points out, its too easy to add a new field and forget about updating toString and hashCode/equals leading to confusing and subtle bugs down the road.

The other alternative, and the one I am promoting, strikes a balance between Project Lombok and IDE generation by auto generating the hashCode/equals and toString and leaving the getters/setters and constructors up to the developer as these are necessary for the new field to be of any use.

Apache Commons

Buried deep in the Apache Commons is the builder package containing utilities such as EqualsBuilder, HashCodeBuilder, and ReflectionToStringBuilder. I don’t remember when I first discovered these gems but once I did the lightbulb went off; if used them in an Abstract base class they would ensure proper implementations of equals/hashCode and toString without having to mess with every data object I create.

import org.apache.commons.lang.builder.EqualsBuilder;
import org.apache.commons.lang.builder.HashCodeBuilder;
import org.apache.commons.lang.builder.ReflectionToStringBuilder;
import org.apache.commons.lang.builder.ToStringStyle;

public abstract class AbstractBaseObj {

    @Override
    public boolean equals( final Object obj ) {
        return EqualsBuilder.reflectionEquals( this, obj );
    }

    @Override
    public int hashCode() {
        return HashCodeBuilder.reflectionHashCode( this );
    }

    @Override
    public String toString() {
        return new ReflectionToStringBuilder( this).toString();
    }
}

Drawbacks

Of course this option is not with out its own drawbacks compared to Project Lombok. First off, as mentioned before, it does not deal with getters/setters or constructors. However, as I pointed out, I don’t believe this to be a huge imposition since creating a new field without creating getters/setters and/or a constructor using that field is fairly useless. Additionally, I like that this gives ease and flexibility over which constructors are created. This has been quite handy in the most recent project I’ve been working on.

Another draw back is that it requires an Abstract base class from which all objects must inherit. Annotations are certainly a more elegant solution since it doesn’t require a given object hierarchy. However there are often good reasons for a project specific base object for reasons beyond just equals/hashCode and toString. Additionally, the base class is fully owned by the project with just the few methods necessary being implemented. This is significantly better than having to inherit from a class provided by some 3rd party jar.

The important drawback, however,  is that the Apache Commons solution uses reflection instead of actually generating code to implement equals/hashCode and toString. Depending on how frequently these are used and how performant they need to be this might be a real consideration to use either Project Lombok or to write all the code directly. However, I have not found this to be the case in my project as it never seems to show up as a hot spot in any profiling I have done.

Synchronous Request Response with ActiveMQ and Spring

activemq-5-x-box-reflectionA few months ago I did a deep dive into Efficient Lightweight JMS with Spring and ActiveMQ where I focused on the details for asynchronous sending and receiving of messages. In an ideal world all messaging would be asynchronous. If you need a response then you should set up an asynchronous listener and either have enough state stored in the service or in the message that you can continue processing once the response has been received. However, when ideals lead to complexity, we have to make the decision of how much complexity can we tolerate for the performance we need. Sometimes it just makes more sense to use a synchronous request/response.

Request Response with JMS

ActiveMQ documentation actually has a pretty good overview of how request-response semantics work in JMS.

You might think at first that to implement request-response type operations in JMS that you should create a new consumer with a selector per request; or maybe create a new temporary queue per request.

Creating temporary destinations, consumers, producers and connections are all synchronous request-response operations with the broker and so should be avoided for processing each request as it results in lots of chat with the JMS broker.

The best way to implement request-response over JMS is to create a temporary queue and consumer per client on startup, set JMSReplyTo property on each message to the temporary queue and then use a correlationID on each message to correlate request messages to response messages. This avoids the overhead of creating and closing a consumer for each request (which is expensive). It also means you can share the same producer & consumer across many threads if you want (or pool them maybe).

This is a pretty good start but it requires some tweaking to work best in Spring. It also should be noted that Lingo and Camel are also suggested as options when using ActiveMQ. In my previous post I addressed why I don’t use either of these options. In short Camel is more power than is needed for basic messaging and Lingo is built on Jencks, neither of which have been updated in years.

Request Response in Spring

The first thing to notice is that its infeasible to create a consumer and temporary queue per client in Spring since pooling resources is required overcome the JmsTemplate gotchas. To get around this, I suggest using predefined request and response queues, removing the overhead of creating a temporary queue for each request/response. To allow for multiple consumers and producers on the same queue the JMSCorrelationId is used to correlated the request with its response message.

At this point I implemented the following naive solution:

@Component
public class Requestor {

    private static final class CorrelationIdPostProcessor implements MessagePostProcessor {

        private final String correlationId;

        public CorrelationIdPostProcessor( final String correlationId ) {
            this.correlationId = correlationId;
        }

        @Override
        public Message postProcessMessage( final Message msg ) throws JMSException {
            msg.setJMSCorrelationID( correlationId );
            return msg;
        }
    }

    private final JmsTemplate jmsTemplate;

    @Autowired
    public RequestGateway( JmsTemplate jmsTemplate ) {
        this.jmsTemplate = jmsTemplate;
    }

    public String request( final String request, String queue ) throws IOException {
        final String correlationId = UUID.randomUUID().toString();
        jmsTemplate.convertAndSend( queue+".request", request, new CorrelationIdPostProcessor( correlationId ) );
        return (String) jmsTemplate.receiveSelectedAndConvert( queue+".response", "JMSCorrelationID='" + correlationId + "'" );
    }
}

This worked for a while until the system started occasionally timing out when making a request against a particularly fast responding service. After some debugging it became apparent that the service was responding so quickly that the receive() call was not fully initialized, causing it to miss the message. Once it finished initializing, it would wait until the timeout and fail. Unfortunately, there is very little in the way of documentation for this and the best suggestion I could find still seemed to leave open the possibility for the race condition by creating the consumer after sending the message. Luckily, according to the JMS spec, a consumer becomes active as soon as it is created and, assuming the connection has been started, it will start consuming messages. This allows for the reordering of the method calls leading to the slightly more verbose but also more correct solution. (NOTE: Thanks to Aaron Korver for pointing out that ProducerConsumer needs to implement SessionCallback and that true needs to be passed to the JmsTemplate.execute() for the connection to be started.)

@Component
public class Requestor {

    private static final class ProducerConsumer implements SessionCallback<Message> {

        private static final int TIMEOUT = 5000;

        private final String msg;

        private final DestinationResolver destinationResolver;

        private final String queue;

        public ProducerConsumer( final String msg, String queue, final DestinationResolver destinationResolver ) {
            this.msg = msg;
            this.queue = queue;
            this.destinationResolver = destinationResolver;
        }

        public Message doInJms( final Session session ) throws JMSException {
            MessageConsumer consumer = null;
            MessageProducer producer = null;
            try {
                final String correlationId = UUID.randomUUID().toString();
                final Destination requestQueue =
                        destinationResolver.resolveDestinationName( session, queue+".request", false );
                final Destination replyQueue =
                        destinationResolver.resolveDestinationName( session, queue+".response", false );
                // Create the consumer first!
                consumer = session.createConsumer( replyQueue, "JMSCorrelationID = '" + correlationId + "'" );
                final TextMessage textMessage = session.createTextMessage( msg );
                textMessage.setJMSCorrelationID( correlationId );
                textMessage.setJMSReplyTo( replyQueue );
                // Send the request second!
                producer = session.createProducer( requestQueue );
                producer.send( requestQueue, textMessage );
                // Block on receiving the response with a timeout
                return consumer.receive( TIMEOUT );
            }
            finally {
                // Don't forget to close your resources
                JmsUtils.closeMessageConsumer( consumer );
                JmsUtils.closeMessageProducer( producer );
            }
        }
    }

    private final JmsTemplate jmsTemplate;

    @Autowired
    public Requestor( final JmsTemplate jmsTemplate ) {
        this.jmsTemplate = jmsTemplate;
     }

    public String request( final String request, String queue ) {
        // Must pass true as the second param to start the connection
        return (String) jmsTemplate.execute( new ProducerConsumer( msg, queue, jmsTemplate.getDestinationResolver() ), true );
    }
}

About Pooling

Once the request/response logic was correct it was time to load test. Almost instantly, memory usage exploded and the garbage collector started thrashing. Inspecting ActiveMQ with the Web Console showed that MessageConsumers were hanging around even though they were being explicitly closed using Spring’s own JmsUtils. Turns out, the CachingConnectionFactory‘s JavaDoc held the key to what was going on: “Note also that MessageConsumers obtained from a cached Session won’t get closed until the Session will eventually be removed from the pool.” However, if the MessageConsumers could be reused this wouldn’t be an issue. Unfortunately, CachingConnectionFactory caches MessageConsumers based on a hash key which contains the selector among other values. Obviously each request/response call, with its necessarily unique selector, was creating a new consumer that could never be reused. Luckily ActiveMQ provides a PooledConnectionFactory which does not cache MessageConsumers and switching to it fixed the problem instantly. However, this means that each request/response requires a new MessageConsumer to be created. This is adds overhead but its the price that must be payed to do synchronous request/response.

Efficient Lightweight JMS with Spring and ActiveMQ

Spring

© Darwin Bell

Asynchronicity, its the number one design principal for highly scalable systems, and for Java that means JMS, which in turn means ActiveMQ. But how do I use JMS efficiently? One can quickly  become overwhelmed with talk of containers, frameworks, and a plethora of options, most of which are outdated. So lets pick it apart.

Frameworks

The ActiveMQ documentation makes mention of two frameworks; Camel and Spring. The decision here comes down to simplicity vs functionality. Camel supports an immense amount of Enterprise Integration Patterns that can greatly simplify integrating a variety of services and orchestrating complicated message flows between components. Its certainly a best of breed if your system requires such functionality. However, if you are looking for simplicity and support for the basic best practices then Spring has the upper hand. For me, simplicity wins out any day of the week.

JCA (Use It Or Loose It)

Reading through ActiveMQ’s spring support one is instantly introduced to the idea of a JCA container and ActiveMQ’s various proxies and adaptors for working inside of one. However, this is all a red herring. JCA is part of the EJB specification and as with most of the EJB specification, Spring doesn’t support it. Then there is a mention of Jencks, a “lightweight JCA container for Spring”, which was spun off of ActiveMQ’s JCA container. At first this seems like the ideal solution, but let me stop you there.  Jencks was last updated on January 3rd 2007. At that time ActiveMQ was at version 4.1.x and Spring was at version 2.0.x and things have come a long way, a very long way. Even trying to get Jencks from the maven repository fails due to dependencies on ActiveMQ 4.1.x jars that no longer exist. The simple fact is there are better and simpler ways to ensure resource caching.

Sending Messages

The core of Spring’s message sending architecture is the JmsTemplate. In typical Spring template fashion, the JmsTemplate abstracts away all the cruft of opening and closing sessions and producers so all the application developer needs to worry about is the actual business logic. However, ActiveMQ is quick to point out the JmsTemplate gotchas, mostly that JmsTemplate is designed to open and close the session and producer on each call. To prevent this from absolutely destroying the messaging performance the documentation recommends using ActiveMQ’s PooledConnectionFactory which caches the sessions and message producers. However this too is outdated. Starting with version 2.5.3, Spring started shipping its own CachingConnectionFactory which I believe to be the preferred caching method. (UPDATE: In my more recent post, I talk about when you might want to use PooledConnectionFactory.) However, there is one catch to point out. By default the CachingConnectionFactory only caches one session which the javadoc claims to be sufficient for low concurrency situations. By contrast, the PooledConnectionFactory defaults to 500. As with most settings of this type, some amount of experimentation is probably in order. I’ve started with 100 which seems like a good compromise.

Receiving Messages

As you may have noticed, the JmsTemplate gotchas strongly discourages using the recieve() call on the JmsTemplate, again, since there is no pooling of sessions and consumers.  Moreover, all calls on the JmsTemplate are synchronous which means the calling thread will block until the method returns. This is fine when using JmsTemplate to send messages since the method returns almost instantly. However, when using the recieve() call, the thread will block until a message is received, which has a huge impact on performance. Unfortunately, neither the JmsTemplate gotchas nor the spring support documentation mentions the simple Spring solution for these problems. In fact they both recommend using Jencks, which we already debunked. The actual solution, using the DefaultMessageListenerContainer, is buried in the how do I use JMS efficiently documentation. The DefaultMessageListenerContainer allows for the asynchronous receipt of messages as well as caching sessions and message consumers. Even more interesting, the DefaultMessageListenerContainer can dynamically grow and shrink the number of listeners based on message volume. In short, this is why we can completely ignore JCA.

Putting It All Together

Spring Context XML

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:amq="http://activemq.apache.org/schema/core"
xmlns:jms="http://www.springframework.org/schema/jms"
xmlns:context="http://www.springframework.org/schema/context"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
http://activemq.apache.org/schema/core http://activemq.apache.org/schema/core/activemq-core-5.2.0.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-2.5.xsd
http://www.springframework.org/schema/jms http://www.springframework.org/schema/jms/spring-jms-2.5.xsd">

<!-- enables annotation based configuration -->
<context:annotation-config />
<!-- scans for annotated classes in the com.company package -->
<context:component-scan base-package="com.company"/>
<!-- allows for ${} replacement in the spring xml configuration from the system.properties file on the classpath -->
<context:property-placeholder location="classpath:system.properties"/>
<!-- creates an activemq connection factory using the amq namespace -->
<amq:connectionFactory id="amqConnectionFactory" brokerURL="${jms.url}" userName="${jms.username}" password="${jms.password}" />
<!-- CachingConnectionFactory Definition, sessionCacheSize property is the number of sessions to cache -->
<bean id="connectionFactory" class="org.springframework.jms.connection.CachingConnectionFactory">
    <constructor-arg ref="amqConnectionFactory" />
    <property name="exceptionListener" ref="jmsExceptionListener" />
    <property name="sessionCacheSize" value="100" />
</bean>
<!-- JmsTemplate Definition -->
<bean id="jmsTemplate" class="org.springframework.jms.core.JmsTemplate">
   <constructor-arg ref="connectionFactory"/>
</bean>
<!-- listener container definition using the jms namespace, concurrency is the max number of concurrent listeners that can be started -->
<jms:listener-container concurrency="10" >
    <jms:listener id="QueueListener" destination="Queue.Name" ref="queueListener" />
</jms:listener-container>
</beans>

There are two things to notice here. First, I’ve added the amq and jms namespaces to the opening beans tag. Second, I’m using the Spring 2.5 annotation based configuration. By using the annotation based configuration I can simply add @Component annotation to my Java classes instead of having to specify them in the spring context xml explicitly. Additionally, I can add @Autowired on my constructors to have objects such as JmsTemplate automatically wired into my objects.

QueueSender

package com.company;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.jms.core.JmsTemplate;
import org.springframework.stereotype.Component;

@Component
public class QueueSender
{
    private final JmsTemplate jmsTemplate;

    @Autowired
    public QueueSender( final JmsTemplate jmsTemplate )
    {
        this.jmsTemplate = jmsTemplate;
    }

    public void send( final String message )
    {
        jmsTemplate.convertAndSend( "Queue.Name", message );
    }
}

Queue Listener


package com.company;

import javax.jms.JMSException;
import javax.jms.Message;
import javax.jms.MessageListener;
import javax.jms.TextMessage;

import org.springframework.stereotype.Component;

@Component
public class QueueListener implements MessageListener
{
    public void onMessage( final Message message )
    {
        if ( message instanceof TextMessage )
        {
            final TextMessage textMessage = (TextMessage) message;
            try
            {
                System.out.println( textMessage.getText() );
            }
            catch (final JMSException e)
            {
                e.printStackTrace();
            }
        }
    }
}

JmsExceptionListener

package com.company;

import javax.jms.ExceptionListener;
import javax.jms.JMSException;

import org.springframework.stereotype.Component;

@Component
public class JmsExceptionListener implements ExceptionListener
{
    public void onException( final JMSException e )
    {
        e.printStackTrace();
    }
}

Update

I have finally updated the wiki at activemq.apache.org. The following pages now recommend using MessageListenerContainers and JmsTemplate with a Pooling ConnectionFactory instead of JCA and Jencks.

Top 5 Static Analysis Plugins for Eclipse

Eclipse Icon

Static Analysis

How is it that static analysis is still a best kept secret while so much lips service is paid to code reviews?  We have long since understood that boring repetitive jobs should be left for the machines, but when it comes to code inspection there seems to be a mental block.  Humans hold the advantage when reviewing architecture, use of design patterns and code organization, but a machine will find when a known null value is dereferenced 100% of the time and that can hardly be said for even the OCD coders I know.  Therefore, I present the top five static analysis tools for Eclipse.  While you might not have the delight of working on a project running continuous integration with a full unit test suite, code coverage, and static analysis, you can at least run your own private versions and rest comfortably knowing you’ve done your best.

Code Coverage

I assume at the very least that you’ve written some unit tests in Junit or maybe TestNG, the next step is to see if you are actually testing enough of your code.  Code coverage has saved me a number of times when I’ve been a little lax in my TDD discipline.  Having written some code, followed by some tests, its very easy to forget that one code branch and fail to exercise it; EclEmma to the rescue.  EclEmma installs a Coverage mode right next to the run and debug modes that are standard to Eclipse.  Simply run your unit test with the Coverage mode and bang, your code shows up bright green, yellow or red, letting you quickly know where you forgot a test or two.

Bytecode Analysis

The next level of analysis comes from the well respected FindBugs tool.  After installing the plugin, FindBugs inspects the actual .class files of your compiled code for well know bugs and other suspicious behavior.  I’m constantly surprised at the lack of false positives and overall quality of the bugs it finds, from comparing strings with ==, to failing to close resources, and more, its spot on.

Code Complexity Analysis

Next in my list of favorites is complexity analysis.  These metrics, from simple lines of code in a method to the lesser know but more powerful Cyclomatic Complexity and Efferent Coupling, are the closest thing to a “stinky code” test that software engineering has come up with.  If you strive for loose coupling and DRY code, then EclipseMetrics is right up your alley.  Once it is installed you will see warnings where your methods are overly long or complex, or your classes are poorly organized.  Let the OCD run wild.

Dependancy Analysis

Dependancies are one of those things that can go unnoticed until they bite you in the rear.  Once circular dependancies creep in it can become incredibly hard to break apart and modularize your code.  This might not seem important in a small to medium sized project but once a project scales up, the need to break it apart becomes critical and keeping your ducks in a line from the get go makes life that much easier.  For this task we employ JDepend4Eclipse.

Source Code Analysis

Finally we come to PMD. This is the twin brother of FindBugs.  Where FindBugs checks the .class file, PMD checks the .java file.  There is a lot of overlap here but you can find some interesting things like unused code that javac might have optimized away in the .class file.

Monitoring ActiveMQ Using JMX Over SSH

Frustration If you want to know true confusion and frustration I suggest you attempt to monitor ActiveMQ using JMX over SSH port forwarding.

For those who use ActiveMQ, the JMX monitoring is some pretty impressive stuff.  You can dive into connections, queues, topics, subscriptions, etc and get stats about the current state of the system.  However, this nirvana of information is practically unreachable when AMQ is in a data center.  For the first six months or so working on AMQ I was able to get around this using the web console which gives some level of detail on topics and queues but once we moved into trying to debug bigger and bigger problems the console wasn’t cutting it any longer.

The Investigation Begins

As with anything AMQ, I started looking into the JMX documentaion.  Step one was to enable JMX in the activemq.xml file.

<broker useJmx="true" brokerName="BROKER1">

Fire it up, forward port 1099 (the default JMX port), give it a try, and, no go.  Come to find out, JMX uses two ports, one which is defaults to 1099 and one which is negotiated at runtime.  No worries, there is an “Advanced JMX Configuration” seciton in the documentation which lead me to this xml configuration.

<managementContext>
    <managementContext connectorPort="2011" jmxDomainName="test.domain"/>
</managementContext>

Awesome, I can change the default 1099 port but nothing else.  The AMQ documentation was failing me, as usual.  Google, here I come.  After some searching around I was able to dig up AMQ-892.  (Power user tip, search AMQ’s Jira site once the documentation has inevitably failed.)  It turns out you can configure the RMI server port and have been able to do so since version 4.1.

<managementContext connectorPort="11099" rmiServerPort="11119" jmxDomainName="org.apache.activemq"/>

I particularly liked the “This necessitates a documentation update in the wiki.” comment from 2007.  Way to jump on that.  Confident that this is bound to work I forward port 11099 and 11109, give it a whirl, and, failure.  It was at this point that I started to despair.

Deeper Down The Rabbit Hole

Quickly running out of options I decided to try running the same configuration on a dev box in the office, and of course I could connect to it flawlessly.  To add insult to injury I tried another dev box and this one failed.  The only difference I could easily see was one was running a VM image and the other was native but this didn’t point at any easily understandable target.  I was done.  I conceded defeat.  I would soldier on without JMX.

Meanwhile, my co-worker had been pressuring me to start packet sniffing with wireshark.  I had resisted, mostly because I dislike getting that low level, but also because RMI is a binary protocol and I didn’t think I could learn much.  However, I was bloody and beaten at this point and gave in. I downloaded the package, fired it up and started sniffing away.  Sure enough the binary was a bunch of gibberish, but tucked away in there was one string, the hostname of the amq server I was trying to connect to.  What the hell is this?!?  That question is left to the reader but at least the issue was clear, the hostname was inside the data center and not something I could connect to from my machine.  It also jived with the test on the local dev boxes.  The box with the VM image had an internal hostname that was not in the office dns, where as the native machine had an addressable hostname.

I attacked the problem with renewed vigor.  I found a JConsole FAQ on the AMQ website that I had previously dismissed for its cryptic message.  However, this time it made sense.  I quickly added

-Djava.rmi.server.hostname=127.0.0.1

to the Java Service Wrapper configuration for AMQ so communication would be forced back through the SSH tunnel, rebooted AMQ, and, wait for it, VICTORY.

The Morale of the Story

So the short answer is add the broker configuration, the management context (the second on up there), and the rmi hostname configuration, port forward 11099 and 11109, and fire up jconsole

jconsole service:jmx:rmi://127.0.0.1:11119/jndi/rmi://127.0.0.1:11099/jmxrmi

As with most of these posts, so easy once you know, but so hard to get there.

Follow

Get every new post delivered to your Inbox.