Mike Sullivan

01/18/2012

NTLMv2 authentication from Java: A developer's odyssey

Mike Sullivan // in Technology

A project in the works here at Vodori involves making a set of SOAP web service calls to an external system. This is generally a pretty routine exercise: set up the correct client code from the WSDL, integrate some kind of delivery mechanism, and then make the calls. At least, it's routine when it isn't hitting one roadblock after another. 

Getting started: Axis to Axis2 to Spring-WS

For our purposes, we started out using Apache Axis since it was a pre-existing piece of our platform. We used the existing WSDL2JAVA process to generate the client and connectivity code. 

However, when we tried to actually connect to the system, we received a 401 response—our credentials were denied for failing an NTLM authentication. Digging around, I determined that the most likely culprit was a version mismatch between the NTLM support in Axis and the latest NTLMv2 support by the web service server. 

We tried switching out to the newer Axis2 project, which boasted better NTLM support. Unfortunately, a change in the WSDL2JAVA process consolidated all of the code into a massive 11 MB, 226k-line Java source file. Including that file in our project ground IntelliJ to a halt, and it started throwing OutOfMemoryException errors whenever it would try to compile. 

At this point, we switched to JAXB2's xjc jar to recreate the client files and Spring's Spring-WS package to handle the transport. This provided a simpler code setup and much greater visibility into how the calls were being made. Since we're living in 2012 and not 2006, we probably should have started here in the first place. 

Programmer vs. programming, Part 1
Programmer vs. programming, stage 1: The die is cast.


More problems

With the simpler setup in place, we were able to isolate the source of our troubles: Apache's HTTPClient. This library is pretty rock-solid and has been around for a while, and everyone uses it. It is the default (i.e., only) option for Axis, Axis2, and Spring-WS's latest release. 

One drawback: the library doesn't support the latest authentication schemes, and Apache has since replaced it with the HTTPComponents project and its own HTTPClient class. A major side effect of this change was a wholesale break from the old 3.X HTTPClient codebase and package structure, precluding its use as a drop-in replacement. 

Of course, we aren't the only people encountering such snags. The ubiquity of these tools within the Java and Spring ecosystems has generated plenty of discussion and advice to draw upon. Looking at the Spring-WS site and its JIRA, I came across issue SWS-563, which is a request for exactly what I needed.  

The bad news is, it won't see the light of day until 2.1 (or 2.1M1 for now), and Spring hasn't yet published that version's release calendar. The good news is, attached to SWS-563 are the three files updated by the Spring team to solve this issue. Downloading them and plugging them into my local codebase overwrote them in the Classpath, thus enabling me (for now) to use the most recent HTTPClient code. 

Believing I had solved the issue, I ran my unit tests once more to validate the connection. 

They failed.

Programmer vs. programming, Part 2
Programmer vs. programming, stage 2: This time, it's personal.


Third
Fourth Eighth time's the charm

Being stubborn—and a geek—I decided to take a deeper look at the authentication messages. I logged out the actual NTLM handshake messages, ran them through a Base64Decoder, and analyzed the structure. 

I found that part of the NTLMv2 handshake, our Type-3 message, was failing due to incorrect formatting. I chalked this up to the HTTPClient still not working correctly, so I went back to the Apache site and noticed this message in their NTLM guide

"HttpClient as of version 4.1 supports NTLMv1 and NTLMv2 authentication protocols out of the box using a custom authentication engine. However, there are still known compatibility issues with newer Microsoft products as the default NTLM engine implementation is still relatively new."

Luckily, they include some sample code that uses the Samba team's JCIFS implementation as their NTLM engine. Pulling down that code, creating the relevant classes, and wiring it in was easier than I expected. With all of these components in place—JAXB2 2.X generated files, Spring-WS transport code, the Spring 2.1M1 files, HTTPComponents, and JCIFS—the web services connected successfully.

 

Share Article

Mike Sullivan

02/04/2011

Setting up OpenJPA 2 with Spring, Junit, Maven 3 and Tomcat

Mike Sullivan // in Technology

Recently, my team and I started a new project and in an effort to stay on top of things, we decided to upgrade the underlying persistence layer from using OpenJPA 1.2  to OpenJPA 2.0.1. While I was familiar with the JPA specification and had used other ORM libraries in the past, OpenJPA was new to me, so we decided to leverage the existing configurations and extend them first before we attempted a wholesale upgrade. That way we could inspect our existing setup and test some of the limits of OpenJPA 1, and hopefully would make the transition to OpenJPA 2 a little smoother.

While setting up the application using an in memory HSQL database for our testing, the JPA code was simply not cooperating. First we loaded up a test case and saw the SQL CREATE TABLE and ALTER TABLE statemements, then we got the following error

org.apache.openjpa.persistence.ArgumentException: Table"MY_SCHEMA.MY_TABLE" given for "com.package.classes.Class" does not exist.

It took me the better part of two days to realize OpenJPA’s support for Schema’s in generated tables is somewhere between awful and non-existent. Then I got hung up on runtime class enhancement. Unenhanced classes are functional but much slower than their enhanced counterparts. Adding a javaagent to the JVM enabled dynamic enhancement and it worked.

Now on to the upgrade

Once we swapped out the underlying libraries, the only things breaking seemed to be some smaller issues such as annotations but nothing major. After some re-working of the META-INF/persistence.xml file and the spring declarations for the Entity Manager Factory and related classes, things looked okay. And once I updated the path from

-javaagent:/<PATH>/openjpa-1.2.2.jar

to

-javaagent:/<PATH>/openjpa-2.0.1.jar

our test cases ran perfectly inside of IntelliJ.

Since I don't get paid to just build running test cases, I had to get this up and running in Tomcat. 

Enhancing OpenJPA 2 Entities in Tomcat

This proved to be a lot more difficult. In order to get the javaagent working, I added the open openjpa-2.0.1.jar to the /lib directory of the tomcat installation and updated our catalina.bat (yes this was on Windows) file to include the line

set JAVA_OPTS=%JAVA_OPTS% -javaagent:"%CATALINA_HOME%\lib\openjpa-2.0.1.jar"

This enables Runtime Enhancement for the OpenJPA2 Entities in our persistence.xml file. On startup I got a host of NoClassDefinedErrors and using maven, IntelliJ and Google realized I needed to add the following files into my tomcat /lib folder

commons-lang-2.4.jar
geronimo-jpa_2.0_spec-1.1.jar
geronimo-jta_1.1_spec-1.1.1.jar
serp-1.13.1.jar
log4j-1.2.14.jar
commons-collections-3.2.jar

And update the catalina.bat file again with:

set CLASSPATH=%CLASSPATH%;%CATALINA_HOME%\lib\commons-lang-2.4.jar
set CLASSPATH=%CLASSPATH%;%CATALINA_HOME%\lib\geronimo-jpa_2.0_spec-1.1.jar
set CLASSPATH=%CLASSPATH%;%CATALINA_HOME%\lib\geronimo-jta_1.1_spec-1.1.1.jar
set CLASSPATH=%CLASSPATH%;%CATALINA_HOME%\lib\serp-1.13.1.jar
set CLASSPATH=%CLASSPATH%;%CATALINA_HOME%\lib\log4j-1.2.14.jar
set CLASSPATH=%CLASSPATH%;%CATALINA_HOME%\lib\commons-collections-3.2.jar

The app started up, however the JPA entities weren't being enhanced at load time.  OpenJPA essentially doesn't support unenhanced entites, so any attempt to access them failed.

Enabling Load Time Weaving

After a lot of digging without results, I decided to track down an error I saw in startup that hadn’t prevented OpenJPA 1 from working:

Caused by: java.lang.IllegalStateException: Cannot apply class transformer without LoadTimeWeaver specified

I initially dismissed this error as it is caught and ignored by OpenJPA and doesn't prevent my test cases from running properly. Unfortunately, OpenJPA2 requires load time weaving to be enabled for it to work in an application server. I still haven’t found where this is documented outside of Spring’s ORM documentation but using that as a guide I went into my tomcat /conf/context.xml and added

<Loader loaderClass="org.springframework.instrument.
classloading.tomcat.TomcatInstrumentableClassLoader"/>

and moved the spring-instrument-tomcat-3.0.5.RELEASE.jar file into my tomcat /lib directory. This step adds a second ClassLoader to your Tomcat installation since the default Tomcat ClassLoader doesn’t support runtime proxying the way Spring and OpenJPA require. 

With that, it all seemed to work... until we started our automated build process using Maven 3 and Junit 4.7 test cases. 

Running OpenJPA 2 JUnit Tests in Maven 3

Tests were failing with the Runtime Enhancement errors we saw above. I edited the pom.xml file and updated the maven-surefire-plugin configuration to include

<argLine>-javaagent:"${user.home}/.m2/repository/org/apache/openjpa/openjpa/2.0.1/openjpa-2.0.1.jar"</argLine>

I re-ran our build and it failed, and with the LoadTimeWeaver issue. The solution to that is to use the Spring Agent jar as a javaagent. Thankfully, you can chain as many javaagents as you want onto the command line so I updated the argLine tag to

<argLine>-javaagent:"${user.home}/.m2/repository/org/springframework/spring-agent/2.5.6/spring-agent-2.5.6.jar" -javaagent:"${user.home}/.m2/repository/org/apache/openjpa/openjpa/2.0.1/openjpa-2.0.1.jar"</argLine>

and it ran fine locally. 

Moving it to the Build Server

When I finally committed all of these changes, our build server kicked off a build… and it failed. Since the spring-agent-2.5.6.jar and  openjpa-2.0.1.jar aren’t used in the application, they weren’t included in the pom.xml file and subsequently didn’t exist on the build server. To fix this I added

<dependency>
  <groupId>org.apache.openjpa</groupId>
  <artifactId>openjpa</artifactId>
  <version>2.0.1</version>
  <scope>test</scope>
</dependency>
<dependency>
  <groupId>org.springframework</groupId>
  <artifactId>spring-agent</artifactId>
  <version>2.5.6</version>
  <scope>test</scope>
</dependency>

to my pom.xml and everything worked perfectly. Now all I have to do is actually write the application that uses it.

 

Share Article

Mike Sullivan

07/30/2010

The Value of Debugging Part 2: What They Don't Teach You in School

Mike Sullivan // in Technology

In my first post on this subject, I introduced the concept of separating debugging and development as related but different skills. I will further explore this distinction and hopefully highlight the value of debugging as a tool in your technical arsenal.

Debugging has always been one of my stronger skills as a developer, so naturally I think it is pretty important. I view it as an extension of tinkering - taking things apart to see how they work, modifying them and putting them back together. Most people enjoy the creative process. Taking something from your head and transforming it into something real is amazing, but I relish the chance to take something apart and improve it. If it was my creation to begin with, I also get to see how I initially thought about the problem and how my knowledge or approach has changed over time.

Debugging makes the difference

When I was in school (both as a student and as a teaching assistant), nearly all of the focus of the software development instruction was on languages, algorithms, efficiency and organization. All of those things are vital, but one of the largest variances I saw among the best performing students was their ability to quickly and consistently debug problems. When implementing a recursive search routine or Dijkstra's algorithm for graph traversals, the objects and code may be simple, but the resulting behavior is complex and likely to result in confusing errors.

Most students would go from algorithm to code to execution in about the same time, but as soon as something broke, the differences became apparent. Naturally good debuggers would quickly isolate the issue, correct it and start the loop all over again - while other students struggled to step through the code in their head or on the computer. This process is sometimes referred to as the OODA loop for observe, orient, decide and act.

With the bulk of the instruction on analyzing problems, distilling solutions and implementing those solutions as efficiently as possible, there is little room left for the nuances of debugging. If we were all perfect, those lessons would be all we needed. But in reality, we all introduce faults into our code - misunderstanding the problem, logic errors, typos, incorrect variable references, race conditions and a number of other things. These lead to incorrect and possibly erratic behavior. Learning how to isolate and recreate the error, to intelligently crack open the black box, is an equally important skill. 

Be a better developer: learn a multitude of debugging techniques

Because it is so overlooked pedagogically, differences in debugging ability in the ranks of new developers is largely a component of talent or personal diligence. Going from your mostly-homegrown academic work to an established, and potentially old, suite of existing code is really going to test your ability to poke around the edges of a problem and make strategic forays into the depths of the system. Sometimes you won't be able to fire up GDB or step through your code in debug mode. Other times, you won't be able to insert print statements or modify the logging. There will be times when you won't be able to do either. The better your arsenal of tools, techniques, intuitions and experiences, the more effective a debugger and, ultimately, developer you'll be. 

 

Share Article

Mike Sullivan

07/20/2010

The Value of Debugging Part 1: A Puzzle in a Black Box

Mike Sullivan // in Technology

DebuggingA few years back, an application my company supported was suffering some serious operational issues. While this was lighting up our customer service lines, I was scheduled to meet my girlfriend for dinner. Not one to keep a lady waiting, I called her to discuss my plans for the evening. In describing the situation keeping me at the office when everyone else had left, I needed something to explain why I didn't know when I would be free.

Then it came to me, partly inspired by Churchill's "It is a riddle, wrapped in a mystery, inside an enigma." I was dealing with a puzzle wrapped in a black box. Not the airplane-crash-surviving black box, an ominous, bug-hiding, potentially relationship-straining black box.

The bug

After a few days of uptime, errors showed up - to all appearances randomly. The same user, performing the same action, on the same data, just a few seconds apart, would fail or succeed without any seeming difference. The pattern of successes and failures looked to eliminate any single point of commonality. Nearly every page served by the application was vulnerable, but none were consistent. 

Worst possible scenario

These 'random' issues are the worst possible scenario - repeatable things can be tracked, fixed and tested with a high degree of confidence. Without reliable processes to introduce the behavior, you can never really be sure you've found the root cause of the problem, never mind fixed it.

So I'm holding this black box, and clients and my girlfriend are asking how long it will take me to complete the puzzle inside. I can't see inside it - I don't have any details on what awaits me. It could be those 5-piece puzzles you give infants. It could be a 500-piece skyline of Paris. It could be a 5000-piece surrealist painting without a picture on the box. And that was my dilemma: debugging a complex system behaving in ways I didn't understand and trying to estimate how long it would take to fix.

It separates good developers from great developers

That experience is hardly unique. Every day, developers, engineers and consultants are brought in to solve difficult problems in myriad environments with differing levels of control, access and information. And it is these situations where an often overlooked skill, debugging, separates a good developer from a great one. Most professional developers can solve the puzzles once they know what is broken, how it is failing, and what the intended behavior is. Gauging effort and implementing a solution may be time-consuming, but it is ultimately finite. Getting to the puzzle is the hard (and to me, fun) part of this whole process.

Back to my dinner-delaying dilemma, I did the only thing I could - buy time, cross my fingers and get to work. Thankfully, after an hour or so of stepping through logs from both our production and testing environments, and multiple iterations of write/run/test on my local machines, I managed to construct a scenario that would consistently cause the application to behave erratically. Getting it into that state was the key to unlocking the black box, however complicated my multi-user, multi-step was. Once done, I quickly surveyed the underlying system and scoped out the issue. 

Armed with that knowledge, I picked up the phone, changed my reservations and confirmed the new time with my girlfriend. It was still a few hours away, but that was the point; it wasn't an indeterminate amount of time away. It wasn't forever away. It was three hours. I arrived to dinner on time, exhausted but accomplished.

 

Share Article

Mike Sullivan

05/07/2010

Object-Relational Mapping in Enterprise Applications

Mike Sullivan // in Technology

A core piece of every modern application is the manipulation of data; this commonly means interacting with a relational database using SQL. Until about a decade ago, these interactions were performed either through Enterprise Java Beans (EJB's), or on an ad hoc basis using explicit SQL statements either in code or an external text file. Due to the infrastructure requirements needed to host an EJB application, and the complications that came with using them, many solutions avoided using EJB's.

Without a framework to replace EJB's, the direct SQL approach was used. This process required a very tight coupling between the database and the application code. Some applications found ways to limit these interactions into as small a number of places as possible, but even with these limitations the relationship was inherently fragile and prone to breaking whenever either database or application needs changed.

In the late 90's and Early 2000's, the idea of using a common approach to handle the inner workings these connection points became popular. A set of open source tools known as object-relational mapping (ORM) frameworks were developed and gained traction with developers, most notably TOPLink and Hibernate. These allowed users to configure their domain object to database relationships in one place and remove all of their database code without the overhead of EJB's. The ability to use plain old java objects (POJO's) to serve as your application model AND your database model was a major breakthrough.

By enabling a simple and unified place to configure the ways your application interacts with the database, these tools have improved the pace of software development and resulted in more reliable products. For basic CRUD (create, read, update, and delete) operations these mappings allow you to develop software without ever having to spend time on routine database operations or connection management.

Unfortunately, with those improvements come drawbacks. Having all of the mappings embedded in either external configuration files or (as of Java 5) annotations, it can be harder to discern and modify the underlying relationships. In order to simplify the common operations, the less common operations have become more difficult. This is a good tradeoff and in most cases you never run into the downsides. 

However, when you do run into them they can be extremely difficult to diagnose or work around. One great example of those tradeoffs is querying summary data from multiple tables. To illustrate this, assume you have three tables defined like this:

TABLE_A[
 ID: NUMBER,
 NAME: STRING
]

TABLE_B[
 ID: NUMBER,
 A_ID: NUMBER,
 
VALUE: NUMBER
]

TABLE_C[
 ID: NUMBER,
 B_ID: NUMBER,
 VALUE: NUMBER
]

Where TABLE_A has a one-to-many relationship to TABLE_B on column A_ID, and TABLE_B has a one-to-many relationship to TABLE_C  on column B_ID . If you wanted to list out the names for each row of TABLE_A, the number of rows of TABLE_B, sum of the values on TABLE_B, the number of rows of TABLE_C and the sum of  values of TABLE_C  here's what the SQL could look like:

SELECT
 TABLE_A.NAME,
 COUNT(TABLE_B.ID) as B_ROWS,SUM(TABLE_B.VALUE) as B_VALUE,
 COUNT(TABLE_C.ID) as C_ROWS, SUM(TABLE_C.VALUE) as C_VALUE
FROM TABLE_A, TABLE_B, TABLE C
WHERE
 TABLE_A.ID = TABLE_B.A_ID AND
 TABLE_B.ID = TABLE_C.B_ID

The query would be run and the result would then be processed directly into a result set or custom made object to hold the data.

However, using Hibernate, you would need to set up the relationships between OBJECT_A, OBJECT_B and OBJECT_C and their respective tables, and between each other. For our exercise we will leave the object field names and table column names the same. Using Hibernate's CRUD features we would have to load all of the data into memory and process it iteratively, this would be inefficient in both memory and time. You can bypass those features by using HQL, a Hibernate language modeled after SQL, and querying the database directly. The HQL would look like this:

SELECT
 OBJECT_A.NAME,
 COUNT(OBJECT_B.ID), SUM(OBJECT_B.VALUE),
 COUNT(OBJECT_C.ID), SUM(OBJECT_C.VALUE)
FROM OBJECT_A, OBJECT_B, OBJECT_C

This would then be loaded into a set of maps with the column names serving as the keys. This results in a larger, more complicated data structure and potentially much larger data footprint, though smaller than the CRUD alternative. In addition, the HQL is slightly simpler, but by disaggregating the relationships of the objects from the queries, it makes it more likely that N-dimensional relationships are misunderstood, misused or modified incorrectly. 

In addition to the need to still write database queries, basic loading operations can still present problems with these structures. Depending on how the data relationships are set up, loading or accessing data from these structures can be difficult or impossible to do without extensive debugging and customization.

The simplest way to set up the relationships would be to use lazy-loading to populate the set of OBJECT_B's only when needed, the same for the OBJECT_C's. However, if the OBJECT_B's are accessed outside of the database transaction (such as in the view layer of an application) this will quickly result in lazy loading exceptions. The first response to this is to force the loading of the OBJECT_B's through a non-lazy mapping. This will fix the immediate problem, but at the expense of potentially increasing the number of database calls and definitely increasing the size and processing time of each load.

After loading a collection of OBJECT_B's every time you want to access an OBJECT_A, you will still be left with lazy loading issues when you try to access the OBJECT_C's stored inside the OBJECT_B's. Using the same logic as before, we can enforce this relationship as well. This further complicates the loading process, so that even simple loads of even one instance of OBJECT_A can result in dozens, hundreds or thousands of other rows being queried, loaded and processed.

These tradeoffs between lazy and eager loading, between ease of retrieval and efficiency, are the complexities you inherit with any framework. The more complex your data structures, the more complex these tradeoffs become, and the more complex your configurations become as well.  Ultimately, these tradeoffs are common when using a third-party component instead of writing your own solution. The value you receive from them is dependent on the specifics of your situation, though the use of ORM tools is widely accepted.

 

Share Article