JNBridgePro and Java 8

Tuesday, Mar. 25th 2014

Java 8 has a couple of new features (particularly, static and default methods in interfaces) that create problems for our current JNBridgePro 7.1. We will be coming out shortly with a new version that supports Java 8 along with previous versions of Java, but if you’re currently using JNBridgePro 7.1 and are having problems with Java 8, contact us and we’ll send you a patch.

Posted by Wayne | in Java, JNBridgePro | No Comments »

Decoding encodings: XML, JMS and the BizTalk Adapter

Monday, Mar. 10th 2014

How does one view an XML document? Notepad, or the XML editor in Visual Studio? XML is text, after all, so theoretically any text editor will do the job. Right?

It’s not really WYSIWYG

Simply put, XML is a binary format. Consider this simple XML document composed in the Visual Studio XML editor:

<?xml version=“1.0″ encoding=“utf-8″?>
<SimpleTest xmlns=“http://www.jnbridge.com/simple/test/schema”>
<degreeSymbol>°</degreeSymbol>
</SimpleTest>

If this file is saved to disk, the encoding is truly UTF-8, including the UTF-8 Byte Order Marker. However, if I compose this same document in a generic text editor and save it to disk, the encoding will most likely be Windows 1252, not UTF-8. Lets say I highlighted and copied the XML from the Visual Studio XML editor to the clipboard. When I paste the XML into another editor and save it to disk, the resulting encoding will not be UTF-8 because the encoding copied and pasted is Windows 1252. My point is that the very first line of the document, <?xml version=”1.0″ encoding=”utf-8″?>, doesn’t necessarily indicate that what you see is what you get.

For the most part, UTF-8 and Windows 1252 encodings are identical if all the characters are single byte. However, the degree symbol, °, is different. In Windows 1252 (and many other encodings), the degree symbol in hex is 0xB0, a single byte. In UTF-8, the degree symbol is multi-byte, 0xC2B0. If the document is composed without regard to the underlying encoding, it’s very easy to end up with an UTF-8 document containing an illegal character. The problem is that just viewing the document in a text editor isn’t going to tell you that. To be absolutely sure, you need a good binary editor that has the ability to convert between encodings according to the specifications behind the encodings.

When encoding meets code

Recently, a customer using the JNBridge JMS Adapter for BizTalk Server ran into some unexpected behavior. The JMS BizTalk adapter was replacing the File Adapter as the customer was moving to a messaging solution centered on JMS. Occasionally, a UTF-8 XML document would contain an illegal character. When the file adapter was used, the XML pipeline would throw an exception during disassembly when the illegal UTF-8 character was encountered. The failed message was routed to a directory where the illegal characters were presumably fixed and the message resubmitted. When the customer moved to the JMS adapter, there were no failed messages, even those containing an illegal character, in this case, the one-byte degree symbol. The final XML document, wherever it was routed to, now contained ‘ ï¿½ ‘ instead.  The byte 0xB0 had been replaced by three bytes: 0xEF, 0xBF and 0xBD. The customer, understandably, was confused.

The problem got its start when the message was published to the JMS queue. At some point, Java code similar to this executed. The code is Java 7.

byte[] rawBytes = Files.readAllBytes(Paths.get(somePath));
String jmsMessageBody = StandardCharsets.UTF_8.decode(ByteBuffer.wrap(rawBytes)).toString();

A UTF-8 XML file is read as raw bytes, then explicitly converted from UTF-8 to a java.lang.String. The string, jmsMessageBody, is used to create a JMS Text Message that will be published to a queue. Though not entirely obvious, the above line of code has just performed a conversion. A Java string uses UTF-16 encoding. During  the conversion, any illegal UTF-8 characters, like 0xB0, are converted to the UTF-16 replacement character, 0xFFFD. This mechanism is part of the Unicode specification.

When the JMS BizTalk adapter receives the JMS Text Message, it must convert the contained text to the expected encoding, UTF-8, before submitting the message to the BizTalk DB. As per the Unicode specification, the UTF-16 replacement character, 0xFFFD, is converted to the UTF-8 replacement characters: 0xEF, 0xBF and 0xBD. When the File Adapter was used, the pipeline threw an exception because the byte, 0xB0, was illegal. When using the JMS adapter, the UTF-8 replacement characters are perfectly legal, hence the pipeline disassembled the document correctly.

Conclusion

The JMS specification says this about JMS Text Messages:

The inclusion of this message type is based on our presumption that String messages will be used extensively. One reason for this is that XML will likely become a popular mechanism for representing the content of JMS messages.

Of course, this was written in 1999, when XML was still used to markup content rather than define it. Should one use Text Messages to carry XML documents as payloads? I believe the answer is ‘no’. If the customer had used JMS Byte Messages instead of Text Messages, the behavior when they switched to JMS would have been the same as with the file adapter. Illegal characters would have been caught by the pipeline and subsequently fixed with the process that was already in place. Instead, the document ends up with the garbage characters ’ ï¿½ ‘  instead of the intended degree symbol.

Using Byte Messages for XML is also more efficient. Not only would there be no unnecessary conversions from UTF-8 to UTF-16 and back again, but there would be no message bloat. Converting UTF-8 to UTF-16 can double the size of the XML document. That’s a 100% increase in size. Why incur the cost to build, publish and deliver JMS messages that are twice the size of the original document?

Posted by William Heinzman | in Adapters, BizTalk Server, Java | No Comments »

Forcing Visual Studio 2013 support for the WCF JMS Adapter

Thursday, Dec. 5th 2013

The JNBridge JMS Adapter for .NET is dependent on the Microsoft Windows Communication Foundation Line-Of-Business Adapter SDK. The SDK is released with versions of BizTalk Server, not Visual Studio. This year, BizTalk 2013 was released in the spring, but Visual Studio 2013 was released this fall, so the current version of the SDK doesn’t support VS 2013.   While it would be easy to tell our customers  interested in integrating JMS into .NET that they must use VS 2012 or 2010, it doesn’t provide much for the customers using VS 2013.

For the new release of the JNBridge JMS Adapter for .NET, the solution was for us to explicitly install and configure the WCF LOB SDK for VS 2013, as part of the adapter installation. This consists of four assemblies and registry entries into the Visual Studio registry hive, HKLM\SOFTWARE\Microsoft\VisualStudio\12. Along with the work inherent in getting the SDK to support VS 2013, JNBridge must also explicitly support the SDK for our end-users.

Couldn’t we just wait until the SDK was refreshed by Microsoft? We could, but the point is that by installing and supporting the SDK now, the customer, JNBridge and Microsoft all benefit.

Posted by William Heinzman | in Adapters, New releases, WCF JMS Adapter | Comments Off

Announcing JNBridgePro 7.1 and versions 3.1 of the JMS Adapters for .NET and BizTalk Server

Thursday, Dec. 5th 2013

JNBridgePro version 7.1 and versions 3.1 of the JMS Adapters for .NET and for BizTalk Server are released!

JNBridgePro now supports Visual Studio 2013, and completes the “any-CPU” feature to include specifing separate 32-bit and 64-bit JVMs in a single shared-memory application.

The JMS Adapters for .NET and for BizTalk now provide support for the JMS 2.0 specification, and a new unified installer for both 32 and 64-bit.

Download a full-featured 30-day trial today from www.jnbridge.com/downloads.htm.

Posted by JNBridge | in Adapters, Announcements, JNBridgePro, New releases | Comments Off

Creating a .NET-based Visual Monitoring System for Hadoop

Monday, Oct. 21st 2013

Summary

Generic Hadoop doesn’t provide any out-of-the-box visual monitoring systems that report on the status of all the nodes in a Hadoop cluster. This JNBridge Lab demonstrates how to create a .NET-based monitoring application that utilizes an existing Microsoft Windows product to provide a snapshot of the entire Hadoop cluster in real time.

Download the source code for this lab here, get a PDF of this Lab here. Click the image below to see a .GIF preview of the app in action.

visualizer

Introduction

One of the ideals of distributed computing is to have a cluster of machines that is utterly self-monitoring, self-healing, and self-sustaining. If something goes wrong, the system reports it, attempts to repair the problem, and, if nothing works, reports the problem to the administrator — all without causing any tasks to fail. Distributed systems like Hadoop are so popular in part because they approach these ideals to an extent that few systems have before. Hadoop in particular is expandable, redundant (albeit not in a terribly fine-grained manner), easy-to-use, and reliable. That said, it isn’t perfect, and any Hadoop administrator knows the importance of additional monitoring to maintaining a reliable cluster.

Part of that monitoring comes from Hadoop’s built-in webservers. These have direct access to the internals of the cluster and can tell you what jobs are running, what files are in the distributed system, and various other bits of important information, albeit in a somewhat obtuse spreadsheet format. A number of Hadoop packages also come with generic distributed system monitoring apps such as Ganglia, but these aren’t integrated with Hadoop itself. Finally, there are products like Apache Ambari from Hortonworks that do a great deal of visual monitoring, but are tied to particular companies’ versions of Hadoop. In this lab we will look at the basics of producing a custom app that is integrated into the fabric of your own Hadoop cluster. In particular, being JNBridge, we are interested in building a .NET-based monitoring app that interfaces over a TCP connection with Java-based Hadoop using JNBridgePro. To expedite the process of creating a GUI for our monitoring app, we will use Microsoft Visio to easily create a visual model of the Hadoop cluster. This way we can create a rudimentary monitoring app that works as a cluster visualizer as well.

The app that we’re aiming to create for this lab is fairly simple. It will present a graph-like view of the logical topology of the cluster where each worker node displays its status (OK or not OK), the amount of local HDFS space used up, and the portion of  Mappers and Reducers that are in use. We’re not looking for hard numbers here — that information is attainable through the webservers — rather, our goal is to create a schematic that can be used to quickly determine the status of various components of the cluster.

Before we begin, please bear in mind two things: 1. We are not proposing our solution or our code as an actual option for monitoring your Hadoop cluster. We are simply proposing certain tools that can be used in the production of your own monitoring app. 2. Our solution was produced for Hortonworks’ HDP 1.3 distribution of Hadoop 1.2.

Even in our limited testing we noticed a distinct lack of portability between different Hadoop distributions and versions — particularly where directory locations and shell-script specifications are concerned. Hopefully our explanations are clear enough that you can adjust to the needs of your own cluster, but that might not always be the case. We’re also going to assume a passing familiarity with Hadoop and Visio, since explaining either system and its internal logic in great detail would make this lab much longer than need be.

What You’ll Need

  1. Apache Hadoop (we used the Hortonworks distribution, though any will work with some effort)
  2. Visual Studio 2012
  3. Microsoft Visio 2013
  4. Visio 2013 SDK
  5. JNBridgePro 7.0

Digging into Hadoop

To begin, in order to get as complete information about the cluster as possible, we need to get hold of the NameNode and JobTracker objects — which manage the HDFS and MapReduce portions of Hadoop respectively — that are currently running on the cluster. This will expose the rich APIs of both the JobTracker and the NameNode as well the individual Nodes of the cluster. It’s these APIs that the JSP code uses to create the built-in webserver pages and provides more than enough information for our purposes.

However, accessing these objects directly is somewhat difficult. By and large, Hadoop is built so the end user can only interface with the cluster via particular sockets that only meter certain information about the cluster out and only allow certain information in. Thus getting direct access to and using the APIs of the running NameNode and JobTracker isn’t something that you’re supposed to be able to do. This is a sensible safety precaution, but it makes getting the kind of information required for a monitoring app somewhat complicated. Granted, there is the org.apache.hadoop.mapred.ClusterStatus class that passes status information over the network, but the information it provides isn’t enough to create a truly robust monitoring app. Our solution to this dilemma involves a lightweight hack of Hadoop itself. Don’t worry, you’re not going to need to recompile source code, but some knowledge of that source code and the shell scripts used to run it would be helpful.

Our goal is to wedge ourselves between the scripts that run Hadoop and the process of actually instancing the NameNode and JobTracker. In so doing, we can write a program that breaks through the walled garden and allows us to serve up those objects to the .NET side directly. Technically a similar process could be used to code a similar monitoring app in pure Java, but that’s not what we’re interested in here. If things still seem a little fuzzy, hopefully you’ll get a better idea of our solution as we explain it.

When the $HADOOP_INSTALL/hadoop/bin/hadoop script is called to start the NameNode and JobTracker, it simply runs NameNode.main() and JobTracker.main(). These main functions, in turn, call just a handful of lines of code to start the two master nodes. Note that this process is usually further obfuscated by a startup script such as start-all.sh or, in our case with Hortonworks, hadoop-daemon.sh, but they all ultimately call the same $HADOOP_INSTALL/hadoop/bin/hadoop script. In our solution, instead of having the script call NameNode.main() and JobTracker.main(), we instead call the main functions of our own wrapper classes that contain the code from the original main functions in addition to setting up the remote Java-side servers of JNBridgePro. These wrapper classes are then able to serve up the JobTracker and NameNode instances to our Windows-based monitoring app.

The JobTracker wrapper class looks like this:

import java.io.IOException;
import java.util.Properties;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.JobTracker;
import com.jnbridge.jnbcore.server.ServerException;

public class JnbpJobTrackerWrapper {

	private static JobTracker theJobTracker = null;

	public static void main(String[] args) {

		Properties props = new Properties();
		props.put("javaSide.serverType", "tcp");
		props.put("javaSide.port", "8085");
		try {
			com.jnbridge.jnbcore.JNBMain.start(props);
		} catch (ServerException e) {

			e.printStackTrace();
		}

		try {
			theJobTracker = JobTracker.startTracker(new JobConf());
			theJobTracker.offerService();
		} catch (Throwable e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}

	public static JobTracker getJobTracker()
	{
		return theJobTracker;
	}
}

And the NameNode wrapper class looks like this:

import java.util.Properties;
import org.apache.hadoop.hdfs.server.namenode.NameNode;
import com.jnbridge.jnbcore.server.ServerException;

public class JnbpNameNodeWrapper {

	private static NameNode theNameNode = null;

	public static void main(String[] args) {

		Properties props = new Properties();
		props.put("javaSide.serverType", "tcp");
		props.put("javaSide.port", "8087");
		try {
			com.jnbridge.jnbcore.JNBMain.start(props);
		} catch (ServerException e) {

			e.printStackTrace();
		}

		try {
			theNameNode = NameNode.createNameNode(args, null);
			if (theNameNode != null)
			{
				theNameNode.join();
			}
		} catch (Throwable e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}

	public static NameNode getNameNode()
	{
		return theNameNode;
	}
}

To have the $HADOOP_INSTALL/hadoop/bin/hadoop script call our classes instead, we alter the following lines of code:

elif [ "$COMMAND" = "jobtracker" ] ; then
  #CLASS=org.apache.hadoop.mapred.JobTracker
  CLASS=com.jnbridge.labs.visio.JnbpJobTrackerWrapper
  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_JOBTRACKER_OPTS"

and

elif [ "$COMMAND" = "namenode" ] ; then
#CLASS=org.apache.hadoop.hdfs.server.namenode.NameNode
CLASS=com.jnbridge.labs.visio.JnbpNameNodeWrapper
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NAMENODE_OPTS"

The replacement lines are right below the commented-out original lines.

In order to finish up the Java side of this solution, we need to add our wrapper classes as well as the JNBridge .jars to the Hadoop classpath. In our case that meant simply adding the wrapper class .jars along with jnbcore.jar and bcel-5.1-jnbridge.jar to the $HADOOP_INSTALL/hadoop/lib directory. Since the Hadoop startup scripts automatically include that directory as part of the Java classpath, we don’t need to do anything else. The startup scripts that came with the Hortonworks distribution work exactly as they did before, and the cluster starts up without a hitch. Only now, our master nodes are listening for method calls from the .NET side of our monitoring app.

Monitoring on the Windows Side

To begin building the Java proxies, we need to add the .jars that contain the appropriate classes to the JNBridge classpath. Below is a screenshot from the JNBridge Visual Studio plugin of the .jars we added to the classpath used to create the proxies. Note that this includes not just the requisite JNBridge .jars and the Hadoop .jars (scraped from the machines in our cluster), but also our wrapper class .jars as well (here called proto1.jar).

classpath

From here we need to actually pick those classes that need to be proxied so that they can be called within our C# code natively. The easiest way to do this is simply select the two wrapper classes (JnbpJobTrackerWrapper and JnbpNameNodeWrapper) in the left window of the JNBridge Visual Studio proxy tool and click the Add+ button. JNBridge will take care of the rest automatically.

classes

Now we can build the monitoring app itself. When beginning your new project, make sure to add the correct references. You need to reference the correct JNBShare.dll for your version of .NET, the .dll created by the JNBridge proxy process you performed earlier, and the Microsoft.Office.Interop.Visio.dll for your version of Microsoft Visio. We used Visio 2013 for this project along with its SDK (which is a newer version than what came with Visual Studio 2012). Also, be sure to add the JNBridge license .dll to your classpath. Here’s what our references looked like (note that HadoopVis is the name we gave to our Java proxy .dll):

references

The overall flow of our code is fairly simple: get the JobTracker and NameNode objects, use them to build an internal representation of the cluster, draw that cluster in Visio, and update that drawing with the latest information about the cluster.

In order to get the JobTracker and NameNode objects, we need to connect to the wrapper objects running on our cluster. We do this as follows:

// Connect to the two Java sides
// The jobTracker side is automatically named "default"
JNBRemotingConfiguration.specifyRemotingConfiguration(JavaScheme.binary,
        "obiwan.local", 8085);
// We name the nameNode side "NameNode"
JavaSides.addJavaServer("NameNode", JavaScheme.binary, "obiwan.local", 8087);

Note that we are connecting to two Java sides here even though they are both located on the same physical machine (obiwan.local on our cluster). In JNBridgePro, the system needs to know which Java side to communicate with in order to function properly. If you have an object that exists on one remote JVM and you try to call one of its methods while your code is pointing at a different JVM, your program will crash. To manage this, use the JavaSides.setJavaServer() to point to the correct JVM. You’ll see this sprinkled throughout our code as we switch between pointing to the JobTracker and the NameNode JVMs.

Once we’ve connected to the Java side, we just need to get the objects and build our internal representation of the cluster. The overall program flow looks like this:

JobTracker jobtracker = JnbpJobTrackerWrapper.getJobTracker();
java.util.Collection temp = jobtracker.activeTaskTrackers();
java.lang.Object[] tts = temp.toArray();
JavaSides.setJavaServer("NameNode");
NameNode namenode = JnbpNameNodeWrapper.getNameNode();
HadoopTree.buildTree(jobtracker, "obiwan.local", "obiwan.local", tts,
        namenode.getDatanodeReport(FSConstants.DatanodeReportType.ALL));
// closedFlag is True if a user closes the Visio window in use.
while (!closedFlag)
{
    JavaSides.setJavaServer("NameNode");
    DatanodeInfo[] dnReport = namenode.getDatanodeReport(
            FSConstants.DatanodeReportType.ALL);
    JavaSides.setJavaServer("default");
    HadoopTree.updateTree(jobtracker.activeTaskTrackers().toArray(),
            jobtracker.blacklistedTaskTrackers().toArray(), dnReport);
    System.Threading.Thread.Sleep(3000);
}

The buildTree() and updateTree() methods build and update the internal representation of the cluster and hold it within the HadoopTree class. These methods also invoke the VisioDrawer class that, in turn, draws that internal representation to Visio. We’re not going to go into detail here about how the HadoopTree class builds our internal representation of the cluster. The simple tree-building algorithm we use isn’t terribly pertinent to our current discussion, but we encourage you to look at our code especially if you’re curious about what methods we use to extract information from the JobTracker and NameNode objects (though a few of those methods can be seen in the above code snippet). Keep in mind there are a number of ways to pull information about the cluster from these to objects and we encourage you to explore the published APIs to figure out how to get the information you want for your app. On a side note, the API for the NameNode isn’t currently published as part of the official Hadoop API, so you’ll have to go back to the source code to figure out what methods to call. The API for the NameNode is considerably different from that for the JobTracker too, so don’t expect similar functionality between the two.

Drawing Everything in Visio

Once we have an internal representation of the cluster, we need to draw it in Visio to complete our rudimentary monitoring/visualization app. We begin by opening a new instance of Visio, creating a new Document, and adding a new Page:

// Start application and open new document
VisioApp = new Application();
VisioApp.BeforeDocumentClose += new EApplication_BeforeDocumentCloseEventHandler(quit);
ActiveWindow = VisioApp.ActiveWindow;
ActiveDoc = VisioApp.Documents.Add("");
ActivePage = ActiveDoc.Pages.Add();
ActivePage.AutoSize = true;

We then open our custom network template (which we’ve included with the code for your use) and pull out all the Masters we need to draw our diagram of the Hadoop cluster:

// Open visual templates
networkTemplate = VisioApp.Documents.OpenEx(@"$Template_Directory\NetTemplate.vstx",
        (short)VisOpenSaveArgs.visOpenHidden);
Document pcStencil = VisioApp.Documents["COMPME_M.VSSX"];
Document networkStencil = VisioApp.Documents["PERIME_M.VSSX"];
Shape PageInfo = ActivePage.PageSheet;
PageInfo.get_CellsSRC((short)VisSectionIndices.visSectionObject,
        (short)VisRowIndices.visRowPageLayout,
        (short)(VisCellIndices.visPLOPlaceStyle)).set_Result(VisUnitCodes.visPageUnits,
                (double)VisCellVals.visPLOPlaceTopToBottom);
PageInfo.get_CellsSRC((short)VisSectionIndices.visSectionObject,
        (short)VisRowIndices.visRowPageLayout,
        (short)(VisCellIndices.visPLORouteStyle)).set_Result(VisUnitCodes.visPageUnits,
                (double)VisCellVals.visLORouteFlowchartNS);
ActivePage.SetTheme("Whisp");

// Get all the master shapes
masterNode = pcStencil.Masters.get_ItemU("PC");
slaveNode = networkStencil.Masters.get_ItemU("Server");
rack = networkStencil.Masters.get_ItemU("Switch");
dynamicConnector = networkStencil.Masters.get_ItemU("Dynamic connector");

// Open data visualization template and shape
slaveBase = VisioApp.Documents.OpenEx(@"$Template_Directory\DataGraphicTemplate.vsdx",
        (short)Microsoft.Office.Interop.Visio.VisOpenSaveArgs.visOpenHidden);
slaveDataMaster = slaveBase.Pages[1].Shapes[1].DataGraphic;

There are two important things in this snippet that don’t crop up in a lot of examples using Visio. First are these two statements:

PageInfo.get_CellsSRC((short)VisSectionIndices.visSectionObject,
        (short)VisRowIndices.visRowPageLayout,
        (short)(VisCellIndices.visPLOPlaceStyle)).set_Result(VisUnitCodes.visPageUnits,
                (double)VisCellVals.visPLOPlaceTopToBottom);
PageInfo.get_CellsSRC((short)VisSectionIndices.visSectionObject,
        (short)VisRowIndices.visRowPageLayout,
        (short)(VisCellIndices.visPLORouteStyle)).set_Result(VisUnitCodes.visPageUnits,
                (double)VisCellVals.visLORouteFlowchartNS);

These statements tell Visio how to lay out the diagram as it’s being drawn. The first statement tells Visio to create the drawing from top-to-bottom (with the master nodes on top and the slave nodes on the bottom) while the second tells Visio to arrange everything in a flowchart-style pattern (we found this to be the most logical view of the cluster). Logically what we’re doing is editing two values in the current Page’s Shapesheet that Visio refers to when making layout decisions for that Page.

The second thing we want to draw your attention to are these lines:

slaveBase = VisioApp.Documents.OpenEx(@"$Template_Directory\DataGraphicTemplate.vsdx",
        (short)Microsoft.Office.Interop.Visio.VisOpenSaveArgs.visOpenHidden);
slaveDataMaster = slaveBase.Pages[1].Shapes[1].DataGraphic;

This code opens a prior Visio project (that we’ve also included with our code) where we’ve simply tied a series of DataGraphics to a single Shape. These DataGraphics can then be scraped from the old project and tied to Shapes in our new project. Our prefabricated DataGraphics are used to display information about individual nodes in the cluster including HDFS space, Mappers/Reducers in use, and overall status of the TaskTracker and DataNode. We have to create these DataGraphics ahead of time since they can’t be created programmatically.

We can then draw the cluster on the Page that we’ve created. Again, we are going to skip over this portion of the process since it is largely standard Visio code. The cluster representation is drawn mostly using the Page.DropConnected() method, and since we’ve already told Visio how to format the drawing, we don’t need to mess with its layout too much. All we have to do is call Page.Layout() once all the Shapes have been drawn to make sure everything is aligned correctly.

The last interesting bit we want to touch on is updating the drawing with the most recent data from the cluster. First we need to get the latest data from the cluster and update our internal representation of the cluster:

public static void updateTree(Object[] taskNodeInfo, Object[] deadTaskNodes,
        DatanodeInfo[] dataNodeInfo)
{
    JavaSides.setJavaServer("NameNode");
    foreach (DatanodeInfo dn in dataNodeInfo)
    {
        HadoopNode curNode;
        leaves.TryGetValue(dn.getHostName(), out curNode);
        if (dn.isDecommissioned())
        {
            curNode.dataActive = false;
        }
        else
        {
            curNode.setHDSpace(dn.getRemainingPercent());
            curNode.dataActive = true;
        }
    }
    JavaSides.setJavaServer("default");
    foreach (TaskTrackerStatus tt in taskNodeInfo)
    {
        HadoopNode curNode;
        leaves.TryGetValue(tt.getHost(), out curNode);
        curNode.setMapUse(tt.getMaxMapSlots(), tt.countOccupiedMapSlots());
        curNode.setReduceUse(tt.getMaxReduceSlots(), tt.countOccupiedReduceSlots());
        curNode.taskActive = true;
    }
    foreach (TaskTrackerStatus tt in deadTaskNodes)
    {
        HadoopNode curNode;
        leaves.TryGetValue(tt.getHost(), out curNode);
        curNode.taskActive = false;
    }
    VisioDrawer.updateData(leaves);
}

Once the data has been gathered, the Visio drawing is updated:

public static void updateData(Dictionary<string, HadoopNode> leaves)
{
    foreach (KeyValuePair<string, HadoopNode> l in leaves)
    {
        HadoopNode leaf = l.Value;
        Shape leafShape = leaf.getShape();
        // Update HDFS information
        if (leaf.dataActive)
        {
            leafShape.get_CellsSRC(243, 20, 0).set_Result(0, leaf.getHDSpace());
            leafShape.get_CellsSRC(243, 0, 0).set_Result(0, 1);
        }
        // If the DataNode has failed, turn the bottom checkmark to a red X
        else
        {
            leafShape.get_CellsSRC(243, 0, 0).set_Result(0, 0);
        }
        // Update mapred information
        if (leaf.taskActive)
        {
            leafShape.get_CellsSRC(243, 17, 0).set_Result(0, leaf.getMapUse());
            leafShape.get_CellsSRC(243, 18, 0).set_Result(0, leaf.getReduceUse());
            leafShape.get_CellsSRC(243, 12, 0).set_Result(0, 1);
        }
        // If the Tasktracker has failed, turn the bottom checkmark to a red X
        else
        {
            leafShape.get_CellsSRC(243, 12, 0).set_Result(0, 0);
        }
    }
}

Logically we are just changing certain values in each Shapes’ Shapesheet that are tied to the DataGraphics we added earlier. Which cells of the Shapesheet correspond to which DataGraphic had to be decided in advance when we created the DataGraphic by hand. This way we can address those indices directly in our code.

This updating process (as you saw in an earlier code segment) is done in a simple while loop polling system that updates every three seconds. We used this method rather than a callback/event handling strategy largely for ease of implementation. The NameNode and JobTracker classes don’t implement a listener interface for notifying when values change. As a result, in order to add this functionality, we would have to do significantly more Hadoop hacking than we’ve already done. We could also implement an asynchronous update system in pure C# that would use events to notify the graphic to update, but that would still require polling the Java side for changes somewhere within our program flow. Such a system would lighten the load on Visio by decreasing the number of times we draw to the Page, but wouldn’t increase efficiency overall. While both ways of implementing callbacks are interesting exercises, they’re somewhat outside the scope of this lab.

The Result

For our small, four-virtual-machine cluster, this is the result (as you saw above):

visualizer

Here a Map/Reduce job is running such that 100% of the Mappers are in use and none of the Reducers are being used yet. Also, notice that the middle worker node has used up almost all of its local HDFS space. That should probably be addressed.

For larger, enterprise-size clusters Visio will likely become an even less viable option for handling the visualization, but for our proof-of-concept purposes it works just fine. For larger clusters, building a visualizer with WPF would probably be the better answer for a .NET-based solution.

We hope this lab has been a springboard for your own ideas related to creating Hadoop monitoring/visualization applications.

~Ian Heinzman

Posted by ianheinzman | in Labs | 1 Comment »

Build 2013 Impressions

Wednesday, Jul. 24th 2013

I recently came back from Microsoft’s Build 2013 conference in San Francisco, where Microsoft’s latest technologies are introduced to developers.  Much of the conference was devoted to technology related to Metro/Modern Windows/Windows Store apps, and also to Windows Phone, neither of which are relevant to JNBridge’s mission.  However, there were a few things that caught my eye, and which are worth mentioning.

First, Windows 8.1 (formerly, Windows Blue) was unveiled.  It’s definitely an improvement for desktop users.  We currently run Windows 7 on our development machines and do our Windows 8 tests on virtual machines.  While nothing I saw at Build regarding Windows 8.1 will change that, it’s definitely getting closer to being a system that I would feel comfortable using every day.

Visual Studio 2013 was introduced during the Build keynote, and I was quite impressed with the new code navigation features, including the enhanced scrollbars and the look-ahead capability that allows developers looking at a method call to open a mini-window containing the text of the called method.  I can certainly see using these features in our future development.

It appears that the JNBridgePro Visual Studio plugin should work just fine with VS 2013, although we realize that this is a very early version of the software, and things could change.  We will certainly be tracking that.  Similarly, the new version of the .NET Framework that will be released with VS 2013 seems to run JNBridgePro just fine.

Finally, given our interest in interoperability issues relating to Hadoop, we were intrigued to see this talk on Azure HDInsight.  We’ve been thinking of new ways in which JNBridgePro can be integrated with Big Data applications, and the talk suggested some scenarios that we’ll be working on in the coming months.

Were you at Build, and, if so, did you see anything interesting related to interoperability?  Let us know.

Posted by Wayne | in Events, General | Comments Off

What does “Any CPU” really mean?

Thursday, Jun. 13th 2013

There’s a new “Prefer 32-bit” option in Visual Studio 2012 that tripped us up, and can trip you up too.

Running through some standard JNBridgePro test examples recently, we were surprised that the examples didn’t work, and the embedded Java side failed. The examples were created using Visual Studio 2012, set Any CPU, and, since we were running the tests on a 64-bit machine and were using shared memory, we supplied a 64-bit JRE as part of the configuration.

After a bit of investigation, we discovered a new setting in the VS 2012 project that isn’t in previous versions of Visual Studio: in the project’s properties, under the Build tab, there is a new checkbox: “Prefer 32-bit.” The checkbox only seems to be enabled when Any CPU is selected, and it was automatically checked. And indeed the running process was 32-bit. What was happening? Didn’t “Any CPU” mean that the application would run as 64 bit on a 64-bit system, and 32 bit on a 32-bit system? And how does “Any CPU/Prefer 32-bit” differ from simple x86?

We did some research and discovered an explanation in a Microsoft blog post here. It turns out that the meaning of “Any CPU” has changed a bit. I won’t go into too many details, but would suggest that anyone doing .NET development read the blog post. In a nutshell, the new “Prefer 32-bit” is connected to Microsoft’s new support for ARM architectures as of .NET 4.5. It also seems to have something to do with Microsoft’s retreat from encouraging 64-bit and “Any CPU” development in deference to 32-bit development, as the complexity of supporting both 32-bit and 64-bit environments has become apparent. In making the changes, Microsoft has made some design decisions that we feel are to the detriment of users. (Also note that we still strongly believe in supporting “Any CPU” and x64 development. Read more about the bitness challenge in this blog post.)

In the first design decision we take issue with, Microsoft has decided to make “Any CPU/Prefer 32-bit” the default in Visual Studio 2012 when creating .NET 4.5 applications. This is unfortunate because users assume “Any CPU” meant the same thing as it previously did (the application will run as 64 bit on 64-bit systems), and because the new “Prefer 32-bit” setting is somewhat hidden and not immediately obvious. In our experience, most users set the Target Platform (Any CPU/x86/x64/etc.) in the configuration manager, where there’s no mention of the “Prefer 32-bit” setting – “Prefer 32-bit” is only visible (and settable) in the project properties, where most developers don’t have a reason to look, but where it’s already been set, without telling the developer.

Second, setting “Prefer 32-bit” as the default leads to inconsistencies in creating new projects versus migrating exiting ones. “Prefer 32-bit” is set in new “Any CPU” projects created in VS 2012, but it isn’t set when migrating an existing project from VS 2010. Props to Microsoft for not altering the behavior of existing projects when migrating from VS 2010 to 2012, but why not go a step further and make “Any CPU” behavior consistent by leaving the “Prefer 32-bit” setting turned off by default in new projects created with VS 2012?

The inconsistent behavior is what really what makes this change so annoying. Microsoft could have offered the “Prefer 32-bit” capability and not surprise unsuspecting users by just leaving the setting turned off by default. Users would happily create new applications without problem, and without suspecting that “Any CPU” could possibly mean anything different than what it meant before. Interestingly enough, users could still target ARM platforms if “Any CPU” running on ARM was guaranteed to run as 32 bit (until 64-bit ARM chips become generally available, at which time such applications will run as 64 bit). Microsoft could also allow developers to specifically target ARM in the same way as they can now specifically target x86, x64, and IA64. I can almost guarantee that this would be less confusing than the current use of “Any CPU” and “Prefer 32-bit,” particularly since very few .NET developers are going to be targeting ARM, at least not for a long time.

So what does this mean for users of JNBridgePro and the JNBridge adapters? First, as mentioned earlier, we still strongly believe in “Any CPU” and x64 development, and are working hard to create products that can be used in “Any CPU” applications. There are some subtleties and complications in making this transparent to the user, but we’ve done a lot of work on this, which you can see in JNBridgePro 7.0, and will soon see in new releases of the adapters. We’ve discussed the work that we’ve done here. The next version of JNBridgePro will have even more support for “Any CPU” applications; particularly, when using shared memory, you’ll be able to specify paths to both a 32-bit and a 64-bit jvm.dll, and the proper JRE will be loaded depending on whether it’s a 32-bit or a 64-bit process. This will make Any CPU applications using shared memory even easier to deploy on any system without changes.

Second, we want all our users to know that JNBridgePro will still work in all “Any CPU” applications using shared memory, even when “Prefer 32-bit” is turned on. When it is, simply use a 32-bit JRE, and when it isn’t turned on, use a 32-bit or 64-bit JRE as is appropriate to the platform. The V.next of JNBridgePro, with the aforementioned ability to specify both 32-bit and 64-bit JREs, will make this process even more transparent.

We’re always trying to stay ahead of Microsoft’s changes, whether they’re ill-advised or not. The change to the meaning of “Any CPU,” and the new default “Prefer 32-bit” setting are just one example of how we’ve stayed on top of .NET’s evolution, so that your applications using JNBridgePro and the adapters continue to work despite the changes to .NET, and will continue to work in the future.

Posted by Wayne | in JNBridgePro, Tips and examples | Comments Off

Software Development and the Bitness Challenge

Monday, Jun. 3rd 2013

Since 64-bit processors were introduced about ten years ago, they have become the default on both desktop and server machines. With a larger addressable process memory space, wider range of representable integers, and greater floating point precision, what’s not to like? For software developers that need to support both 32- and 64-bit machines, it’s a bit complicated.

Applications written in pure managed code (whether in .NET Intermediate Language or Java bytecodes)  will typically work just fine regardless of whether they’re running as 32-bit or 64-bit processes. However many applications also contain non-managed or native code that is targeted to a specific “bitness” (where “bitness” describes the distinction between 32-bit and 64-bit qualities of a platform, process, or system). In these cases, the developer needs to be careful that the right version of the non-managed code executes when the application is run, depending on the bitness of the running process. For software vendors, like JNBridge, which produce both self-contained applications and components that are integrated inside other users’ applications, the situation becomes even more tricky – the right version of the components needs to be included in the application, and it’s a common and easy mistake for users to include the components with the wrong bitness.  The user sees an error, and we get a support call. Better to reduce the chances of an error, and everybody will be happy.

After facing these issues for a number of years, we now have a workable solution, and our new JNBridgePro 7.0 release reflects this experience. We’d like to share what we’ve learned, so that other developers can successfully address the bitness challenge.

Create a single unified installer for both 32-bit and 64-bit scenarios. Developers’ first instinct is to create separate 32-bit and 64-bit builds and distribute separate installers for each. While this may seem logical, it leads to a lot of user confusion. In our experience, users faced with the decision as to whether to download the installer for the 32-bit or 64-bit version frequently download the wrong one. When they discover this error, they have to go back and download the other version, often after engaging support. Combining the 32- and 64-bit versions into a single installer is a win for the user and a win for us.

The challenge in building a single installer is that most installer generator packages either don’t allow 32-bit and 64-bit components to be combined in a single installer that would run on either system, or if they do support it, the resulting installation is too complicated. In particular, the usual technique of combining 32-bit and 64-bit installers, and a bootstrapper, into a single installer won’t work in our case, since we want 64-bit components to be installed on 32-bit machines, too. We resolved this by creating a single 32-bit installer (that is, one that would run on both 32-bit and 64-bit machines), and fooling the installer generator into thinking that the 64-bit components were simply “content,” so that the installer generator wouldn’t reject them. The “content” is all the 64-bit components packed into a single zip file. When the installer is run, a custom action extracts the 64-bit components then removes the zip file. While there are subtleties in getting this right (Google ‘advertised shortcut’ for pointers to issues that can arise), when done correctly this approach works perfectly and yields a single installer that runs on both 32-bit and 64-bit machines, and installs both 32-bit and 64-bit components on both machines. (Why do we need to install 64-bit components on 32-bit machines? Because, while the build machine may be 32-bit, the developer may be generating 64-bit applications.)

Ensure that users can use your components to create “Any CPU” applications. “Any CPU” applications are designed to automatically run as 32-bit processes on 32-bit machines and as 64-bit processes on 64-bit machines. This is very attractive as it allows the application to take advantage of whatever platform it runs on, and suggests that the code is portable. “Any CPU” implies that the code is completely managed, but there’s nothing to prevent unmanaged code, including third-party components, from being included. We’ve encountered scenarios where a developer used JNBridgePro to create an Any CPU application on their 64-bit development machine, successfully test it there, then ran into problems in deploying the application to a 32-bit machine. Situations like this suggest that developers should develop both 32-bit and 64-bit versions of their components or applications, but this is kicking the developer’s problem down the road, forcing the user to deal with using the right version. This should be as unacceptable to the software developer as it is to the user.

In order to allow users to create and run “Any CPU” applications, we changed JNBridgePro so that the appropriate 32-bit or 64-bit components would be loaded at run time depending on the bitness of the running process. However, platform vendors don’t make this easy. We discovered that the simplest way to test bitness at run time in .NET is to check the value returned by IntPtr.Size: 4 means you’re running as a 32-bit process; 8 means it’s a 64-bit process. (Note that the .NET alternative of calling Environment.Is64BitProcess won’t work in our case, since that API is only supported in .NET Framework 4.0 and later, and our components also need to be able to run in .NET Framework 2.0 through 3.5.) In Java, simply check the value of the system property os.arch. “x86” means that it’s a 32-bit process; “x86_64” means it’s a 64-bit process. These tests allow us to load the appropriate components at run-time, thereby supporting users who want to create applications that run anywhere, even though some of our components contain 32-bit or 64-bit targeted code.

Be careful using the Windows registry. When Microsoft created 64-bit Windows, they made the fateful decision to provide separate 32-bit and 64-bit registries, accessible only to respective 32-bit and 64-bit processes. The problem arises when the registry is used to share information between applications. Information deposited by  a 32-bit application is inaccessible to a 64-bit application, and vice versa. The answer is to only use the registry for information used by a single application, and only when the application’s bitness on that machine will always be the same. Save the information shared between applications in files; do not use the registry. JNBridge used to store information in the registry, but this became a constant source of support headaches once 64-bit platforms were introduced. Now that we store information in files, not the registry, these bitness-related headaches have gone away.

In conclusion: If all the processes in the world were 64-bit processes, the problems I’ve mentioned above wouldn’t exist.  However, as long as 32-bit processes and platforms still exist and must be supported, software developers must be careful and watch out for pitfalls that can trip up users. When developers adhere to the guidelines discussed above, users’ bitness problems should disappear, to the great benefit of both users and software support teams.

Posted by Wayne | in Commentary, General, Tips and examples | Comments Off

Building an Excel add-in for HBase MapReduce

Monday, May. 6th 2013

Summary

This latest project from JNBridge Labs investigates building an Excel add-in for Hadoop HBase. As a Java framework, HBase applications must use Java APIs, resulting in single-platform solutions. A cross-platform HBase integrated solution, particularly one that provides business intelligence on the desktop, like Microsoft Excel, is unable to leverage the HBase remote client API. This means using a lower level interoperability mechanism, like implementing a .NET Thrift client. The current project uses JNBridgePro for .NET-to-Java interoperability. It also leverages concepts and code from the previous lab, Building a LINQ provider for HBase MapReduce, which investigated a LINQ extension for HBase.  

Introduction

Hadoop allows businesses to quickly analyze very large data sets. Hadoop can reduce ludicrous amounts of data to a meaningful answer in a short amount of time, however, without understanding the shape of your data, you run the risk of garbage in, garbage out. Analysis itself is an iterative process relying on investigation. Tools that aid data investigation provide a means to quickly view, sort, filter/reduce and represent data, making it possible to quickly find and understand patterns, trends and relationships.

Microsoft Excel has always been the ubiquitous off-the-shelf  tool for data analysis and it makes a ready-to-go front end for Hadoop. Excel can be extended using add-ins developed in Visual Studio using VSTO, Visual Studio Tools for Office. This lab will explore a simple Excel front-end to HBase MapReduce. The front-end will allow a user to view HBase tables and execute MapReduce jobs. The goal is to make the add-in generic with respect to the column definitions and data in a HBase table.

Getting Started

The components required for this lab are identical to those required in the previous lab, Building a LINQ provider for HBase MapReduce. Here’s a quick list of the components.

  1. Apache Hadoop Stack (see the previous lab’s Getting Started section for more information)
  2. Visual Studio 2012
  3. Eclipse
  4. JNBridgePro 7.0
  5. Office Developer Tools for Visual Studio 2012 (this includes VSTO).
  6. Microsoft Office 2010

Calling Java from .NET: Creating proxies using JNBridgePro

Since the Excel add-in is written in C#/.NET and needs to call several Java class APIs, the first step is to use the JNBridgePro plug-in for Visual Studio to create an assembly of proxies that represent the Java API. When a proxy of a Java class is instantiated in .NET, the real Java object is instantiated in the Java Virtual Machine. The JNBridgePro run-time manages communications, i.e. invoking methods, and syncing garbage collection between the .NET CLR and the JVM.

For this development step, as well as during run-time, a bunch of Hadoop, HBase and ZooKeeper JAR files must be available on the Windows machine. These can be scraped from a machine running the Hadoop stack (look in /usr/lib/hadoop/lib/usr/lib/hbase/lib, etc.)

This is a screen shot of the Edit Class Path dialog for the JNBridgePro Visual Studio plug-in.

These are the JAR files required to create the .NET proxies. During run-time, three additional JAR files must be included in the JVM’s class path when initiating the bridge between the JVM and the CLR: avro-1.5.4.jarcommons-httpclient-3.1.jar and slf4j-nop-1.6.1.jar (the last JAR file inhibits logging by Hadoop and HBase).

Below, is a screen shot of the JNBridgePro proxy tool in Visual Studio. The left hand pane shows all the namespaces found in the JAR files shown in the above dialog. The required namespaces are org.apache.hadoop.hbase.client and org.apache.hadoop.hbase.filter. In addition, individual classes like org.apache.hadoop.hbase.HBaseConfiguration are required (see the link at the end of this blog to download the source).

 

By clicking on the Add+ button, the chosen classes, as well as every dependent class, will be found and displayed in the center pane. The right-hand pane displays the public members and methods of the Java HTable class. The last step is to build the proxy assembly, DotNetToJavaProxies.dll.

Creating and populating an HBase Table

It would be nice to have an HBase table loaded with data and provide an opportunity to test calling various HBase Java APIs from .NET. The simple data will consist of an IP address, like “88.240.129.183″ and the requested web page, for example “/zebra.html”. This lab will use the same table, access_logs, created for the previous lab, Building a LINQ provider for HBase MapReduce. Please see the previous lab’s section, Creating and populating an HBase Table, for the code used to build this table.

Building an Excel add-in

The Excel add-in will consist of a single control pane. As the user interacts with the pane, underlying code accesses the Excel data model consisting of workbooks, worksheets and charts. Here’s what the completed add-in looks like.

The class HBasePane is a .NET User Control. It consists of two groups, View Table and Map Reduce. The above screen shot shows the user controls labeled Zookeeper Host, Table Name and Number of Records, which all have user entered values. By clicking on the button, View Records, the user has loaded in 20 rows from the HBase table, access_logs.

Here’s the handler code for the button click event.

        private void viewTableButtonClick(object sender, EventArgs e)
        {
            Excel.Worksheet activeWorksheet 
                 = ((Excel.Worksheet)Globals.ExcelHBaseAddIn.Application.ActiveSheet);
            activeWorksheet.Name = "Records";
            Excel.Range navigator = activeWorksheet.get_Range("A1");
            int numRows = Decimal.ToInt32(this.numberOfRecords.Value);
            // most of the work done here
            this.columns = ViewHBaseTable.populateWorkSheet(navigator
                , this.hostName.Text
                , this.tableName.Text
                , numRows);
            // autofit the range
            int numCols = this.columns.Count<string>();
            Excel.Range c1 = activeWorksheet.Cells[1, 1];
            Excel.Range c2 = activeWorksheet.Cells[numRows, numCols];
            this.cols = activeWorksheet.get_Range(c1, c2); 
            this.cols.EntireColumn.AutoFit();
            // populate the user controls with the column names
            this.filterComboBox.Items.AddRange(this.columns);
            this.frequencyComboBox.Items.AddRange(this.columns);
        }

All the work is done in the method, ViewHBaseTable.populateWorkSheet(). The user controls are hostName, tableName and numberOfRecords. The hostName control contains the address of the machine that’s running Zookeeper, which is responsible for managing connections from the HBase client API. Below is code from populateWorkSheet(). Notice that the HBase table column family and cell names are obtained using the methods getFamily() and getQualifier() along with the cell values. The method returns an array of strings that represents the column and cell names in the table. These are used to populate the combo box controls filterComboBox and frequencyComboBox in the group Map Reduce.

            Configuration hbaseConfig = HBaseConfiguration.create();
            hbaseConfig.set("hbase.zookeeper.quorum", hostName);
            try
            {
                HTable tbl = new HTable(hbaseConfig, tableName);
                Scan scan = new Scan();
                ResultScanner scanner = tbl.getScanner(scan);
                Result r;
                while (((r = scanner.next()) != null) && ndx++ < numRecords)
                {
                    List aList = r.list();
                    ListIterator li = aList.listIterator();
                    while (li.hasNext())
                    {
                        kv = (KeyValue)li.next();
                        familyName = Bytes.toString(kv.getFamily());
                        cellName = Bytes.toString(kv.getQualifier());
                        value = Bytes.toString(kv.getValue());
                        // make a unique list of all the column names
                        if (!names.Contains(familyName + ":" + cellName))
                        {
                            names.Add(familyName + ":" + cellName);
                        }
                        // add headers
                        if (currentRow == 2)
                        {
                            currentCell = navigator.Cells[1, currentColumn];
                            currentCell.Value2 = cellName;
                        }
                        currentCell = navigator.Cells[currentRow, currentColumn++];
                        currentCell.Value2 = value;
                    }
                    currentRow++;
                    currentColumn = 1;
                }
                scanner.close();
                tbl.close();
            }
            catch (Exception ex)
            {
                throw ex;
            }
            return names.ToArray<string>();
        }

Generic filtering and frequency user interface

Below is a close-up screenshot of the HBase pane. The interface in the View Table group allows the user to point to a Hadoop implementation, choose a table and the number of records to load into the active worksheet. Once that is done, the user can then define a MapReduce job using the controls in the Map Reduce group.

The user interface allows filtering on any one column. The combo box control labeled Choose filter column contains all the column names in the form family:cell. The text box labeled FilterValue is the filter which elides all rows where the chosen column  doesn’t match the filter value. The combo box labeled Column to Count is used to choose the column whose values will be grouped and counted. The above values ask the question: “What are the pages—specifically the frequencies of the pages— visited by the IP address 80.240.129.183“.

When the button, Map Reduce, is clicked, this handler is invoked:

        private void onMapRedButtonClick(object sender, EventArgs e)
        {
            this.filterColumn = this.filterComboBox.Text;
            this.filterValue = this.filterValueTextBox.Text;
            this.frequencyColumn = this.frequencyComboBox.Text;
            Excel.Worksheet activeWorksheet 
                = ((Excel.Worksheet)Globals.ExcelHBaseAddIn.Application.Worksheets[2]);
            activeWorksheet.Name = "Frequency";
            Excel.Range navigator = activeWorksheet.get_Range("A1");
            // most of the fun stuff happens here
            int numRows = MapReduce.executeMapReduce(navigator
                , this.filterColumn
                , this.filterValue
                , this.frequencyColumn
                , this.hostName.Text
                , this.tableName.Text);
            // autofit the range
            Excel.Range c1 = activeWorksheet.Cells[1, 1];
            Excel.Range c2 = activeWorksheet.Cells[numRows, 2];
            this.cols = activeWorksheet.get_Range(c1, c2); 
            this.cols.EntireColumn.AutoFit();
            // bring the worksheet to the top
            activeWorksheet.Activate();
        }

All the work is done by the method MapReduce.executeMapReduce(), partially shown below. The .NET-to-Java method call, HBaseToLinq.FrequencyMapRed.executeMapRed(), is almost the same Java code used in the previous lab, Building a LINQ provider for HBase MapReduce. The only modifications have been to remove hard-coded column names, instead using the programmatic column names for filtering and frequency counts chosen by the user. The method then scans the results of the MapReduce job stored in the table, summary_user, and loads them into a worksheet, returning the number of records in the results table.

            try
            {
                HBaseToLinq.FrequencyMapRed.executeMapRed(hostName
                    , tableName
                    , frequencyColumn
                    , columnToFilter
                    , filterValue);
            }
            catch(Exception ex)
            {
                throw ex;
            }
            Configuration hbaseConfig = HBaseConfiguration.create();
            hbaseConfig.set("hbase.zookeeper.quorum", hostName);
            try
            {
                string cellName = 
                     frequencyColumn.Substring(frequencyColumn.IndexOf(":") +1);
                string familyName = 
                     frequencyColumn.Substring(0, frequencyColumn.IndexOf(":"));
                HTable tbl = new HTable(hbaseConfig, "summary_user");
                Scan scan = new Scan();
                ResultScanner scanner = tbl.getScanner(scan);
                Result r;
                while ((r = scanner.next()) != null)
                {
                    rowKey = Bytes.toString(r.getRow());
                    count = Bytes.toInt(r.getValue(Bytes.toBytes(familyName)
                         , Bytes.toBytes("total")));
                    currentCell = navigator.Cells[currentRow, currentColumn++];
                    currentCell.Value2 = rowKey;
                    currentCell = navigator.Cells[currentRow++, currentColumn];
                    currentCell.Value2 = count;
                    currentColumn = 1;
                }
                scanner.close();
                tbl.close();
            }
            catch (Exception ex)
            {
                throw ex;
            }
            return currentRow - 1;

Here’s a screen shot of the Excel add-in after performing the MapReduce.

Visualizing data

Data visualization through graphs and charts is an important final step when investigating and analyzing data. Clicking on the button Chart Frequencies causes the add-in to create a stacked column chart of the Frequency worksheet. Here’s the code for the handler, onChartFrequenciesClick().

        private void onChartFrequenciesClick(object sender, EventArgs e)
        {
            Excel.Workbook wb = Globals.ExcelHBaseAddIn.Application.ActiveWorkbook;
            Excel.Chart chart = (Excel.Chart)wb.Charts.Add();
            chart.ChartType = Excel.XlChartType.xlColumnStacked;
            chart.SetSourceData(this.cols, Excel.XlRowCol.xlColumns);
            chart.HasTitle = true;
            string filterName = this.filterColumn.Substring(this.filterColumn.IndexOf(":") + 1);
            string frequencyName 
                 = this.frequencyColumn.Substring(this.frequencyColumn.IndexOf(":") + 1);
            chart.ChartTitle.Text = "Frequency of " 
                  + frequencyName 
                  + " when " + filterName 
                  + " = " + this.filterValue;
        }

This screen shot of the add-in shows the resulting chart. Notice that the MapReduce columns for filtering and frequency are different than the previous example. Here, the question being asked is “What is the frequency of visiting IP addresses for the page, /cats.html”.

Conclusion

Building an Excel add-in that supports viewing any HBase table of column families and provides filtering and Map Reduce frequency counts is relatively straightforward. Leveraging the HBase Java client APIs using JNBridgePro to create .NET proxies is key to the simplicity.  By keeping the MapReduce job both on the Java side and generic, any table can be filtered and reduced to frequencies of one particular column.

The source for this example can be downloaded here.

Posted by William Heinzman | in JNBridgePro, Labs | Comments Off

JNBridge JMS Adapter for BizTalk Server supports BTS 2013

Monday, Apr. 15th 2013

Microsoft has released BizTalk Server 2013, and the good news for users of our JMS adapter for BizTalk Server is that it already works with the new BTS release.

Simply install and configure the adapter in exactly the same way as you did with earlier versions of BizTalk Server.  It just works!

Posted by Wayne | in Adapters, Announcements, BizTalk Server | Comments Off