Simple database operations via HTTP with Node JS, MongoDB and Bluemix

In my previous posting I mentioned how I’d planned to harvest some of the useful fragments from a home project that might be useful to others. This time I thought I’d capture the simple mechanism I created to keep an audit database for my application using MongoDB running on Bluemix.

I opted for Mongo due to its simplicity and close integration with JavaScript — a perfect option for creating something quickly and easily to run in a Node JS environment. Mongo is a NoSQL database, meaning that you don’t have to define a specific schema for data, you simply store what you need as a JSON object in the form of key/value pairs. This means you can store and retrieve a wide variety of different data objects in the same database, and aren’t constrained by decisions made early on in the project if your needs change. Whilst it wasn’t a design point of mine, Mongo also is designed to scale too.

As described previously, I’m using the Node JS Web Starter boilerplate as my starting point. I’ve previously added the Twilio service, now to add Mongo, I simply select the MongoLab service from the Data Management set of services on Bluemix console, and add it to my application.

Screen Shot 2014-08-06 at 15.55.50When you create the MongoLab service for the app, Bluemix provides a link to the MongoLab admin page. The nice thing about MongoLab as a service provider is that it gives you nice user friendly tools for creating Collections, reviewing documents etc. I created a collection in there called easyFiddle using the web console.

Screen Shot 2014-08-06 at 16.01.55

Having configured the Mongo Collection, the next step is to make sure that the Mongo libraries are available to the Node JS environment. As with Twilio before, we simply make sure we have an entry in the package.json file.

{
   "name": "NodejsStarterApp",
   "version": "0.0.1",
   "description": "A sample nodejs app for BlueMix",
   "dependencies": {
      "mongodb": "1.4.6",
      "express": "3.4.7",
      "twilio": "^1.6.0",
      "jade": "1.1.4"
   },
   "engines": {
      "node": "0.10.26"
   },
   "repository": {}
}

Just as before, Bluemix will handle the installation of the packages for us when we push the updates to the server.

Within our code, we now need to instantiate the Mongo driver objects with the credentials generated by Bluemix for the MongoDB instance running in MongoLabs. Bluemix supplies the credentials for connection via the VCAP_SERVICES environment variable.

var services = JSON.parse(process.env.VCAP_SERVICES || "{}");
...
var mongo = services['mongolab'][0].credentials;

We will reference the mongo object to retrieve the credentials when we connect to the database.

As I did with Twilio, I am using a simple HTTP-based service that will in this case create an audit record in a database. I’m using Express again (as described previously), together with the same basic authentication scheme. My service works on HTTP GET requests to /audit with two query parameters device and event.

// Leave out the auth parameter if you're not using an 
// authentication scheme
app.get('/audit', auth, function(req, res) {
   var theDevice = req.param("device");
   var theEvent = req.param("event");

Now it’s a case of connecting to Mongo, and inserting a JSON object as a document to contain the two parameters.

   mongodb.MongoClient.connect(mongo.uri, function(err,db) {
      // We'd put error handling in here -- simply check if 
      // err is set to something
      var collection = db.collection('easyFiddle');
		
      var doc = {
         "device": theDevice, 
         "event": theEvent, 
         "date": new Date()
      };
		
      collection.insert(doc, {w:1}, function(err, result) {
         // Again, we'd put error handling in here
         res.json(doc);
      });			
   });
});

And that’s it. We can now create an audit entry using our browser if we choose, with a URL that looks like:

http://appname.mybluemix.net/audits?device=TEST&event=TESTING

I’ve added other services using the same method to variously query and delete all the records in the Collection. Whilst I’ll not include them all here, note that the syntax for deleting all the records in a collection is a bit non-obvious — the examples show you how to delete a record matching a given key/value pair, but are less clear on how to delete them all. You do so simply by supplying a null instead of a name/value pair the method call:

collection.remove(null, {w:1}, function(err, result) {
   // Error handling
   // ...
});

Note that the result variable will contain the number of records deleted.

Hopefully this posting has helped get you going anyway. A great resource to help you navigate your way around the Node JS API for Mongo can be found in the MongoDB documentation.

Sending SMS messages using Twilio and Bluemix

I’ve been tinkering with an Internet of Things project at home for a while which I’ll write up in due course, but in the course of doing so have knocked up a few useful fragments of function that I thought I’d share in case other people need them. The first of these is a simple Node.js app to send an SMS message via Twilio using IBM Bluemix.

There’s lots of material on Twilio and Bluemix but by way of a very rapid summary, Twilio provide nice, friendly APIs over telephony-type services (such as sending SMS messages), and Bluemix is IBM’s Cloud Foundry-based Platform-as-a-Service offering to enable developers to build applications rapidly in the cloud. Twilio have created a service within Bluemix that developers can pick up and use to enable their applications with the Twilio services. One of the things I wanted for my application was a way of notifying me that something had happened, and a text message suited my needs nicely. Twilio provide a free trial service with a few restrictions which you can upgrade once you want to do something serious.

To begin with, I created myself a Node application on Bluemix using the Node JS Web Starter application boilerplate provided:

My approach was to create a simple HTTP-based service that I could invoke with the destination phone number, and the message itself as parameters. To make the Twilio service available to my Node application, it was simply case of adding the service to my application in Bluemix. Twilio is listed as one of the Mobile services:

Once you have added the Twilio service, you configure it in Bluemix by providing the Account SID and Auth Token values that you find on the account details page once you have registered and logged in to Twilio.

The Node JS Web Starter boilerplate creates a simple template for a web server that serves up pages using the Express framework on top of Node. Express is handy, in that it provides a useful framework for handling HTTP requests, so I decided to stick with it for my HTTP service. The first change I needed to make to the boilerplate was to add a reference to Twilio in the package.json file so that the modules would be available to my code.

 
{
   "name": "NodejsStarterApp",
   "version": "0.0.1",
   "description": "A sample nodejs app for BlueMix",
   "dependencies": {
      "mongodb": "1.4.6",
      "express": "3.4.7",
      "twilio": "^1.6.0",
      "jade": "1.1.4"
   },
   "engines": {
      "node": "0.10.26"
   },
   "repository": {}
}

When you push your updated code to Bluemix, Bluemix automatically does the npm install to go and fetch the modules based on the package.json.

Within the app, you then need to set up the Twilio package ready for sending messages. First, we need to require the Twilio package so we can access the service from our code, and then retrieve the Account SID and Auth Token values configured in Bluemix from the VCAP_SERVICES environment variable that Bluemix provides to the Node runtime.

var twilio = require('twilio'); // Twilio API
...
var services = JSON.parse(process.env.VCAP_SERVICES || "{}");
...
var twilioSid, twilioToken;
services['user-provided'].forEach(function(service) {
   if (service.name == 'Twilio-ph') { 
      twilioSid = service.credentials.accountSID;
      twilioToken = service.credentials.authToken;
   }
});

Note that Twilio-ph is the name I gave to the Twilio service when I added it to my application in Bluemix, yours may vary so remember to change it if different.

The environment is now set up, so now we need to create our HTTP handler using Express to form the basis of our service.  I’ve added Basic Authentication to my handler to prevent spam to my Twilio account, this is nice and easy to do using Express.

// Configured environment variables to protect Twilio requests
var USER = process.env.USER;
console.log("USER = "+USER);
var PASSWORD = process.env.PASSWORD;
console.log("PASSWORD = "+PASSWORD);

// Basic authentication to restrict access to my services.
var auth = express.basicAuth(USER, PASSWORD);

I’ve used environment variables that I’ve set in the Bluemix environment, clearly in a production environment one would use a proper directory. You can set your own environment variables within your application by going to the Runtime view of your application and selecting the USER-DEFINED button on the main panel.

The HTTP handler simply looks for the URI pattern /twilio as a GET request and reads the destination telephone number and the message content as query parameters. The auth object passed in applies the Basic Authentication rule defined previously.

app.get('/twilio', auth, function(req, res) {
   var toNum = req.param("number");
   var twilioMessage = req.param("message");
   var fromNum = 'your Twilio number';

Your Twilio number can be found on the Twilio account details page.

Twilio make it really easy to send a message using their twilio.RestClient API wrapper class. You simply instantiate an instance of the twilio.RestClient class, and invoke the sendMessage method with a JSON object containing parameters describing who the message is to, the number it is sent from and the message to be included in the SMS. You provide a callback function that is invoked when the request is completed.

   var client = new twilio.RestClient(twilioSid, twilioToken);
   client.sendMessage(
     {
         to: toNum,
         from: fromNum, 
         body: twilioMessage
      }, 
      function(err, message) {
         if (err) {
            console.error("Problem: "+err+": "+message);
            res.send("Error! "+err+": "+message);
            return;
         } else {
            res.send("Done!");
            console.log("Twilio message sent from "+fromNum+
            " to "+toNum+": "+twilioMessage);
         }
      }
   );
});

Once deployed, the service can be invoked with a URL in the form described below:

http://appname.mybluemix.net/twilio?number=number&message=mess

If invoked programmatically the authentication credentials will be required in the HTTP header. If tried from a browser, the browser will prompt for a username and password combination. Ultimately you’ll receive an SMS on your phone:
twilio sample
And there it is. I can now trigger SMS messages either from my NodeRED flows, browser or any other apps I might write.

Fixing jQuery Mobile headers in a Worklight app on iOS 7

One of the fun bits (depending on your inclination) of working cross-platform is discovering and mitigating the nuanced differences as you try your app on different devices. One such difference in iOS 7 is the transparency of the iOS status bar that contains the wifi strength, battery life and so on.

iOS header area

If you’re not including a header in your app then this won’t make a whole lot of difference to you, but if you are, you’ll find the iOS status bar overlays your header which can mess up your carefully placed buttons, iconography and header text.

I’ve come up with a simple workaround for jQuery Mobile running in a Worklight environment that I’ve posted here for ease of reuse and in case anybody else is looking for similar. The same principle should apply equally in a vanilla Cordova app too.

My example uses a simple jQuery Mobile header on a page.

<div data-role="page" id="mypage">
   <div data-role="header" data-position="fixed" 
      data-fullscreen="false">
      <a href="#home" 
class="ui-btn ui-icon-back ui-btn-icon-notext ui-corner-all"
>Back</a>
      <h1>My heading</h1>
   </div>
   <div data-role="main" class="ui-content">
      Blah blah
   </div>
</div>

The overlap of the status bar is 20 points, so when the app renders we need to first detect whether we’re on a version of iOS that needs adjusting, then fix the elements contained in the header to allow for the status bar height.
For the purposes of demonstration I’ve simplified the below just to test for the iOS version assuming an Apple device, but of course you can add further tests for other platforms.

<head>
//... head contents
<script>
function onDeviceReady() {
   if (parseFloat(window.device.version) >= 7.0) {
      $('H1').each(function() {
         // `this` is the h1, the padding goes on the
         // containing header div.
         $(this).parent().css("padding-top", "20px");
         // sort any buttons/icons rendered from A tags too
         $(this).siblings('A').css("margin-top", "20px");
      });
   }
}
// Fire when Cordova is up and ready
document.addEventListener("deviceready", 
                           onDeviceReady, false);
</script>
//... rest of head contents
</head>

The logic of script searches for h1 tags on the assumption that they will be used for header text. If your interface is styled differently, you might want to come up with a different “eye-catcher” tag or attribute so jQuery can select all the nodes for the headers in the app. Having found the h1 tags, it then adjusts the containing div nodes to pad out from the top by the required amount. My application has a tags for buttons in the header, which a bit of experimentation showed were not being adjusted along with the containing div, so I’ve adjusted them directly.

Notice that I’ve used CSS padding for the h1 – this means that the additional offsetting will be filled with the natural background scheme of the header, rather than a blank white oblong which would occur if margin were used. The jQuery icons for my back link get distorted by tinkering with the padding, so I’ve used margin which works just fine as they layer over the top of the heading colour scheme.

Mobile web frameworks, and other religious debates

It is an interesting litmus test of the maturity of any given technology trend as to when it starts to develop its own set of heated points of debate and argument. We had the “browser wars” of the late ‘90s, and then proprietary plug-ins versus AJAX/open web and more recently which AJAX framework is “best”.

The rise of mobile apps as a party that everyone wants to be at has further amplified this frameworks debate, as the focus has evolved from AJAX on the desktop to the mobile platforms. A quick Google and you’ll find any number of fora debating the merits of jQuery Mobile vs Dojo Mobile vs Sencha Touch and so on.

So there are several, which one is best then?

In fact participation in such debate in isolation is ultimately futile. That a particular topic becomes the subject of almost religious fervour in itself betrays that absolute truth either is either very hard or impossible to prove. They key to finding an answer is understanding the context. What is best for one situation may not be best for another, and to suggest otherwise would do the asker of the question a disservice, assuming they are asking for help.

There are a number of considerations though that can help navigate to what best might be.

You’re at the mercy of consumers

Technical debate is all fine and good, but in the mobile world, we know that consumers will decide the success or failure of the app. A poor experience by the end user will ultimately be its undoing. The framework must be able to meet the experience expected by the users. This is of course a key factor in determining whether a native or mobile web/hybrid approach is applicable in the first place, but that is another discussion entirely.

Don’t forget also that user experience and aesthetics are two different things – nice transitions or shading will never rectify a fundamentally flawed user experience. Rejecting a framework purely because it apparently contains less bundled eye candy than alternatives still may not mean you’ve chosen wisely.

A green field is increasingly rare

Even in the evolving world of mobile, it is increasingly likely that there will be some existing apps with which the new apps will have to live happily. A few things to consider might be:

  1. Is there already an incumbent framework?
  2. Is the existing framework capable of building what is required to the right quality in the given timescales?
  3. Are the developers effective using it?

If the answer to the above is a clean sweep of “Yes”, then unless there is a non-technical reason why the existing framework should be abandoned,then that probably suggests that sticking with what is there is the best option.

A hygiene factor for any technology decision, but an important consideration is the current position in the “marketplace” of a given framework. Is the framework under consideration acknowledged by other developers (and vendors) as strategic, or are references thin on the ground?

Skills matter

The accelerated lifecycle of the mobile world means that development time is at a premium. Adopting a framework or approach that is a closer match to the skills available within the organisation means greater opportunity for reuse both of assets and skills, and shortens the time required for developers to get up to speed. Related to the previous consideration, if there is an incumbent framework and the decision is made to replace it then selecting a replacement with some similar characteristics would make sense – e.g. script-centric vs markup centric.

It’s still a browser

The growth of AJAX as a technique in general has placed far greater expectation on the browser environment in terms of its criticality to the delivery of the application. It is easy to forget that for all the enhancement and development since the birth of the internet, fundamentally a browser renders documents, and Javascript is there to augment that core purpose. I’ve always been fairly sceptical of attempting to layer on more tiers of engineering into the browser than are absolutely necessary.

So when looking at the various frameworks,  it should be borne in mind that it’s not necessarily the same as a package selection exercise with enterprise software products. Looking at one framework for the UI, another for handling MVC, another for service invocation and so on may well be overcomplicating things unless that specific combination is absolutely the only way to deliver the experience. It is relatively straightforward, for example, to create a simple MVC framework within most mobile frameworks without introducing the complexity and bloat of yet another framework.

Horses for courses

And finally a variation on the consultant’s answer of “it depends”, but it is certainly true that choosing the right framework depends on what you want to do with it.

For example, I like prototyping using jQuery for its lightweight CSS/HTML-centric approach, whereas for construction of reusable components in an enterprise app I can see where the Dojo Toolkit with its Java-like packaging and widget framework has its strengths. That’s not to say you can’t prototype that way in Dojo or indeed create widgets in jQuery, just they each have different strengths depending on the use for me personally. So a key consideration here when evaluating a framework is determining what its core use is going to be – for example, do you need to make a strategic decision for a new service or are you looking to put something disposable together quickly? In the latter case, depending on skill levels some may choose not to use a framework at all.

Systems of Engagement 101

The emerging trend of Systems of Engagement is growing increasingly popular in the field of consumer and business applications and has been a frequently occurring topic of conversation for me recently with clients. There is an expanding body of materials on the subject, not least this excellent presentation from its originator Geoffrey Moore, but I wanted to capture my own quick snapshot in the form of a simple primer on the subject.

What are Systems of Engagement?

Systems of Engagement refer to a new generation of IT systems to support consumers and knowledge workers in the achievement of their objectives. Systems of Engagement optimise the effectiveness of the user by providing the required responsiveness and flexibility to deal with the fluidity of everyday life.

Haven’t we had these for a long time?

For many years, the types of applications organisations have invested in what are often referred to as Systems of Record, such as customer relationship management (CRM) tools and transactional consumer applications such as online banking applications. These tools clearly are beneficial, but at the same time have limitations since

  • they typically enable only a subset of the process to achieve real outcome desired, and
  • are constructed in terms of the provider’s world view, rather than the consumer’s.

For example online banking systems offer access to transactions and products, whereas the consumer’s overall objective might be something far more complex, such as moving house. Systems of Record support a model of interaction through sporadic, episodic transactions.

So why Systems of Engagement now?

Systems of Record are largely built out to the extent that they now offer a diminishing competitive advantage for organisations because most have now them. Cloud delivery models also mean that they are becoming increasingly commoditised, decreasing competitive return on investment even further. Systems of Record grew out of a time when differentiation was achieved through greater efficiency through IT systems. Consumer smartphones and social tools have created far higher expectations of what IT can deliver and this has shifted the emphasis for differentiation onto the systems that provide the greatest degree of effectiveness to the consumer. In contrast to Systems of Record, Systems of Engagement support a model of continuous interaction.

What are some attributes of Systems of Engagement?

Whilst opinions vary, the Harvard Business Review describes nine traits that define Systems of Engagement that I think serve as a good starting point:

  1. Design for sense and response.
  2. Address massive social scale.
  3. Foster conversation.
  4. Utilize a multitude of media styles for user experience.
  5. Deliver speed in real time.
  6. Reach to multi-channel networks.
  7. Factor in new types of information management.
  8. Apply a richer social orientation.
  9. Rely on smarter intelligence.

How are they constructed?

Clearly for systems such as that described above to be achievable, it follows that different technology is required to that of traditional Systems of Record. There are four major new technology trends that are key enablers for Systems of Engagement now and in the future:

  • Mobile devices that provide a ubiquitous entry point for the user wherever they are, and that can now provide richer context for the service provider (such as location) to offer better targeted services.
  • Social tools that provide “people integration” capabilities to glue together complex elements of the human workflow associated with achieving an outcome.
  • Analytics and Big Data to provide richer capabilities to engage with users with the benefit of a far broader supporting context, and proactively interact with the user with relevant beneficial services.
  • Cloud computing as a common delivery model for consuming services in a consistent way, wherever the user may be and from whichever device they choose. Cloud also enables organisations to move Systems of Record outside their premises and focus on differentiating Systems of Engagement.

Does this mean Systems of Record are obsolete?

Not by any means. Systems of Record have a key role to play since their efficiency and robust qualities of service will continue to underpin business processes. A bank will still need to reliably process transactions, and a retail store will still need to maintain inventory levels. The real power of this new trend will be the interactivity of Systems of Engagement and efficiency of Systems of Record harnessed together.

This sounds like a lot of work?

Certainly to re-engineer every existing touchpoint with every user would be many, many years of development and investment for any organisation. However, if Systems of Engagement will be the source of differentiation for organisations then doing nothing is also unlikely to be a sustainable option. The key will be identifying and understanding the most critical moments of engagement and looking to improve them in a prioritised and pragmatic fashion.

Who will benefit from Systems of Engagement?

Potentially all parties could benefit. There is certainly an upside for Systems of Engagement for the consumers of their services and the organisations they serve, be that enterprise users or consumers. Systems of Engagement focus competitive differentiation on effectiveness of the people using them, rather than purely on the organisation providing the service as is the most often the case with Systems of Record, so it is an indication of the increasing empowerment of the end user. In addition, in adopting a Systems of Engagement approach organisations are in a position to steal further competitive advantage over and above what they achieve through their Systems of Record.

Trends in Big Data requirements

Big Data is still emerging and maturing as a style of solution for particular types of problems. The current challenge for both the IT industry and business leaders is to try and make sense of what opportunity Big Data thinking and related technology really creates in an applied sense. It may be that in fact one day we will simply drop the “Big” prefix – today’s “Big” data will naturally mature into augmentations of standard information management architectures. For today, however, as with all new things we are still learning about the possibilities.

Common patterns for Big Data

Even at this early stage on the Big Data journey, we have discovered some specific use cases. In the IBM ebook “Understanding Big Data”, the authors describe six recurring patterns or fruitful areas for Big Data that they have identified during client engagements:

  1. IT for IT log analytics.
  2. Fraud detection.
  3. Social media analytics.
  4. Call centre interaction analytics.
  5. Financial risk modelling and management.
  6. Big data and the energy sector – analytics of sensor data.

These reflect the collective experiences with Big Data thinking and technology to date, and it started me thinking about how that list could grow with new scenarios  aligned to business outcomes that will resonate within a variety of industries.

Take an example of a bank that is trying to attract new customers from a particular demographic to a premium product with various incentives. They want to select the right incentives to maximise the return on their investment in the new product, gain market share from competitors and attract “good” customers (and so on). None of that business intent contains the words “Big” or “Data” yet we know from our early experience that social media analytics has a role to play in terms of better understanding the target audience and, importantly, the competition during product development. So how did we get there?

From use cases to business themes

There will clearly be many more such scenarios that we have not yet unearthed, and so this has caused me to consider whether underlying the known set of patterns that we understand today there is a set of business themes that will help us identify future use cases for a Big Data style of solution. In taking a step back, we might hopefully become better equipped to take many steps forward into the specifics once again.

In order to test this theory, I’ve identified five such themes based on my own experiences with Big Data in the field to date and insight gathered from colleagues and various papers and lectures on the subject. They are as follows:

  1. Augmenting a partial view of an entity or process.
  2. Understanding people better.
  3. Improving management information.
  4. Increasing confidence in decision making.
  5. Supporting partnership and value creation.

The first thing I will note is that there is natural overlap between some (or indeed possibly all) of the above when listed together. Once taken to a suitably high level, the lines between any group of related concepts naturally blur. However the intent is that depending on the mindset and perspective over the business problem at hand, one may well recognise one (or some) more strongly than others. Having done so, one may hence consider that Big Data may have a role to play within a technology solution. This is based on personal perspective, so there may well be other themes I’ve not yet identified.

A short summary of each of the themes I have identified follows.

Augmenting a partial view of an entity or process

This theme speaks to the notion of “Big” as meaning that the underlying data is gathered from a broader variety of sources than the traditional enterprise data warehouse or other data sources within the firewall of an organisation. It is often the case that the success of a particular business process has critical dependencies on external factors outside the direct control of an organisation – for example the weather.

Whilst of course we cannot directly influence something like the weather, if we can analyse its relationship to understand, say, how it affects the performance of our logistics processes against service levels, we can better tune the elements of the process we do control based on that insight. This also speaks to the financial risk modelling pattern mentioned earlier. If we can glean any further insight from external sources as to the position of the counterparties upon which we are dependent, say, we are far better informed to manage our risk position effectively.

Understanding people better

Whatever the core business of the organisation, it is highly likely that at some point meeting a particular business challenge requires a better understanding of people. Possible scenarios might range from a deeper understanding of customer preferences and needs, to understanding the morale of the workforce. Human beings are of course not digital entities and as such operate in an inherently unstructured, unpredictable and fluid manner, whether that is in written text, spoken word or implicitly via their actions.

We can try and impose a structured approach such as a survey or questionnaire, but that is a model that is inherently limited in its breadth and also its ability to capture the finer nuances of opinions implicit in behaviour or the spoken word. By gathering a large volume of data from a variety of sources, be that social media, call centre logs, explicit surveys and the digital footprints of individuals (e.g. entering and leaving a building), we are likely to build a much more accurate picture. Furthermore, we start to build an implicit picture rather than one aligned to the set of explicit questions or pathways we may have led them to.

Improving management information

Closely related to the first theme of an augmented view of a key entity, it is often a reality that an organisation often lacks the level of basic information from its core systems that it would ideally desire to run the business effectively. In seeking to address this issue, we discover that the supporting IT systems were not designed to support the reporting required, or indeed are constructed from a variety of technology that renders the solution complex and costly to modify (or replace) to meet the business need.

Whilst the formal metrics may not be explicitly codified into the solution, a Big Data approach views the vast quantities of “digital exhaust” typically generated by the IT systems as a valuable source to be harvested. By harvesting this output, we can begin to deduce certain of the key performance indicators required in a more cost effective fashion. Taking an approach that uses Big Data principles offers at least an alternative to a long and costly integration or replacement exercise, and has the potential to offer more benefits more quickly. It is important to note also that this theme applies both to applications supporting the line of business, and also the business of IT within the organisation. For example, harvesting server logs in conjunction with support ticket data and call records could yield valuable insights into driving operational efficiency within IT support functions.

Increasing confidence in decision making

Rather than decision making in general (which it could be argued all analytics or business intelligence supports), this theme refers to specific, fine grained business decisions such as whether to extend a line of credit, whether a loan application might be fraudulent or indeed where to allocate stock in a retail chain. Today such decisions are supported by IT systems that are fuelled by large quantities of structured data gathered from a discrete set of sources closely related to the business.

This theme, therefore, derives from the recognition that in addition to these traditional, structured data sources, confidence can be further increased by assessing a broader variety of inputs. For example, mixing social media data with traditional forecasting and inventory data in retail could provide invaluable early insight into coming retail trends in regions ahead of the demand. This could be the difference between sales won and sales (and customers) lost to competitors. Similarly, building a richer picture of an individual (or demographic) or an organisation can only lead to a refined decision making process when deciding whether to issue credit or check for fraudulent activity.

Supporting partnership and value creation

An alliance between two organisations leads to a spectrum of possibility in terms of business model innovation, and also from an IT perspective necessarily has a multiplier effect on the data already available and subsequently created. In this context, a Big Data approach can add considerable value in terms of realising benefit from this increased variety of data, both in terms of the increased variety of data consumed and created, and also the inherent flexibility and speed to value elements of Big Data technology.

Firstly, the data itself may have provided the original impetus for the alliance – each organisation holds pieces of the jigsaw and by bringing the pieces together, they both realise shared advantage. For example a bank and a retail chain may decide to collaborate with their focus on driving increased revenues through richer customer analytics. Big Data thinking in this context provides the thought processes and technology tools to help realise that innovation quickly and cost effectively. Secondly, having developed a shared offering, the resulting service will generate a “digital exhaust” and bi-products quite unlike anything either party could have produced themselves.

In summary

We are at the beginning of the Big Data journey, and one of the most exciting aspects is that we are still scratching the surface of what might be possible if the current pace of technology evolution continues. The above list will doubtless look different in five months time, let alone five years and is in no way meant to be exhaustive, but hopefully the approach will help identify further opportunities for Big Data to drive the business agenda forwards, and develop our set of applicable use cases further.

Big Data – what’s the Big Idea?

My first technology post (in fact post of any kind) for a while. As in the past I’ve decided to commit to my blog thoughts that are whirling around my head that I don’t want to lose, and am interested to share with others that mind find it. Views, of course, are my own and not necessarily those of IBM.

I’ve recently been developing a paper for use inside IBM on the topic of Big Data in the context of Financial Services. I have been working with Big Data technologies in a variety of contexts for the past year or so, and the paper has been a good opportunity not only to explore the topic with my peers, but also to take stock of what I have learned in that time. Whilst the paper is an IBM-specific view, in the process I have been refining my own point of view, and that is what I’ve decided to record here as a series of observations that I’ve made in this time.

Thanks to Mark for his additional review and comments.

What’s in a name?

As technicians we are naturally wont to try and find the absolute meaning of any given piece of terminology, which means that when terms like “Big Data” or “Cloud” come along, a lot of time is spent deciding on what the “true” meaning really is. Published definitions of Big Data vary, generally tend to be at a high level, and reflect the wider strategy of the organisation making it. For example, the IBM web site defines Big Data in the context of the increasingly connected and instrumented world in alignment with the Smarter Planet agenda:

“Everyday, we create 2.5 quintillion bytes of data–so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: from sensors used to gather climate information, posts to social media sites, digital pictures and videos posted online, transaction records of online purchases, and from cell phone GPS signals to name a few. This data is big data.”

A cursory look on Wikipedia yields a less applied definition as follows:

“Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set.”

I could go on but sufficed to say, trying to tie Big Data down too firmly is clearly not helpful. What is interesting is examining some of the definitions of the term that I have heard myself in a variety of fora, such as:

  • Social media analytics.
  • Hadoop and MapReduce
  • Stream analytics and complex event processing.
  • Unstructured data.
  • Data gathered from smart energy meters.

It is tempting in such circumstances to critique each example for accuracy and completeness against a chosen definition, but in the end I have reached the conclusion that the answer is that Big Data is all of the above, and many more things besides. This leads me to my first conclusion:

Big Data as a term is deliberately open to interpretation to accommodate a variety of possible lenses through which to view it, and the many and varied definitions reflect this variety.

Noticeable traits of Big Data scenarios

The format and structure of the data are not constrained to those of traditional business data models

One of the key themes of Big Data is the removal of traditional constraints around the type of data that can be leveraged in support of the business. Taking a Hadoop-type environment as an example, a key advantage is that data of any kind can be harnessed quickly from its raw format, without the need for a full scale data modelling exercise.

It is important to position how some of the Big Data technologies fit with the traditional data warehouse approach. One clear difference is the nature of how the data is stored and made available for analysis. Traditional data warehouses store data in well-defined structures to support Online Analytical Processing (OLAP) in the context of business intelligence initiatives. Typically a data warehousing project involves significant analysis to determine the business data structures into which the data is to be loaded for consumption in this way.

In a Big Data scenario, the source data is typically accessed in its raw format e.g. log files, audio, text. There can be a number of reasons for this, ranging from the sheer volume of data that would make traditional handling inefficient and costly to the uncertainty of the requirements and primitive nature of the data which would render a traditional data modelling exercise extremely difficult. Furthermore, the rapidly changing nature of Big Data sources, the business pressures of time to market and agility and the fact that we are only just starting to understand the possibilities also means a traditional approach is unlikely to be effective.

Data may be sourced from a variety of sources inside and outside the enterprise, including the public internet.

Another key point is that from a data ownership perspective, it may not just be about you any more. The “Big” in Big Data may refer to size, but it equally true may refer to scope — i.e. bigger than one organisation alone. It may of course refer simply to sources within an enterprise that have not been put together before, for example analysis of call centre records combined with an existing data warehouse. Social media analytics of the public internet is a good example where data beyond the “four walls” can be integrated with business-as-usual processes to improve performance.

The data itself may be analysed either in static data store or as a continually changing data flow.

As discussed previously, Big Data embraces a multitude of interpretations, one of which is the concept of “Big” indicating speed of data movement, or at least that the underlying data set may otherwise be fluid and/or with a temporal element to the business use case.

Again, the field of social media analytics offers a good example, wherein we are harnessing a constantly varying source of data. This in turn may be coupled with a fluid stream of business queries — for example, measuring the impact of recently-launched or enhanced marketing campaigns. This is a good example of a varying data set where the analysis occurs on a static, point-in-time snapshot of the data — data “at rest”.

In Financial Markets, algorithmic trading is a well known example where”Big” refers to the velocity of change, and the demand for fast response time. In this scenario, the data is analysed “in motion” as a continuous stream, with the Big Data tools providing the capability to spot potentially valuable patterns that indicate particular circumstances are occurring, in this case an order being made automatically at the right time.

Requirements for applications in the environment are often fluid and evolutionary.

As discussed above, to a large degree this is unsurprising given the emerging nature of the subject area. Technology-led exploration grows an increased appreciation of “the art of the possible”, and technologies such as Hadoop are very amenable to agile, rapid experimentation — indeed, one of the key value propositions of Hadoop is the ability to get started quickly and cost effectively, and the agility of the environment.

The ability for technology to handle Big Data in solving business problems removes some of the traditional IT constraints on thinking, and this naturally tends towards an exploratory approach to innovation with analytics. The flexibility inherent in IT tools such as Hadoop enables new degrees of business innovation, potential for value creation, and differentiated products and services. Factor into this the highly competitive and market-driven nature of consumer-facing fields such as retail and consumer finance, and this is a recipe for an ever-changing set of requirements.

“Big” is a subjective measure and specific to the context in question.

“Big” is very much in the eye of the beholder — earlier in this post I talked about the variety of definitions for the term Big Data, and largely this stems from the use of this inherently subjective term. “Big” to a business analyst at a bank may mean too many rows for their standard spreadsheet to handle any more. On the other hand, “Big” to a data-centric organisation Google means something different entirely.

Another definition of “Big” is not as a measure as such, but as an indicator of being “outside of conventional bounds”, for example drawing in data from social media or third-party organisations. In this sense “Big” becomes synonymous with “uncharted” and possibly “hard to manage” within the confines of the traditional enterprise scope.

Having concluded that there are many possible perspectives on Big Data, there is an emerging set of recurring attributes of a Big Data environment when one drops down a level of detail to examine the technical requirements.

Business scenarios for Big Data

It is interesting to note that the terminology itself is inherently technical, which instinctively leads a lot of the current thinking into the world of implementation technology. This naturally leads to a “bottom up” view of the problem space — i.e. here is what particular technology allows you to do, now think how you can apply that capability to your business and see what fits. From a technologist’s perspective, this is exciting because one can see the possibilities, and this natural enables an entrepreneurial approach to IT. This can however end up becoming the archetypal technical solution looking for a problem.

It is interesting to note that there is no one obvious place to start in terms of a business problem space addressed by Big Data. A few are emerging, for example those associated with social media analytics (marketing and campaign management, product development and so on), but actually it is likely that in many cases the Big Data thought is something one goes armed with when the top down analysis and requirements gathering begins, rather than a precise piece part that fits a specific problem. For example, there is not the same defined link as exists between a single view of the customer type business problem and a master data solution.

It is that new art of the possible, and suspending judgement on what can be done that is the real benefit of the Big Data thought from a top-down business perspective.

Whilst there are a growing family of technology pieces in the Big Data solution story, you may not realise you have a Big Data business problem until you get there.