More BIG Data

Not Just a Buzz Word for CIOs

Doug Harr

Big Data 2


What do CIOs do with Big/Machine Data?

In 2010, most of us were deleting machine log data from our systems as soon as it was clear that processes had survived the night – very frequently this data was being tossed in the trash daily. Now a short four years later, we’ve all learned that there is information in that data, and that by saving it and using search and analytics to mine it, an amazing number of things are possible.



As CIO at Splunk (a rapidly growing company that makes a platform aiming to make machine data available, usable and valuable for everyone) the first example I saw of the use of the the solution within company itself was related to their go-to-market model. Splunk had and has a “free-mium” model where customer and prospects can download Splunk software to their PC/Mac or host, then run machine data into it to search or analyze the data. We were “splunking” those downloads – for example taking the Apache web log from the Splunk web site, contact feeds from our CRM system, Salesforce, for a lookup table, and communications back to our site which come back from Splunk itself once up and running. With just these three types of machine data records, one being a “lookup” table to enrich the data, we were able to produce an amazing array of analytics and reporting used by IT, product management, marketing, and the others in measuring the download experience, uptime, and capacity, but also the actual sales pipeline, and understanding the company’s prospects.

Downloaded Experiences – Visualized

Downloads Experience


Since IT was responsible for making sure that the free Splunk software download function was operating properly, we were interested in the download experience – things like average minutes per download, and how that differed by platform.




We also liked seeing activity via geo-mapping, and other dashboard visualizations, as shown below:

Downloads by CRM Region

CRM Map 2







Real-time Data – Driving Business Excellence

Over the years the use of Splunk internally was expanded to address needs for both IT and business constituents providing customer insight, protecting against intrusion and malware, enhancing operations effectiveness, and other uses, falling into these categories:

  • Monitor and manage infrastructure – capacity, uptime, project delivery
  • Deliver application management – health of business apps, usage statistics, even some missing reporting
  • Provide analytics on security posture – identify and eradicate malware, APT’s (advanced persistent threats), and other threats
  • Provide business analytics – most of these derived by departments – people in sales, marketing, and engineering analyzing business trends, product delivery, customer support and more
  • Internet of Things – we even “splunked” our headquarters building to review temperature and C02 levels

These examples roughly match the broad spectrum of what can be done when ingesting and analyzing machine data in real time. Stay tuned for more examples in posts to come. Now with StrataFusion, I will be consulting and teaching more on these topics!



Big Data

Not Just a Buzz Word for CIOs

Doug Harr

Big Data box image

Four wonderful years at Splunk as CIO. Splunk? Splunk is a rapidly growing company that makes a platform aiming to make machine data available, usable and valuable for everyone. While there, I built the IT and Real Estate/Facilities teams and solidified an “all cloud” business applications portfolio. This advanced my knowledge of all things cloud, this time including the appropriate use of Amazon’s EC2 (Amazon Elastic Compute Cloud*) for compute and storage needs. At Splunk everything but Engineering applications were delivered via cloud subscriptions, and half of the compute and storage needed for Engineering, from EC2. More on that in future posts.

Harness Opportunity

The most impactful thing I learned at Splunk is the tremendous opportunity CIO’s have to harness what the market is calling “Big Data” and which Splunk refers to as “machine data.” In this context, “machine data” can be thought of as system logs, sensor readings, results of polling and measuring machine behavior. Every computer system, storage, device, web, app, and database spews forth machine data – much of it delivered via a constant, real-time stream from the machine – and almost all of it in text format. The original application of Splunk was for data center management. What was built worked equally as well for application management, security, business and web analytics, and more recently, to monitor and analyze devices connected as “the internet of things.” Results come from searching through the data and formulating analytics from its content – ranging from things like “are the machines up? Are there signs of imminent failure? Are there attempts to infiltrate and hack the system? … “Has Joe taken his heart monitor off?”   Uses are limited only by the imagination. What can you do with you data? Learn more in my next post.  Or, visit me at StrataFusion.

*Amazon Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the Amazon Web Services (AWS) cloud.