Jo: . . . our round table webinar, “Selling Your Big Data Initiative to Your C-Suite”. This
webinar is brought to you by our sponsor, Infochimps and GigaOM Research. My name’s Jo Maitland [SP]. I will be your moderator for today.
We’ll start off the presentation with a round of introductions from our panel. We’ll then hear a brief message from our sponsor. Then we’re going to dig into the meat of this topic.
We’ll have a quick look at the history of BI, some of the things that were promised when we first got into the Business Intelligence marketplace and why those things never came to bear.
We’ll talk about the rise of managed cloud services for big data and what’s happening there now, why these things are interesting trends. Then, we’re going to dig a little bit more into the fact that more big data cloud services are created equal.
What are the differences between some of the players in this space? What are the concerns that enterprises still have? Then we’re going to ask our esteemed panelists to sort of look in the crystal ball for a couple of moments and predict what they think is going to happen in this market over the next two to three years.
Now, as we go through the webinar, you guys will see the place to ask questions. Please ask questions all the way through as they come to you. I’ll try to take some as we go if they’re pertinent to where we are in the conversation or most likely we’ll save them up until the end.
You’ll also notice there’s going to be a couple of polls as we go through the webinar. These are interactive. They’re for the audience to take the polls and then we, on the analyst side, will review the results live and we’ll just kind of, sort of a gut check for an interesting stat for conversational purposes.
It’s not a huge sample but it’s nevertheless interesting data for the purpose of conversation so we really want to get your feedback on those questions.
Again, my name’s Jo Maitland. I’m the Research Director for Cloud and Big Data Infrastructure at GigaOM Research. Ron, would you like to say a couple of words?
Ron: Hi. I’m Ron Bodkin, founder and CEO of Think Big Analytics and a GigaOM Research
analyst. We started Think Big three years ago with an exclusive focus on helping enterprises get measurable value from the new way that big data technology Hadoop, no SQL and distributed machine learning.
We’ve been working with a number of leading enterprises in technology, financial services, retail and other industries and are partners with Infochimp. We’re very excited about the opportunities to really drive meaningful up-sided revenue, opportunities and competitive advantage for big data. We’re excited to share some of our experience in the discussion today.
Jo: Thank you. Dan?
Dan: Yes, hi, I’m Dan Olds with Gabriel Consulting Group. I’m the founder and principle
analyst. We do a lot of research into data centers, both enterprise and a fair amount into HPC, which has a very interesting relationship with big data in that those two worlds of scientific technical computing and the very largest big data are kind of coming together. It’s an interesting intersection.
Jo: Thanks, sir. And Jim?
Jim: Hi, I’m Jim Kaskade. I’m the CEO of Infochimps. I guess I’ve been a passionate
participant in the use of data since I started my career, ten years at Teradata and over the last decade in various cloud and analytic type of initiatives as the CEO of Infochimps.
I’m proud to participate in this panel and this discussion around big data since it is our focus as a company. Everybody who works for Infochimps is passionate about turning data into revenue.
Jo: Great. Thanks, Jim. Over to you for your message.
Jim: Great, great. For many people, when they hear the name “Infochimps”, they think of a
company that got its start as a data marketplace. In 2009 when the company was founded by Flip Kromer and his cofounders, the vision was to make data easily accessible.
Their way of implementing that was to amass over 15,000 data centers in a cloud service that required the company to source data in large volume as well as velocity. One of the early consumers of the Twitter Firehose, for example, store that data, curate it and then make it accessible through analytics and queries via an API.
The company Infochimps moved to offering their analytic platform that they use for this data market application, moved to making that available to customers in 2010 and moved away from being a data marketplace and being a cloud service provider with a cloud purpose built for big data analytics. As you see in this slide, our focus is around three different data analytic services.
Starting on the left, we have a cloud service called “Cloud Streams”. That is a real-time streamed processing cloud service. It allows us to not only connect to any data source asynchronously but to process that data source in memory with either transport directly to your application or other data stores or applying special purpose analytics on the fly as the data is in motion.
The second cloud service is what we’ve referred to as “Cloud Queries”. It’s a platform that allows our customers to query in an ad hoc or interactive way on very large amounts of data elements but typically not all of your data so that you can understand what’s trending or provide visibility into days or weeks or maybe months of data.
Then on the far right is the technology that we all love and that really kind of defines what big data is today, which is our “Cloud Hadoop” service. That allows us to provide our customers a batch analytics service.
I think when you look at our cloud offering, the Infochimps Cloud, you have to think about the fact that we have offered a suite of analytic services that cover the continuum from real-time all the way to batch.
Then lastly, what makes us very different as a cloud service provider is that we’re focused on Fortune 1000 companies who require that our cloud service be co-located with their data and that means for us that we are in a network of Tier 4 data centers where our tenants’ data also resides.
Our cloud service actually sits on a very highly reliable and very highly secure set of infrastructure next to our customers’ data. That’s represented as a virtual private cloud offering.
For some marquis accounts we actually sit in their data centers and we manage our analytic services with remote hands, in some cases, on-site through a private cloud and then for small proof of concepts we will operate in a public cloud setting as well.
That pretty much overviews the Infochimps cloud mix slide. I wanted to give the audience a kind of taste for a use case. This is one of our customers.
It’s a billion-dollar, publicly traded company that we launched a SaaS application on our cloud service. We were approached by this company with a team that was in the process of scoping out their own internal big data project.
As you can see, the slide’s titled here “Do It Yourself vs. Managed Big Data Cloud”. Obviously, we are the managed big data cloud but I want to preface one thing. We’re not advocates of displacing internal IT-led big data initiatives.
What we are about is augmenting and providing a hybrid strategy. So when I present this comparison, it’s not like it’s an either/or. This is really an option that augments initiatives that probably are underway for virtually every global 2000 or North America Fortune 1000 company.
With this particular customer, they were looking at deploying within their own data center, so internal data center allocation. I’m showing a three-year total cost of ownership.
So for you C-level executives on this webinar, you’re looking at time to market, total cost, flexibility, a lot of the variables in terms of what you’re considering when you embark on these big data programs.
What this particular customer was looking at over three years was an allocated cost of internal data center infrastructure connectivity for powered cooling, then also purchasing what we thought was an extremely high commodity hardware footprint.
It was actually a super micro rack of 29 one-use servers that was expensed over three years for a TCO of $300K and those servers were allocated across a stream processing technology and NoSQL and Hadoop software stack, which they were looking at various different vendors to source those technologies from. That total three-year cost of ownership, again, was a very low cost of $1 million.
I’m throwing these numbers up here because I brag about this company’s view of how cheap they could have deployed this infrastructure to support this application suite that they have no launched on our cloud service.
The other part in the cost component to what they analyzed was obviously network operations. That includes systems management and the NOp, which they had valued at about $550,000 in terms of expense over three years.
Then, a data science team plus others, so I’ll include architect, data engineering, data scientists, test Q&A and developers all in that for $1.8 million.
Then they had speculated that they could launch within 15 months. In month 16, they would have launched their application for a total cost of $4 million. To break even on that, they had speculated would be somewhat after the launch, a revenue generating opportunity of about $5 million per month for this application would be around month 19.
So what we presented as a company, Infochimps, obviously with our cloud services, we launched in a Tier 4 data center with managed service with all three of these cloud services for this customer and instead of $4.1 million TCO, we’re at a little less than $2.5 million in total cost and instead of 15 months in deployment, we deployed in 4 months.
The value to this particular company, the revenue they generated over three years, including the delays, was penciled out to be about $160 million when launching on our cloud service, versus $100 million being delayed and launching it through their own initiatives so that’s a strong $60 million of upside in revenue, a reduced cost of almost 50%.
I think the key message here that I want to provide the audience and the panelists is that for C-level folks, you really care about time to market. You obviously care about costs but time to market in terms of competitive advantage is probably your most important thing.
That’s our key value proposition of being a cloud service provider is clearly through this deployment method we can literally get our customers up and running in as little as 30 days.
That’s it for kind of an opening on Infochimps. Thanks, Jo, back to you.
Jo: Thanks, Jim. Yes, that’s some pretty compelling numbers there. I think even though
people do talk about agility and time to market, I think there is also a tremendous amount of interest, even though people might not want to say it so publicly, around how they change the way they’ve currently been paying for databases, generally data warehouses.
They want to see alternatives to the traditional way that they have been locked in in terms of licensing and maintenance fees and there’s huge hunger for alternatives as well.
I agree with you. I think the agility thing is super exciting and the fact that you can actually make money faster if you get to market faster but I think people are hungry to hear alternative business models too. I think you have an interesting story there.
I want to take us back a little bit though and sort of where we’ve come from in this market and why maybe there’s a lot of promises being thrown around out there, why we think that actually we might be able to do things better this time around, if at all.
Ron, I kind of punt this one to you first. How companies kind of traditionally perform business intelligence and why hasn’t this worked? We’ve had sort of 20+ years, 20-30 years at this now. Why hasn’t it worked?
Ron: Sure, Jo. Well, definitely what we see is the classic model of business intelligence is
that you anticipate all the needs that you have of information up front and you organize databases to give you very fast, easy access to that information.
The reality is the cost and complexity of integrating the data and feeding it through and getting it out in front of the business means that the classic BI model has been fairly limited in the range of analysis and questions that can be answered. The more established, the more slow moving the space, the more successful it’s been.
So drill down into what all the fine cost structures for accounting has been an early success for business intelligence across organizations. The more you try to have an ability to use new data sets and you try to analyze new problems in new ways, and you want to do more exploration and support a more nimble approach of how do we launch new product and how do we understand what’s going on with our product in the field and our customers, the more difficult the idea of, first, let’s get all the information together, organize it all and tell us up front what analysis we’re going to do has not been effective. It’s too costly. It takes too long.
A big shift is we see big data as a way of embracing the notion of agility, being able to, instead of anticipating all the questions up front, let people explore the data, get some insights from that exploration.
Then once they’re getting results, once they’ve done some tests and found something useful out of the data and insight, then invest more in standardizing, regularizing and ultimately refining.
It represents working with the business, identify what’s possible and then incrementally over time having more and more structure to support more and more standardized access to the data.
It’s a big shift using big data technologies from traditional business intelligence and one that lets you have a lot more responsive, nimble way of looking at information.
Jo: Right. Dan, Hadoop has definitely lit this market on fire over the last sort of three to five
years and NoSQL in general because of the sort of [inaudible 16:23] approach that Ron was just talking about. Do you think that it can address this issue of knocking down some of the technology siloes?
We’ve heard people say that maybe Hadoop is going to be like this sort of data bit bucket or data lake or data kind of dumping ground. I’ve heard all these different expressions where it can be this almost sort of central repository.
Do you think that’s really true, that there is a potential here to break down some of the silos between the traditional platform?
Dan: I do. I think that kind of getting back to what Ron was saying, one of the big problems
that companies have had in doing truly an analytics-led business and having full BI to exploit the potential that they see there is that they’ve had all these different silos like you’re talking about.
They’ve tried to do it top down, anticipating every potential need and let’s do one, huge, massive multiyear project and make it work. Typically they start running into huge problems in data formats, different systems, different cultural problems with fiefdoms and things like that. It tends to die under its own weight.
Where Hadoop is different is it's kind of coming into the enterprise along the lines of the same way that Linux has come into the data center, kind of a little bit of a tortured analogy but it’s coming in as more of a targeted and granular use, kind of a bottom-up.
Because of that, I do think it has the potential to change the game because you can use it for just the masses of unstructured data and it doesn’t require a huge month-long or years-long project in order to get it in. It’s a try and buy. I think that’s going to prove to be very helpful.
Jo: Jim, why these projects in the past, why have they taken so long? Is it because the
technology’s been so hard to implement? Why traditionally has this been such a slow and arduous process?
Jim: I think when you’re talking about the infrastructure that’s required to run your business,
you want to make sure you make the right choices. When you embark on, let’s say, an enterprise data warehouse project, in the 10 years I worked at Teradata, it was a make or break company decision, huge investment.
I think part of the delay or the time it takes has to do with the fact that people want to make the right choices so there’s a lot that goes into the analysis, the testing, the trials.
Then, I think it has a lot to do with the technology. Let’s face it, relational database, as powerful as it is, does require a logical, physical data modeling exercise and then a huge amount of ETL and you’re months into the project before you issue your first SQL statement, SQL query.
There’s just traditionally, because of the infrastructure itself, a kind of locked in time period that’s fairly lengthy. Then lastly is the political friction within the organization that people tend to underestimate that tends to be the largest and most powerful delay maker.
As any C-level in this audience of this webinar will agree with, if you ask a question, you pose that to your organization and it involves including a new data element in your data infrastructure, you could be lucky to get a report that incorporates that new data element six months from the time you’ve asked the question.
Just purely because of who you have to go to and who owns what piece and who controls the DBA of Teradata versus the head of BI for Cognos and the head of ETL with Informatica and all those changes need to be made, boom, six months later that report comes out six months after you needed the answer.
Jo: Right. I would say thought that the early rise of the SaaS marketplace people kind of,
shadow IT and people going around the IT organization and using SaaS services didn’t exactly endear those people to the IT department. I wonder how that might be changing a little bit now.
Ron, what can be done about some of the political challenges of you have a certain amount of infrastructure internally, there’s obviously all kinds of options now with different kinds of SaaS and managed services. How do people, in your experience, manage that politically, that shift?
Ron: Absolutely, Jo. Certainly our company is a services company. We help with roadmaps
and implementation on these big data systems, the engineering and data science.
The art is really getting people aligned, that fundamentally it’s around getting the business excited about creating value, how can it move the top line, drive competitive advantage, then on the IT side, how do they support that.
What you often find is you’ve got both parties very interested in creating value but often talking a different language.
You’ll see it’s important that the business and IT teams are working together to innovate. So often you’ll see one or the other complaining about the other organization where having a dialogue, having a way of using services and capabilities to get quick value can change a lot.
We think that having services, having some of these new big data standards coming in that let you go faster with shorter cycles of iteration make a big difference because the best way to drive that communication and the best way to drive a partnership is to get results quickly and then to iterate on that success.
It’s having some trust in kind of setting up an environment where you can have quick wins that you can build on.
Dan: That means starting one application at a time, ideally, right, Ron?
Ron: Yes, absolutely. We want to start with a success and then incrementally adopt. Go
from that application to doing a few more, building on the success of that, cross-pollinate the skills from the team both on the business side, engineering/technical side, the analytic/science side, taking those successes and building them out.
What you see is when organizations achieve that, then the next step is to get a lot of excitement. In fact, often you have to reel it in a little bit. There’s so much excitement in the organization on their first success that you still have to pick intelligently or what are the next bets that you want to make to get the most value and keep the momentum up.
Dan: Sorry, this is Dan. It’s grab that low-hanging fruit and take the easy wins and use that to
Ron: That’s right, absolutely.
Jo: Yes, people want to see proof, right? We’ve been through many generations of
technology now and people have been burned in the past and stuff so it’s kind of like, “Let’s do a pilot and really show me.”
I launched a poll just while we’ve been chatting here, guys, on the biggest issue with traditional BI tools, which we’ve been talking about here and the results are coming in.
Dan: Yes, I think when you say “BI tools”, everybody jumps to Microstrategies, Cognos,
Business Objects, the new guys, Tableau and maybe even the SaaS players as well, good data, etc. I think when you think of business intelligence you’ve got to think of the full stack end-to-end, really.
I think that’s where a lot of the technologies that we given birth to here in Silicon Valley, from the Facebooks to Twitter, etc., are really disrupting that entire stack and enabling even the legacy tools to an extent that should be disruptive to the internal organization, I mean, legacy presentation tools, like Cognos.
We see customers who still have a huge footprint of a Microstrategies, Cognos or what have you but by providing more intelligent data analyses that feed into those systems, we can create a huge amount of change within the organization.
Jo: We’ve got the results in now. Ron, check out the results here, 43% of the audience
saying that it takes too long to get valuable insight out of the traditional BI tools. Any thought on that? Kind of what we’ve been saying.
Jim: Time to market and to Ron’s point, pick one application, show value and build on that. I
think there’s so many boil the ocean projects out there and it’s very important, I think, to provide your organization new toolsets to allow them to experiment and kick the tires with but you also have to give the organization direction.
Pick a use case with the most amount of impact and the least amount of political friction and show a win and build on it.
Ron: Jim, to that point, what we often see is it’s so important to have two things. One is to
have the business priority of what you’re trying to achieve with sponsorship, how it aligns with goals and something that’s a new opportunity, where here’s data we’ve never been able to use.
Here’s a combination of data sets we never could combine. Putting those together, you’ve gotten new opportunities in data and a business case that really aligns with strategic objectives. We really see that as the recipe for the low-hanging fruit that gets the results.
Jim: Yes, we were meeting with a very large pharmaceutical and we had all the major
stakeholders, CMO, CIO, head of apps, etc., in the room for two hours. In the first hour, all we did was talk about use cases.
Matter of fact, when I opened, I said, “For the first hour we’re not going to talk about anything about big data or infrastructure,” and they all looked at me funny.
After the first hour, we picked one out of five use cases, we knew what the conversation was going to be about. Talking about information and data and big data technology was easy after that.
Jo: Yes, so I think that’s sort of a clear picture of the challenges and where people feel
they are in terms of their existing infrastructure. All right, so we know that there’s all this great Hadoop technology out there.
There’s 20 to 30 different flavors of NoSQL. There’s all kinds of brand new BI tools. Super, super awesome looking desktop tools now that you can use. Dan, why can’t I actually just, you know, I have an IT organization, I have DBAs, I have awesome people internally, why can’t I just build this new, big data infrastructure myself?
Dan: If you didn’t have to do everything else to keep the lights on and keep all the
systems running from a data center perspective, you probably could. If you had a long runway to where you had to prove ROI, sure, you could if you had the time. But, like we said several times so far, time-to-solution is very important.
Capturing that market opportunity is very important and most data centers [inaudible 28:28], time and again that’s where a data center bit has some slack capacity in terms of personnel and systems.
The vast majority of data centers don’t have any slack capacity. They are just working to keep running on the treadmill to keep everything going. That’s the real difficulty in trying to do yourself, at least in my view.
Ron: And just to build on that, definitely we see that the skill set issue is a really important
one. A lot of our customers underestimate the magnitude of the shift. They think they can take skills from working with traditional relational databases, kind of mature packaged application models for how to develop these applications, how to do the analytics.
The reality is, as we know, there’s a lot of value but that these technologies are new, they’re less mature, they require more than a little bit of learning from the teams in place to ramp up, that there’s a deep need for retooling and tremendous, it’s not just the time to market issue, but it’s also a risk of failure issue if organizations try to do it themselves.
We think using the best technology, getting the best services that provide as much out of the box is really important and so is getting the right kind of help of experts that can work with the team to succeed. Of course, that’s why we created 'Think Big' to provide that.
Dan: I would also add real quick, I would also add that these new technologies that do map
reduce, they’re not a panacea. It’s not going to be your only solution. We’ve mentioned this before too.
You’re still going to be using a lot of the existing relational databases and other tools that you’re still using and you’re going to need the employees that are experts in that to remain experts in that and bring in some help in integrating the new stuff.
Jim: I think that’s the big question for C-levels out there is, “Do I need to completely top
mine my organization? Do I need to replace? Do I need to hire these new crazy smart big data experts and try to pilfer people from Google, Yahoo!, Facebook, etc.?” I think that’s a tough question to answer.
My answer for it is add part expert like Think Big and definitely leverage your internal expertise and legacy infrastructure and attempt to train. Then, yes, to the extent that you can find this talent that’s so hard to find, I think what we’ll have to do, the reality is we’re going to have to build a gap by using the experts and the toolsets from companies that make it easier.
Our whole focus at Infochimps is really to take all the complexity of three types of analytic services and patch them in a way that can accelerate your existing staff’s ability to answer your internal business [inaudible 31:33].
We still believe that you can’t do it just with the people you have but with some help from us and help from folks like Think Big, you can guarantee success and not be in that category of half of the projects that fail or 100% of the projects that went budget and over schedule.
Jo: Ron, what would you say when you’re out there kind of advising customers? There’s
obviously some optimum use cases for big data in the cloud versus kind of building on prem. What would you say the sort of ideal use cases for a cloud-based big data service?
Ron: Sure, I think a couple of the characteristics that are the most favorable to working
with cloud are, one, data residency. There’s a center of gravity to large data sets. If you have a large data set already in the cloud, that’s very favorable for doing big data in the cloud, whether it be in public cloud or in a co-location facility like Jim talked about for more of a private cloud. That’s one.
A second is variable processing needs. So if you have needs to provide a ramped up capacity quickly, that’s an area where you see a lot of value in cloud. An example of that is we have a number of customers of ours in the financial industry, the technology industry that have large data sets.
they’ve traditionally had a good business around providing a core capability, sometimes providing data to their customers but they’re customers are saying, “We want to be able to get analytical capabilities from you.”
Being able to provide analytics as a service to your customers, to add value, whether it be through bench-marking, whether it be through simply customers bring some of their own data and combine it with a high value big data set that’s organized to make analysis easy is a great example of a cloud use case.
You don’t want to set up a large infrastructure on premise and hope that you’ve built enough and reinvest it based on business plans. The elasticity of cloud is really compelling in those cases.
The case where we see the biggest resistance to cloud is, of course, around privacy and security, that organizations feel like they don’t have control of data. So pure public cloud, that’s the biggest impediment we see but having hybrid solutions where you can have some of the best of both could be very promising.
Jim: Yes, I think, Jo, if you advance to your next slide on the rise of managed cloud
services, I think when you define cloud in the way that we do at Infochimps, which is your trusted data center infrastructure, it’s not public cloud, it’s not multitenant, you’re not in a noisy neighborhood.
Also, you don’t have to worry about the petabytes being secret added into that cloud service. All of those things go away and it looks and smells just like your own data center infrastructure.
Then I think you can focus on taking advantage of all the aspects of what cloud means, elastic, you can expand quickly, you can experiment fast, you can pay as you go, and at the end of the day, you can find out quickly if your thesis around a business use case is accurate.
Then it can guide your internal initiatives within the enterprise and give you some further direction for your own IT groups that are spinning up their own big data platforms internally.
Jo: So we're starting a little to touch on the fact that not all cloud services are created equal.
Ron, there's a ton of options out there, everything from Amazon Elastic MapReduce, which is sort of raw Hadoop as a service, to then other players like Snap Logic and Oversight to Saas players focused purely on helping people.
The new term is "get insight to the service". Then Infochimps is kind of wrapping up a lot of this complex open-source software in these different infrastructure layers of big data around your streaming data, NoSQL and Hadoop.
In your mind, how is that going to sort of manifest itself on the enterprise side in terms of people buying here, like what sorts of enterprises are going to be buying Amazon's EMR versus something from Oversight versus a service from Infochimps? How do you see this sort of space evolving here? What are the categories and how do you see it shaking out?
Jim: Yes, that’s a good question. Are you asking me, Jo? Or are you asking the other
Jo: Actually, I was going to ask Ron first of all because he’s out there talking to a lot of
customers so I’m curious to get his input first.
Ron: Let me comment on the different kinds of cloud offerings that you articulated, Jo. I do
say that in general what we see is that the space is one of great innovation and so cloud offerings that are open, that allow customers to take advantage of great new capabilities and work with standard APIs have a lot of promise for big data.
You look at Infochimps, where it's integrating a number of established standards and emerging standards, that has a very different value proposition than some of the proprietary cloud offerings that try to provide data science magic in a box.
"Upload your data and magic answers come out here", which is a great story but isn't something that we believe is really being delivered.
I think there's a big difference between the different layers, between people that are trying to provide some kind of a software as a service for big data, where what we've seen is that people that are trying to build the package applications in the cloud are typically working on a very narrow focus with very constrained data problems that don't give nearly the same leverage as something that lets you build, assemble more of a custom application from best class components.
We think that there's a big difference between that and platform as a service and infrastructure as a service offering.
We see Infochimps and Amazon as both offering platform as a service offering that are high-value that can allow you to build these kinds of analytic applications that really create differentiated business value.
But we think that the prepackaged analytics and prepackaged applications are generally fairly point capabilities that aren't going to create nearly the things that strategic value and we haven't seen a lot of adoption of the latter, of the point solutions.
We see a lot more interest in the enterprise in platforms that can support a range of big data capabilities because of the importance of that in the architectures.
Dan: It’s like building an appliance in the cloud and I don’t see an awful lot of adoption of
some of the big, dedicated data analysis appliances, some of the large ones. There's some adoption but not as much as the vendors anticipate because there's a lock in there and the proprietary nature of them, in a lot of cases, forecloses customer options down the road.
Jim: Yes, I think when you, to just echo what's been said, I think when you provide a
packaged application that underneath is taking advantage of an integrated big data stack, that's great for a longtail customer, for somebody who's small, who doesn't have the desire to build their own applications nor the ability.
It just depends on who you're talking to, but if you are talking to Fortune 1000, clearly they have the expertise on what their applications need to be and a lot of application development, resources internally, what they need is access to the standards and big data technologies in such a way that it's easy for them to consume those and then help them build their applications.
I think when you think of not all cloud services are created equal, the first thing that our customers talk to us about at Infochimps is are you an Amazon-like service because we can't be in the public cloud. The public cloud label, if you sat down over dinner at Werner Vogels, you would hear a lot of the same things that he's doing that any CIO is doing for their infrastructure in terms of security.
But the perception is, "It's out of my control, it goes down, even though my stuff goes down too." It just gets this label of "I will not touch it if I'm a Fortune 1000 company." That's why we, Infochimps, have moved into the same data infrastructure with the same security risk assessment checklist and taken that issue off the table.
The other 'not created equal', some big data cloud services other than vertically integrated are those that are just purely infrastructures of service. That's just I'm going to give you Hadoop as a service, you can log in, you can do things like EMR, but that's at the very basic level.
I think the reasons that we believe that application developers need more than just Hadoop, we've broadened the number of big data services with a number of big data technologies.
There are some people who do just ETL as a service and are thought of as a big data service. Some people who are just doing analytics who are thought to be big data as a service.
There's a lot of flavors of what big data can be and I think at the end of the day, you need a pretty broad toolbox. You need those tools and that toolbox to be founded on the 'best in class' with the best standards and then you need to make those tools easy to consume and use by your internal organizations.
That's our mantra and I think that's what Think Big is doing for their clients. Those clients that embrace that philosophy are the ones that are going to be successful.
Ron: Jim, that's a great point. A lot of times people think that Hadoop is everything in big
data and while it's a really important standard and capability, the other capabilities, NoSQL, streaming real-time data, integration with external logs, unstructured data, as well as relational systems is also really important.
Big data is about having the right architecture to support your use case and it does tend to mean integrating multiple standards to create value.
Jo: We've touched on this a little bit, guys, but I want to give us some more time here to talk
about the security issues because I think, Ron, you mentioned it, Dan, you mentioned it and I think, Jim, you guys have quite a specific story here to tell.
What are the options when you have decided that you want to sort of get to market faster, you like some of the art of the TTO story? What's the security situation here? What's going to happen in terms of protecting my intellectual property and data when I'm in a managed cloud service?
Jim: Is that to me, Jim?
Jo: Yes, Jim.
Jim: Yes, so I think we take security very seriously. I think when we, to be frank,
Infochimps was born in Amazon so as the company, we cut our teeth on the Amazon Web services. But when we decided we wanted to address larger issues with bigger customers we had to be very thoughtful in terms of which data center providers that we would pick.
I remember the first day we walked into the first data center that we launched into and you've got the electronic eye that's talking to you and you get the retinal scan and you've got the armed guard and all of those at a physical data center security measures being taken.
Then on top of that, it's making sure that you are still addressing the same requirements of identity access management, SSO, active directory, all the security pieces that need to be put into place so that when you are launching your cloud service and your supporting analyses of your customers data and that data is highly sensitive, you haven't unchecked any of the boxes in that risk assessment list.
From our perspective, we feel like we are as secure as our customers need. As a matter of fact we wouldn't be deploying with large Fortune 100 companies, large Wall Street banks, large telecom communication companies, etc., if we haven't provided the level of security that's required.
I think that's what it is, basically night and day. Public cloud is perceived to not be secure and private cloud and virtual private cloud can be made to be as secure or more secure than what you have.
Jo: Right. Sort of touching on the point about what are some of the key questions to ask
your potential managed services provider, one would be can you mirror my internal security and governance and policies, I guess?
Jim: Yes, I think the first question would be are you running on top of Amazon? Are you
just this layer on AWS? Ifyou are, then most of these large customers are going to say, "Thank you, I've got to do something else." That would be question number one. Dan, you were going to say something?
Dan: No, I totally agree with that though. There are a lot of, I mean it's the wild West out
there. Everybody's making claims and you really need to, as a customer, drill down into the specifics of exactly what are you doing in terms of security SLA's, performance and availability.
Jim: Absolutely. None of our customers are going to accept 99.95.
Jim: It’s just not going to work.
Dan: I would say that even, there are the early adopter enterprises that are starting to do more
in public cloud but what we see is a lot of them too really value the ability to have portability, where it's not an uncommon scenario that enterprise might start developing a proof of concept in the public cloud or might intend to do that and then there's further governance discussions around where the application ought to run.
Dan: Building something that is portable in public love is a much better bet than building
something that locked into proprietary APIs and the public cloud.
Jo: Yes, Ron, you also touched on this a little bit earlier about sort of optimum use case
being data that already resides in the cloud. There’s obviously companies too, that want to move some of their existing apps or they want to actually move some data and do processing in the cloud to just be able to do it way faster than they could if they had to start all infrastructure up internally.
Is there a advice there about actually getting the data into the cloud in the first place? What are your options there, really?
Ron: Sure. Well, certainly a lot of our customers inThink Big have distributed data centers
that capture data. In some sense, many of the big data applications, whether it‘s sensors on devices or it’s web and mobile logs, advertising logs, will be captured in many places anyway.
To some extent, there is always a question of moving data into a place where you can centralize it for analysis. Some of what is distinguishing there is do you have an environment where it’s easy to have high-bandwidth, fast data movement.
That becomes a factor as you got data in Amazon’s data centers and moving data among those tends to be easy.
Likewise, if you have other hosting providers, they tend to have high bandwidth, low cost connectivity between data centers to make it easier to back-haul the data. So that’s one factor.
There are, of course, a number of software technologies such as Kafka that’s integrated in to Infochimps for allowing you to feed unstructured logs into a common place to make it easier to deal with some of the technical issues.
You’ve got both networking concerns in terms of what’s the cost and latency and bandwidth, as well as software architecture issues in terms of how do you integrate different data sets from diverse places into a common location in the cloud.
Jo: Right. Jim, I think you mentioned too, like if I was JP Morgan Chase and I’m in lower
Manhattan, that you could actually bring your service to my data center potentially too.
Jim: Yes, we are working with very large customers who require us to operate our managed
cloud service within their data centers. We have projects currently that force us to, well, I say “force us”, move us into London and Tokyo as well as North America data centers for clients.
I think those are customers where there are no options beyond their own four firewalls, so to speak, and I think we're happy to do that for applications that have great impact and obviously important economics that can justify the cost associated with providing that level of managed service.
It's easier for us as a company to operate within the environment of, say, a switch or am Intermap or a Telex, [Savas], Teramark or like, because when we service one customer, we have the opportunity to service multiple customers.
But, I think we are very sensitive to the need in terms of security and availability and we will address those by going into your own data center.
I think some of the security goes beyond into availability and so there's a lot of discussions around fault tolerance with the level of customers that we're working with.
We as a company have embraced use of standards but also in applying some of our own know how and our own ability to kind of raise the bar in providing complete no single point points of failure and 100% fault tolerance and guaranteed delivery of data and analysis and insights. Those are some key things as well.
Jo: Dan, Ron, I want to get you guys to sort of look in the crystal ball for a couple of
minutes here and then we have a long list of questions too that I want to try to get in from the audience, at least a couple of those.
Dan, to you first, what do you expect to happen in terms of adoption for big data services in the cloud, particularly in the next sort of 2 to 3 years. How do you see this panning out?
Dan: I think more and more companies are going to dip their toes in the water there. I
believe that you're going to see more big companies do it. That's sort of the question out there. Is it for big companies only? It's not for big companies only. In fact, a lot of small companies begin their big data experience with the cloud because they can't afford to do it any other way.
Big companies will do it as the cloud experience and the offerings mature and they start to get like we talked about so much on this call, those service capabilities that enterprises have to have, starting with security, going on through availability and into performance.
With those guarantees in place, big companies are going to be more trusting and more willing to do this. What's really going to drive them through it that fast ROI. That's going to make an awful lot of decisions.
Jo: Ron, how about sort of trends from your perspective? Is there like a killer use case that
you can see cropping up here that’s going to trigger, sort of, big adoption on the enterprise side? How do you see the sort of adoption curve here and what do you predict in terms of the trends?
Ron: That's a great question, Jo. Certainly, I think that we see broad adoption across
industries in use cases but one megatrend that we think is really the elephant in the room, if you will, is the generation of machine data.
When you think about it, increasingly companies are shipping products that have software on them and sensors, so all the logs, all the configuration from the software, all the sensors that are going out, whether it be industrial applications for equipment, machines, factories, smart software, consumer devices, energy, smart grid, we see that the machines, the use of machine data is really a critical area where companies are competing.
How can you provide more value to your customers with analytics? How can you build smart products? How can you use direct data about what's happening with your products out in the field to make them better, to gain competitive advantage?
We think that above everything else, that's the trend that's really going to drive the most value in the most adoption across the economy in big data.
Jo: And then, Jim, from the fellow of enterprise IT buying side, the CIOs and CFOs, what’s
going to sort of differentiate how you, who are likely going to be the likely winners and losers here? Yes, you weigh in there really.
Jim: Yes, I think the winners are going to be those who embrace a hybrid cloud strategy. I
think everybody in the audience, everybody in the panel knows that there are the thought leaders who are embracing a hybrid cloud adoption, not just internal private clouds, but also the use of external cloud vendors.
We're big advocates of that. We don't want to displace the internal projects that are occurring within big data but we want to augment those with a hybrid cloud strategy that supports the organization's ability to show value quickly and to further fuel the initiatives going on internally.
I think making your big data infrastructure elastic and truly cloud-like is not easy, so there's going to be a lot of work there. My crystal ball is that's going to get better and it's not just going to mean that there's going to be Hadoop as a service internally and a multi-tenant, easy to expand and consume, but also means there's going to be NoSQL and stream processing real-time types of analytics services that enterprises will be able to deploy for their intelligent applications.
We're there to help you learn how that works now, today. We can get you up and running in 30 days on any size, any volume, velocity or variety of data set.
Jo: Great. SI want to grab a couple of questions here from the audience. There’s one on
security, Jim, specifically to you.
What security measures does Infochimps provide to customers for multi-tenancy as Hadoop currently does not come with encryption?
Jim: There's a couple of questions there and a couple of comments, encryption and multi-
tenancy, so let me kind of address those separately. I think any platform has its various abilities to support multi-tenancy.
I think we all know that when you have a Hadoop cluster and somebody goes in and issues a really poor formed map reduce job, you can bring the whole cluster to its knees and every other tenant who's trying to shoot map reduce jobs are going to be screaming and pounding their fists.
So one of the things that we've done as a company is we've provided a very simple way to create on-demand Hadoop clusters in our cloud that can operate against the exact same Hadoop file system or data sets and their tuned specifically to the specifics of the interests and needs of each of those people who are requesting those clusters.
We've kind of created this pseudo way of making multi-tenancy work really well for our customers without having the pure kind of version of what people think is multi-tenancy.
Then in terms of security measures, every customer that's the deployed on our platform, we share a big data security practices guide that goes through our employee lifecycle monitoring, information, communication practices, physical security, environmental safeguards, configuration management policies, business continuity management, our backup policies, etc., etc.
It is quite thorough and even though we thought we've gone through what we think are the best practices, some of our bigger customers add to our lists, so I think one of the key messages to our audience is we will do what's necessary to address your security measures required as you consume our cloud services.
Jo: Great. Listen, there’s a lot more questions here. Jim, we’re going to send this list of
questions over from the audience to you. So folks listening, you will get all these questions answered.
To the panel, Ron and Dan, thank you very much for joining us. This has been a great conversation. Cheers, guys.
Jim: Cheers, everyone. Thank you.
Ron: Thank you.