Roger: So, I'm Roger Magoulas from O'Reilly Media. We're going to talk about data as a service. I'm going to start out with just one bit, kind of, context setting I think will be helpful for the discussion.
When I think of the data space, data ecosystem, I think it's got three, kind of major parts. The first part is kind of the data acquisition and management. This is how you get the data. How you organize it and so forth.
There's another part which is kind of the insight. How you make some sense. How you work the data. How you use it back in your company. Tying that together. I think that is almost the important part is culture. Is your company's ability to understand data, to process it in it's correct way.
And I think we talk during this discussion, I'll probably bring up some culture issues because I think it's really important.
I've been around this data space long enough that I've heard people say something to the effect of, "You know what. Our company really needs to buy us a Hadoop." And I always think, that's going to a be a project that goes nowhere because they really don't know what is they want and they really should be putting it in different languages and in a different way.
So, having said that I am going to go through introductions and what I'm going to ask everyone to do because I think this is kind of helpful. Is not only talk about their affiliation and what they do but how they got where they did. What was their academic and work background to get there.
So, I'll start. I'm Roger Magoulas. I'm at O'Reilly Media. I've been there about ten years. I do quantitative and qualitative analytics to try to figure out technology adoption and technology adoption in kind of broad sense of python is a more popular language. Not this particular product is better.
I have computer science degree and an MBA. I started doing data warehousing in the 90s and then I started working with a PhD in math and started getting into machine learning and kind of switched over to the kind of data sciencey stuff and found it really interesting. So, that's my background. Drew.
Dhruv: Hi everybody. Thank you for coming. My name is Dhruv Bansal. I'm one of the co-founders and current the Chief Science Officer at an Austin-based startup called Infochimps. We make big data platform that runs on the cloud, makes it really easy for folks to get started with big data.
My own background may be a little bit similar to yours, Roger. I started out as PhD student in the physics department. My background's in math and physics at the University of Texas. I met my co-founder, Flip, there and so we're very much, like, engineering, kind of driving, hard core nerdy company. But I think that we've realized is that not everybody finds data as easy to play with as maybe we do. And so there's been a great opportunity for us to take our expertise and kind of bundle it up into something that makes it easy for other folks to get it started.
Roger: Okay, Kerem.
Kerem: Kerem Tomak . I'm the Head of the Marketing Analytics for Macy.com. I hope you have an hour to go through my background. But, the where I came to where I am is a long kind of path. I am from academia. My undergrad is math economics and a [inaudible 00:03:19] metrics masters, economics PhD. Decided economics and math is too abstract for me.
And then switched into information systems, played around with mission earning, computer science and maybe I said, okay, well I did the PhD for five years, I put it into use and be a professor. And, after seven years in dabbling in academia and again getting disillusioned that I'm doing something that's not related to real life, I jumped ship, joined Yahoo! and did their search, Yahoo! Search marketing and their Yahoo! Search marketing group did a lot of pricing, price optimization for a cured sponsored search.
Then, did a bit of consulting for Wal-mart stores for their price optimization, price optimization algorithms, marking merchandising divisions and then now I'm at Macys.com building their big data analytics capabilities, taking them to the next stage.
Ron: Great. I'm Ron Bodkin, founder and CEO of Think Big Analytics. We're a professional services firm with a mission of helping the enterprise create value out of their data. We're purpose built to work with big data and to really enable companies to succeed and how did I get here? How did we come to start Think Big?
Well, my background, I did math and computer science undergraduate at McGill. Masters at MIT in the clinical decision making group. Then went on to do enterprise consulting around data warehousing for companies like AT&T wireless, Microsoft. I went back to MIT in the PhD program where I spent one year until I led a team into the finals of the entrepreneurship competition and decided that this internet thing was pretty exciting.
So, I joined an incubator to be a finding CTO of a company called C-bridge that grew to about 900 people, an IPO in the December 99 with a mission of helping the enterprise using then new technology of the Internet. And how do you get business value out of that. And I see an amazing number of parallels for what we were doing back then and what's available now with big data, but the good news is that big data is under hyped compared to internet. Right?
Because everyone can see a consumer application, understand, "Hey, if I go to Amazon.com I can buy something." That can really change things whereas there's press stories about big data and people say this is interesting but I don't know what it means. So, they kind of put it down. So there's a lot of awareness that something's happening but the big trend is really now we've got all this data, we can create value out of it.
So, following on from that, I ended up working at a company called Quantcast as the VP of Engineering. Quantcast is a pioneer in big data. We sponsor here. Quantcast continues to do super well. We were among the first companies to put Hadoop in production back in 2006. We built our own NoSQL technology right around 2007 when there was none available on the market.
We did advanced data science and machine learning and out of the experiences leading the teams that did those things, I felt there was a tremendous opportunity to bring these approaches to the enterprise, to more established companies.
So, at Think Big, we're very value focused. We work with companies to integrate best of breed products, platforms, services and technologies to create differentiated value. And we are working with some fantastic customers. Some of the biggest financial retail and technology firms to build world class solutions and to really help them succeed.
Derek: So, I'm solo classed. I'm Derrick Harris. I'm a Senior Writer at GigaOM Technology Media company. We have a blog, conferences, research network, kind of the whole media package. So, I am a journalist and a lawyer by training and I got into. So, my introduction to technology was kind of a trial by fire. I was in journalism school covering super computing and high performance computing and that has since transitioned. Now, I cover the big data space day in and day out. So, I am not . . . I am slowly teaching myself Coursera and Udacity and [edX], right? I'm honing my stat skills and my programmer skills slowly by surely.
Roger: So this is by design. It's really a good segue way into talking about data as a service. Because I think a lot of this started with a lot of PhD's trying to make sense of big data and I think if you haven't read the Unreasonable Effectiveness of Data by Halevy and Norvig who are both Google guys. You really should. And the basic premise there is when you have a lot of data, your algorithms don't matter as much as you have a lot of data. That data kind of trumps algorithms in a lot of scenarios and nothing creates data like the web. And, all the sensors we have now like our phones and all that.
And I think one of the changes that's going on and one of the democratization issues is that you don't need to be a PhD anymore and I think that's why data as a service is becoming a way to start bringing it down and you don't need to win a Kaggle competition and hope everyone knows what Kaggle is the data science competitions to actually get effective use out of your data. So, I think that you should feel bad maybe about being a lawyer. But . . .
Derek: I'm not licensed. I never took the bar exam. So, maybe I shouldn't say.
Roger: And I have an MBA.
Derek: So you can make all the lawyer jokes you want.
Roger: So, I have the same kind of embarrassing academic thing.
Ron: Roger, I'd point out though while it is certainly true that you don't need the sophistication on say, machine learning techniques that in fact the discipline and the process of analyzing the questions, framing your metrics clearly and knowing how to let data speak is as essential as ever.
Roger: Well, this is a great. Because we talk about this all of the time. There are some people in the data community say you don't need to know anything about the subject. You just give someone some data and they'll be able to figure something out about it. And there's others who think you need some domain and expertise. And I am of the opinion and this is where this is a little bit of talking about this. It's an art, designing an experiment.
It's really hard and if you care about your health at all and you read the paper and you see within weeks, completely opposite claims about what you should or shouldn't be eating. Well, that's the design of the experiments. It's all the kind of thinking about how to organize the data. Is there biases? Was it a gouge in distribution or not. Maybe we can just kind of, I don't know if anyone has some more to add to that.
Dhruv: I think there's a comment to make here today briefly that there's this phrase that you use is the unreasonable effectiveness of data. I mean it goes back to actually a phrase from a mathematician using Wigner. The Unreasonable Effect of Doing Mathematics, right? His point was that it's amazing that you can cook up stuff in your brain and it somehow describes the real world, but there's a piece missing there which I think to Ron's point that you can cook up all sorts of stuff in your brain that doesn't describe the real world and it requires the discipline, intuition and understanding and experience to be able to cook up the right sorts of stuff that has the unreasonable effectiveness. Right?
So, I think it's the same thing in data. Ask anybody can get themselves a Hadoop and then run a couple of jobs and claim that they figured something out. And, that's often the first step to do something easy like that but I think that to the same extent you've got to have that intuition. You've got to have that expertise. It helps to have firms like Ron's come in and guide you when you lack that expertise right off the bat. Because it's not a panacea. You can't just throw something at it and expect to get an answer because you'll get a bad answer. Right? That unreasonable effectiveness comes with combining big data with the expertise to really look at it correctly.
Roger: Yeah. We have an expression we use at O'Reilly a lot where the best analysis like begs more questions than it answers. And that means that you're always in kind of experiment mode and I think that that's. I don't know what you all think, but when you're working with data everything looks like an experiment.
In fact, when you kill a project, what we always say is you've lost an opportunity to learn. It's not whether you saved money or whatever. It's like, "We're not going to learn something." We could have tried to figure out. And think that that's kind of.
Kerem: I think it's the PhD base which gives you a way to train your [quan] side of your brain. I think that that important thing that everybody should think about it is how do you go from your quan side to your artistic, more creative side, then combine the two. Then the most, the rarest kind of skill set that we are all after and everybody who's hiring, including me kind of knows the pain to find the right talent who can do that jump almost instantaneously between left and right and create while thinking in a structured fashion, whatever expertise they have, put it into something that actually work is really the path.
Dhruv: That has nothing to do with big data.
Kerem: Nope. Nothing to do with data.
Derek: It's a combination of, the requirements, in terms of technical skills have gone up. But the reality is that the opportunity to create value has gone up so dramatically that it makes sense to adopt these technologies. It is true that massive amounts of data. We'll work with customers that have spent ten years, tuning a model in [SAS] on a small sample of data over a limited amount of history.
And they've got that model, you can bet, these are very smart teams and they have tuned that model to just about optimal on that constraint. But it's always striking how when you bring it to the broader context, you bring in 100 times as much data, you bring a 1000 times the processing power that you can beat what people spent decades working on because you have the power to do more with it. Right?
So, this is what the break point is that we have the ability to do dramatic things with data but I always think of designing, doing data science, it's a little like an evil genie. You can ask for . . . a machine learned model will sort of answer exactly what you asked and you often discover with a little sanity check of what it did, it's exactly what you didn't want. It's like well, I answered your question. I gave you exactly what you wanted. You wanted clickers. They won't buy anything but boy do they click right back. That kind of thing, right?
So, it's very much. You have to look closely and engage both your left brain and your right brain. And one other thought, I think the question of do you let the data speak bottom up or do you apply top down domain insight? Just as in any pursuit in the real world, there's room for both. Right? But at the same time, whatever technique you're using it needs to have a rigorous approach. You need to have a methodology that makes sense and vet your results.
And the other things we see a lot of people not doing enough of is testing and getting things automated. Right? So, we believe it's not just about human insights and human decision making, but that this revolution is about automating your interactions increasingly and having testing approaches to get things out there in the front of customers, in front of partners, in front of whatever, in front of device response and seeing what's working and course correcting. So, you want fast feedback loops in the application as well as in the simulation and the trying of new ideas.
Questioner: It seems to me that you're saying that with the increased processing capacity that it's easier to explore, it's easier to experiment, it's easier to fail more quickly than you're able to in a SAS mode, especially [inaudible 00:14:58] resources. And then explore more quickly, is that the exercise now?
Ron: Did people all hear that? No. So, the question was am I saying that with the increased capacity can you experiment more, can you fail more quickly and learn more because of than in a traditional say, SAS modeling environment? And I would definitely agree that you have got to have the capacity to use larger amounts of data to run more experiments, to provide more resources, even in environments where people had a petabyte data warehouse. Very few people, only the high priests, got the access the data. Most people didn't have access to that. They got 1%, 0.1% sample in a data mart they could work with. So, it's a big shift that you have the tools to get the insight and now you need the approach to go with it.
Derek: Yeah. I mean so you mentioned Kaggle before. It's kind of speaking, so we just had our inaugural GigaOM WordPress Kaggle competition right? And the winner, I spoke with him last week and he just started a blog of overkill analytics and his whole premise was, he's in the insurance business. He's on the social network. He works a lot of advancing modeling techniques aren't what he does on a day to day basis. So, he kind of took the approach of I can get a machine, a big beefy server from Amazon and just like through simple techniques at this and just throw as much processing as I need.
I mean he won and that's kind of the thing and it's generated a lot of discussion, actually. This whole idea of how you actually go about, how you go about, like I said applying, just doing everything faster and using that to make up for the fact that you don't have finely model that took 10 years or 15 years.
Kerem: I think the underlying messages. I read here and there that there's this concept of big data now enables us to not to sample anymore. Essentially that process underlying technical process, the rigor, the way you're using the models, mission earnings versus traditional SAS. That really doesn't change. What changes your 1% sample is 100 terabytes, now. Right, so you have to deal with that.
But essentially the thought process itself is not changing but the time is compressing and the need to get to the inside, put it into action, make it productionalized and actually make a change with that is, in terms of time frames is compressed so much that you cannot do things independent of what the end goal is which is serve the customer and provide the product at the right time, at the right framework, at the right offer to whoever is looking for it.
Roger: I think how it manifests itself is like how much in the flow of your analysis you can stay. So, if you've designed your data structures to do something and when we went distributor the first time, we had queries that took ten hours which meant one a day to six minutes which meant you do a bunch an hour but that meant I could try a lot of stuff out. And that's a big thing that I'm able to iterate, stay in the flow of it. What if I throw this in? What if I throw that in?
Where sometimes those SAS models even on small things is even like you tweak them in this very careful way. We used to joke that when we went to distribute and had all this great performance that we could be really stupid. You know? Because we could put like weird expressions in something to kind of like inefficient stuff that would blow a big data set away when you run single thread.
And so, I think, from my perspective, and see if anyone disagrees that the fact that you can stay in the analysis more and not being with the sample. We often start with samples but then we go to the whole thing is that you're really getting the real results. And you don't have to. You can focus on the problem rather the infrastructure around the problem.
Ron: I think the biggest obstacle to being in the flow is so many organizations rely on a cue of IT work to structure data in a way that it can be analyzed. Right? So, it can be weeks in organizations to say, "Let's parse it this way. Let's process it this way." And get it into a form we can even take a look and then of course, the first time you take a look, you say, "Well, that's not really what I wanted. Let me put it back in the cue. I know what I want." Right?
So that's devastating to a process of creating value out of data. And so, you're point of doing reg acts approximate ways to start looking at the data, you can get something that give you insight in terms of whether there's value here, very quickly, and then you can hone in and say, "Is this exactly right?"
Roger: Yeah, we start out. We have this program called [AutoFreak]. It's like a big concordance. So, we do a lot of words and we just freak everything. What's the frequency distribution and just that simple thing that can solve the data makes a huge difference on what you might do next. And then we start getting into it. Which I think this might be a good segue way to talk about data as a service then and because what I think happens with data as a service is you start getting to that point a little quicker. Right?
Questioner: [inaudible 00:19:57] iterative.
Roger: That's right. Let say you're in a marketing group. Kerem is in a marketing group and he's had a, because we've talked before this. He has had the privilege of having a pretty nice size team and stuff like that. But if you're trying to build it yourself and you've got the IT bottlenecks you're used to and stuff. What are you going to do? Well, one of the things you can do is in a way outsource the things that you're not all that good at. Which might be things around sys admin. What we can kind of define is there's lots of flavors of data as a service. So, why don't we just kind of go through and talk about that? I guess we'll just kind of go round robin on where to use it, when to use, and who should be using it.
Dhruv: I'm happy to start on that. The history of my company, Infochimps. We actually started out with a slightly different model than we have right now. Right now, we are very focused around making these new big data technologies as easy as possible to use providing kind of a packaging solution that let's folks that aren't experts get started.
Sort of where we began, though, was much more on the data as a service. Our background is scientists and one of the things that frustrated us was that if I wrote a piece of code or if I wrote a song or if had funny video, it was really easy to share online and if I had large terabyte of fast data information, there was no place that I could put that and share or even sell it, or get value from it.
So, we decided that we would create a data as a service market place where anybody could go. Kerem could go buy a bunch of cool retail data that he didn't have access to, so we didn't have to have internal IT go and produce that and it was interesting model. There was a lot of folks that loved the idea and said, "This is wonderful. I love being about to just go onto websites, sign up, drop a credit card and grab, a few hundred million rows of data and get them into my database. This is awesome."
Of course, what we really realized after being in this model for awhile is that data as a service wasn't the end of the story. And I don't mean to color the conversation for everybody else on the panel, but really we got a lot of complaints. People would say, "This is cool. I love that you have all this data on your website. I really want to get started with it, but I don't know what Hadoop is. I don't really have the expertise to get started with this data. Can you help me?
Do you offer professional services? Is there technology that you can install or recommend that's going to help me get started with this data?" We began to realize that we're still thinking like scientists. Oh, everybody knows how to do a simply regression on a few billion objects by speeding up a Hadoop cluster. That's just not the case. It turns out and so, a lot of.
Drew: Surprise. It was surprising to me because I'm foolish and I think what we realized, beyond data as a service, there's another piece of it which is the technology or the expertise which doesn't live in the data. It lives in the people that work in that data. It's going back to the unreasonable effectiveness. Right? Just the data alone is not going to solve the problem. You've really got to bring technology. You've got to bring the expertise along with it. So, that's kind of where we've wound up on the data as a services side.
Now that doesn't mean that we don't still think that data as a service is important. We still have a lot of data as a service components to the solutions that we sell to our customers. We're still always working with vendors that provide data to us in a really nice clean and easy way so that we don't have to go and source it ourselves and it's still a huge part of the value we provide but it's not the entire idea of it.
And, I'm curious, actually, for, especially, Ron, your comments on and Kerem as well, when you bring this data in house that's not the end of the story. There's so much more that you need to do it and how do you guys tackle those problems?
Kerem: So, yeah. I kind of blessed in a way that I have the size of the team that I have. I have the system support that I need that I don't have to open up those large enterprise ID tickets for things to get done. I will kind of look at the data as a service from, because I'm on the other side of the fence, right? I'm not the internal enterprise kind of, I'm a dedicated analytics team that is supporting most of the functions in dot com as well as now in the stores.
So, I'll twist it around and say you can think about an analytics function within an enterprise as if it's a start-up entity and a start-up entity provides data as a service for the rest of the company and this is not an IT function you're talking about. You're talking about a business function that exists within a business hierarchy that has all the tools and technologies and systems supported needs to act fast to the changes in the business, to the changes in the needs of the rest of the functions across the business while IT is still there.
I'm not by any means saying you don't need IT anymore. You need IT for larger scale enterprise, real time maybe kind of connectivity for the enterprise to serve the needs of the large organization but as a small analytics entity that is becoming more big data driven, more mission earning and self, kind of, what's the right word, self-learning and self-growing entity within an enterprise that drives the business, drives top line, bottom line revenue growth for the company. We need to start thinking about analytics functions as entities that has to stay in agile and has stay focused on the delivery for the business and that is where the value is.
And that's where the data as a service should find it's place to support that vision. That, this team with an enterprise needs to stay fast, agile, responsive to change in the data internally and if there is a need outside that we do not have internally with the means that we have to collect that data, that's where the additional value is and the value is clear for us.
And then, obviously the hooks and all the checks and balances need to be in place for us to be able to use that data that is coming from outside, obviously, no need to mention all the legal clearances that we need to go through and so on and so forth. As long as the data as a service providers approach us and say, "Here's the value add. Here's how you're going to be able to bring these inside, turn it around, feed it into the business and the business will make X amount of bottom line impact." By all means, this true value proposition for us.
Ron: So we certainly see to have that notion of having a shared service in an enterprise that creates value and at this point, we're not yet talking about where the technology implementations. So, we often see in our customers, we'll have a leading group like Kerem said that run ahead and start to create value often in online and marketing because that is such a prominent place to get started. But what you start seeing is maybe you get the risk modeling group or maybe you get a physical presence group involved or a different part of the organization starts to realize that they also need value.
And that's where you start to get to interesting questions around centralization around decentralization and where a service orientation can become interesting. Right? What are the things that your common infrastructure provides? What are the things that are best delegated out? And we believe that all business units need to have the ability to take advantage of data and to create innovative ideas but you want to have at least a community of practice sharing and that often data sourcing is an important strategic question in the enterprise and that's new. Right?
It's typically been handled as extremely tactful function but when you look at say social data, there's a lot of value in social media data, but there's absolutely not a simple just go out and buy a ton of it. Right? You have to deal with the relationships with the different vendors you have. You have to work it. You have to have a strategy for what can be done and what are the consequences to your enterprise which again are reputational and can be significant depending on how you choose to act in that space. Right?
So there's a whole sourcing strategy and interestingly, there are cases we see where part and parcel of sourcing the data starts to impact the technology in the sense that big is data is big and you just don't move a hundred terabytes of data over the wire on the fly. Right? You need to have a thought of where the data lives. And so we see that there are places where you have centers of gravity.
If you've got a hundred terabytes of transactional in your existing data center, it's difficult to move that into the cloud and a big decision most enterprises don't want to do. But if you're tapping into a hundred terabytes of social media data that's already on a cloud provider like Amazon. Then, it's a lot more easy. The center of gravity is, let's put some analysis. Let's put the smaller data sets up and join them in the cloud.
So, we think that your data service provisioning is really does depend a lot on where's the weight of the data. And that becomes important. And stepping back, we think you need to have the data provisioning and all the services. The data, the analytics, the infrastructure, the compute all of those need to be aligned around this model of what do you decentralize and centralize. How do you continually drive innovation with experimentation?
And so, we think that the best practice is to have an enterprise process for tying these together to share among groups but also to have a deliberate strategy of measuring and pushing things out to get value from the data and the different pieces ought to be packaged as services and made reasonable as best possible, whether it be data or computer otherwise.
Derek: So maybe a cloud orientation to begin with. So we're adding data as a service, I don't think I look at the enterprise, I guess. Although in one theory I understand it's service I think about right cloud services, targeting business users, targeting app developers, targeting whomever on the idea of making data as. But data I mean the assumptions you have is the data or you can work with. The data isn't the key. The analytics are always the key. Right?
So making a service that just takes away all the guess work. Right? So like, insert data. Maybe you have to do some work with your data to make it workable but insert data and the service part is, you know, here's making this Joe business user running some fairly complex analytics on it in the cloud and in a hurry and all that goes along with as a service. Right? So, kind of on demand utility sort of model I don't.
So, my kind of pipe dream, and it's kind of a pipe dream at this point. Right? But the idea of, let's say, democratization to the point where the secretary of whomever, whomever has the insight, or whomever's job it is has the best business sense around a particular data set can actually run the analytics they need to achieve something, right, and it might not, it's not always going to be a risk analyst or a computer scientist or whomever has the answers to that question or who knows the best questions to ask. But you need to get those tools in the hands of someone, the right people. And, I think the service that has a service model makes that possible.
Roger: Yeah. I'm going to bring up two points. One is the transmission in the network stuff. I just sent a disk. Shipped it FedEx because it was a lot faster than shipping the data over wire. And, that's something that when you start getting these big data sets that becomes a pretty big consideration. It will be interesting to see if over time that there will be this kind of consolidation that is really around network latency rather than any other, whatever factors cost and so forth that will happen is that does Amazon have an advantage because a lot of people are on there because you can get ten gig between things rather than have to ship it over the regular internet?
But then we get into the point that everyone was really talking about is that data as a service in a way bring this layer of tools around so that you can focus on the business part and why you are doing this. And, when I think of clouds, at O'Reilly, we think at all sorts of technology things. When I think of cloud, to me, the problem that was being solved was a procurement problem. How do I get something so that I can try something out without waiting six months?
And, I look for shakes of head. Is there anything as prone to that as data analysis? Of, I just need 15 terabytes right away because I want to go look at this thing. This is a really common thing. You need a sandbox and you're not even sure what you're going to do. You're just going to go in and do it. And so, I think this notion of a service really works well in a data space.
Dhruv: I like the connection that you're making between sort of big data as almost a use case for the cloud. I would be a little more abstract and say it's flexibility or it's unpredictability that drives the cloud. Right? So, web serving is a classic example of a cloud empowered business because on Thursday, I'll have this many users and on Friday afternoon, I'll ten times that many and then goes back to this many on Saturday and how do I deal with that from a procurement perspective. Right?
I mean that was exactly Amazon's problem back in the day. Right? It was like over Christmas we have way more traffic than we ever have during the rest of the year so why don't we solve that access capacity. So web serving was always a great driver for folks in the cloud. I completely agree thought that big data or data analysis in general is another one. Partly goes back to your prior comments about just being iterative.
Sometimes, you may decide. And we've done this at Infochimps. Right? This is working really well. I know we weren't planning this but hell, let's just spin up 50 machines, just go at it. See what happens today with this new approach. Right? And to be able to do that in 20 minutes on the cloud is incredible, I mean really drives the irritability. And especially if you've guessed wrongly. Because if you've guessed wrongly and it turns out that was a complete waste of time, that was a bad a idea. I don't know. You wasted a couple of hundred dollars and it took a couple of hours. Right? You haven't gone through that six month procurement cycle, bought a bunch only to realize your entire approach was flawed. Right?
Derek: I agree. Big data as a use case for cloud computing. For just the reasons you said. The idea to be able to kind of start toying around with all this stuff or maybe do something, you know, as an idea pops up and you say, "I want to do this." And it only costs me whatever like $20 or whatever is crazy but then I think too, just the cloud it turns out is a great place to centralize knowledge Right? Or centralize skill. So, we don't need a whole universe of data scientist or distributive assistant engineers if they're starting businesses and building services that then, you know, make it so that not every company needs that team internally. Right? I think that's really valuable.
Ron: I definitely think the cloud environment lifts the bar in terms of productivity and making it easier to deploy capabilities, reducing the needs for specialized skills. You see a lot of enterprises adopting cloud for big data, especially around proofs of concept and you certainly see the majority of start-ups embracing it just as with web surveying that it's radically better trade off for the resources of the start-up of getting it going and to scale on an environment.
But with that being said. I think the enterprise is still getting it's head around privacy and security and some of the regulatory compliance and it's not so much. I think that in most cases what you'll find is the cloud providers are far ahead on their ability to operate and comply with standards. So, it's perception is really the thing that, from a PR standpoint, if something did go wrong, and it was found out that a regulated company was using a cloud infrastructure to work some sensitive data, they would be exposed extremely from a publicity standpoint. So that's the burden.
Like many of these things, it just takes time for the norms to shift and people to realize that this is reasonable way of doing things. I mean you look at Salesforce.com as one of the first prominent software as service providers at the time that you would run a core a business application in the cloud and put your CRM data with your customers up in the cloud was also considered pretty risky.
But you may have noted some signs around here that indicate there are a few well known companies that are running CRM in the cloud with Salesforce. So, over time, attitudes change and I think the long term trend is not only do you have better skills around cloud but you're going to have far better economics that try to maintain an internal data center is very difficult to competing with what the kind of economies to scale and the kind of efficiency, a purpose built service can provide. So, I think the long term trend is pretty clear about how it's going to gain shares is just a question of timing and what's right for your needs right now.
Dhruv: Just to follow up on that. Do you think it's public cloud that gains shares there? Or is it folks re-implementing that idea of cloud flexibility brought on in internal hardware?
Ron: I think there's value in both. So, having private cloud infrastructure helps a lot with many of the provisioning and capability needs as you start seeing multiple groups working with it you get that same kind of averaging. I still think that looking longer out that the public cloud is going to have better economics rather than everyone building their own little cloud if you can get it so that everyone's confident that the security and compliance issues are addresses. Having it pooled in larger pools, you can have capabilities, located more by where it makes sense, lots of data centers, lots of capabilities to be near the right place and to have exchange points for data. So, I think that public cloud becomes dominate in the long term but that could be a ways out and there's definitely a useful intermediate step around hybrid cloud.
Derek: And at this point, I mean the cloud-economics, I think it's the current term that's been coined as. Still if you have a static, I mean a static data work flows running regularly, jobs running regularly. That economically might be better off doing something internally. So aside from even in the . . .
Roger: In fact, it's a data problem.
Derek: I mean yeah. It's just a matter of. Yeah, if it's too expensive for you to ship in a timely. You can't the, latency, everything. Many reasons economically and time wise to do stuff internally and then just deliver it as a service and that's the whole public versus private cloud debate. Yeah, but over time as the cloud-economics, public cloud exchanged, I think that's where the shift's happening as well.
Kerem: I think. We have an internally cloud but that doesn't prevent us from looking into how we can benefit from public cloud services and one example is, what Ron mentioned earlier, the social data. There is so much social data, unstructured data out there that it makes more economic sense to actually work with a third-party provider as an enterprise to sift that data for the teams internally for the enterprise and then enterprise then picks up on already, maybe, filtered or worked social data, make it actually useful and actionable in the shorter time period for the internal teams to act upon that data. I mean it just doesn't make sense from my perspective to just down pull, down structure data that can think of, and try to do it internally just doesn't. There's no.
Roger: So, we're out of time. Oh.
Questioner: So is that ETL and data management layer?
Kerem: ETL data management filtering and actually creating the data that is required for the business to take an action on by embedding into the internal processes and make it a complete services internally.
Roger: So, here's what I'd like to know and I'll even start with my own opinion on it since I'm in this space. What do you see as coming next? And what do you want to come next? Right? So there are little bit different shades on it.
So, what I see coming next are ARM-based processors because power. Just the article on the front page in the New York Times yesterday about just how much power these things use. I think ARM and that kind of architecture where power is as important as anything else is going to be really different. I think zero VMs. Anyone not familiar with zero VMs? It's a teeny, tiny fast start up VM. That means if you've got an army of arm processors, you can turn them on and off. Right? Computers are terrible at doing nothing. So I think that's going to be something we're going to see.
I'm not going to get into what Spark is, but Spark is a kind of distributive analytic environment out of Cal that I think is a really cool. I think people's about real time, streaming a lot, I think streaming is going to becoming the new ETL. That instead of batching which is a really inefficient way to deploy your resources, you're going to get stuff in and always going to be working it to get it into your stuff. Now, what do I want to see?
As someone who provides analytics, I would love a unified thingy, where I can bring everything together and communicate out to someone. And I've kind of managed how I've thought about how things went. It's almost like the way mathematica where it's got to kind of scroll is that something that covers all my analytic steps. So, that's my vision of the future.
Dhruv: I have a couple of comments, I suppose. I have some technical. Some projects I wish existed. I mean at this point we've got great streaming, ETL frameworks like Flume and Storm. Those exist. We've got a Hadoop for batch. We've got awesome databases that give you kind of big table and Dynamo and all these core idea. They've been open source. They exist. They've been robust. They're powerful. They scale. They're awesome.
One major piece that's missing, I think, is a really good framework for doing network based calculations. Google's got a great one. There are some beginning open source projects around that but none that have the level of sophistication or robustness of Hadoop or other things. And even Hadoop has a lack an attitude of not even being ready in the enterprise sometimes. Right? So, the software has a ways to go. So, I'm looking forward to that.
But, I hugely agree with you. One of the things I'm desperate for is a really, really good visualization and analytics framework. It's almost like right now we either have two approaches. It's either like well use something from five years ago which was designed to use at a smaller scale and just grab last week's data and put it in there or grab one in one hundred records and put it in there. Or grab data for just this one customer and put in there so I can get beautiful graphs. So I can do really cool analytics on it.
And then there is the other approach, well, let's just do it all in scale in the big data world but then the graphs kind of look ugly and they're not super nice and we feel okay about it but there's a billion data points I've put into this graph. So, just the fact that you have it you should be thankful for.
But, I'm not thankful. It's been too long. And I want the quality of the small node, the [tabalode], those really, really high class analytics environments but at scale and that doesn't exist yet. We are still in a realm of people coming up with the purpose built things that aren't good enough or people taking those smaller tools and trying to connect them into the big data space without truly making them native members of that space. So, I'm waiting for some smart San Francisco start-up to invent that product.
Ron: I think they're working on it.
Dhruv: They're working on it. I think a lot of them are working on it.
Ron: I was going to say. I think what I keep hearing is coming is and what I would love is a user experience that is so seamless.
Dhruv: That a secretary could use?
Ron: Well yeah, but.
Dhruv: I mean billions.
Ron: There are companies working on it now that Hadoop has . . . I mean the platform for us is that the history major could do is the fun of this. But yeah, the idea of being able to visualize and do these sorts of things in a very elegant and visually stimulating and intuitive way and displaying it in that way.
Dhruv: My own guess is that I don't think that Hadoop is the back end that powers those things. I think something either not build yet or a mixture of things is probably going to power it. Hadoop is way too dispatch oriented and slow.
Ron: It depends on your use case.
Dhruv: But that doesn't stop people from trying to built it using that. Right?
Ron: And that's the technology that you have to build. Right?
Kerem: Yeah, from my perspective, I think I don't know if I'm going to live to see this but more people how can actually build on the stand visualization while empowering those visuals with real insights from the models or data or whatever they are seeing more and more people doing this and actually doing things, knowing what they're doing and really impacting the business with the tools and the data that they have at their hands. More people, more power to us.
Derek: I think talking about the future is such a broad topic because we're in the middle of this revolution that's happening to really bringing a whole new mindset to the enterprise and so you've got big changes happening as a couple of organizations are thinking about governance and how they organize, how do they transfer knowledge, how do they start to get into a cycle of being more nimble and experimental around their data. I actually think that is the biggest single that's happening is this transformation in mind set that the technologies have enabled.
Now, there's tons of information and innovation. The government's pouring massive amounts of money into our R&Ds, funding, research. There's tons of venture capital investing in all kinds of companies that are building innovative tools and products. I think one of the challenges I see in that space or the opportunities as well as.
One it's much easier to build something that's a point and it's actually much more valuable to have it integrated. Right? So, I think there's way too many start-ups saying, "We're building this one thing in isolation to try to solve the whole problem." And they're not recognizing how they fit into the community. Our customers don't want to hear about a propriety axe that doesn't play well with anything else. They want to hear about how it fits into what they already have invested and committed in. Right?
So, is Hadoop today. Does it have the right characteristics to do all the things we talked about? Well, no, but there is a path forward. Right? And so we're much more excited about technologies that integrate in and add value into the standard than just put in our own proprietary cloud environment. We've got magic tools that work in isolation. That's not what we see our customers needing. Right?
So, there is a balance. You've got to solve a problem but you have to do it on some kind of bedrock and I think a key pivot point around that is how effect is open source community at allowing others in and allowing them to make contributions. Right? We're in this unprecedented territory of most of the innovation around big data is happening in an open source context and it's not just Hadoop and it's also NoSQL and it's R and it's a lot of the key technologies are being done in an open source way. But, you still have this key question of governance and communicate, to have direct competitors come together and interested parties to make something move well.
Questioner: The comment you made about not wanting proprietary tools that only work in an isolated environment, that's a lot of businesses have wanted proprietary tools that everything's normalized and works in that context. But that's not the view now?
Ron: So you're saying the past, there's been a lot of proprietary innovation. We think that big data is fundamentally being driven by open source and by. It's moving fast because there is really quick standardization. And businesses want standards. Proprietary standards take forever to fight out in the market place and no one buys anything until. Well, Oracle's one. Or DB2. So we'll go with that as a database because now it's safe.
Companies are able to move a lot faster because say Hadoop is an API set is clearly succeeding and there's number of implementations but it doesn't seem nearly as risky to pick any one of them versus if you had to pick between three completely incompatible implementations of the same concept. The enterprise would simply be paralyzed around what to do.
Questioner: So, besides being a system integrator, how can a company make any money without these proprietary tools aren't proprietary? And not sold at.
Roger: Money. Who cares about money? We got analysis. Analysts.
Ron: Companies make money in an open source ecosystem. That's been a big topic for the last ten years. And there's a lot of business models. You know, in terms of selling support. In terms of open core. In terms of selling commercial pieces on top of the core standard. Right, there's a lot of ways that organizations have made money on open source. And, I think I don't see that stopping.
Roger: I think we have to stop. But I'll just say if you haven't read Tim O'Reilly's clothes line effect about open source, you should. He basically, the argument is when you put your clothes out to dry, no economic value is traded but your clothes are dry. If you put in the drying, you bought gas. You had the cost of the dryer. You've used electricity to use the engine. You've created some economics there. Don't look at is there an "R" vendor. Look at what people are using R to do. And that's where the value is created.
So, people are applying this and that's why data as a service probably makes some sense for moving forward is that what matters is how this all comes together. Why did Yahoo build Hadoop? They didn't, they couldn't hire a thousand engineers to build a structure as good as Google's. So, they open sourced it. They got something they wanted and so did everyone else. So, anyway. So when you think about the equation that way, it looks different.
So, on that note. What a great panel. So, I want to give them a big hand. Thank you. And thank you everyone for your attention.