Transcript
Joe Kelly:
Hey guys, thanks for joining. Good morning. I'm Joe Kelly. I'm a Co-Founder and CEO at Infochimps. I've got a great panel here of folks that the media industry, some technical folks who are going to talk about getting started with a big data platforms and our experience there is definitely a lot of interest in enterprises bringing data platforms and things like that but there is not often sometimes the budget, or the talent and things like that. So we want to kind present a road map or just scenarios that we have seen in real life for how people can do it successfully when there is kind of low barrier to entry. So with that, I'll kind let each of the panel introduce themselves, I will start kind of on the far right I suppose first with Daniel.
Daniel Eklund:
Hello everyone. My name is Daniel Eklund, I'm a principal consultant with company called ThinkBig Analytics, based out in Mountain View. We have been doing consultancy in the big data space, SQL, Hadoop, etc. for about 2 1/2 years. Yeah, I architect solutions. I'm on the ground with clients. Sometimes they are large Fortune 10 companies, sometimes they are cool start-ups in San Francisco, and I've seen a lot of stuff.
Jonathan Harris:
I'm Jonathan Harris. I don't like using microphones, I prefer to use my own natural voice. I work at Postmedia. Postmedia is based in Toronto, Canada. It's one of the largest media companies in Canada, so we've newspapers all across the East to West Coast, and I also run the media monitoring division. It's called Infomart. So we serve about a 1000 clients, and these are 1000 clients are governments and corporate. We basically listen to what's going on in the news for PR companies and comp specialist and increasingly marketers who want to hear what's going on, good news, bad news about their brand, about their message.
Dhruv Bansal:
Hi, everyone. I'm Dhruv. I'm another Co-Founder and I'm the Chief Science Officer at Infochimps, I have a lot of experience with implementing big data solutions. I'm very eager to talk to you all about the kind of use cases and what I've seen folks actually doing with big data in the world. Thanks.
Derrick Harris:
I'm Derrick Harris. I'm writer for a slightly smaller media company called GigaOm. We have a news site and we do conferences and research service. I write a lot about everything these guys do for a living, and I also occasionally take a look at our analytics and got to try to base my work around data analytics. I'm in a rare position of answering questions instead of asking them.
Joe Kelly:
Thanks guys. I guess we'll start. I was hoping they are going to dive in little bit on the just kind of media story, media use case. Jonathan, you're with a many decade old media company. They got these legacy newspapers. What are you doing at a big data conference like what's going on with you at Infomart?
Jonathan Harris:
Yeah, there is a lot of change is going on in the media industry, I think probably everyone in this room is well aware of it. I don't even think it's worth repeating some of the challenges that newspapers are encountering. So it's really, in terms of data and why we're here, there is two channels. I work for Postmedia, a large media company, traditional media company looking for ways to serve its clients better, the agencies and media buyers. That's part of the equation; the other is the media monitoring equation. That’s the business that I'm directly responsible for.
So, let's start off with the newspaper side of the business. What we see and of course over the years is media buyers, agencies, marketers, the people who are in charge of the budgets are moving away from newspapers, we all know that. One of the big reasons of course is well, you know if I put an ad in the newspaper and I show ROI, and I show value. It's difficult to do with a piece of paper. The movement of course into digital that all newspapers are going into started to change that. You can start to measure rate and advertisers start buying leader-board ads and big boxes, fantastic.
And there are some metrics around it, but you know there also start to be some questions where the click through rates start to drop. There is a lot of inventory there, so you know we start to become more and more challenged. What solutions can we offer our clients, that really make us a viable solution for them? At the same time we have heard all this chatter about content marketing. Content marketing is essentially, it could be anything; posting a video to a tweet. For us in newspapers what we're really good at are telling stories.
Telling stories written in long form, something short form. What we are seeing a lot of now is newspapers, media outlets, crafting messages, positioning brands at start years and posting that on their own websites. So, then you start doing that and you start putting clients message on your story that isn't just an ad, but it's a contextual, you have something that can be measured. When you have something that can be measured it also gets amplified across social networks, which can also be measured.
All of a sudden, you can say to a brand "hey, I can tell you who is reading your stories, I can tell you where it's going." Then I can start thinking about well what's the influence? What's the authority? so I can create a story? I can measure a story, I can analyze the story, and then I can refine and optimize. So it becomes this virtual circle on behalf of the brand, and a lot of people are moving into this pretty fast in our landscape. So that's one piece of puzzle.
The other is the media monitoring side, the side that I'm directly responsible for. My business has been around for 25 years, and because it was naturally part of a media company, we are really good at print. We have relationships with a lot of different, what we would consider competitors, and we suck in their feeds traditional print; and we suck in some broadcast, and that was sort of it.
We're almost like a LexisNexis, a clipping service. Overtime we have migrated, become more of a public relations necessity. Where you should use us if you want to hear what's going on. But a piece of the puzzle is missing, and that social. We have had a social solution, but we also want to offer a bigger holy grail integrated solution there for our clients. They are demanding more metric, they want to know what's going on all the time. And that's why I'm here.
Joe Kelly:
Awesome. Good story. I find that structurally, it's pretty much the case that than big data we feel that web analytics, social media that been kind of primarily leading use cases. I mean, Daniel I guess from the view of big analytics, you feel that's pretty consistent with the kind of use cases you see, as generally the leading use cases. Why is that?
Jonathan Harris:
Measurement is key for all of these things. I find the web, the ad text base fascinating. Me personally, it's a varied ecosystem. You hear the term big data thrown around a lot, and I think the varied data is really the thing that's more compelling. Well, don't get me wrong, big data is still big and there is still a lot of big data coming in, but the varied data from the different vendors, the different players in this gigantic ecosystem from advertiser to publisher necessitates different technologies. You can't just say "hey here is your traditional data base which curates it for you. You need to come up with technologies, you need to come up with the techniques to handle these types of things" Obviously, you hear the Vs, velocities is one of those too, definitely in this space. This is much more important.
Dhruv Bansal:
Say in the ad tech space I mean that's where I see some of the most interesting, companies and start-ups forming because like the velocity, is the other thing you just kind of touched on; but the data is coming so fast and they are talking about on-demand. we're talking about predictive modeling for placing ads and whatever and everything, this is a split second decisions and like the technology. I mean like it's not just what we're going to do Hadoop, but we have to do something lightning fast, because we have to make the decisions in the real time or as close to real time as possible. I mean that space is just like huge right now.
Jonathan Harris:
Yeah the real time bidders are driving some amazing technology, I know SEQL fronts that are forcing people to think very, very hard about the traditional data models that are out there.
Joe Kelly:
It's definitely an ROI story. So I guess, when someone decides at a company that I'm the CMO, I need to get better measurement, I need to reach out and bring in these additional data sources. What are some of the usually like the first steps maybe Daniel, you guys folks through it at ThinkBig? You say just kind of ground for what do people start?
Daniel Eklund:
We look for big wins early, because every organization needs to be given a win in order to have someone advocate, we look for an influencer to advocate it. But always, there is the traditional technologies have been always about the vertical scale and it's so easy now a days to pick a particular scenario and say Hey, just put this on Hadoop or try Cassandra or Info Chimps and you can see an amazing turnaround.
It doesn't have to be the absolute best utilization of the technology. So proof of concept just to get the organization buying end to end range and move. Because a lot of times people need their eyes open. There is hype around it everyone is here because everyone has heard about Big Data but when you go into an organization there is also the natural anti-hype, and yeah show me and so almost inevitably we have to start with a very big win and they tend to be fairly easy.
The infrastructure is almost always there, people have been ingesting data for multiple sources, but it always has been in a fairly slow fashion so the velocity is the thing that's changing. There are some technologies out there regarding streaming and ingestion that are slightly different. But more often than not, these are very simple organizational structures. So, you tend to work with those organizational structures that work across the enterprise that are dealing more with integration across the enterprise. You work with them to tweak their expectations about how fast things are coming in and where you should put but it's not too different, it's not too far.
Joe Kelly:
Dhruv any experience you want to share?
Dhruv Bansal:
Yeah. One thing I actually had a question a little bit around that. I don't know if it's my role to be asking questions, here. OK. It's kind of a little bit I'm curious because Jonathan, I know that you have a very clear understanding of, I think, where is that your company needs to go I mean it seems like you really get you know what you want to be doing, whereas definitely my experience and I wonder Daniel this is kind of my questions is to you, you also see this, is in my experience. There is a lot of folks out there who like they want to get into big data but they don't know exactly what they want to do with it. It's on that they are saying, "here is the problem I'm having, it's very specific. If you can solve this for me than I will be your champion and I will help advocate this technology in some organization." A lot of times there is that element of needing to...
Joe Kelly:
I want that Hadoop thing or something.
Dhruv Bansal:
"Yeah I want this Hadoop thing, I think it does everything," but it doesn't. You've got to be quite clear about what you wanted to do. Sometimes it's a challenge to help folks understand that it's not a panacea, it's not going to come in and fix everything. The better they understand what it is that they want to have fixed, the more quickly and easily firms like ours can come in and solve those problems.
Daniel Eklund:
Yeah. No, education is one of the first things we do, I mean I don't want to dampen people's advocacy for big data in their organization, it's always, almost always a necessity. I remember back in '95 people were asking me to compare and contrast Java with HTML. They're not even in the same bucket, you know you have to, education is a very important part here, and Hadoop is not the panacea for everything. Cassandra is not the panacea for everything. So there is always an education component that goes in, and you have to do it lightly without puncturing people's bubbles.
But, I think across the general, everyone is becoming familiar with it. People now know Hadoop is for batch analytics. They are not thinking "hey this thing is going to replace my Teradata or my Greenblum. They're beginning to understand it's a complimentary technology to an existing infrastructure. I would say bring in your big data, the way to bring it in is not a solution to replace everything but as a complimentary solution. Sometimes to be an ROI play to save you some cost but sometimes to enable to different types of capabilities that you didn't even realize you could do, like natural language processing, for instance.
Joe Kelly:
So right, so Jonathan you were in the standpoint of wanting to second the social media data. What were the steps you kind of approached? You did have specific goal, what did you kind of go from there. What did things look like on your side?
Jonathan Harris:
Well, you know that we had a lot of it, so we had some experience with social media, in terms of we were working with another partner and another vendor and we still do, and they are very good at what they do. You know it came to a point where as we start to look at our old platform, which was have been around for quite a while, we need to make a lot of changes and really go for that holy grail solution. At that point, we thought that you know we had to have our own solution to really offer that to our clients. One of the first things you know that went through our head, and remember it's a traditional media company, although we have technology resources at our company. If I mentioned Hadoop I think it will cause a lot of blank stares. I mean no offense to the company, I know this is being recorded. So the first thing I did is well how I'm going to solve this issue. I mean I'm not techy, I'm a media guy; but I knew enough that I needed to approach someone who could help me.
I also knew if I went internally and asked for resources, yeah I probably get them. But how many internal stakeholders meetings would I have? How much work would that be, and there be a lot of how many fairies can dance on the head of the pin? I just didn't want to go there. So, it just happened to be at a conference at the [inaudible 00:15:35] conference and I was looking for Big Data Solutions. I happened to come across you know Info Chimps and Dhruv here was presenting and I said "well I'll keep talking to these guys" and overtime we developed a really good relationship. They understand what we wanted, and I thought, you know what, they can deliver it. So in some ways it was like outsourcing the technology and the expertise to someone I trust. And I think it has allowed us to move a lot faster than I would have thought and help our team learn super fast too.
Joe Kelly:
Who within your organization who is the end user I guess with the Infomart of?
Jonathan Harris:
That's a great question too because we do have internal users and some of them maybe journalist, but they are using it for different reasons. The interesting thing about our platform, you know it started off as essentially more or less a clipping service, a librarian's tool long before you had searched on the web. Overtime it morphed into a PR public relations tool, in terms of monitoring what's going on; and now it's moving into a more of a marketing database.
So there is all sorts of different user bases or user types, and that's made it even more complex in terms of how we build our new platform. But in terms of the emphasis and where we're going it really is PR marketing but we still have to serve some of those existing.
Joe Kelly:
So you have you're data platform I guess, you mean I mean whose consuming, I mean are your employees looking at dashboards and I mean is that kind of the consumption model? I'm just kind of curious.
Jonathan Harris:
That's where it's going to go and there is some dashboard function but that's totally where it's going to move. Absolutely.
Joe Kelly:
Because I've a team of data analyst back there.
Jonathan Harris:
And that's great, and we do not have a team of data analyst and that's a big part of why we are, we had to look for solution out there.
Joe Kelly:
So one question I have is well for Daniel some on the, how many organizations do you see because we haven't kind of social media and analyst folks, but how many folks are implementing these solutions for internal data already and I feel like there is a big use case for the big data for just helping me see what I already have, helping me just understand what's already within my organization. So what do you kind of see for solutions within that, Daniel? Or what would, if some of even that, look like for you? Anyone, take it.
Daniel Eklund:
I mean there is tons of use cases, I mean I think one of the areas that's interesting in terms of what's going on in the whole newspaper environment, and being able to really track and measure where those messages are going on behalf of brands and measuring that ROI for them and I know anyone who is a marketer in the room is going to grill me on "what is ROI and can you prove it?" But at least having those deep metric and been able to tell them real time what's going on measuring on behalf for those clients and producing those reports that they are really demanding. I think that's a key use case and in that use case you have to pull from so many different resources and that's where big data really is helpful.
Joe Kelly:
I will chime I mean just like coming from the perspective of a journalist I always think of you know looking at data as, so when you're in journalism school you kind of learn how to design a newspaper at least I did. So we are working its like well here is the story that goes in the front page, right, because everything was just kind of based on your gut. This is the way it works, I mean you know what I mean? It's years of tradition the story goes here, this one goes here, here is along the headline, should be all these things and then I always look at data understanding what you have. I mean, if you go back and can look at historical readership data and data about topics and data about you know authors and you can really put together a picture of what actually works, right.
So you can look at these are things we actually should be featuring. These are topics we should write about more, these are, I mean this is the data that I don't think a lot of you know media organizations have, or they certainly didn't organize it in any structured way. So, to think now that you can take you know natural language tools and all the things and they start putting you know so you don't have a metadata and every story remember to use, tools and start extracting some of that and actually figuring out like what is that's that really is successful using data to drive publishing decisions versus just intuition I think is going to give a big deal.
Daniel Eklund:
You know, I think that's an interesting point and one that's happening in some newsrooms, or that you see in the beginning of it. I mean it's totally true in terms of the way we used to put together newspapers. I used to be a business editor and it was kind of "what do I feel like today?" And I would slap it together and put a few headlines on. Was it based on what I thought our readers or audience would like? No. It was how I felt that day, and it was my own decision. It's completely qualitative. What's changing the digital realm, and companies like [inaudible 00:20:51] for example, whom I was speaking earlier. They have tools out there that are allowing us to monitor what stories are been read in real time, that's a huge impact to the newsroom. Where editors are actually looking at that and deciding where they should not only place stories in the digital environment to get much juice as possible in other words, unique visitors and page views. But also it's starting to guide what we put in the paper and where it goes, which is kind of cool.
Joe Kelly:
So one thing I think about is there is these technologies and all these practices and things come from the LinkedIn's and Google's of the world. Where the LinkedIn presentation yesterday, she asked the question well what do data products look like at LinkedIn? And she showed the homepage and like everything was circled, these are all data products. So, and we are talking about as well kind of a more reactive media organization and media organization that blends some of those things within it. I would kind of pose a question to you guys, where else do you see outside of the LinkedIn, Google's of the world. Who else do you see doing that well kind of integrating that same science, that mode that method, that reverence to data, what it can build and bring to an organization. Where do you see that happening, either specifically Daniel, and within industries or folks that you guys work with? Or just even probably wherever you have seen that done excellently, even today, even on the panels you're seeing, things like that.
Daniel Eklund:
I think the entire, they coming to see every website, every start-up but it's amazing to see.
Dhruv Bansal:
It sounds like cop out, but it's true.
Joe Kelly:
But outside of those at the enterprise level even, who would you say?
Dhruv Bansal:
I mean I have guess, I haven't heard. I mean you hear stories starting if you're talking about these kind of traditional industries of manufacturing or products or whatever, of using data twin form things that are already built you know products. But honestly, just amazing the cop out I'm talking about the web I mean, when you come hearing you see I mean yesterday's talks by Sparked, and StumbleUpon and those companies and the ways everyone uses data to inform everything, from the user experience to the product on the site, to what customers, I mean use everything. It's amazing, and I don't know that too many traditional companies are at that point yet. Where data is actually informing, aside from the traditional BI stuff, is actually informing you know the business.
Joe Kelly:
I've got a question for you, to follow-up on that. Derrick, which publisher would you guys send? Maybe this is for Jonathan and Derrick both. Which publisher do you think is like the I don't know, the Google of the publishing world? That their homepage you can also say "yes everything here is actually totally backed by data and not an editor gut decision, somewhere. Who does the best job? Name names, guys.
Dhruv Bansal:
Why? But even to the previous question, why it doesn't it exists outside of those companies yet? Measurement is one, we're just now able to measure social media and things like that but what is it as well, internal that's lacking? Technology, people? I mean those are some of the obvious ones but anything you can share as a story Daniel or any particular arch or a "aha" that happened in an organization, that set them on this path? Things like that, I'm curious.
Daniel Eklund:
Well we have been speaking a lot about the publisher and I have been, I have worked a lot more with the people sort of in the middle and towards the advertiser end who are very interested in maximizing their spend. Now those who haven't adopted it, I hate to say it it's a same issue that you're bringing it up. There are these massive holding companies that have agencies, creative agencies, trading desks and they are trying to and to get all the stakeholders agreeing upon technology or allowing their individual IT organizations to not lose face and adopt a big data strategy, is a very difficult problem.
So these are the people who tend to have problems with it, and it tends to be because of the issues with politics. Most of the people who are adopting these things tend to be the pure technology place who are in the middle. The DMPs, the DSPs, the retargeters. They are the guys who are saying "you know these advertisers and some of these publishers have too much, whether it be political or inertia or whatever or lacking technological investment. We're going to build this infrastructure for them and sell it. We're going to build cookie stores, and do segmentation and clustering. We're going to build excellent tools for managing creatively, and managing campaign; so that the ROI is cleared and you're buying audiences." So I see most of the technological innovation, unfortunately, in the people. Well not unfortunately, they are making money. There are $1 billion potential market cap start-ups, but they are the ones in the middle and I think, yeah I will just leave at that.
Joe Kelly:
But what would prevent say Procter & Gamble from being that; or what would eBay and non-technology based company from achieving this, in your mind.
Dhruv Bansal:
Don't get me wrong, the Procter & Gamble's of the world are definitely doing it. We're in with those types of companies for sure. A lot of these companies, have a sufficiently good enough IT organization and well thinking IT organization. Some of them were way ahead of even us, in being able to deploy Hadoop. So what we prevented, it's really, it's not a technical answer it really is a political issue it really an organizational issue.
Joe Kelly:
It's finding that the high value use case you bought apparently...
Dhruv Bansal:
It has to do with, also, the product you're building. I mean the web you're building you know something that people consume. It's driven, I mean the most loved I would say the social and the Web 2.0 era is driven by data. I mean these are just, the product itself it is essentially. It's digital I mean you can iterate you can work on the file when you look at, I was trying to make the comparison you know comparing, you know when you talk about like privacy regulation or is it just kind of [inaudible 00:27:09]thing. If you look at, if you say "well we want to regulate privacy, online privacy." Well it's so easy when you talk about regulation, though happily regulated industries are large you know multi-billion dollar industries and things move slowly. So regulation sometimes it is in such in impedance but when you talk about you know Facebook or a company with that well it can make a privacy mistake and then roll it back and then you know these things are just, it's a very fluid way of doing things.
I think that data makes that so much easy and so much more natural than if you're building, you know if you're making cereal or doing whatever it is you're doing. You've to be creative and how data actually starts to inform you every part of that business, it's not so intuitive and I don't think are inherent.
Jonathan Harris:
Yeah there are few companies that specialize in owing data as an asset or there were whether they would be the traditional marketers; but now people are beginning to understand this. And companies who that is not what they do, like the Procter & Gamble's to understand that their data is an asset, that their cookie collections that you know joined with their CRM data if as far as privacy allows that they have this asset and figuring out ways to utilize that are interesting. So those challenges are not so much technological at this point, they are organizational.
Dhruv Bansal:
Well it's should be on like marketing or advertising or beyond the traditional like complete money based way.
Joe Kelly:
Personalized, personalization.
Dhruv Bansal:
Yeah I meant to do like they actually inform the product development or that everything is like a whole other...
Jonathan Harris:
Absolutely there are a lot of people interested in sentiment analysis about our recent release for a product. They want to be able to quickly get a sense of how the large public is perceiving their product, these are some of the use cases we have seen.
Joe Kelly:
So I want to touch on, you talked about the competency of this idea of a P & G not having baked into their DNA, this use to digital assets as something that they do as a business. I want to kind of bring up a build vs. buy approach to getting at that competency, and when it's right for a company to build this kind of infrastructure or practice in-house. When they should be buying when they should be working with folks like ThinkBig or Infochimps or a [Cloud Editor] or anyone can done an eco-system. What idoes the portfolio start to look like or shape up, if someone wants to build that competency for themselves?
Dhruv Bansal:
Well coming from a consultancy, we tend to work with people who are going for the buy. But, we often times find ourselves saying you know, you also need to have you need a blended solution it's always a blended solution. Sometimes you might need to reach out to an infrastructure as a service or someone who has layered capabilities on to it, because sometimes you need spike capacity, sometimes just the way of getting a brand new rack for your Hadoop infrastructures. You have to take it through the networking guys and these guys you're going to see about a two month overhead, meanwhile budgets given for allocating a space on an Info Chimps or an AWI crisis [SP] fairly easy to get.
So it's almost always a blended solution. The ROI for just getting things started, go with the buy but in sometimes the more you use it the more it makes sense to do a build. But definitely requires a lot more upfront capital cost. So it's almost always, it's almost always a combination of build and buy. Buy to start build if and I'm talking about people who buy to start, I'm talking about maybe small to mid-size companies. Big companies almost always have the impetus to want to build and they are using a buy as a situation to do a proof of concept, in order to get that advocacy within their organization.
Joe Kelly:
You can afford to hire that direct of analytics.
Dhruv Bansal:
I look forward to come up with brand new job titles for a data engineer, "what's a data engineer?" "Well it's kind of like a programmer but he is somewhere between a programmer and a sequel guy. He is probably less conservative than a sequel guy and he is not so much a cowboy as a programmer. He is somewhere in between, or she.
Joe Kelly:
Anything else to share in the build vs. buy?
Jonathan Harris:
Yeah. I mean I can totally build on what you're saying there Daniel.I think obviously again since where data technology vendors were always excited about the buyer side of things. But sometimes I think like, Hadoop cloud share it's also no different than a wrist watch. In that you can have a lot of complexity. But most people just use it to tell the time, and most people will never dig into the super advanced things that you can do with Hadoop. So that's sort of where, especially with our product, we try to make it as simple as possible. You can argue, it's not a less powerful version of Hadoop, but it's certainly a simpler version of Hadoop. I think the same folks that want to do the crazy in-house build, that want to go all the way out and need all that advance functionality. They need the very particularly tweaked configurations and stuff like that.
They are always going to build eventually and I think you're right. They may start up by buying something small so that they can turn that buying decision into some hiring and then the hiring terms into road-map. Eventually they are going always build but I think for the hugest part of the middle market we like, certainly our thesis is that we have got to make it simpler and we got to create a scenario in which there is no requirement for example to understand what Hadoop really is, in a deep way. There is no requirement to understand what NoSQL really is, there is just a requirement to be able to get business value out of using these services. If there is wrappers or if there are tools to make it simpler, that's a win for everybody.
Dhruv Bansal:
I think "Big Data Space" is unique, because they're kind of spraying, at least and NoSQL and Hadoop, those type of things, they are spraying from an open source it wasn't a product that then people tried to imitate right, it was the whole OpenSource movement. So even when you talk about buying into an ecosystem where even if you decide to build or even if you decide to buy something else those skills transfer right and maybe you're data structure and maybe everything you know most of what you're doing transfers over and so it's not just like I'm buying from vendor a and now all of a sudden if I want to switch I have to learn and buy and do this all thing from vendor b . so I'm switching my Hadoop strategy but I'm still using Hadoop right, but I'm still using Mongo I'm just switching up. And I think that's, maybe that makes a decision.
Joe Kelly:
I'm taking on more responsibility for it.
Dhruv Bansal:
Yeah and maybe that makes the decision just a little easier because it's you can build, you can buy, you're flexibility at the very least.
Joe Kelly:
It's almost blurred a little bit, because of its OpenSource nature.
Dhruv Bansal:
I think so I'm just like Hadoop I mean everyone.
Jonathan Harris:
But at the same time there is also Wild West mentality here. I mean people are given NoSQL and they say "let's do NoSQLwhat the heck does that mean?" I mean you get things on one end from like a Berkeley embedded DB to Mongos which the programmers love all the way to these like massive scale out solutions like Cassandra which can work in a data center and there is the education needed there. So some scales don't quite transfer, you know a no sequel is a space that is named by being in contrast or something you know, a rapid is not sequel so how do I define what non-sequel is. So in as much as it's getting people to think, to deconstruct the database, to deconstruct the typical data scenarios and say :Hey, there are many different ways of do it than these skills transfer." Now we have an entire different set of people out there. It's no longer curators who handle the data base that's bought. It's people who actually think I can do relational things or not relational things depending upon my needs. So yes, it's forced the industry and engineers to think very differently.
Dhruv Bansal:
Well I mean I guess my point is I mean but when you decide once you get educated, you get informed, you can say "OK I'm going to buy, I'm going to use Mongo" let's say. At that point then, whatever Mongo if you want to build a cluster, if you want to buy different, hosted module, whatever. You are always still using that it's an open source project who knows, the core functionality is always going to be the same I think that's where the flexibility comes in. You're not just switching to completely new product.
Jonathan Harris:
Yeah sure. I mean there is a blended, Couch is very close to Mongo, and this Arrow Spike had something very similar to that. Yeah, once you buy into it again it's the mentality let's just say hey there are many different ways of doing this and just and be open to data storage as something wild and amazing, well maybe not wild.
Joe Kelly:
It's crazy, schema and some data bases. Cool. Well that's kind of a lot of info kind of own my agenda of questions and things to ask unless you guys have any other things you want to kind of really speak about or touch on and I was kind of open the floor for about the next 15 minutes for folks in the audience who might have some questions. We are going to have George coming out with the Mike, looks like there is a question over here.
Questioner 1:
Thanks. You guys talked about kind of your experiences with your clients. Where is the kind of first question people are trying to solve, like where do they typically start in there. We talked about a POC and bring value very quickly. So what is one of those quick hit answers that we can kind of get from a legacy platform perspective, to show the value and something we haven't seen before in our legacy platforms?
Dhruv Bansal:
I have a couple of answers for that, Daniel alluded to this earlier, where folks really understood now that Hadoop isn't a solution to everything. It's one part of a larger stack. From that perspective there are many parts, there is batch, there is real time, there is data management, there is interfaces, there is dashboard there is whole bunch of things. Especially in Info Chimps where we think about vendoring all those things together and again trying to make it as easy possible but what we tend to see is that a lot of times folks are excited about taking one path towards that more full solution. A great example might be you know there are some system that we run internally at this company, one of our customers. It's a real time system. It's producing data all the time, but I also have this huge amount of historical data that's already been producing over the last few years.
I don't exactly know what would happen if I spend a bunch of money and built a big data system to go analyze and stuff. I know there is insight to be had there. I don't know exactly what it is, I don't know what exactly ROI would be. So I'm a little nervous about going ho-ha on the whole thing.
So why don't I take a simpler route. Like for example, why don't I take a historical dump of a bunch of a data from a three month period last year when I knew that interesting things would happen. Why don't I get Hadoop going on that, why don't I see if my guys can understand Hadoop applied to this data? Why don't I see what I can discover by like what I see? Well then maybe now it becomes time to go ahead and start falling into a much larger solution that sort of takes that same technology maybe moves it into real time. There are other examples of that too, sampling data and starting out smaller, asking one very, very specific question instead of trying to ask all the possible questions you could be interested in. I think the challenge here it goes back to what you were saying, Daniel , about education.
Too often folks will come in and they won't be well informed and they will expect you as the vendor to be able to understand their business as well as they do plus all the technology, plus build everything and answer all their questions and that's unrealistic right. So the better that a customer of ours at least, can come and say "look I got this really one specific question, I know it's going to lead to 20 or 30 more but this is the one that I want to start with. Can you give me a recommendation for how I might go about doing that?"
That's perfect because it puts us in a great position as technology vendors to build something specific that works, that does what it's supposed to do and it gives them an answer, it allows them to go to back their bosses and say "hey look at this cool thing I just discovered for a modest sum as a proof of concept, and then don't you think we should roll this out at a much larger scale?
I think where the process can go really wrong is when folks get those unrealistic expectations, thanks to I don't know inflated articles about the merit or the power of these tools. I don't know I think my short answer it's best to start scoped either with one component of the stacks, just historical or just real time. It's best to start Scoped, by having a very, very specific question. I think that's the right approach.
Jonathan Harris:
I have seen the very, very specific question often times, a what went wrong? We have a lot of these old data that we had archived. A lot of times it's what are we going to do with this and no one has the resources. No one has a particular oracle that they can use and it's taking five days to actually load it in and import it and we can run some SEQL statements but we need an answer right now. Sometimes I mean I have worked with a company that had 13 terabytes worth of DFA data and...
Joe Kelly:
What's that?
Jonathan Harris:
Double click for advertisers. They wanted to say "why is my campaign not working and how do you put that into oracle?" They spent probably two months trying to write all the processes to import it, and then just it's a resource that's cost and it's been used by other people and it just doesn't scale horizontally. So man, we put in a simple hive of solution and we are able to get more result within a day. Man, those are the types of things that can really show the value.
Dhruv Bansal:
It sounds like one answer is having a historical part; but I couldn't help it noticing from seeing where the discussion went at least that social media and web analytics end up been great early starts as well. It's simple use case that demonstrate some of that marketing ROI, that measurement there, and require this kind of technology generally to do it. It's also something that's trackable, it's in digital format so you don't have challenges of measuring things. So I think that makes some good answers. Any questions? We got two.
Questioner 2:
Thank you. I thought I would just jump in before you moved away from the DFA stuff but I actually work for a digital marketing agency, and ThinkBig to help sort it all out.
Dhruv Bansal:
You bring it at ThinkBig.
Joe Kelly:
That's what I actually suggested at one point. But, maybe if you guys could talk about how you have seen people who are experienced predictive analytics and know what they need, but aren't quite sure of how to go that next step and just jump in.
Dhruv Bansal:
One quick budget, how much of a concern that is would be I think one also helping trying buy and then go and fight about it later so.
Joe Kelly:
I love Daniel's almost the story you just told about the customer with the hives and we should right. To be able to take two months of frustration and turn into one day here is your answer, I mean can you take that to your box and like impress and it's sort of like it's up to you to come up with that question right?