Amanda: Welcome to our webcast. We're going to get started here in just a few minutes, so bear with us just a little bit longer. We're going to wait, have some more people join.
Good morning. We'll go ahead and get started. Welcome to our webcast. I'm the director of marketing here at InfoChimp. I'm excited about our webcast today. Here at InfoChimp, we're seeing a lot of applications for wanting to leverage big data for their analytics. I'm excited today to share InfoMart's story with you. Before we jump into that, let's do a little housekeeping.
Jim: It's 12:00.
Amanda: Just so you know, we are recording this webcast. We'll send a link to the video and the slides via email this week, so be sure to watch your inbox. Secondly, this webinar is here for you. We encourage you to ask questions. We'd love your participation. There's a chat functionality in your go to webinar consult panel. Go ahead and ask your questions through there; we'll be monitoring that.
I'll be answering some of the questions, if I can, throughout the web cast, and we will hold some of the questions for the Q&A at the end of the webcast. And to note, we will not have a twitter hashtag today, so no worries about that. Without further ado, let me introduce to you Infochimps CEO, Jim Kaskade.
Jim: Thank you, Amanda. Thank you everybody for participating. It's really exciting to be able to talk about real use cases and InfoMart, strong analytics focus group. Part of Post Media, one of the largest media companies in Canada. They're such an excellent use case to talk about, and I think will resonate with a lot of the audience.
The agenda today is we'll go through InfoMart's business problems that they were faced with prior to making a selection of how they were going to launch their new SaaS applications to their customers.
And really, the process by which they went through to understand what solution they needed to support them to get a perspective to enable their SaaS application deliver to their customers.
Then, I peeked nicely into the value proposition around InfoChimps, and what the group has been able to accomplish to the customer base, which again I think very well represents what a lot of companies can accomplish leveraging big data technologies.
What I'd like to do is, before I jump into introduction, we're going to do a poll, and then I'll introduce Jennifer Stein, the general manager of platform experience at InfoMart and the key person behind the transformative solution they offered to their customers. Amanda, we're going to do a poll?
Amanda: Yeah, let's go ahead and launch this first poll. You should be able to see it there on your screen, Go ahead and select one of the answers. We'll show the results in just a few minutes.
The question is, "Has the proliferation of big data changed your application development strategy?" Let's take a minute, give everybody a chance to vote, and we'll show the results. A number of you have voted. There's still room for the rest of you to finish selecting your answer.
I see the vote coming in now. One second. If you haven’t already voted, place your vote. I'm going to close the poll here. So, Jim, Jennifer, the results. "Has the proliferation of big data changed your application development strategy?" Thirty-three percent say yes, 67% say no.
Jim: Thirty-three percent say yes, 67% say no? I don't see it in front of me, but that makes sense in terms of the number of people who have been able to leverage big data. To me, that speaks to a significant part of the audience that has had experience, and the rest that are interested, and actually embarking on their first project, or in the middle of their first project.
Amanda: Yeah. Very true, Jim. I'm going to go ahead and hide the results, and we will head off to introduce Jennifer.
Jim: Great. It's my honor to introduce Jennifer Stein, who is the GM of platforming experience, and I think those are two very important terms in terms of any big data project. Clearly, we all have to be focused on customer experience. Yet we have to understand if there's a lot of technology that sits behind that customer experience. What a great person to have on. Thanks for joining us, Jennifer.
Jennifer: My pleasure. If you could just advance the slide for me. You can be my forward button today. Hi, everybody. Welcome to the webinar, and thanks for joining us. My role here today is to give a real world example of how using the Infochimps services has allowed to go to market more quickly with a transformation project for our business.
Just a few words about the problem we started out trying to solve: InfoMart is a media monitoring platform that allows our customers, who are subscribers to our B2B service, to monitor the media for news and information that's critical to their business. We've been doing this since 1986, so we've been in the market for a long time, and we went live to the web in about 1998.
We migrated fully from a command line boolean search interface to OS exclusive product in 2001. The most recent version of our legacy platform was last implemented in 2004. This doesn't mean we've stayed stagnant since then; we've continued to develop features and functionality, but to an almost ten-year-old platform now, which has led to what I call "Frankenstein Syndrome." Whenever we wanted to add something, we've have to bolt it on to the neck of the existing functionality, and platform.
In addition to legacy infrastructure issues, the media landscape has changed via proliferation of social media, the accelerating news cycle has really changed both the volume, the quality, the quantity of information, and the customers' demand for that information.
The technology that we have used for the past 25 years to store and search what's known in the industry as traditional media monitoring content, which is newspapers, newswires, magazines, trade publications, and more recently, T.V. and radio. It's just not suitable for social media monitoring.
As well as the way we work with this data, and the way that our clients work with this data has changed dramatically, in addition to the data itself. Merely being aware of news about a customer's business is no longer enough; people want metrics, they want insights, they want analysis. They don't just want to know what happens; they want to know what it means and what they should do about it. That's a huge shift in the media monitoring business. You can go ahead and advance.
Jim: I just wanted to make one comment, Jennifer. It's interesting that, given that you're dealing with so much different data, and the metrics and insights analysis have become more dynamic and more demanding, it seems like you have to be able to operate quickly. With data changing so fast, it's not like you're finished with this most recent transformation. You're going to be continuously iterating on new data as you come across them.
As your clients see need for new information, you'll need to be able to nimbly respond to with your platform, and I'm assuming that's the reason why standard SaaS applications that are tandem provided to customers for media monitoring, they'll work because the campaign changes as fast are your business paces change.
Jennifer: That's right. There are two aspects of this. It's the rate at which information is created. There was a time when the newspaper was published once a day, so by 9AM, you'd done your media monitoring for the day, and you couldn't possibly have more information until nine AM the next day. That's obviously no longer true any more.
The second of the addition: all the time of new streams of content. With Twitter, and Facebook, and YouTube, and Pinterest, and Flickr. Every day, there's new types of content. You never know when one of those, or many of those, are going to become important to businesses.
We need a platform that's flexible and scalable enough that we can start to ingest any new stream of content without having to completely reinvent our infrastructure every time there's a new stream of information that becomes valuable to our customers. So, we're talking about big data, which makes it a big problem, and a big problem is social media data.
We've got a pretty good handle on what I call, or what the industry calls, traditional media monitoring. As I said about the newspapers, and TV and radio, even. Social media has its own inherent problems, some of which we just discussed, which is the constant addition of news streams, the volume of data, the structure or lack thereof. A newspaper article, every newspaper article has a headline that has a byline, that has lead paragraph. We know what section of the paper it ran in.
Pieces of information are sometimes absent from social media data, and every channel has a different structure. A tweet doesn't have a headline; the name of the person who wrote it isn't necessarily the handle that they use, so how do you correlate who is creating that information with what they call themselves.
The speed of that data creation, the hundreds of millions of tweets that are created every day, the disparate sources of data, all this boils down to the fact that social media monitoring is a big data problem.
Jim: Yeah it's amazing. I think the, just the variety of data sources, it's not just about Twitter. Obviously, it's the first that comes to mind, but there's hundreds, sometimes thousands, even millions of different sources of information that you can draw from it. It's not traditional.
It reminds me, going back to the days of traditional infrastructure, if you created a relational database and wanted to pour that data into it, you'd have to define the data model, and all of this process would take months, if not years, to incorporate new data.
Jennifer: And we can't afford to rebuild the database every time a new piece of information is ingested. It has to be flexible enough to add new streams of data, with completely different structures, and still treat every piece of data similarly, and that's one of the major principles of our new platform, which is that every piece of data can be acted on within our platform, regardless of its origin. We want to provide the end user with a seamless experience.
They need to be able to use the tools that we built on our platform against any piece of data that comes in to the system. They shouldn't need to care whether it's a tweet, whether it's a YouTube video, whether it's a newspaper article, or a TV clip. We've built a set of tools, and we needed an infrastructure that could support use of those tools regardless of the source or structure of that data.
So, how did we get there? The process was fairly standard. Consultation with stakeholders, both internal, the folks that are experts here at InfoMart about our business and our customers, and the clients themselves. Former users and prospects.
And we had conversations, very frank conversations about how our legacy platform was used. It was very interesting conversation, because we'd chat about how they used InfoMart, and what they liked about it, and the tools and the information, and they'd tell us about the information they got from the LegacyMart platform.
Then our conversations would take this very strange, 90-degree turn. They would say, okay, and then they're finished using Infomart, and now we take all the information that we've gotten from Infomart, and we use these other tools to do what we actually need to do with it. Whether that was exporting to Excel, Word, or some other built tool.
It turned out that Infomart's platform was a great tool for finding information, but that wasn't the end product that was customer was looking for. They need to get some analysis, some insight, some distribution tools, some collection, so curation tools. They were having to take the data out of the legacy platform and use other tools to get to their end point.
Our major goal was, then, to take all that information back into the platform, and provide the end users with the tools that they needed to get to their desired endpoints. The most frequent endpoints that our users are looking for are content curation, so they want to go through a list of results that they may have created using a keyword search; they want to pick and choose the relevant or best content. They want to distribute that content, whether it's by email, by RSS, by some other means, and they need the tools to do that within the platform.
They want to analyze that content, whether it's slicing and dicing it across user-defined dimensions, through tagging, whether it's creating reports right into the platform. We filled all those tools so the data no longer has to leave the platform in order for the user to get to the end point that they're looking for.
Jim: You guys are basically the ideal example of where the entire industry's going in terms of making their applications more, not only more data driven, but more seamlessly analytic focused.
So, allowing people to actually analyze and gain insights with an incredibly simple user experience is where every application out there, whether it's mobile, web or otherwise, is heading. The novel process you've gone through, actually identifying the customer's use case and problems, and then building the solution behind that leverages big data technology.
There's so many customers out there that we work with. They think that maybe we'll just put together a big data platform, and then we'll glean the applications from him afterwards instead of doing what you've done, which was identifying the application and making sure you have the right infrastructure to support.
Jennifer: That's right, and our goal was to keep the user in the platform for as much of the business day as possible. We want this to be a work space that the user lives in in their media monitoring type role. We're talking about gaining revenue through big data.
The fact is, companies only have a certain amount of dollars available for tools. If your tool doesn't do everything that that user needs, then they're going to have to purchase additional tools in order to accomplish that. That means less of the revenue pie for you.
If your tool can accomplish, end to end, from discovery to curation, to dissemination and reporting in our case, end to end solution, then the available budget for that type of toolset, more of that pie can come to you, and that's how you can increase revenue.
Jim: Yeah, and no one else has been able to do this, because they can't get access to all the technologies that allow you to integrate all of those experiences. In the past, that's required different people, different competencies. That's what we love to see these big data technologies enable is to be able to do those end-to-end, unified solutions.
Jim: It's great.
Jennifer: Of course, we had identified that we had a big data problem, so the big question is, do we build it, or do we buy it? Our infrastructure, our management strategy and insights have always been in-house until today. We have our own hardware that manages the traditional database. We have our own developer, but this was new territory for us.
We just simply didn't have the expertise we need to collect, ingest, process, store, and search big data. We had to decide, were we going to acquire that in house? Or were we better off outsourcing that role? I think our presence here in this webinar speaks to the answer to that question.
Jim: Yeah, but I do remember speaking early on with your staff, and there was already a plan in place, pretty much a scope of what it would take to bring your application to market. That was a great exercise for you to have gone through, because we can simply do a total customership or a total business value assessment, and we can say, "Here's what it is, do it yourself versus working with a partner like us." It was an easy discussion because you knew what was in front of you.
Jennifer: Absolutely. It was, when it came down to it, a fairly easy decision. The fact is, we're not in the hardware infrastructure and storage business. We're in the media monitoring business. So, it essentially didn't make sense to acquire the physical infrastructure of the software, and the expertise required to do this project, when a much more scalable and economic solution was available through InfoChimps.
This slide here says it all. It allowed us to get up and running quickly. Our strengths were the development of our platform. That's the business that we know, and that's the business that we were able to focus on, because InfoChimps was taking care of the infrastructure and the big data problem, and gave us that flexibility and scalability to build our platform on top of a fast, in class, back end solution.
Jim: We see so many customers that have, clearly, the domain expertise for their business, and they have very savvy developers, but it doesn't mean that they all understand Java map reviews or NoSql data store, or in-suite processing technologies. It's not that they're not capable of coming up to speed on that, it's a matter of time to market, a matter of risk, and total cost of ownership.
I know a lot of our customers are trying to recruit talent. It's not easy to pull the kids out of Google or Yahoo to come work for you. So, the inventors of this technology are few and far between, and until that has been corrected, you're really going to have to leverage full solutions that can speed your time of market. We're happy to be working with you guys. Definitely.
Jim: Where has this gotten us? We're now Canada's leading consultancy, and not only are we providing integrated media monitoring services through our platform, but we're also now able to provide research insights for AN Solutions and corporate data. Again, drawing from our core expertise, not just a robust platform, but the people here who have the background in research and media and journalism can now use our tools and our data, and pull those insights out. It's opened up a whole new line of business for us.
On the platform side, we're scalable, we're expandable and we have a modern platform with rich, searchable, real time data. This is used both by our platform customers who have the self service ability to use that platform for their own media monitoring, and our own insights consultants.
And this is our newer line of business, again, because we were able to focus our energy and our resources on what we do best, and we were not taking resources away from our core business to build our back end platform. We now have a consulting business, where our in house experts can help our clients better understand the media data that's important to their business, and provide insight about what those businesses should do about that data.
Jim: If one of your customers comes to you, Jennifer, and says, "I'd like to add this new data source," whether it's the social media, or whether it's more traditional media, it's a source you haven't deployed. Is that something you feel that you can respond to very quickly? Is that the idea, being able to..
Jennifer: Not quite, but we'd certainly have, with the infrastructure, if there was enough market demand to open up a new social channel, for example. Right now we’ve launched six social channels, which are Twitter, Facebook, YouTube, blogs, online news, and forums. But if there was high demand from our customer base to start ingesting another channel, then we would do so, and it would benefit our entire customer base.
Jennifer: So we're not ingesting strains of content, but on the consulting side, it allows the customers to use our in-house expertise in data analysis if they don't have that resource and expertise in their own organization.
They may just be doing straight media monitoring for mentions of their brand, then they can bring that data to us, or let us get that data for them and take it and turn it into insights about what it means to their business, and even actionable recommendations on how they should shape their media strategy as a result of the listening we're able to do using the platform.
Jim: Yeah, that makes sense. I didn't mean to think, to infer, that everybody asks for something and you would just respond to it. Obviously, if there's a large demand for it. I know that, from our experience with you, from the time that you opened up our APIs to your development team to the time we had the application up and running in about thirty days, I think, given the complexity of the entire solution, you could operate that fast. Clearly, you want to be able to do things that you know the mass-
Jennifer: Absolutely. There's been multiple examples along the way, even after implementation where, for example with an existing data stream, we needed another data point, and we were able to come to you guys and say, "Can we add these decorators to the information so that we can start providing this additional analysis point? That was very easy to do, again without having to rebuild an entire infrastructure to do so.
Jim: Yeah, that's great.
Jennifer: Some of the data points that we're looking at: contact impact, analyzing buzz, and tone, so we do sentiment analysis on the data, authority ranking of the sources of information, and other essential metrics. Some of the ways that customers are using our platform and our insight consultants are providing values to their business line, things like how real time chatter affects brands.
The prevailing wisdom right now is that companies no longer own their brand; customers own their brands, so companies have to have their ear to the ground to find out what is being said about them and how that affects their image within the industry. Impact of brand's media coverage, so if a brand is doing some sort of media campaign, whether it's in traditional or social media, what's the impact, what's the ROI area, help customers to measure that.
Message related strengths and weaknesses. If you've got a message that you're trying to get out there, how is that being received? What part of that message is offering positive returns, and what parts need to be tweaked to get the return that you're looking for. Campaign performance, again back to ROI, and risks and opportunities. Watching competitors, looking for holes in the market. What you might be able to fill. This can all be done through listening to media, traditional and social.
Jim: This is amazing, compared to some of the off the shelf solutions, there's a lot of social media fast platforms out there, like Radiant 6, et cetera. They're just not rich to this extent. Being able to combine your own internal, traditional media versus fluff social media is such an incredibly powerful combination.
Jennifer: Right. The integration of content is one of the platform killers. When we say integration, it's a twofold pillar. The first is integration of content, that two sided coin of traditional and social media monitoring, and even within that. On traditional, we're aggregating together different types of news content, from newspapers. Day of publication. You can get content from most newspapers before you can buy a physical copy from your corner news stand. Real time, 24/7 broadcast, monitoring 100 Canadian TV and 100 Canadian radio stations, being able to watch those video clips and listen to the audio clips right within the platform, and on the social side, the six social streams that we talked about.
The other side of the integration pillar is the tool set. Having the tools that you need to work with the information, and not just listen to it, right within the platform. The platform itself be expandable and scalable, so as our customer’s need for information, and how that information changes, we have a solid foundation with which to add and expand functionality.
Jim: It's such a perfect big data use case, because so many people have their own definition of big data, either the high volume or the high velocity, or the variety of data sources and data types. You brought all of that together in a way that actually creates impact and return on investment for customers.
I can't see any company on this planet that doesn't need this type of solution in terms of managing their brand, and not just bringing all the data sources that you've obviously aggregated together, but applying the analytics to it is just a fabulous use case.
Jennifer: We're pretty excited about it. Here's a screenshot of our corporate website, and the tagline says it all. Pull the news from the noise. There's so much data out there right now, and our true goal when you distill it right down, is to help the customer understand what's meaningful and what's just noise.
Jim: Jennifer, how many disparate applications do you think your customers, in the worst case scenario, have to manage in order to get the same experience that you're providing in one?
Jennifer: I would imagine that there are customers out there who have three applications right now. Probably one to monitor traditional media, sometimes two: one for print, one for broadcast, and then a third solution for social media monitoring. Then, possibly, depending on what solutions they've chosen, they made need a fourth application for analysis.
Jim: I would imagine they pulled that data out into an analytic package, whether it's DI tool or a statistical analytics package, PSS or SaaS. All of that has to come into play.
Jennifer: That brings us to where we are today, which is a full service media consultancy, so we've gone from a platform based business that was ten years old to an end to end media monitoring solution where our customers can choose from a range of services, which include the platform which we're really discussing today, as well as layering on our consultancy services for deeper analysis. That's it.
Jim: My big question would be, when are you coming to the US? Your Canadian enterprise customers are just going to be kids in a candy store. What about the rest of the world?
Jennifer: It's a fair question. It's a big market, isn't it?
Jim: This has been great, Jennifer. Thank you for spending the time to overview that. Before I take some of the things that Jennifer has spoken to, and bring it around in more of what we're seeing across our customer base, Amanda take control here. I think we've got a poll coming up. Amanda.
Amanda: Hi again. We do have a poll I'm going to go ahead and launch. Regarding your data sources, have you encountered the technology and expertise, or has it been internally built? The question is, again, regarding your data source, have you encountered the technology or build the expertise in house?
Go ahead and place your votes. We'll give it a few minutes. I see a number of votes coming in at this time. There's still a few more seconds for the remainder that have not yet voted. There's a radio dial on your screen. Go ahead and choose your answer.
While we're waiting on everyone to vote, I just wanted to let you know we've had some great questions come in through this webcast, so I hope you can stay on for the Q&A. We're going to have a lot of fun addressing a number of things that have come up.
Jim: That will be exciting.
Amanda: Yeah. It's going to be a great Q&A session. We'll jump into that, Jim, right after your presentation. It looks like most everyone has voted, so we will go ahead and close the poll and share these results.
This is really interesting. No one voted to outsource the expertise. Thirty-three percent of the audience voted to build up the expertise internally. The majority, 67% of the audience, voted both, both outsource and build up expertise in source. Nobody voted “don't know”.
Jim: That makes sense because that majority knows that it's not either/or; it's a both. You just cannot completely outsource your data infrastructure needs. If you look at the Fortune 1000 or Global 2000, and you were to talk to all the CIOs of those companies who are supporting folks like Jennifer, most of the strategic thinkers of CIOs are thinking of getting to a steady state, where 30% of their mission critical infrastructure is managed by them, and 70% is outsourced or managed by others.
And that's the poll that I've taken over a couple hundred games with CIOs that to me, gives me the general view that it's always going to be a hybrid, and it's never going to be one or the other. You have web scale companies that have grown up. Amazon, for example, it will be 100% outsourced. That's not the case for any larger enterprise, which is where we're focused as a company. That's a good outcome. That makes sense.
Again, my name's Jim Kaskade; I'm the CEO of InfoChimps. It's such a great use case with Jennifer from our post media. To give you a little bit of background for those in the audience that are not familiar with InfoChimps, we offer up a number of cloud services. We are a cloud service provider, in its most basic definition. We provide resources that are elastically available through a restful interface that you pay as you consume.
The difference between us as a cloud service provider and others is we're very focused to graphic data. We're a purpose-built, big data cloud service, and we have free data analytic services that are each separate cloud service offering.
The first is called Cloud Streams, which defines the type of data that this cloud service processes; it's streaming data. It's data that's in motion, and it's data that's not only being created in real time, but it's typically data that you want to act on in real time. So it requires a cloud service that can operate at very high speeds.
And what does real time mean to us? It means less than a second. It means that we're operating in a sub-second time frame to source and analyze and act on the data. That's not a lot of time, and requires a service that literally can operate at scalable speeds. Also in memory to provide those response times.
Cloud streams is a key leading product for us. It is, basically, something that you get with any cloud offering from us because it's the only way we attach to our customers' data, and its the only way we populate other data stores, other cloud services within our offering. It's a required component.
The second of our three analytic services is called Cloud Queries. That's a term that represents being able to ask questions of your data in an ad hoc capability. This requires convention, having folks like Jennifer, or clients that are dynamically engaged with the data that they've stored. In many of our customer accounts, business analysts internally coming up with new insights, bringing new ways to evolve the product, and then launch that en masse to their customers.
In some cases, it's direct access by the customer in being able to query the data in an ad hoc way. These two services together, address a pretty significant portion of our customers' problems in terms of providing a solution. With a cloud streams product, the data streams processing capability and providing analytics on the fly is really actionable and the cloud queries provides a structured data storage against people have predictable outcomes.
The last of our three cloud services is called Cloud Hadoop, and I think everybody refers to big data as a Hadoop solution. Hadoop is important, and it is what's created this new generation of big data. big data 2.0. It's really only 20 percent of our customer solutions. In some cases, our customers don't need Hadoop at all, because Hadoop ultimately is a decision support solution.
It's a solution where you’re asking complex questions against a lot of historic data, and batch. You ask a question and you come back after a cup of coffee to get the answer. In the case of a real time dashboard or a social media monitoring application, it may be part of that application that's summarizing trends over the last month or the last quarter, or the last year or over a year. You're constantly getting refreshed statistics that are a function of launching queries against volumes and volumes of data.
Whereas the cloud queries might be months, weeks, days, or minutes, more near term questions, of course Cloud Streams is all about what's happening right now. These three cloud services are separately deployable, but also all three are well integrated.
For us, as a cloud service provider, the key thing is for us to provide the services in a way they meet your SOAs, so your company may have, in the case of InfoMart, an SaaS application and it just can't go down. It has to be up, it has to be reliable, and it has to work. For us to do that, we have a lot of investment in what we call a command and control. This is what our virtual knock or operations personnel use to make sure that Jennifer and team never see a disruption in their service.
This is a very important part of our offering because it is a management service, it is something we take seriously because, as you know, any sort of cloud service outage affects not just one customer, but many customers, and really does make it hard to create a brand of reliability.
We are focused purely on the Fortune 1000 as a company at InfoChimps. We're not looking to work with long tails, so SOAs, high availability and reliability are extremely important to us. The last part of our solution is clearly enabling the developing community. There are a lot of ways to make sure that developers can have rapid development platforms.
To us, that translates as making sure they have the right tools, or the right interfaces. That doesn't always mean one single interface, or one snazzy, gooey drag 'n' drop. It's not about building applications with a visual interface; it's about enabling people who love their PHP, Python, Ruby, their various new languages as well as Java and others that are old languages, and most of the domains they operate in.
So, in many cases, that means that we have to provide both an advanced way, a simple an attractive way of getting to our platform, as well as native interfaces to a lot of the technologies that power us. We do that. This is a key part of our focus. We're very open standards based.
As a matter of fact, nothing about our cloud service has anything to do with proprietary. As we advance our solution, we open source it. We're very much involved in contributing back to the open source community and, therefore, very involved with the developers within those communities. We feel we know what it means to be a developer, and how to make developer-friendly solutions.
This is InfoChimps in its simplest form: three analytic services and enabling tools and support which allows us to manage, as well as our customers to consume. Let's talk a bit about why we have become a major thought leader in big data and cloud computing as an intersection.
Our history started back in the University of Texas with our founders, our rocket scientists, who originally did research around distributed systems and analytics. They were basically consumers of big data technology long before people knew about it. People were using Hadoop before it was called Hadoop, when it was called Nutch.
Most people don't even know the history of Hadoop, and how it was born out of Google and Yahoo, and at the time, when people were even starting to hear about it, we already had services leveraging it. It potentially began its history as a data marketplace.
Since then, it has evolved to take the learning of amassing many data markets, data sets into a platform that allows anybody to manage their own data marketplaces of data sets and applies sophisticated analytics to it. I think the neat thing to think about it, and the reason why we are in a great position to take advantage of these big data technologies is because of the fact that we amassed 15,000 different data sets, data sources in the beginning of our company history.
We've evolved that into an enterprise class solution with these three data analytic services. When you look at it again in summary, what it is we're doing as a company, the data driven applications like the InfoMart application, is being powered by real time analytics, ad hoc analytics, batch analytics in a way that's easy to consume. That expands on top of an elastic infrastructure in multiple different domains.
That's another thing that's very different between us and our peers. We're not just a layer of data analytics on top of Amazon. As a matter of fact, our Fortune 1000 companies require us to service them in a network of tier four data centers, where their data is more secure, and where our cloud service is co-located. In some cases, behind the firewall of our own customers' data centers themselves.
So, when we look at the pull in our InfoChimps solutions, we work with our customers like Infomart to do a detailed understanding around their use case, a design of a standard reference platform configuration that's common across all of our customers. We can do that within weeks instead of months, and then we develop basically a development and staging platform for our customers to begin developing their applications.
We'll create an end-to-end process of sourcing their data all the way to the insights of that data in their application. We can do that in less that four weeks from the time the customer gives us the clear specifications. After that, we scale up and out. Get ready to roll you in to production, and that means attaching to more data sources, ingesting historic data, getting you to a point where you're ready to go live.
Once you've gone live, you just iterate. Our iteration process is very tight. It requires a very thoughtful continuous integration process. It allows you to make changes, knowing that when you make those changes. It won't break your application.
Just to close, I want to give two comments and then I know I'm going over here with Amanda. The time we have. Most of our customers see the value of time to market. This is one example of a customer where they thought they were going to take north of 15 months to deploy, to get into production. It only took them four months with us. Clearly, time to market, time to launch your application is important, because that means time to revenue.
When we look at this same example customer, the revenue that they were able to capture over a three year period was significantly higher compared to what it would have cost them to do themselves. It delays the time to market.
When you look at the total cost of ownership over three years, as well as the time to value, your time to revenue, you combine those for the total business value, this is why you want to leverage big data in a cloud, and in a trusted cloud for any sort of secure data asset that you might want to aggregate and analyze. That's it. That's all I have. Thank you very much, now I can turn it over to you, Amanda, for questions.
Amanda: Thanks, Jim. We actually have one more poll, I believe. If we want to take it really fast. The question is, "Are you building an application that involves social media data?" “Yes”, “no” or “don't know” are your three answers. If you want to go ahead and vote on that, I will queue up all of the questions that we have for Q&A. We have at least ten, maybe more questions from the audience that have come in.
Go ahead and give everybody another second or two to vote. There's a radio button on your screen. A lot of votes coming in right now. Are you building an application that involves social media data? “Yes”, “no” or “I'm not sure” are your answers. Okay, here we go.
I'm going to close the poll. Thank you all for voting. And the results are 57% of the audience have said yes, they are building an application that involves social media data. Twenty-nine percent of the audience: no, not building an application that involves social media data. Fourteen percent of the audience just aren't sure. Okay.
Jim: Yes, that makes sense. At the end of the day, probably every customer out there will incorporate social media in some way, shape or form, but I think it is important to know that a lot of businesses are struggling just to get their own data sets aggregated in a single view of their customer based on the data they generate themselves.
The dirty little secret out there is only about 15% of the company's potential, internal, traditional assets are being aggregated and analyzed. Huge opportunity in that percentage of just working on their own data. Then adding in social would make it even more complex. It makes it a great big data market out there.
Amanda: Yeah. Without further ado, I'll just jump into some questions. Russ on the line asked, I believe this question would be for you, Jennifer. "Can you pull content from LinkedIn and Facebook?"
Jennifer: In our current platform, Facebook, yes. We are aggregating content through a social media content aggregator. Facebook is one of the channels that are available. LinkedIn has not yet made their content available to social media monitoring services and the companies that aggregate that data. Obviously, there's a demand for it. Especially given that we're a B2B service. LinkedIn has not yet made that data available.
Amanda: Thanks, Jennifer. I think this question might be for you as well. David on the line asked, "How do you deal with data governments and MDM, so many people can find, define, appropriate use of knowledge of quality.
Jennifer: I'm not familiar with the term MDM.
Jim: Metadata management. Data. So when you pull all these data sources, how do you keep track of all the-
Jennifer: Sure. Yeah. It's a good question, the answer comes a little bit from the fact that we have been doing media monitoring from the traditional side for as long as we have, and we place a lot of value on really well and rigorously structured data. So, coming from traditional media monitoring, as I said earlier, if you think about a newspaper article, it always has a headline. It almost always has a byline. It's been in a printed paper, it has a page number.
So, we came from a place where we had a very well structured database that was sealed with a specific seal. So specific pieces of information went into a field called headline, and certain pieces of information went into a field called byline, and this allowed us to build the application on top of that data so that it is as clean as possible.
We tried to extend that principle as much as possible into social, and it's a double challenge. One, first of all, because social data; by its very nature is not always clean. It's either not clean or not available.
A good example is location data for twitter information. The holy grail is to be able to say, " I only want to hear tweets about my brand that come from near me,” or from, in our case, Canada.
When I sign up for Twitter, whether or not I specify where I am in the world, it's actually optional. Twitter doesn't make that a required field, so sometimes that data is absent, and it's also not verified. So I could also say I'm a 70 year old man from Botswana, and nobody would tell me not to. We do the best we can with what we've got.
To wrap up, to say that when we created the requirements for InfoChimps to start storing our data, we said, look. Here are the pieces of information that we're going to want to slice and dice against, and we need you to set up the data collection infrastructure in such a way that the same type of information is stored cleanly wherever possible. We tried to map the various different data types against each other, so that like was stored similarly to like. That's how we can get metadata.
Jim: I'd like to say that we invented all of this, but to be honest, what powers Cloud Streams, which is the cloud service that patches to additional, as well as new media sources, is powered by the same technology that powers Twitter and LinkedIn.
We're just talking about LinkedIn as a source. We're talking about Twitter as a source. These are two companies that have created technologies that leverage cloud computing in a way that allows them to operate against this unstructured data, and allow us to all create structure out of it.
We're using Twitter's Storm technology, we're using Kafka from LinkedIn, and that combination, amongst other various technologies powering cloud streams equates to a framework that allows us to take any data source and manipulate it on the fly in a linear way.
The traditional domain, you might do ETL. Extract, Transfer, and Load. That ETL process is what you use to clean your data, to get it into a form that can be poured into a relational database.
Our Cloudstreams Technology is not much different from that, but it's much easier to use, and it's all in memory and it's literally scalable, and it's based on big data technology that allows our folks to work with InfoMart to really quickly create the metadata files, to create and clean, stream our structure, out of unstructured data in such a quick period of time.
It's why big data technologies are going to transform every industry. It's not about creating a star schema and loading it with Infomatic or Avonitio ETL school. It's not that at all. It's a very nimble process. It's a very easy, agile process.
Amanda: Awesome, thank you both. Yeah, that was a great question. I'm going to keep rolling with the questions. We have a lot to answer here. So, Prahbakar is asking, "There's a lot of media monitoring platforms out there. What makes InfoMart the leading platform?”
Jennifer: That's a fair question. To me, it's the three pillars of the platform, a couple of which we covered in the session. The first is that of integration and the two sides of integration. The integration of both social and traditional content into a single platform, which is fairly unique.
Our traditional content, I should mention, is actually licensed directly from the information providers, which means we have relationships with the specific sources, those newspapers, magazines, TV stations. Which means they feed us that content directly, within a structured form, so there's a few advantages there. One, of course, is that we get the data in a timely fashion. We get the data in a clean fashion, and we get the data with all copyright attached.
In today's media landscape, with the advent of pay walls, relying on a solution that gets it content from crawling the web is just not going to be tenable for much longer. Most media content is getting locked up behind pay walls, which is not an obstacle for us, simply because we're getting the data with permission directly from the content provider.
The integration of a solution like that with a social media monitoring into a single product is an advantage, as well as the tool set that's integrated in there. As I was saying earlier, the idea of keeping the user inside that platform all day, our other pillar is collaboration, so the ability to work with your colleagues within the platform to try and cut down, there's tools within the platform to try and cut down from email overload. Tasks and workflows, and things like that are integrated directly into the platform.
This third pillar is something we call portfolios, which keeps work contextual. And again, the fact is we've designed the platform from the ground up to mirror the way that companies today are working with media information. We know that they might be doing media monitoring for a number of departments or crises, products, or issues, et cetera. We've built the platform in such a way that the content sets and the tools to work with in the content sets are groups together.
You can have one account and work on multiple things at the same time. All these things together make our platform, although it has similarities to some others in the marketplace, somewhat unique, and we'd like to think, best in class within the market.
Jim: I would just encourage the audience to get a demo. Potential clients of InfoMart should just reach out. Nothing's better than seeing how powerful it is in action.
Jennifer: Sure. And I think, Amanda, at the end, you're going to put up contact information. There's an email address there, and if anybody is interested in a demo, just send a note to that address, and we'd be happy to set it up with a demo.
Amanda: That sounds great. Lawrence has a couple questions. The first question piggy backs on the last question I just asked from Prahbakar. Lawrence asked, "Does InfoMart monitor US brands and companies as well?"
Jennifer: Yes, we do. Social knows no boundaries, of course. If you're monitoring for a keyword on Twitter, you could restrict it by geography, but you don't have to. We're not bringing in only Canadian tweets, for example. We have access to the full fire hose there. That's one way of doing it.
On our licensed content side, on our traditional media monitoring side, we have over 1,800 media sources. Only about half of those are Canadian. We have a few hundred major US newspapers as well. We do have, not all of them, but a significant number of US sources. Depending on the market and the industry that you're in, that could be sufficient for an American company's needs. It's definitely worth taking a look.
Jim: Wow. Traditional media sources,
Jennifer: That's correct.
Jim: Scope of social, to millions of individuals talking about your brand. An incredible combination of concepts.
Amanda: Wonderful. Lawrence has a couple other questions, more technical. This one is, “How do you easily adapt to the changing data structures?”
Jennifer: I'll leave that one with you, Jim.
Jim: Okay. Fair enough. One of the flexible aspects of our cloud services is the gleaning data structure from the data, as opposed to developing the data structure first, then trying to fit the data to it, although you can do both. It really comes down to the fact that Hadoop, as a technology, has been built to allow people to bring in all the data they need to analyze together without understanding the structure first.
Step one is aggregating. From there, your business users and your data scientists within any customer account will look at the various data elements and say, “I want this. I want that.” You'll glean the structure from it. You'll create user data models from it. It can be represented very easily within our three different cloud services.
It's not that we have one data store, like a Sql or a NoSql data store, and that's where all your structure lies. Structure lies are created across all three of our cloud services, so we can actually apply aggregations in stream, we can apply sophisticated authority to influence types of algorithms on the streams that are coming in motion on real time.
Each of our cloud services, you can apply a data model to, and we manage that data model using, for ourselves, and we encourage our customers to do this using a domain specific language, a high level attraction that define the data flows and the data models across all three of those domains at once.
Think of it as a sentence: the nouns are your data sources, the verbs are the processing steps that you want to apply to those nouns, and that's where some of the structure starts to glean, and then the sentence itself, how those are strung together is finally your data model. Your data flow applies to all three of our cloud services. We use very simple, declarative languages to help our customers quickly be able to define their data flows and data architecture.
We're seeing folks without any domain expertise, literally no understanding of Hadoop, or MapReduce or NoSql or stream processing, and within a day, they're ready to go. That's how you get from day zero to day three, having an application up and running. It's phenomenally fast compared to the traditional infrastructure that's therefore so disruptive. Hopefully that answers the question.
Amanda: Thank you, Jim. That actually lead to another question that Paul asked. Paul's question is, "I heard you mention it only took three months to build the data infrastructure for the InfoMart application. This question is two part. Is that an average amount of time, and the second part was the front end development, then, concurrently.”
Jim: All of our customers work concurrently. It's the best use of time, and the agile method. The earlier we can work with a customer, the better. If they're in the process of defining requirements and beginning wire frames, that is a great time for us, because we can actually used case business discovery, and information discovery with our customers.
With every use of our cloud services, we offer up a set of expert services to give our customers the process they need to build their first application. Once they've done so, they can iterate on their own. Our typical market for customers, we like to see people get from start to value, generating revenue within 90 days.
We can support people in shorter periods of time, but obviously the internal politics, the internal development cycles, can extend that. We like to think we can get you up and running in 30 days for something that is so complex like Jennifer has described. I would say our average is 90.
Amanda: Okay, great. This is the last question from Lawrence, and then Carminga has at least four questions. Here we go with Lawrence. What is the application spec and database technologies?
Jim: Application spec. What powers InfoMart's application? What sits underneath the hood? When you think of our analytic cloud services, you could think of us as a platform, as a service, and an infrastructure as a service combined. We, at the very bottom of the stack, were using elastic infrastructure.
That comes down to three options, predominantly. Amazon Web Services, for those that are willing to use the public cloud, we can support Rackspace and all these others, but the Amazon web services is the premier provider there, and when you move into a virtual private or private cloud setting, the infrastructure turns into one of two choices. It's either VM Ware, or it's open stack.
Those three solutions, those three infrastructures of service operates, gives us the elastic resources we need to compute storage networking. Above that stack is a product that we call IronFan, which is an orchestration layer that we use to deploy across those various different infrastructure across those various different infrastructure cloud solutions. It also is used to orchestrate our various data stores. IronFan is an open source project.
You can find on GitHub. It's got about five years of scar tissue, and it's used by VM Ware themselves to virtualize Hadoop. It's being used by very large service providers to provide competition to EMR and Amazon. It's got a lot of background and capability.
Above IronFan are three cloud services powered by Hadoop as a data store. The prop queries are powered predominantly by H Base and Elastic search, although there are other NoSqls. You can think of this as a common set of the best in class NoSqls that we have put together and abstracted.
The last, but not least, store associated with our stream processing engine, if you can call it a store because data never really hits this, it's processed in memory until it's either archived or put directly into our NoSql or Hadoop stores.
That's being powered by Twitter Storm, storm is a stream processing framework topology, and also a queuing solution invented by LinkedIn, called Kafka, which is one of the major components of our stack. Above that system are development tools consisting of many different types of languages like Trident and Pig and our own called Wukong, which is universal across all our data stores, and a number of others. Hopefully that helps.
Amanda: Thank you Jim. Carminga, thank you for being patient with us. I'm going to jump into your questions now. We'll start with the first one that you asked, which was, I believe this question would be for you, Jim. How big does the data set have to be to be cost effective to develop in this way with InfoChimps? That's a really good question.
Jim: Great question. I think, based on the nature of cloud computing, and graphic infrastructure, we could potentially draft any sort of data size for customers. I think big data is defined as data that's complex enough to where your own traditional data metric systems don't work. When you know you can't do what you need to do with what you have, Sql serve or DB2, Oracle, or paradata, or you've tried NoSqls, but it just is so difficult. You are a big data candidate.
That all being said, I would say, "What is a big data problem from a batch analytics perspective?" We don't consider things challenging until they're in the petabyte scale. That doesn't mean that our customers don't find it challenging in the gigabytes and terabytes. That's atypical; you would think terabyte is a batch processing volume size that gets interesting.
When we talked about the ad hoc interactive, we're typically talking about the number of rows, the structured tables the traditional domains. How big is a table get? It had better be more than a billion entries for it to get interesting. That becomes an ad hoc big data problem.
In the billions of rows that I want to query against, the structure that I want to do delineate some understanding around. The ad hoc interaction, the 360-degree view of my business typically will equate to billions of rows. Again, that's where you could have much fewer number of rows and still be challenged, on the stream processing side, not too many people have experience with it.
I would say a big data problem when you're talking about at least 1000 events per second. A lot of people are looking at a lot less than that. We have single clients, single clients that have more events than Twitter, as a business does, around the number of tweets. If you can get in to the tens, the hundreds of thousands of events per second, you are a really interesting big data customer prospect for us.
Amanda: Thank you, Jim. Carminga's next question is, "What industry category would you say is the most frequent user of Infochimps’ services?"
Jim: Would you repeat that question again, Amanda?
Amanda: Oh, yeah, I'm sorry. The question was, "What industry category, would you say, is the most frequent user of Infochimps’ services?"
Jim: Great question. The biggest opportunities in the places that we like to focus are the following: financial services, big banks and insurances companies out there that you would think would have all of this figured out themselves. There's definitely a high demand, even with the top leaders as well as the mid market players, regional banks, etc. Great segment, because they have so much of the variety, velocity, and volume.
Manufacturing is such a great opportunity, because they typically have many different facilities, separate data management platforms per facility, and data is just not brought together. They're forced to have what's called, data search sheets because they can't get to what they need, and spend most of the time trying to find the data. Manufacturing's huge.
We love government, because our whole fed and local governments can leverage, or want to have to manage leverage cloud as a delivery model, and big data and the cloud as our focus. I would say health. Health is an area that we're just beginning to get into, and we're seeing incredible opportunity, we'll be announcing a few big opportunities, a few customers on the platform of health space that I think will leverage a lot of client data, clinical data to bring big analytics to improving our health system.
So, those are some big categories. I'll say this, generally speaking: if you're on this webinar and you're in any industry, you think about your top five use cases, what would drive more revenue for you. We like to focus on revenue as opposed to cost reducers or differentiators or optimization types of problems.
We like to help businesses grow the top line, no matter what it is. If you take the first five top pressing, CEO-level, visible use cases, I'll guarantee you that you will be able to take advantage of big data technology to advance where you are today. I guarantee you. If I can't, I'll give you your money back.
Amanda: Okay. Thank you, Jim. We're ten minutes over. There's a question here from Russ that I'd like to address. Russ, if you're on the line, I'm not really sure if you're asking this question of InfoChimps or of Infomart, but I'm going to ask Jim and Jennifer both your response to this question. The question is, “Can you crawl forums?” Jennifer, we'll start with you.
Jennifer: Yes we do have information from forums. Half of our social media comes from social data aggregator called [Gnif] and another comes from another party. We do get forums from that other party.
It's aggregated by various providers. Some seas of information come directly from the information provider, and some come from third party aggregators, and forums is one of those that is available through on of our third party aggregators.
Jim: I would have to say, there's probably not one data source we have not consumed. Just because of the history, and kind of founding of the company, literally any and all types of data sources, and that even goes across purchased data sources, whether it be in medical or other industries.
There's really not any data source we can't consume and help you glean insights against. Yeah, forums. Whether you're working with Gnif, data sift, or working with moreover intensity, there's so many different sources like data aggregators, let alone individual and high value data sources.
Amanda: Great. Thank you all for attending. Thank you all for joining us, Jennifer, Jim. Thank you for your presentations. At this time, we have no further questions, and we're almost 15 minutes past. We'll go ahead and end it.
Just a reminder, this webinar was recorded, and we will send out and email early next week that has a link to the video, and we'll put the slide deck up in there as well. Jennifer, Jim, thank you so much.
Jennifer: Thank you so much.
Jim: Thank you everybody.
Amanda: Take care.