MapR Technologies is pursuing a unique strategy in the world of Big Data technology. Instead of focusing on offering a pure open source distribution like Hortonworks, or on adding additional proprietary capabilities to the core project like Cloudera, MapR has chosen to reach under the hood and make fundamental innovations to the underlying data platform that make Hadoop more friendly and resilient to use in conjunction with existing business technology.
In a recent research paper, “Putting Hadoop to Work the Right Way”, I worked with MapR to examine the barriers in closing the gap between the full enterprise functionality and the current state of the Hadoop ecosystem. The paper identified three gaps that needed to be overcome. The missing capabilities for installing, configuring, and operating Hadoop (productization gap); the vision for a complete solution (the product management gap); and the need to find engineering talent to implement the vision (the engineering skills gap).
MapR is betting that its approach of combining architectural advances with open source innovations provides a lasting advantage over others. While the vast majority of MapR’s Hadoop distribution consists of the Apache Hadoop code, in a few key areas, MapR felt that the gap was too wide and needed to be closed to make Hadoop suitable for enterprise use. A key part of MapR’s strategy is to continue to build on the advantages that its underlying platform provides especially with respect to the Hadoop Distributed File System (HDFS) and HBase projects. If MapR succeeds in keeping its lead and grabbing a big chunk in the broad wave of enterprise adoption of big data technology, its success will challenge several aspects of open source and Hadoop orthodoxy.
As part of my series of interviews with the technical visionaries driving the Hadoop market forward, I have had a series of conversations with M.C. Srivas, the CTO and co-founder of MapR.
M.C. Srivas, CTO and Co-founder, MapR Technologies
Based on these conversations, in this article I will look at the prospects for MapR and what a victory would say about the strengths and weaknesses of the open source model and the expectations for how Hadoop will be adopted by the business world. This analysis should be of great interest to the many companies who are now figuring out what big data will mean to their businesses and choosing the technology to process it.
Here’s the misunderstood challenge. The oft-asked question is: “Can MapR stay ahead of the Apache Hadoop open source ecosystem?” Srivas responded to this question by stating: “MapR is not a separate proprietary development paralleling Hadoop. MapR is an active participant in the Hadoop community and provides a Hadoop distribution that combines the Apache Hadoop code on top of a unique platform. This unique platform supports not only the standard Hadoop HDFS and HBase APIs but also expands the platform to support POSIX NFS which enables easier integration and access.”
MapR’s re-architecture is unique in the following areas:
•A file-system MapR-FS that can scale limitlessly.
•A database engine MapR-DB that can capture and return data upwards of a 100 million data points per second.
•An ANSI-SQL query engine Apache Drill that can handle data while it is changing, with support for nested and schema-less storage.
•An ultra-strong security system that encrypts everything on the disk and the wire.
Srivas focused on extending these capabilities because he felt they closed the gap between what the Apache Hadoop core product and the surrounding ecosystem offered.
•A database engine MapR-DB that can capture and return data upwards of a 100 million data points per second.
•An ANSI-SQL query engine Apache Drill that can handle data while it is changing, with support for nested and schema-less storage.
•An ultra-strong security system that encrypts everything on the disk and the wire.
Srivas focused on extending these capabilities because he felt they closed the gap between what the Apache Hadoop core product and the surrounding ecosystem offered.
“It was clear to me that for Hadoop to become the foundation of running a business, it had to meet and even exceed the standards of quality and reliability set by other enterprise software,” said Srivas. “Our approach has been to combine the best of open source with some of our top-notch innovations, and bring to the market something that is very necessary. MapR has done some really outstanding work, but the important measure we focus on is our customers’ success.”
In broad strokes, MapR is arguing that it is the responsibility of Hadoop vendors to aggressively integrate Hadoop into the existing data management ecosystem. To be sure, the leaders of the Hadoop ecosystem are also seeking to improve integration but for many reasons open source projects move quickly in some respects, but quite slowly when it comes to architecture changes, productization and integration with existing enterprise technology.
“In our view, the APIs must be a community managed resource. If there’s one company controlling the API, then it causes lock-in. It doesn’t even matter whether it’s open source or not. For example, Ethernet is not open-source, but there’s no single entity dictating what it is and so everyone feels safe about it. This prevents lock-in,” said Srivas. “Having a well-defined API allows everyone to innovate, both above and below the API, and that’s exactly what MapR has done. We have taken the opportunity to improve Hadoop by innovating both above and below the API. When we innovate above the API it’s always open source. We also don’t shy away from the hard problems involved in addressing the low level architecture. That’s the foundation of our durable competitive advantage.”
Srivas said because of these capabilities when a customer buys from MapR they are able to make better use of the data with Hadoop. For example, there’s no limit to how many files can be stored in a cluster. Data sources can be mounted directly to a cluster and update continuously with no fear of running out of files, and therefore there’s no need to limit oneself to batch processing. Similarly with MapR-DB, the in-Hadoop database included in the distribution, there’s no limit to the number of tables one can have, or the number of rows that a table can hold. MapR-DB allows a customer to combine real-time, operational, workloads with continuous analytics, without the need to forklift data back-and-forth between OLTP and OLAP databases.
In other words, because of MapR’s innovations Hadoop can become more tightly integrated into the existing enterprise data ecosystem. The goal of MapR is to avoid the unpleasant surprises that often occur with emerging technology.
Of course, the advocates of the core project at Hortonworks and Cloudera argue that MapR doesn’t really have much of a lead and that the open source project will someday catch up in many areas that MapR has pulled ahead.
But MC Srivas says he is not concerned. To catch up MapR’s competitors will need to:
•Dramatically re-architect the underlying Hadoop Distributed File System. HDFS is a batch oriented, write-once file system, with several problems that have been festering for over six years. Srivas believes it is clear that incremental fixes by the community haven’t been sufficient, even over such a long time. Improving HDFS to be enterprise-grade with support for the most common use-cases requires a significant re-write, and is a multi-year, advanced project.
•Significantly improve NoSQL technologies to be able to compete with the likes of Oracle, so that there’s zero-data-loss and zero-downtime while maintaining the ability to hold more and more data.
•Significantly improve security.
He doesn’t see this happening. Community processes, especially those involved with Hadoop, are populated by engineers who work for competing companies. The investment required is significant, and these companies are not about to share their work.
“On the other hand, Hadoop wouldn’t be as popular as it is today if it weren’t open-source. If something works well, whether it is open-source or not, we will adopt it. If something doesn’t work, we will fix it. We have very innovative projects both in open and closed source. Apache Drill for example is a fantastic open-source query engine which the Hadoop project sorely needed. MapR-DB is another amazing NoSQL project that significantly raises the bar on what NoSQL can do.”
Of course, Hortonworks and Cloudera don’t seem too concerned either. As I’ve said, this market could be big enough for all three companies to do just fine. (For an overview of all three companies see: “Why Google Capital Placed its Bet on MapR”.)
How MapR’s Success Would Challenge Open Source Orthodoxy?
As I’ve pointed out in the past, the Hadoop open source community is not like other projects. (See “Can Hadoop Survive its Weird Beginning?”.) The community is populated by people from competing venture-funded startups, unlike Linux, which has been funded by companies getting value from using the technology. Even so, the rules of the Apache ecosystem keep things running pretty well, but the fact is that open source most often succeeds in providing a commodity alternative to an existing product, not in breaking new ground. When open source does break new ground, it is often because there is a central figure, often called a benevolent dictator for life, who guides the design and provides product management discipline. Hadoop does not have such a figure. There is no doubt that Hadoop is an example of successful community-driven innovation. The question for the Hadoop vendors is not so much will that innovation continue, but will it result in a product that allows value to be captured.
Remember also that Hadoop was designed to be a batch system and has gradually been morphed into being a general purpose operating system for big data workloads both batch and transactional. Hadoop is not yet finished. There is a lot of engineering work to do. In some cases the Hadoop ecosystem has found the right talent. In other cases, progress has been slow because experienced engineers are not working on the projects.
These challenges don’t stop any of the open source enthusiasts from saying with great confidence statements like “open source always wins” or “the community always wins.”
A closer look at other markets should give pause to these enthusiasts. For example, Salesforce.com, a proprietary product delivered through the cloud, is closed source and competes quite well against SugarCRM, a product based on open source. Splunk is another successful proprietary product that has lots of open source competition. Has SAP been destroyed by open source ERP? Nope. Two of the most successful companies in the data discovery space, Tableau and QlikView, are both proprietary. And even though Oracle is not growing as fast as it used to, it is still raking in huge amounts of maintenance on its databases despite an abundance of open source competition.
The fact of the matter is that both engineering, product design, and a delivery ecosystem can create advantages that overcome the benefits of the open source model. MapR’s bet is that they can be the company that creates such advantages by combining the Hadoop project with additional innovations. This is not really different than other open core business models. The difference is that with other open source projects the innovations were focused at higher levels in the stack and not focused on such core underlying capabilities. Of course, the product maturity of Hadoop is at a much earlier stage than open source projects such as Linux that were created 30 years after the first appearance of Unix operating systems. To be sure, MapR hasn’t won that bet yet, but if they do, people should reconsider blind faith in the power of the open source model to succeed to solve every problem.
How MapR’s Success Would Change Expectations for Hadoop?
Right now, Cloudera, with its first mover advantage, has the most cash and the most momentum in the market. As I’ve pointed out in previous articles (“Cloudera’s Strategy for Conquering Big Data in the Enterprise”, “Cloudera’s Cash is not an Adequate Business Strategy”), the question for Cloudera is what is its long term differentiation? Hortonworks, on the other hand, has complete strategic clarity. Its business is making the Apache Hadoop open source as useful as possible without modification.
But it is important to remember that we are still in the early stages of adoption for Hadoop. A recent study by Barclay’s pointed out that many CIOs are still not sure how to incorporate Hadoop into their environments (“CIOs Uncertain About Hadoop’s Value”).
Right now, much of the market is driven by proof of concepts. In such sales, Hadoop is an island on which a big data experiment is performed. But once the experiments are successful, then Hadoop must become part of a production quality infrastructure.
Here the weakness of Cloudera’s undifferentiated open source strategy is beginning to show based on intelligence I’ve gathered from a few client engagements. When a firm starts moving Cloudera into production, the number of subscriptions or licenses starts to grow. Cloudera’s clients then face the prospect of having to pay a large bill which raises questions such as “What am I paying for?” and “Why does an open source solution cost so much?”
Hortonworks is happy to provide a cheaper alternative in such situations.
MapR’s strategy is to provide a better alternative with respect to integration, high availability, and all the other aspects mentioned. MapR also has clear answers to the two questions just mentioned.
In the end, MapR’s success will only come if enterprise buyers really see value in its additional functionality. Evidence of this would include large sales to big names as well as wins to replace Cloudera and Hortonworks.
“We have customers across industries including at least one million dollar customer in seven of the top 10 verticals,” said Srivas. “We also have many customers that have had experience with other distributions before using MapR and the majority of those experience a greater than 5X return with MapR.”
To succeed, MapR must find a way to become the safe choice, the way to use Hadoop that is not a pain in the rear that allows more to be done than the other distributions.
Does MapR need a second act?
The problem MapR faces is that the open source community is working on many of the problems that it has addressed with its underlying data platform. MapR argues these efforts don’t even come close. Cloudera and Hortonworks say that the gap is all but closed. What matters is what buyers think and do.
While I don’t buy the argument that open source always wins or that the community can create software to meet any need, it is clear that Hadoop is making progress. When I asked M. C. Srivas how MapR will keep its lead, he responded this way:
“Well, MapR’s advantages have been out there for the last 5 years for everyone to see. These advantages are core to how customers are significantly impacting their bottom-line, and it’s the evidence that the MapR advantages remain unchallenged. It’s also evident that our competitors have not been able to guide the open-source process to address festering enterprise-readiness concerns with HDFS and HBase.“
The other avenue for MapR would be to create other types of extensions such as its deal with Vertica that allows the same cluster to run both Hadoop and Vertica. Finding new ways to use a Hadoop cluster is a great idea, one that could mark the second coming of generic grid computing. The challenge for MapR is that the Yarn capabilities that were introduces as part of Hadoop 2.0 do go a long way to making Hadoop a general purpose environment for grid computing.
In my view, MapR must keep ahead, but it also needs a second act. Srivas disagrees.
“MapR has already delivered a significant second act with MapR-DB that provides the best enterprise data platform for real-time operations and analytics, and a third act with the Apache Drill project, which we have driven completely in the open with the community,” said Srivas. “Although we already have a substantial lead, our platform also gives us ability to rapidly innovate with 4th and 5th acts.”
The battle for supremacy in the Hadoop supremacy battle is one of the most interesting parts of the current tech scene. In this next phase the enterprise buyers will pick the winner.
Source:
http://www.forbes.com/sites/danwoods/2014/09/29/can-mapr-keep-ahead-of-hadoop-competitors/print/ Dan Woods ContributorDan Woods Contributorhttp://www.forbes.com/sites/danwoods/2014/09/29/can-mapr-keep-ahead-of-hadoop-competitors/print/http://www.forbes.com/sites/danwoods/2014/09/29/can-mapr-keep-ahead-of-hadoop-competitors/print/
No comments:
Post a Comment