Big Data Platform Distributions week – Wrap up

Wrapping up a week of Big Data Platform comparisons. A closer look @ #Cloudera, #MapR and #Hortonworks.

This last week I have been taking a slightly closer look at 3 of the most well known Big Data Platform Distributions; Cloudera, MapR and Hortonworks. It’s interesting to see how different de various distributions look at the same data challenge.

Which Big Data Platform Distributions is the best?

The three different Big Data Platform Distributions have a different focus. Here are a few things that make each of the top three vendors stand out from each other:

  • Cloudera – Proven, user-friendly technology.
    • Use Case; Enterprise Data Hub. Let the Hadoop platform serve as a central data repository.
  • MapR – Stable platform with a generic file-system and fast processing.
    • Use Case; Integrated platform with a focus on streaming.
  • Hortonworks – 100% Open source with minimal investment.
    • Use Case; Modernising your traditional EDW.

There is no easy answer to the question; “Which Big Data Platform Distributions is the best?”. My answer would be; “It depends”. It depends on a various different factors:

  • Performance – MapR has extra focus on speed and performance and therefor developed its own file system (MapR-FS) as well as its own NoSQL database, MapR-DB
  • Scalability – Hadoop is known to scale very well. All three offer software to mange this effectively. Cloudera & MapR go for proprietary.
  • Reliability – Before Hadoop 2.0 the NameNode was the single point of failure (SOPF) in a HDFS-cluster. MapR has a different approach (more distributed) approach with its file system known as MapR File System (MapR-FS)
  • Manageability – Cloudera & MapR add (proprietary) management software to their distribution. Hortonworks chooses for their open-source equivalents.
  • Licenses – All three offer downloadable free versions of their software. Both Cloudera & MapR add additional features for their paying customers.
  • Support – All three are part of the Hadoop community as contributors & committers. They contribute and commit (updated) code back to the open source repository.
  • Upgrades – Cloudera & Hortonworks both are known for their quick adoption of new technologies. Hortonworks seems to be the quickest to get things production ready.
  • OS Support – Hortonworks supports the Microsoft Windows OS. Microsoft included Hortonworks and packaged it into its own HDInsight (both on-premise or in the Azure cloud).
  • Training – It looks like Cloudera offers the most complete and professional training program. This also reflected in the price.
  • Tutorials – All three offer various tutorials and sandboxes to get started

Back to the question; “Which Big Data Platform Distributions is the best?”. Go ahead and find out for yourself. Determine which of the points above are important to your situation and try it out for your self.

If you have anything to contribute, please let me know. I haven’t performed a thorough comparison, yet. Maybe Gartner can help out a bit as well.

Thanks for reading.

Author: Daan Bakboord

I am an Oracle Big Data Analytics Consultant with great interest in anything closely related to the Oracle Big Data Analytics (OBIEE, BICS, OAC, Big Data, Data Integration, Data Visualization, Data Management, Data Architecture).

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s