The Cloudera Enterprise Data Hub

As part of the Big Data Platform Distributions week, I will have a closer look at the Cloudera distribution.

Cloudera was founded in 2008 by a few people out of the Silicon Valley atmosphere:

Also Doug Cutting, co-creator of Hadoop, joined the company in 2009 as Chief Architect He still is active in  that role.

Cloudera offers an architecture which serves as an Enterprise Data Hub (EDH). The Hadoop platform serves as a central data repository. As opposed to traditional data management systems, data is not transferred (ETL / E-LT) from A to B. The data is ingested and stored on the EDH and processed, analysed and served where the data resides on the data platform.

Cloudera Enterprise Data Hub

The core of the Cloudera Distribution is based on Apache™ Hadoop® open-source project. These projects include projects like Impala, Kudu and Sentry, which are created inside Cloudera and returned back to the open-source community.

One product which sets Cloudera apart from the other distributions is the Cloudera Manager (CM). According to Cloudera, the Cloudera Manager is the best way to install, configure, manage, and monitor the Apache Hadoop stack. People will argue that CM is the best option because there is also Apache Ambari, which is open-source. I won’t go into details which one is better. As for a lot of things the answer would be; it depends. From what I hear and read, Cloudera Manager should be a must have for administrators, because of it’s rich functionalities. A downside of CM is that it is a proprietary product and therefore cannot benefit from the innovations from the community. That doesn’t necessarily mean that open-source projects are more open than proprietary software, because they are only used for a specific distribution.

The Enterprise Data Hub is the flagship product within Cloudera. Other products include:

As the leader in Apache Hadoop-based data platforms, Cloudera has the enterprise quality and expertise that make them the right choice to work with on Oracle Big Data Appliance.
— Andy Mendelson, Senior Vice President, Oracle Server Technologies

Oracle_Big_Data_ApplianceOracle has taken the don’t DIY philosophy one step further. They have created the Oracle Big Data Appliance (BDA) of which the recently announced the latest and 6th hardware generation of the BDA, which is now generally available.

Check out more about the collaboration between Cloudera and Oracle here.

“Oracle, Intel, and Cloudera partner to co-engineer integrated hardware and software into the Oracle Big Data Appliance, an engineered system designed to provide high performance and scalable data processing environment for Big Data.” See the video for the interview.

How to get started?

The best way to get to know the Cloudera product(s) is by getting your hands dirty. Cloudera offers Quickstart Virtual images (VM). These VM’s come in different flavours, like VMWare, VirtualBox and Docker. Go and download a copy for the current 5.12 version here. For questions and other interactions go to the Cloudera community.

Thanks for reading.