The Rittman Mead BI Forum started off with a one-day Hadoop Masterclass, provided by Lars George. As he messaged us the day before we have learned what Hadoop is all about, what its major components are, how to acquire, processes and provide data as part of a production data processing pipeline. To that effect, Lars advised that it would be useful to follow along the examples in the course and have an environment handy. That would allow us to experiment at our convenience during and after the class. He directed us to the following link; the Cloudera Quickstart VM.
Lars recommends the following: “Select the CDH5 version of the VM. Please select a virtual machines image matching your VM platform of choice. If you do not have a VM host application installed yet, you can choose from a few available ones. VirtualBox is provided by Oracle and a great choice to use. It can be downloaded here. Set up the VM application, then download and start the Cloudera Quickstart VM to run on top of it. It is as easy as that.”
Find below a few notes I took during the Masterclass.
Lars devided the Masterclass into four parts.
I – Introduction into Hadoop
What is Big Data? – It’s not necessarily volume but also format and speed. Three V’s – Volume, Variety and Velocity
Hadoop is not a system but a set of tools, projects which work together. You should decide, for each part of the architecture, which tool you should use and how you would use it.
I think Lars could have talked about Hadoop two more days (with or without sheets). Hadoop is all about making choices. There are similar tools, projects, concepts, etc. All depends on what you want to achieve.
Although this Masterclass was very informative, I still struggle to see the use case at this moment. A lot of my customers are still struggling with their ’normal’ data……
I am a Self Employed Oracle Data &; Analytics Consultant with a great interest in anything closely related to Oracle Big Data Analytics (OBIEE, BICS, OAC, Big Data, Data Integration, Data Visualization, Data Management, Data Architecture).
View all posts by Daan Bakboord
Published
3 thoughts on “RM BI Forum 2014 Notes – Cloudera Hadoop Masterclass”
3 thoughts on “RM BI Forum 2014 Notes – Cloudera Hadoop Masterclass”