Hadoop and Oracle Technologies on BI Projects

Last night I attended an event powered by oGH and OBUG. Mark Rittman was invited to talk about; ‘Hadoop and Oracle Technologies on BI Projects’. This event has been organized to inform us about Hadoop combined with Oracle Technologies. Next to that the event was also meant as a start up of a BI / Warehousing SIG.

Last May I visited the Rittman Mead BI Forum in Brighton. I attended a Cloudera Hadoop Masterclass. Mark extended this Masterclass where he integrated Hadoop into the (traditional) Oracle BI world. It was a very interesting session. Check a blogpost form Jan Karremans for a summary.

Although a lot of my current clients are still struggling with their (Operational) BI, it’s very interesting to see how Oracle responds to the Digitization & Datafication of the world.

Oracle Data Warehousing Platform - 2014

The Oracle BI Community is a very vivid and lively (Online) Community. There are a lot of offline (partly) Oracle BI related events around the globe. Sadly there is not much offline activity (yet) in the Benelux. It’s good to hear that the oGH and OBUG seem to join forces to setup a BI / Warehousing SIG. Mark proved with his session that there is a lot to talk about as we see that the worlds of traditional Data Warehousing and Hadoop & NoSQL (Data Reservoir) are colliding.

Check out Mark’s Slideshare for his presentation. For those interested in the BI / Warehousing SIG, please watch the OBUG-site. There is a new session planned for 3 march 2015.

RM BI Forum 2014 Notes – Cloudera Hadoop Masterclass

Cloudera

The Rittman Mead BI Forum started off with a one-day Hadoop Masterclass, provided by Lars George.  As he messaged us the day before we have learned what Hadoop is all about, what its major components are, how to acquire, processes and provide data as part of a production data processing pipeline. To that effect, Lars advised that it would be useful to follow along the examples in the course and have an environment handy. That would allow us to experiment at our convenience during and after the class. He directed us to the following link; the Cloudera Quickstart VM.

Lars recommends the following: “Select the CDH5 version of the VM. Please select a virtual machines image matching your VM platform of choice. If you do not have a VM host application installed yet, you can choose from a few available ones. VirtualBox is provided by Oracle and a great choice to use. It can be downloaded here. Set up the VM application, then download and start the Cloudera Quickstart VM to run on top of it. It is as easy as that.”
Find below a few notes I took during the Masterclass.
Lars devided the Masterclass into four parts.

I – Introduction into Hadoop

  • What is Big Data? –  It’s not necessarily volume but also format and speed. Three V’s – Volume, Variety and Velocity
  • Introduction to Hadoop
  • HDFS
  • MapReduce
  • YARN
  • Cluster Planning

Hadoop is Open Source and Apache licensed — http://hadoop.apache.org
Many developers Cloudera, Apple
Contributers
Many related projects, applications, tools
Hadoop is not a system but a set of tools, projects which work together. You should decide, for each part of the architecture, which tool you should use and how you would use it.
HadoopEcosystem

Hadoop where to get it?
Load, Process and Analyze data
Hadoop Concept – distribute data in the system.
Process the data where it resides
No network processing
High level code (java)
No communication between nodes
Data stored on different machines in advance

Map Reduce Data Flow
  • Map
  • Sort en Shuffle
  • Reduce

II – Ingress and Egress

Ingress – moving data into Hadoop (HDFS)
Flume  (Near Real-Time Pipeline)
  • Source
  • (File) Channel
  • Sink —> poll, collect and write to eg. HDFS
Apache_Flume
Transfer data between Relational Database (Oracle, Terradata, Sql Server, etc.) and HDFS
Oracle Database Driver for Sqoop – OraOop by Quest 
FIle Formats important to keep in mind when you want to get the data out again.
Simple File versus Container (Structured) File
Parquet vs Google Dremel —>
BI Integration
  • Sqoop
  • HDFS Connector
  • ODBC/JDBC

III – NoSQL and Hadoop

ACID (atomicity, consistency, isolation, durability) 

IV Analyzing Big Data

  • Pig
  • Hive  (HiveServer2 instead of HiveServer1)
  • Impala
  • Search – Lucien
  • Data Pipelines (micro –  macro)
  • Oozie (Workflow Server)
  • Information Architecture – Where / How to store data and how to secure this structure
  • Spark (Java, Python, Scala compile into code)
I think Lars could have talked about Hadoop two more days (with or without sheets). Hadoop is all about making choices. There are similar tools, projects, concepts, etc. All depends on what you want to achieve.
Although this Masterclass was very informative, I still struggle to see the use case at this moment. A lot of my customers are still struggling with their ’normal’ data……

Big Data and Analytics Top Ten Trends for 2014

Oracle recently published their view on the; Top Ten Trends “Big Data” & “Analytics” for 2014. Find the list below:

1. Business Users Get Hooked on Mobile Analytics –> Oracle Business Intelligence Mobile App Designer

2. Analytics Take to the Cloud –> Oracle Applications Cloud

3. Hadoop-Based Data Reservoirs Unite with Data Warehouses –> Your Data Warehouse and Hadoop – Better Together Featuring Cloudera Webcast

4. New Skills Bolster Big Data Investments –> In search of Insight and Foresight – Getting more out of Big Data

5. Big Data Discovery is the Secret to Workforce Success for HCM –> Harnessing the Power of Employee Sentiment

6. Predictive Analytics Lend Fresh Insight into Big Data Strategies –> Oracle Advanced Analytics Option

7. Predictive Analytics Bring New Insight to Old Business Processes –> Big Data @ work – Real-Time Fraud Detection and Prevention

  • White Paper Big Data Analytics: Advanced Analytics in Oracle Database
  • White Paper Bringing R to the Enterprise
  • White Paper Oracle Engineered Systems – Engineered for Extreme Performance

8. Decision Optimization Technologies Enhance Human Intuition –> Oracle Real-Time Decisions Review by James Taylor

9. Business Leaders Embrace Packaged Analytics –> Packaged Analytic Applications: Accelerating Time and Value

10. New Skills Launch New Horizons of Analysis –> Oracle Training

If you are interested in; Succeeding with Enterprise Performance Management in 2014, please check here.

Check it out yourself

Check here for the Pre-Built Developer VMs (for Oracle VM VirtualBox). Pay special attention to the ‘SampleApp V309’ (Oracle BI Foundation) and the ‘BigDataLite 2.5’. If you are interested in Endeca, you might want to download an Endeca Virtual Machine here.

There are also some hosted environments online for Oracle BI (prodney / Admin123) and Endeca (publicuser@oracle.com / Admin123)

If you would like some guidance, you might want to check out the Oracle Learning Library (OLL).