Big Data Platform Distributions week

There is a lot to do when it comes to Big Data. All kinds of new / improved techniques to us use data. Have a look at things like Machine Learning, Deep Learning or Artificial Intelligence. All these techniques use (Big) Data. I will not go into the discussion what Big Data exactly means. In the end it’s all about data whether it is structured (e.g. relational, spreadsheet, etc.), semi-structured (e.g. log files) or un-structured (e.g. pictures, video’s).

This blog is the start of a series blogs to have a closer look at technical implementations of Big Data. I am aware of the fact that there is a whole world around Big Data. Things like (full) data architecture or the actual request for information are often forgotten. Also the field of tension between the Business and IT deserves special attention.

What is Hadoop?

If we look at data from a technical perspective, we see one term popping up every time; “Hadoop”. What is Hadoop and why would I need it?

“The Apache™ Hadoop® project is a project that develops open-source software for reliable, scalable, distributed computing.”

Hadoop (/həˈdp/) is based on work done at Google in the late 1990s/early 2000s. According to the co-founders of Hadoop; Doug Cutting & Mike Cafarella, Hadoop is originated from the Google File System-paper that was published in October 2003. Doug Cutting named the project after is son’s pet elephant.

Back at the time, Google had a challenge. They wanted to index the entire web which required massive amounts of storage and a new approach to process these large amounts of data. Google found a solution in the Google File System (GFS) and  Distributed MapReduce (described in a paper released in 2004).

Hadoop was first built as the Nutch-project. It was meant to serve as an infrastructure to crawl the web and store a search engine index for the crawled pages. HDFS is used as a distributed filesystem that can store data across thousands of servers Map/Reduce jobs across various machines, running the work close to the data.

According to the the project page, Hadoop is built around three core components;

Hadoop-logo

  • Distributed File System (HDFS) – Stores data
  • Hadoop MapReduce – Processes data
  • Hadoop Yarn – Schedules work

These core Hadoop components are surrounded by a whole ecosystem of Hadoop projects. This open source eco-system provides all kinds of projects to solve real data problems. There are projects to support the different challenges within a data-driven environment:

This list is just an impression of the possible Hadoop ecosystem projects. There is a more actual list here, which provides; “…a summary to keep the track of Hadoop related projects…”.

Why would I need Hadoop?

There are a few reasons why one would need Hadoop. The most important ones are that the current amount of data is growing faster than the ability of e.g. RDBMS systems to store and process it. Next to that, the traditional data storage alternatives are no longer cost effective. Hadoop offers a an approach on low-cost commodity hardware, which makes it easy to scale up and down when necessary. Data is distributed over this hardware when it is stored. The processing of this data takes place where it is stored.

One of the big challenges while setting up an Hadoop environment is; “Where to start?” Starting a Single-Node Hadoop Cluster could be a first step, but that is the start. What to do next? Which projects (and which version) to include? When to upgrade which project? Is the project already production ready? What about things like support (issues, bugs, technical assistance), service level agreements (SLA), compliance, etc.

There are several distributions which provide a solution to answer the above questions. an additional benefit is that the organisations behind these distributions are part of the Hadoop community. They contribute and commit (updated) code back to the open source repository.

For this series I will focus on three of the largest distributions within the community; ClouderaMapR and HortonWorks. Please check out my findings in the following blogposts:

Thanks for reading.

 

Handouts – Introducing Oracle’s Information Management Reference Architecture

One of the great things of working in the Oracle Business Analytics industry is the fact that there is a very active community. Both online as well as offline. Oracle supports these activities where possible. Last week I attended an offline session at Oracle HQ in the Netherlands. This session was a event organized by the oGH and the OBUG (the SIG BIWA). Marti Koppelmans provided an Introduction into Oracle’s Information Management Reference Architecture. As promised he has posted some of his material on the oGH side. If you are interested, have a look at the following;

Although the titles are in Dutch, the contents are in English.

I will try to leave my notes here whit in a few days

Oracle Business Analytics Update

Last week I visited the Oracle Business Analytics Partner Update at Oracle NL in Utrecht, the Netherlands. The NL (Pre-) Sales gave us an update of the Oracle Analytics Roadmap. There is a lot of movement in the world of Oracle Analytics. I have written about the publications during Oracle Open World 2014. When we look back at 2014; “Year in Review – Oracle Business Analytics in 2014“, we can see that there are a few subjects key; Big Data, Cloud.

Oracle Business Analytics Strategy

The Oracle Business Analytics Strategy can be captured in one picture.

Oracle Business Analytics Strategy - 2014

The Oracle Business Analytics Strategy consists of Six Design Goals.

Oracle Business Analytics Strategy - Six Design Goals

1. Solutions – “To see more Patterns”

  • OBIEE
    • 12 C
    • R Integration
    • Interactive Dashboards
    • Voice & Word Recognition (BIAsk)
    • Data Mashups
  • Endeca Information Discovery

2. Mobile – “A Secure and Simple Experience”

Oracle has two options for a Mobile Experience.

  • Oracle BI Mobile HD
  • Oracle BI Mobile App Designer

3. More Answers – “In Context”

  • Oracle BI Applications (On Premise)
  • OTBI Saas (Embedded in Oracle Cloud Applications)
  • OTBI Enterprise (HCM, CRM, ERP, etc.)

4. Most Complet – “Cloud Capabilities”

Oracle has introduced new Cloud Services. The Oracle Analytics Cloud delivers Business Intelligence  & Analytics for Traditional Data and Big Data; Oracle Business Intelligence Cloud Service

There already is a lot of information to be found about the Oracle BICS offering;

If you like there is also a possibility to Try it Live.

5. Engineered – “Systems that Scale Operations”

Larry Ellison announced the Next Generation of Oracle Engineered Systems. See the Press Release here.

 

Check here for more video’s about Oracle’s X5 Engineered Systems Launch Event.

6. Faster – “Innovation Cycles”

Oracle introduces; Oracle Big Data Discovery – The Visual Face of Hadoop. With this new tool you should be able to spend more time on analyzing your data instead of prepping it. Check here what Oracle has to say about it.

Sounds like enough to try to keep up with!

Oracle OpenWorld 2014 is over – What’s next?

Last week Oracle OpenWorld 2014 took place in San Francisco. I did not have the pleasure to attend this event. thanks to the Social Media and the World Wide Web you could be able to follow the highlights. If we check out the Keynote of Thomas Kurian, we can learn that there are three Major Trends;

  • Big Data
  • Mobility
  • Cloud

I have looked at these Trends earlier.

Big Data

Just before Oracle OpenWorld 2014, the new updated Oracle Big Data Information Management Reference Architecture has been released. This updated Reference Architecture should place all the (new) technologies in context and next to that it should provide insight into a real, implementable Architecture.

Oracle offers new Big Data capabilities via;

Mobility

One of the introductions during Oracle Open World is the; Oracle Mobile Cloud, an end-to-end Mobile Development Framework;

On top of that Oracle has enhanced the integration between Oracle Mobile Application Framework and Oracle Mobile Security Suite. Oracle says the deeper integration makes it easier for mobile front-end developers to secure their applications.

For more details, please check;

Cloud

Oracle will introduce new Cloud Services. Check the Official Press Release for more details. The Oracle Analytics Cloud delivers Business Intelligence  & Analytics for Traditional Data and Big Data. 

There already is a lot of information to be found about the Oracle BICS offering;

Apart from the above there are a few more ‘Introductions’ & News facts:

  • OBI 12c
  • Oracle BI Applications
    • There will be no more upgrades on Oracle BIA 7.9.x (Informatica release). A few months ago there was a rumor that there would come an Oracle BIA 11g (Informatica release). This is not going to happen. Existing customers have to move over to the Oracle BIA 11g (ODI release)
  • Introducing Oracle Alta UI (Renewed Cross-platform User Experience)
  • Oracle Data Integrator will play a Key Role in binding the Applications, moving Data and simplifying access to Big Data (check these recaps – day 1, 2, 3 & day 4 – from the Oracle Data Integration team).

Oracle Big Data Information Management Reference Architecture

A few months ago I wrote a blogpost about the Oracle Reference Architecture for Information Management. There is a new Oracle Big Data Information Management Reference Architecture online now.

If you want to find out more about Oracle’s Big Data Information Management Reference Architecture please check the below links: