Oracle Data Visualization -Hands on Lab


Triggered by both Mike Durran and the Amis Oracle OpenWorld & CodeOne review I went through the OOW18 Session Catalog. The Oracle Data Visualization Hands on Lab is an easy way to get in touch with this product. I decided to give it a quick try.

Next to this I am also trying out the Oracle Cloud. I already provisioned an Autonomous Data Warehouse Cloud database (ADWC). It could be nice to combine part of the Hands on Lab with the Oracle Cloud trial.

Impact of Social Media Campaigns on
Product Sales

According to Mike, the Hands on Lab material can be downloaded here.

The Hands on Lab material consists of a pdf with the different steps to be taken. Next to that there are two .xlsx-files with the data.

I decided to try and use Oracle SQL Developer to load the .xlsx-files into my Autonomous Data Warehouse Cloud database. This proces is relatively straightforward.

Make sure you have a SQL Developer connection to ADWC. How you should achieve that is described in the Oracle Help Center.

Via an Right-Mouse-Click, you can Import Data.

Import Data

As from here it is just following the wizard and load the .xlsx-files into ADWC.

Data Preview
Import Method
Choose Columns
Column Definition

Make sure all column names are valid. In the above example ‘DATE’ is a reserveren word. Special characters, like ‘#’, also need special attention.

Column re-Definition
Finish

If everything goes well, the data is loaded into a table, specified in the ‘Import Method’. In this case it is the table: KOOLKART_SALES_DATA.

Table imported

Repeat similar steps for the other .xlsx-file.

If you want to proceed with the Hands on Lab you need to connect Oracle Data Visualization Desktop (DVD) to ADWC. The Oracle A-Team has dedicated a blogpost to this process. When DVD and ADWC are connected, a Data Set can be created.

Create Data Set

From this point onwards, it’s easy to follow the Hands on Lab.

It’s nice to see how easy various components within the Oracle stack can communicate together. This blog only covers the first 58 or so pages. I guess the other steps can be followed just as easy. If there is anything worth mentioning or when I will take a sidestep again I will post again.

Good Luck.

The Forrester Wave™: Big Data Fabric, Q2 2018

I have written about analyst reports in the past. Last week I read about the The Forrester Wave™: Big Data Fabric, Q2 2018. A few months ago I wrote a 5 part series about Big Data Platforms. Apparently platforms like Cloudera an Hortonworks do not run the show according to this analysis report.

“Talend, Denodo Technologies, Oracle, IBM, And Paxata Lead The Pack”

According to Forrester, a Big Data Fabric is an emerging platform which accelerates business insights; “by automating ingestion, curation, discovery, preparation and integration from data silos”. It can support many types of use cases. The top big data fabric use cases Forrester has seen are:

  • A 360-degree view of the customer
  • Internet-of-things (IoT) analytics
  • Real-time & Advanced analytics

Forrester has included the following companies (Cambridge Semantics, Cloudera, Denodo Technologies, Hitachi Vantara, Hortonworks, IBM, Informatica, Oracle, Paxata, Podium Data, SAP, Syncsort, Talend, TIBCO Software, and Trifacta) into the report, based on a few inclusion criteria. The most important ones are the functionality of the offering and (of course) Cloud. The offering must be able to offer the following functionality:

  • data access
  • data discovery
  • data transformation
  • data integration
  • data preparation
  • data security
  • data governance
  • data orchestration of data sources

Of course the solution;

  • “must be able to ingest, process, and curate large amounts of structured, semistructured, and unstructured data stored in Big Data Platforms
  • “should be able to store metadata/catalogs for data modeling and data access purposes to support a globally distributed data fabric.
  • “must be able to run on cloud or on-premises platforms.”

“Oracle continues to broaden its big data fabric solution”

Oracle’s key strengths lie in its security & governance capabilities, highly scalable data movement, and transformations that can be done in real-time streaming environments. Forrester says, Oracle’s customers use the Big Data Fabric to support the various use cases, like real-time analytics, customer intelligence, IoT applications, and other Big Data applications and insights.

In a presentation I did at the UKOUG and the nlOUG last year, I uncovered some parts of the Oracle Big Data (Cloud) offering.

The following Oracle products are included in the report:

On Premise

Cloud

Find out what some of the other vendors have to say about the report:

Of course this report is only a view from one analyst. You should check what the Gartners of this world have to say to get a completer picture.

Originally published on LinkedIn.

 

Oracle and the MQ for BI & Analytics 2018

Two years ago I wrote an article about the 2016 release of the Gartner Magic Quadrant for Business Intelligence and Analytics. It was very clear that Oracle and Gartner are not on the same track when it comes to the Analytics market. At that time there were various opinions between the different analysts. Jen Underwood wrote a nice article last year where she explored the results of the 2017 MQ.

Just recently Gartner released the 2018 version of the Magic Quadrant for Business Intelligence and Analytics. Still Oracle and Gartner do not completely agree.

Oracle recognizes self-service data visualization as a critical component of any BI & Analytics regimen, but only part of a broader Cloud-based Data & Analytics Strategy.

Check out why Oracle thinks that the Cloud is a disruptive force for innovation and change with Analytics here. To get a more complete picture of the Analytics market, please check out Forrester Research and BARC, who named Oracle as a Market Leader for Analytics. Also this Ovum publication might be interesting to find out why they think the Cloud is a model for better business.

I still think that there is more than Self-Service when it comes to Analytics. Especially these days with important subjects like the GDPR.

In the end I think you should have a closer look at the the findings of the different analysts. Sometimes a picture says more than a thousand words, but in this case it is important to familiarise yourself with the background.

I would be happy to assist you in this process.

Originally published on LinkedIn.

Big Data Platform Distributions week – Wrap up

This last week I have been taking a slightly closer look at 3 of the most well known Big Data Platform Distributions; Cloudera, MapR and Hortonworks. It’s interesting to see how different de various distributions look at the same data challenge.

Which Big Data Platform Distributions is the best?

The three different Big Data Platform Distributions have a different focus. Here are a few things that make each of the top three vendors stand out from each other:

  • Cloudera – Proven, user-friendly technology.
    • Use Case; Enterprise Data Hub. Let the Hadoop platform serve as a central data repository.
  • MapR – Stable platform with a generic file-system and fast processing.
    • Use Case; Integrated platform with a focus on streaming.
  • Hortonworks – 100% Open source with minimal investment.
    • Use Case; Modernising your traditional EDW.

There is no easy answer to the question; “Which Big Data Platform Distributions is the best?”. My answer would be; “It depends”. It depends on a various different factors:

  • Performance – MapR has extra focus on speed and performance and therefor developed its own file system (MapR-FS) as well as its own NoSQL database, MapR-DB
  • Scalability – Hadoop is known to scale very well. All three offer software to mange this effectively. Cloudera & MapR go for proprietary.
  • Reliability – Before Hadoop 2.0 the NameNode was the single point of failure (SOPF) in a HDFS-cluster. MapR has a different approach (more distributed) approach with its file system known as MapR File System (MapR-FS)
  • Manageability – Cloudera & MapR add (proprietary) management software to their distribution. Hortonworks chooses for their open-source equivalents.
  • Licenses – All three offer downloadable free versions of their software. Both Cloudera & MapR add additional features for their paying customers.
  • Support – All three are part of the Hadoop community as contributors & committers. They contribute and commit (updated) code back to the open source repository.
  • Upgrades – Cloudera & Hortonworks both are known for their quick adoption of new technologies. Hortonworks seems to be the quickest to get things production ready.
  • OS Support – Hortonworks supports the Microsoft Windows OS. Microsoft included Hortonworks and packaged it into its own HDInsight (both on-premise or in the Azure cloud).
  • Training – It looks like Cloudera offers the most complete and professional training program. This also reflected in the price.
  • Tutorials – All three offer various tutorials and sandboxes to get started

Back to the question; “Which Big Data Platform Distributions is the best?”. Go ahead and find out for yourself. Determine which of the points above are important to your situation and try it out for your self.

If you have anything to contribute, please let me know. I haven’t performed a thorough comparison, yet. Maybe Gartner can help out a bit as well.

Thanks for reading.

The Hortonworks Connected Data Platforms

As part of the Big Data Platform Distributions week, I will have a closer look at the Hortonworks distribution.

Hortonworks was founded in 2011 when 24 engineers from the original Hadoop team at Yahoo! formed Hortonworks. This included the founders Rob BeardenAlan Gates, Arun Murthy, Devaraj Das, Mahadev Konar, Owen O’Malley, Sanjay Radia, and Suresh Srinivas. The name Hortonworks refers to Horton the Elephant, which relates to the naming of Hadoop.

“The only way to deliver infrastructure platform technology is completely in open source.”

The Hortonworks solution aims to offer an platform to be able to process and store data-in-motion as well as data-at-rest. This platform is a combination of Hortonworks Data Flow (HDF) and Hortonworks Data Platforms (HDP®). This way Hortonworks is not only about doing Hadoop (HDP), but it is also connecting data platforms via HDF.

Since the birth of Hortonworks they have had a fundamental belief: “The only way to deliver infrastructure platform technology is completely in open source.” Hortonworks is  also member of the Open Data Platform Initiative; “A nonprofit organization committed to simplification & standardization of the Big Data ecosystem with common reference specifications and test suites”

Hortonworks Data Flow

The Hortonworks Data Flow solution for data-in-motion includes 3 key components:

  • Data Flow Management Systems – a drag-&-drop visual interface based on Apache Nifi / MiNifi. Apache NiFi is a robust and secure framework for routing, transforming, and delivering data across a multitude of systems. Apache MiNiFi (a light-weight agent) is created as a subproject of Apache Nifi and focuses on the collection of the data at the source.
  • Stream Processing – HDF supports Apache Storm and Kafka. The added value is in the GUI of Streaming Analytics Manager (SAM), which eliminates the need to code streaming data flows.
  • Enterprise Services – Making sure that everything works together in an enterprise environment. HDF supports Apache Ranger (Security) and Ambari (Provisioning, management and Monitoring). The Schema Registry builds a catalog so data streams can be reused.

HDF-Data-Motion-Platform-1024x532

Streaming Analytics Manager and Schema Registry are both open source projects. Until this moment they are not part of the Apache Software Foundation project.

Hortonworks Data Platforms

Hortonworks solution for data-at-rest is Hortonworks Data Platform (HDP). HDP consists of the following components.

Hortonworks Data Platform

Hortonworks is also available in the cloud with two specific products:

  • Azure HDInsight – a collaboration between Microsoft and Hortonworks to offer a Big Data Analytics platform on the Azure Cloud.
  • Hortonworks Data Cloud for AWS – deploy Hortonworks Data Cloud Hadoop clusters on AWS infrastructure.

How to get started?

The best way to get to know the Hortonworks product(s) is by getting your hands dirty. Hortonworks offers Sandboxes on a VM for both HDP as well as HDF. These VM’s come in different flavours, like VMWare, VirtualBox and Docker. Go and download a copy here. For questions and other interactions go to the Hortonworks community.

Thanks for reading.