Thursday, February 15, 2018 – the first edition of the Full Scale Data Architecture College Tour will be organized. The Full Scale Data Architects have secured a C-Level executive of a high ranking financial institution who is willing to be interviewed by the host; Ronald Damhof and is also willing to answer questions from the audience.
Last week I was present at the third Full Scale Data Architect meetup. The Full Scale Data Architect group is created by Ronald Damhof and Martijn Evers.
Ronald and Martijn have a mission; “…to make Data Architects come to grips with the new reality of data, and how to get control back. For this, we started a movement for Full Scale Data Architects to help us combat the ever-increasing data-tsunami. For raising awareness we postulate 10 commandments for the (aspiring) Full Scale Data Architect. Join us on our mission to combat data entropy.”
The topic of last weeks meetup was; “The Future of Data/Information modeling”. For me, as an Oracle Data and Analytics consultant, with a focus on the technical implementation of data, these meetups are very interesting. I am infected with the red virus and approach most data challenges from an Oracle perspective. It’s good (and in fact a necessity) to look further than the technical implementation of data as well.
The key take away from the last meetup is that there is no standard solution for whatever data challenge. You always have to focus on the customers’ concerns. Familiarize yourself with the customers’ concerns and base your solution (both functional as well as technical) on these. A passionate argument by Martijn Evers and a lively discussion in/with the audience.
“Collegetour; an executive talking data”
I really liked this meetup and I am looking forward to the next one. This one will be really interesting. Thursday, February 15, 2018 – the first edition of the Full Scale Data Architecture College Tour will be organized. The Full Scale Data Architects have secured a C-Level executive of a high ranking financial institution who is willing to be interviewed by the host; Ronald Damhof and is also willing to answer questions from the audience.
I already secured my seat, I hope to see you there as well. Have a look here at other planned events.
If you are interested in the Full Data Architect movement, check out these online profiles on Twitter, LinkedIn, and Facebook.
Originally published on LinkedIn.
After several years of employment, the time is right to move on and do the things my own way. As from this week, I will start commercializing my DaAnalytics (Daan & Analytics) label. I will continue to provide Oracle Data & Analytics services as a self-employed professional.
First of all, I would like to wish all of you a very happy, healthy and prosperous year 2018. In my last post of the year 2017, I announced that there would be some new challenges lying ahead for the year 2018. The first challenge is a very exciting one.
Back in 1998 I first was introduced into the world of Oracle. I started at Yacht as an Oracle Application Developer (Oracle Developer & Oracle Designer). After 4 years I made a move to Van Oord where I worked as a Business Intelligence consultant. In 2004 I joined Scamander Solutions for the first time. I have worked at Scamander for almost 10 years in two separate periods. In these periods I developed myself as an Oracle Business Analytics consultant, starting with Oracle Discoverer and after the Siebel-takeover working with Oracle BI EE. In a short period at Ebicus I got the chance to further explore Oracle BI Applications. At my last employer; Quistor I had the privilege make the move to the Oracle Cloud (BICS and OAC).
I am very grateful to my previous employers mentioned above. They provided me the chance to develop myself into the professional I am today. Now I am confident enough to take the next step.
Oracle Data & Analytics @ DaAnalytics
After several years of employment, the time is right to move on and do the things my own way. As from this week, I will start commercializing my DaAnalytics (Daan & Analytics) label. I will continue to provide Oracle Data & Analytics services as a self-employed professional. With DaAnalytics, I’ll keep focussing on Data & Analytics in Oracle environments. This will be broader than Oracle tooling alone. Data & Analytics is a discipline that goes beyond tooling. Before you can go ahead with tooling there is a whole spectrum of organizing, preparing and managing data which is even more important.
For those who like to stay connected, I am online;
Website – https://www.daanalytics.nl
Twitter – Daan Bakboord & DaAnalytics
Another challenge lies ahead but I cannot go into too much detail, yet. We are in the course of finalizing a partnership agreement. More news to come in a few days/weeks.
“Data visualisatie de valkuilen en uitdagingen.”
Apart from my consulting activities, I will still be active for the nlOUG as SIG Lead for BIWA. Next January 25, the BIWA SIG will organise a Meetup at Oracle in Utrecht. It will be in Dutch and the theme will be; “Data visualisatie de valkuilen en uitdagingen.” More details here.
This year’s nlOUG Tech Experience 2018 “The Cloud is Next” will take place on the 7th and the 8th of June 2018. The Call for Papers is still open until this weekend. Hope you will submit a paper and join us.
Originally written on LinkedIn.
The year is almost over. Time to look back and to look forward.
Lots of things have happened over the last year. Next to my daily activities at Quistor, I had two highlights. In June the nlOUG organized the first Tech Experience. Early December I had the privilege to present at the UKOUG Tech Event. These two events were very nice to be a part of. Next year’s Tech Experience will take place on the 7th and the 8th of June 2018. The Call for Papers is still open until this weekend. I also hope the be able to join the UKOUG next year December in Liverpool.
Next year will be an interesting one for me. New challenges are lying ahead. I am really excited and will tell you more next year.
All that rest is wishing you a very happy, healthy and prosperous next year 2018. Enjoy the holiday season.
Originally written on LinkedIn.
Last week I have attended the UKOUG Tech Conference at the ICC in Birmingham. This conference is a combined conference with APPS-, TECH- and JDE-related content. Quistor was present with a stand at the JDE-conference. For me, these kinds of events are a way of exchanging knowledge and meeting old and new friends.
I have been given the privilege to speak. My first presentation was about ‘Becoming Insight Driven With Big Data’. This was basically an overview of Oracle’s answer to the changing needs in the Data and Analytics atmosphere. The second presentation tried to answer the question; ‘Is Data Warehousing Dead’. During this presentation, I discovered that preparing for two presentations at one conference (which I have never done before) is more work than I had expected. It took me too much effort to get the story across. This is a lesson for next time so I will be better prepared.
For those interested, I uploaded my slides to Speaker Deck.
Becoming Insight Driven with Big Data – https://speakerdeck.com/daanalytics/becoming-insight-driven-with-big-data …
Is Data Warehousing dead? – https://speakerdeck.com/daanalytics/is-data-warehousing-dead …
In the Christmas edition of Quistor’s QPulse there are two articles from my hand with some more insight of my visit to Birmingham. I have attended several presentations. If I may highlight a few, I would first like to mention Robin Moffat, because he is so passionate about his topic; Kafka. Rob Cowell showed excellent skills explaining Oracle Big Data SQL. Last but not least Mike Vickers managed to get his story about Oracle BI Publisher across in a very structured manner. If I compare their performance to mine, there is some room for improvement 😊.
Still for me personally, it was great to meet ‘old’ and new friends. There was also an offline presence of the #obihackers IRC channel. Finally, a chance to meet people in real-life instead of from behind the keyboard.
Thanks, UKOUG, for the organization and for giving me the opportunity to be part of your conference. I had a great time and I hope to be back next year in Liverpool.
Don’t forget to submit your paper for Tech Experience on the 7th & 8th of June 2018! Submit your paper on http://www.tech18.nl. More info here.
Originally written for LinkedIn.
Wrapping up a week of Big Data Platform comparisons. A closer look @ #Cloudera, #MapR and #Hortonworks.
This last week I have been taking a slightly closer look at 3 of the most well known Big Data Platform Distributions; Cloudera, MapR and Hortonworks. It’s interesting to see how different de various distributions look at the same data challenge.
Which Big Data Platform Distributions is the best?
The three different Big Data Platform Distributions have a different focus. Here are a few things that make each of the top three vendors stand out from each other:
- Cloudera – Proven, user-friendly technology.
- Use Case; Enterprise Data Hub. Let the Hadoop platform serve as a central data repository.
- MapR – Stable platform with a generic file-system and fast processing.
- Use Case; Integrated platform with a focus on streaming.
- Hortonworks – 100% Open source with minimal investment.
- Use Case; Modernising your traditional EDW.
There is no easy answer to the question; “Which Big Data Platform Distributions is the best?”. My answer would be; “It depends”. It depends on a various different factors:
- Performance – MapR has extra focus on speed and performance and therefor developed its own file system (MapR-FS) as well as its own NoSQL database, MapR-DB
- Scalability – Hadoop is known to scale very well. All three offer software to mange this effectively. Cloudera & MapR go for proprietary.
- Reliability – Before Hadoop 2.0 the NameNode was the single point of failure (SOPF) in a HDFS-cluster. MapR has a different approach (more distributed) approach with its file system known as MapR File System (MapR-FS)
- Manageability – Cloudera & MapR add (proprietary) management software to their distribution. Hortonworks chooses for their open-source equivalents.
- Licenses – All three offer downloadable free versions of their software. Both Cloudera & MapR add additional features for their paying customers.
- Support – All three are part of the Hadoop community as contributors & committers. They contribute and commit (updated) code back to the open source repository.
- Upgrades – Cloudera & Hortonworks both are known for their quick adoption of new technologies. Hortonworks seems to be the quickest to get things production ready.
- OS Support – Hortonworks supports the Microsoft Windows OS. Microsoft included Hortonworks and packaged it into its own HDInsight (both on-premise or in the Azure cloud).
- Training – It looks like Cloudera offers the most complete and professional training program. This also reflected in the price.
- Tutorials – All three offer various tutorials and sandboxes to get started
Back to the question; “Which Big Data Platform Distributions is the best?”. Go ahead and find out for yourself. Determine which of the points above are important to your situation and try it out for your self.
If you have anything to contribute, please let me know. I haven’t performed a thorough comparison, yet. Maybe Gartner can help out a bit as well.
Thanks for reading.
As part of the Big Data Platform Distributions week, I will have a closer look at the Hortonworks distribution.
Hortonworks was founded in 2011 when 24 engineers from the original Hadoop team at Yahoo! formed Hortonworks. This included the founders Rob Bearden, Alan Gates, Arun Murthy, Devaraj Das, Mahadev Konar, Owen O’Malley, Sanjay Radia, and Suresh Srinivas. The name Hortonworks refers to Horton the Elephant, which relates to the naming of Hadoop.
“The only way to deliver infrastructure platform technology is completely in open source.”
The Hortonworks solution aims to offer an platform to be able to process and store data-in-motion as well as data-at-rest. This platform is a combination of Hortonworks Data Flow (HDF™) and Hortonworks Data Platforms (HDP®). This way Hortonworks is not only about doing Hadoop (HDP), but it is also connecting data platforms via HDF.
Since the birth of Hortonworks they have had a fundamental belief: “The only way to deliver infrastructure platform technology is completely in open source.” Hortonworks is also member of the Open Data Platform Initiative; “A nonprofit organization committed to simplification & standardization of the Big Data ecosystem with common reference specifications and test suites”
Hortonworks Data Flow
The Hortonworks Data Flow solution for data-in-motion includes 3 key components:
- Data Flow Management Systems – a drag-&-drop visual interface based on Apache Nifi / MiNifi. Apache NiFi is a robust and secure framework for routing, transforming, and delivering data across a multitude of systems. Apache MiNiFi (a light-weight agent) is created as a subproject of Apache Nifi and focuses on the collection of the data at the source.
- Stream Processing – HDF supports Apache Storm and Kafka. The added value is in the GUI of Streaming Analytics Manager (SAM), which eliminates the need to code streaming data flows.
- Enterprise Services – Making sure that everything works together in an enterprise environment. HDF supports Apache Ranger (Security) and Ambari (Provisioning, management and Monitoring). The Schema Registry builds a catalog so data streams can be reused.
Streaming Analytics Manager and Schema Registry are both open source projects. Until this moment they are not part of the Apache Software Foundation project.
Hortonworks Data Platforms
Hortonworks solution for data-at-rest is Hortonworks Data Platform (HDP). HDP consists of the following components.
- Data Management – YARN and HDFS (scalable, fault tolerant and cost effective storage) are the two key components in the Hortonworks Data Platform solution.
- Data Access – Interact with the data in any way from batch to streaming, based on Apache™ Hadoop® open-source projects like; Apache Pig, Apache Hive, Apache HBase, Apache Storm and Apache Spark.
- Data Governance and Integration – Quickly and easily load data. Manage this data according policy, using Apache Knox & Apache Ranger
- Security – Authenticate, authorise, account and protect data.
- Operations – provision, manage, monitor and operate Hadoop clusters at scale using Apache Ambari, Apache Oozie and Apache ZooKeeper.
Hortonworks is also available in the cloud with two specific products:
- Azure HDInsight – a collaboration between Microsoft and Hortonworks to offer a Big Data Analytics platform on the Azure Cloud.
- Hortonworks Data Cloud for AWS – deploy Hortonworks Data Cloud Hadoop clusters on AWS infrastructure.
How to get started?
The best way to get to know the Hortonworks product(s) is by getting your hands dirty. Hortonworks offers Sandboxes on a VM for both HDP as well as HDF. These VM’s come in different flavours, like VMWare, VirtualBox and Docker. Go and download a copy here. For questions and other interactions go to the Hortonworks community.
Thanks for reading.
As part of the Big Data Platform Distributions week, I will have a closer look at the MapR distribution.
John Schroeder founded MapR in 2009 and served as the company’s CEO until 2016.
MapR offers their Converged Data Platform (CDP). The vision behind this platform is to offer one integrated platform for Big Data, which enable batch (e.g. ETL offload, log file analytic), interactive (e.g. BI / Analytics) and streaming (e.g. Sensor Analytics) capabilities. MapR offers one integrated platform to prevent building data silos and point solutions.It’s incorrect to think of the MapR CDP as proprietary. Of course the MapR CDP is powered by three platform services:
- MapR-FS –> It’s good to understand why MapR decided to introduce their own file system instead of HDFS. Check out the explanation in one of MapR’s Whiteboard video’s.
- MapR-DB –> explained via a Whiteboard
- MapR Streams –> How differentiates MapR from similar products in the market. Also explained via a Whiteboard.
But apart from that MapR supports several projects of the Apache™ Hadoop® project (Apache Storm, Apache Pig, Apache Hive, Apache Mahout, YARN, Apache Sqoop, Apache Flume, etc.). Apache Drill™ is MapR’s SQL query engine. So the MapR Converged Data Platform is a mix of proprietary as well as open-source.
Central in the MapR philosophy is ‘Convergence’. I do not know the exact definition of ‘Convergence’ but in the MapR context it’s all about; “integrating Data-in-Motion & Data-at-Rest to support real-time applications”. Since the end of 2016 MapR supports this philosophy by using Event-Driven Micro-services. The idea behind these Micro-services (combined with specific API’s) is that they unify all kinds of data (structured, semi-structured and un-structured) as well as streaming & event data. These Micro-services are designed in such a way that they remove complexity and enhance several tasks.To get the most out of the CDP and to speed up the process, MapR is delivering a Converged Application Blueprint to get things started. This blueprint includes:
- Sample apps (incl. source)
- Architecture guides
- Community-supported best practices (use these wisely)
The Converged Data Platform is the flagship product within MapR. Other products include:
MapR Converged Data Platform Now Available in Oracle Cloud Marketplace
There is are a few areas where MapR and Oracle have a connection. One is via the Oracle Cloud Marketplace. This enables Oracle Cloud customers to use MapR in the Oracle Cloud. Oracle Data Integrator, being open and heterogenous, seems to integrate with MapR very well. Check out Issam Hijazi’s findings.
How to get started?
The best way to get to know the MapR product(s) is by getting your hands dirty. Try MapR and download a Sandbox. For questions and other interactions go to the MapR community.
Thanks for reading