Full Scale Data Architecture College Tour

Yesterday I attended the Full Scale Data Architecture College Tour. A unique opportunity to step into the boardroom and find out what an executive thinks about IT, architects, architecture and data in particular. The host of the event; Ronald Damhof and Sylvia Butzke (COO at PGGM Investments) did a great job of keeping the conversation going. Sylvia was willing to answer questions from both Ronald as well as from the audience.

I will not give a complete transcription of the 1,5 hour event. Following are a few of my highlights of the evening.

Sylvia seems to be a leader who overlooks the situation / organisation and tries to understand thoroughly what is happening and why it is happening. Are these things necessary or is there a possible alternative.

“Doing the right things instead of doing things right”

You have to understand why things are the way they are and then; “So what?”. Why is it relevant.

For Sylvia the role of an Architect is to ask the right questions about the existence of an organisation. This should be done from a business perspective. It doesn’t make sense to start with the technique, although that seems to be the easiest way. Often because the Architecture capability is considered an IT role. The Architect supports in asking the relevant business questions and helps the business answer these question.

The outcome of an Architecture exercise must give an organization something to hold on to. This is not without obligation. An Architecture should have authority within the organisation. There is no general Architecture. Each Architecture is specific and in context of that organisation.

Together with the board, an Architect (it can be discussed which specific role, I guess it will be the Enterprise Architect) should act as a trusted advisor. Someone partly responsible for the course of the organisation. Both roles should reinforce each other. They must have the same goal. If they collide, the do not have the same goals. It’s a skill where they must experience each others added value.

Sylvia warns not to rush to technology to quickly. That looks very tempting because of the digitalisation and all the technological opportunities. If you do not see the bigger picture, you might only build a point solution for a current challenge.

Keep on thinking. Are we asking the right / relevant questions? Are we using the right datasets to answer certain questions?

So what’s the role of data in the boardroom? The boardroom is responsible for the strategy of an organisation. What are we going to do? How are we going to direct the organisation. The way this is organised, determines the maturity of an organisation. Technology is supportive. Data can be collectedby effective use of IT (digitalisation). The boardroom needs to be able to understand how digitalisation works

Normally organisations are managed in different layers. To get an better understanding, it makes sense to manage the organisation structured as a T. The Top layer is still responsible, but the expertise is top-down in the organisation. Making decisions together in the deepest fibers of the organisation. The board needs to gather the right expertise and understand what is happening. This means that the board needs to determine how much they are willing to understand. The experts need to determine how much information and expertise they need to aggregate to make it able to consume by the board. The challenge for the top management is to find balance between innovation & experimenting versus discipline & data quality. It’s not a matter of choosing but trying to do both.

The form in which an Architecture is communicated (e.g. Archimate) is not necessarily important. What is relevant is to understand its relevance and its dependency within the total Architecture. Certain dimensions of the Architecture are important. The Architect is the conversation partner.

Data is a means and not an end in itself. One can look at data as a mean to reach a certain goal. Data can (and should) be used to answer questions. Business must learn how to ask questions. If done right, data can be seen as an asset. The value of data can be compared with the value of people, material and money within an organisation. To manage people correctly within an organisation, organisations have an HR-policy in place. A Data-policy should be in place as well. A policy specific to the organisation.

According to Sylvia being Agile means being able to learn and to adapt to change. These days things are changing very fast so you cannot direct a company on what. You need to focus on the why and according to which principles.

The actual challenge within organisations is that we want to do ‘something’ with data. The important question to answer is; Why? What are we doing? Which processes are important? Which data is involved? What is the dependency of the different processes and the related data? Determine the Business Context first before you start to do ‘something’ with data.

“Stay curious and immerse yourself in the why”

Final takeaway; “Stay curious and immerse yourself in the why”; So what?

Next meetup on 22nd of March will be the Full Scale Data BOK Kickoff session

If you are interested in the Full Data Architect movement, check out these online profiles on TwitterLinkedIn, and Facebook.

Originally published on LinkedIn.

Collegetour; an executive talking data

Thursday, February 15, 2018 – the first edition of the Full Scale Data Architecture College Tour will be organized. The Full Scale Data Architects have secured a C-Level executive of a high ranking financial institution who is willing to be interviewed by the host; Ronald Damhof and is also willing to answer questions from the audience.

Last week I was present at the third Full Scale Data Architect meetup. The Full Scale Data Architect group is created by Ronald Damhof and Martijn Evers.

Ronald and Martijn have a mission; “…to make Data Architects come to grips with the new reality of data, and how to get control back. For this, we started a movement for Full Scale Data Architects to help us combat the ever-increasing data-tsunami. For raising awareness we postulate 10 commandments for the (aspiring) Full Scale Data Architect. Join us on our mission to combat data entropy.”

The topic of last weeks meetup was; “The Future of Data/Information modeling”. For me, as an Oracle Data and Analytics consultant, with a focus on the technical implementation of data, these meetups are very interesting. I am infected with the red virus and approach most data challenges from an Oracle perspective. It’s good (and in fact a necessity) to look further than the technical implementation of data as well.

The key take away from the last meetup is that there is no standard solution for whatever data challenge. You always have to focus on the customers’ concerns. Familiarize yourself with the customers’ concerns and base your solution (both functional as well as technical) on these. A passionate argument by Martijn Evers and a lively discussion in/with the audience.

“Collegetour; an executive talking data”

I really liked this meetup and I am looking forward to the next one. This one will be really interesting. Thursday, February 15, 2018 – the first edition of the Full Scale Data Architecture College Tour will be organized. The Full Scale Data Architects have secured a C-Level executive of a high ranking financial institution who is willing to be interviewed by the host; Ronald Damhof and is also willing to answer questions from the audience.

I already secured my seat, I hope to see you there as well. Have a look here at other planned events.

If you are interested in the Full Data Architect movement, check out these online profiles on TwitterLinkedIn, and Facebook.

Originally published on LinkedIn.

Happy New Year for a new challenge!

After several years of employment, the time is right to move on and do the things my own way. As from this week, I will start commercializing my DaAnalytics (Daan & Analytics) label. I will continue to provide Oracle Data & Analytics services as a self-employed professional.

First of all, I would like to wish all of you a very happy, healthy and prosperous year 2018. In my last post of the year 2017, I announced that there would be some new challenges lying ahead for the year 2018. The first challenge is a very exciting one.

Back in 1998 I first was introduced into the world of Oracle. I started at Yacht as an Oracle Application Developer (Oracle Developer & Oracle Designer). After 4 years I made a move to Van Oord where I worked as a Business Intelligence consultant. In 2004 I joined Scamander Solutions for the first time. I have worked at Scamander for almost 10 years in two separate periods. In these periods I developed myself as an Oracle Business Analytics consultant, starting with Oracle Discoverer and after the Siebel-takeover working with Oracle BI EE. In a short period at Ebicus I got the chance to further explore Oracle BI Applications. At my last employer; Quistor I had the privilege make the move to the Oracle Cloud (BICS and OAC).

I am very grateful to my previous employers mentioned above. They provided me the chance to develop myself into the professional I am today. Now I am confident enough to take the next step.

Oracle Data & Analytics @ DaAnalytics

After several years of employment, the time is right to move on and do the things my own way. As from this week, I will start commercializing my DaAnalytics (Daan & Analytics) label. I will continue to provide Oracle Data & Analytics services as a self-employed professional. With DaAnalytics, I’ll keep focussing on Data & Analytics in Oracle environments. This will be broader than Oracle tooling alone. Data & Analytics is a discipline that goes beyond tooling. Before you can go ahead with tooling there is a whole spectrum of organizing, preparing and managing data which is even more important.

For those who like to stay connected, I am online;

Website – https://www.daanalytics.nl

Twitter – Daan Bakboord & DaAnalytics

Another challenge lies ahead but I cannot go into too much detail, yet. We are in the course of finalizing a partnership agreement. More news to come in a few days/weeks.

“Data visualisatie de valkuilen en uitdagingen.”

Apart from my consulting activities, I will still be active for the nlOUG as SIG Lead for BIWA. Next January 25, the BIWA SIG will organise a Meetup at Oracle in Utrecht. It will be in Dutch and the theme will be; “Data visualisatie de valkuilen en uitdagingen.” More details here.

This year’s nlOUG Tech Experience 2018 “The Cloud is Next” will take place on the 7th and the 8th of June 2018. The Call for Papers is still open until this weekend. Hope you will submit a paper and join us.

Originally written on LinkedIn.

Seasons Greetings

The year is almost over. Time to look back and to look forward.

Lots of things have happened over the last year. Next to my daily activities at Quistor, I had two highlights. In June the nlOUG organized the first Tech Experience. Early December I had the privilege to present at the UKOUG Tech Event. These two events were very nice to be a part of. Next year’s Tech Experience will take place on the 7th and the 8th of June 2018. The Call for Papers is still open until this weekend. I also hope the be able to join the UKOUG next year December in Liverpool.

Next year will be an interesting one for me. New challenges are lying ahead. I am really excited and will tell you more next year.

All that rest is wishing you a very happy, healthy and prosperous next year 2018. Enjoy the holiday season.

Originally written on LinkedIn.

UKOUG Tech17 is a wrap

Last week I have attended the UKOUG Tech Conference at the ICC in Birmingham. This conference is a combined conference with APPS-, TECH- and JDE-related content. Quistor was present with a stand at the JDE-conference. For me, these kinds of events are a way of exchanging knowledge and meeting old and new friends.

I have been given the privilege to speak. My first presentation was about ‘Becoming Insight Driven With Big Data’. This was basically an overview of Oracle’s answer to the changing needs in the Data and Analytics atmosphere. The second presentation tried to answer the question; ‘Is Data Warehousing Dead’. During this presentation, I discovered that preparing for two presentations at one conference (which I have never done before) is more work than I had expected. It took me too much effort to get the story across. This is a lesson for next time so I will be better prepared.

For those interested, I uploaded my slides to Speaker Deck.

Becoming Insight Driven with Big Data – https://speakerdeck.com/daanalytics/becoming-insight-driven-with-big-data …

Is Data Warehousing dead? – https://speakerdeck.com/daanalytics/is-data-warehousing-dead …

In the Christmas edition of Quistor’s QPulse there are two articles from my hand with some more insight of my visit to Birmingham. I have attended several presentations. If I may highlight a few, I would first like to mention Robin Moffat, because he is so passionate about his topic; Kafka. Rob Cowell showed excellent skills explaining Oracle Big Data SQL. Last but not least Mike Vickers managed to get his story about Oracle BI Publisher across in a very structured manner. If I compare their performance to mine, there is some room for improvement 😊.

Still for me personally, it was great to meet ‘old’ and new friends. There was also an offline presence of the #obihackers IRC channel. Finally, a chance to meet people in real-life instead of from behind the keyboard.

Thanks, UKOUG, for the organization and for giving me the opportunity to be part of your conference. I had a great time and I hope to be back next year in Liverpool.

Don’t forget to submit your paper for Tech Experience on the 7th & 8th of June 2018! Submit your paper on http://www.tech18.nl. More info here.

 

Originally written for LinkedIn.

Big Data Platform Distributions week – Wrap up

Wrapping up a week of Big Data Platform comparisons. A closer look @ #Cloudera, #MapR and #Hortonworks.

This last week I have been taking a slightly closer look at 3 of the most well known Big Data Platform Distributions; Cloudera, MapR and Hortonworks. It’s interesting to see how different de various distributions look at the same data challenge.

Which Big Data Platform Distributions is the best?

The three different Big Data Platform Distributions have a different focus. Here are a few things that make each of the top three vendors stand out from each other:

  • Cloudera – Proven, user-friendly technology.
    • Use Case; Enterprise Data Hub. Let the Hadoop platform serve as a central data repository.
  • MapR – Stable platform with a generic file-system and fast processing.
    • Use Case; Integrated platform with a focus on streaming.
  • Hortonworks – 100% Open source with minimal investment.
    • Use Case; Modernising your traditional EDW.

There is no easy answer to the question; “Which Big Data Platform Distributions is the best?”. My answer would be; “It depends”. It depends on a various different factors:

  • Performance – MapR has extra focus on speed and performance and therefor developed its own file system (MapR-FS) as well as its own NoSQL database, MapR-DB
  • Scalability – Hadoop is known to scale very well. All three offer software to mange this effectively. Cloudera & MapR go for proprietary.
  • Reliability – Before Hadoop 2.0 the NameNode was the single point of failure (SOPF) in a HDFS-cluster. MapR has a different approach (more distributed) approach with its file system known as MapR File System (MapR-FS)
  • Manageability – Cloudera & MapR add (proprietary) management software to their distribution. Hortonworks chooses for their open-source equivalents.
  • Licenses – All three offer downloadable free versions of their software. Both Cloudera & MapR add additional features for their paying customers.
  • Support – All three are part of the Hadoop community as contributors & committers. They contribute and commit (updated) code back to the open source repository.
  • Upgrades – Cloudera & Hortonworks both are known for their quick adoption of new technologies. Hortonworks seems to be the quickest to get things production ready.
  • OS Support – Hortonworks supports the Microsoft Windows OS. Microsoft included Hortonworks and packaged it into its own HDInsight (both on-premise or in the Azure cloud).
  • Training – It looks like Cloudera offers the most complete and professional training program. This also reflected in the price.
  • Tutorials – All three offer various tutorials and sandboxes to get started

Back to the question; “Which Big Data Platform Distributions is the best?”. Go ahead and find out for yourself. Determine which of the points above are important to your situation and try it out for your self.

If you have anything to contribute, please let me know. I haven’t performed a thorough comparison, yet. Maybe Gartner can help out a bit as well.

Thanks for reading.

The Hortonworks Connected Data Platforms

As part of the Big Data Platform Distributions week, I will have a closer look at the Hortonworks distribution.

Hortonworks was founded in 2011 when 24 engineers from the original Hadoop team at Yahoo! formed Hortonworks. This included the founders Rob BeardenAlan Gates, Arun Murthy, Devaraj Das, Mahadev Konar, Owen O’Malley, Sanjay Radia, and Suresh Srinivas. The name Hortonworks refers to Horton the Elephant, which relates to the naming of Hadoop.

“The only way to deliver infrastructure platform technology is completely in open source.”

The Hortonworks solution aims to offer an platform to be able to process and store data-in-motion as well as data-at-rest. This platform is a combination of Hortonworks Data Flow (HDF) and Hortonworks Data Platforms (HDP®). This way Hortonworks is not only about doing Hadoop (HDP), but it is also connecting data platforms via HDF.

Since the birth of Hortonworks they have had a fundamental belief: “The only way to deliver infrastructure platform technology is completely in open source.” Hortonworks is  also member of the Open Data Platform Initiative; “A nonprofit organization committed to simplification & standardization of the Big Data ecosystem with common reference specifications and test suites”

Hortonworks Data Flow

The Hortonworks Data Flow solution for data-in-motion includes 3 key components:

  • Data Flow Management Systems – a drag-&-drop visual interface based on Apache Nifi / MiNifi. Apache NiFi is a robust and secure framework for routing, transforming, and delivering data across a multitude of systems. Apache MiNiFi (a light-weight agent) is created as a subproject of Apache Nifi and focuses on the collection of the data at the source.
  • Stream Processing – HDF supports Apache Storm and Kafka. The added value is in the GUI of Streaming Analytics Manager (SAM), which eliminates the need to code streaming data flows.
  • Enterprise Services – Making sure that everything works together in an enterprise environment. HDF supports Apache Ranger (Security) and Ambari (Provisioning, management and Monitoring). The Schema Registry builds a catalog so data streams can be reused.

HDF-Data-Motion-Platform-1024x532

Streaming Analytics Manager and Schema Registry are both open source projects. Until this moment they are not part of the Apache Software Foundation project.

Hortonworks Data Platforms

Hortonworks solution for data-at-rest is Hortonworks Data Platform (HDP). HDP consists of the following components.

Hortonworks Data Platform

Hortonworks is also available in the cloud with two specific products:

  • Azure HDInsight – a collaboration between Microsoft and Hortonworks to offer a Big Data Analytics platform on the Azure Cloud.
  • Hortonworks Data Cloud for AWS – deploy Hortonworks Data Cloud Hadoop clusters on AWS infrastructure.

How to get started?

The best way to get to know the Hortonworks product(s) is by getting your hands dirty. Hortonworks offers Sandboxes on a VM for both HDP as well as HDF. These VM’s come in different flavours, like VMWare, VirtualBox and Docker. Go and download a copy here. For questions and other interactions go to the Hortonworks community.

Thanks for reading.