The MapR Converged Data Platform

As part of the Big Data Platform Distributions week, I will have a closer look at the MapR distribution.

John Schroeder founded MapR in 2009 and served as the company’s CEO until 2016.

MapR offers their Converged Data Platform (CDP). The vision behind this platform is to offer one integrated platform for Big Data, which enable batch (e.g. ETL offload, log file analytic), interactive (e.g. BI / Analytics) and streaming (e.g. Sensor Analytics) capabilities. MapR offers one integrated platform to prevent building data silos and point solutions.Mapr_Converged_Data_PlatformIt’s incorrect to think of the MapR CDP as proprietary. Of course the MapR CDP is powered by three platform services:

  • MapR-FS –> It’s good to understand why MapR decided to introduce their own file system instead of HDFS. Check out the explanation in one of MapR’s Whiteboard video’s.
  • MapR-DB –> explained via a Whiteboard
  • MapR Streams –> How differentiates MapR from similar products in the market. Also explained via a Whiteboard.

But apart from that MapR supports several projects of the Apache™ Hadoop® project (Apache Storm, Apache Pig, Apache Hive, Apache Mahout, YARN, Apache Sqoop, Apache Flume, etc.). Apache Drill™ is MapR’s SQL query engine. So the MapR Converged Data Platform is a mix of proprietary as well as open-source.

Central in the MapR philosophy is ‘Convergence’. I do not know the exact definition of ‘Convergence’ but in the MapR context it’s all about; “integrating Data-in-Motion & Data-at-Rest to support real-time applications”. Since the end of 2016 MapR supports this philosophy by using Event-Driven Micro-services. The idea behind these Micro-services (combined with specific API’s) is that they unify all kinds of data (structured, semi-structured and un-structured) as well as streaming & event data. These Micro-services are designed in such a way that they remove complexity and enhance several tasks.MapR_apiTo get the most out of the CDP and to speed up the process, MapR is delivering a Converged Application Blueprint to get things started. This blueprint includes:

  • Sample apps (incl. source)
  • Architecture guides
  • Community-supported best practices (use these wisely)

The Converged Data Platform is the flagship product within MapR. Other products include:

MapR Converged Data Platform Now Available in Oracle Cloud Marketplace

There is are a few areas where MapR and Oracle have a connection. One is via the Oracle Cloud Marketplace. This enables Oracle Cloud customers to use MapR in the Oracle Cloud. Oracle Data Integrator, being open and heterogenous, seems to integrate with MapR very well. Check out Issam Hijazi’s findings.

How to get started?

The best way to get to know the MapR product(s) is by getting your hands dirty. Try MapR and download a Sandbox. For questions and other interactions go to the MapR community.

Thanks for reading