Data Vault for Dummies

One of the pillars on which the company I work for (Scamander Solutions) is based on is (sharing) knowledge.

Last week I attended a session at the office about Data Vault. This is a subject which isn’t discussed very often within the Oracle BI community. Nevertheless it is an interesting subject. During the meeting we discussed the possibilities of designing a Data Vault for one of our clients. For me it has been few years ago that I first got into contact with this subject. Our Certified Data Vault Data Modeler (CDVDM) : Denny de Jonge, was kind enough to refresh my memory.

Data Vault for Dummies:

The Data Vault is based on the following Object Types:

http://gerardnico.com/wiki/data_modeling/data_vault

  1. Hubs
    • Identification of an entity
      • Customer
      • Invoice
      • Order
  2. Links
    • Link between two hubs
      • a Customer places an Order
  3. Satellites
    • Details  / Description of a Hub
      • Context
    • Specification of a Link
      • Alternative payment

How to Build the Data Vault?

You build a Data Vault by defining the Hubs first. When the Hubs ar there, it’s time to build the Links between the Hubs. Last but not least you can build the Satelites to describe the Hubs. 

 http://danlinstedt.com/about/data-vault-basics/

Ad 1. Hub

A Hub consists of the following columns:

  • Unique Identification – CustomerId, OrderId (Sequence)
  • Identification – eg. CustomerNumber, OrderNumber
  • RecordSource -Which source does this record come from
  • LoadDate – Date the record has been loaded

Ad 2. Link

A Link consists of the following columns:

  • Unique Identification of the Hub – Customer (CustomerId)
  • Unique Identification of the linking Hub – Order (OrderId)
  • RecordSource -Which source does this record come from
  • LoadDate – Date the record has been loaded

Ad 3.  Satellites

A Satellite could consist of the following columns:

  • CustomerName
  • Address
  • Fax
  • Email
  • RecordSource -Which source does this record come from
  • LoadDate – Date the record has been loaded
  • Unique Identification of the Hub – Customer (CustomerId)
  • EndDate

The first columns are describing the concerning Hub. The LoadDate and the CustomerId uniquely identify a Satellite-record.

Additional Comments

  • Unit of Work

In Data Warehouse terms a Unit of Work is the definition of a load operation. Is eg. in the case of a mistake the whole batch rejected or is only the erroneous record disapproved? In a Data Vault a Unit of Work is a combination of a Hub and a Link

  • Everything fits

Whatever you load or wherever you load from, it is always possible to load the data. Everything that has been loaded can be monitored via eg. Errormarts, ‘corrected’ afterwards via updates and managed via Version Control.  

  • Data Integration vs. Data Interpretation

Data is integrated close at the source via the Data Vault. The meaning / interpretation of the data is situated in the Datamart.

Of course there is a lot that could be added to the subject of Data Vault, but for me it’s enough for now.

Author: Daan Bakboord

I am an Oracle Big Data Analytics Consultant with great interest in anything closely related to the Oracle Big Data Analytics (OBIEE, BICS, OAC, Big Data, Data Integration, Data Visualization, Data Management, Data Architecture).

2 thoughts on “Data Vault for Dummies”

  1. I need to improve my knowledge on this one.

    What I know is that Dan Lindsted has worked a lot with Teradata and then that this model is (may be) more adapted with MPP database for the performance side.

    Thanks for the information and the links.

    Be careful with the external links (such as Denny De Jong and Scamander), they point to your own website and then we get an error 404.

    Cheers
    Nico

    Like

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s