One of the pillars on which the company I work for (Scamander Solutions) is based on is (sharing) knowledge.
Last week I attended a session at the office about Data Vault. This is a subject which isn’t discussed very often within the Oracle BI community. Nevertheless it is an interesting subject. During the meeting we discussed the possibilities of designing a Data Vault for one of our clients. For me it has been few years ago that I first got into contact with this subject. Our Certified Data Vault Data Modeler (CDVDM) : Denny de Jonge, was kind enough to refresh my memory.
Data Vault for Dummies:
The Data Vault is based on the following Object Types:
http://gerardnico.com/wiki/data_modeling/data_vault
- Hubs
- Identification of an entity
- Customer
- Invoice
- Order
- Identification of an entity
- Links
- Link between two hubs
- a Customer places an Order
- Link between two hubs
- Satellites
- Details / Description of a Hub
- Context
- Specification of a Link
- Alternative payment
- Details / Description of a Hub
How to Build the Data Vault?
You build a Data Vault by defining the Hubs first. When the Hubs ar there, it’s time to build the Links between the Hubs. Last but not least you can build the Satelites to describe the Hubs.
http://danlinstedt.com/about/data-vault-basics/
Ad 1. Hub
A Hub consists of the following columns:
- Unique Identification – CustomerId, OrderId (Sequence)
- Identification – eg. CustomerNumber, OrderNumber
- RecordSource -Which source does this record come from
- LoadDate – Date the record has been loaded
Ad 2. Link
A Link consists of the following columns:
- Unique Identification of the Hub – Customer (CustomerId)
- Unique Identification of the linking Hub – Order (OrderId)
- RecordSource -Which source does this record come from
- LoadDate – Date the record has been loaded
Ad 3. Satellites
A Satellite could consist of the following columns:
- CustomerName
- Address
- Fax
- RecordSource -Which source does this record come from
- LoadDate – Date the record has been loaded
- Unique Identification of the Hub – Customer (CustomerId)
- EndDate
The first columns are describing the concerning Hub. The LoadDate and the CustomerId uniquely identify a Satellite-record.
Additional Comments
- Unit of Work
In Data Warehouse terms a Unit of Work is the definition of a load operation. Is eg. in the case of a mistake the whole batch rejected or is only the erroneous record disapproved? In a Data Vault a Unit of Work is a combination of a Hub and a Link
- Everything fits
Whatever you load or wherever you load from, it is always possible to load the data. Everything that has been loaded can be monitored via eg. Errormarts, ‘corrected’ afterwards via updates and managed via Version Control.
- Data Integration vs. Data Interpretation
Data is integrated close at the source via the Data Vault. The meaning / interpretation of the data is situated in the Datamart.
Of course there is a lot that could be added to the subject of Data Vault, but for me it’s enough for now.
Thanks Nicolas, I have updated the links.
Cheers,
Daan
LikeLike
I need to improve my knowledge on this one.
What I know is that Dan Lindsted has worked a lot with Teradata and then that this model is (may be) more adapted with MPP database for the performance side.
Thanks for the information and the links.
Be careful with the external links (such as Denny De Jong and Scamander), they point to your own website and then we get an error 404.
Cheers
Nico
LikeLike