Everyone is talking about Big Data these days but there is little agreement as to the type of data structure to use to organise all our ‘Big’ data. In fact in some minds it’s still the assumption that little in the way of structure is actually required. I don’t intend to get into the whole argument around should (or shouldn’t) a data structure be used within Hadoop but rather explore a little into the value of a dimensional approach to structuring our Big Data.
A common technique used to deliver old style data warehousing and business intelligence solutions, is the use of dimensional tables. In a paper back in 2013 some guys from SAS (the link is shown at the bottom of this post). They found that using a dimensional structure can provide performance improvements of 40, 50 or even in some case 70 times faster for query access.
One particular dimensional data structure concept that is often used albeit rather poorly is the conformed dimension. The concept of conformed dimensions is that it makes sense to design one instance of a dimension and re-use when ever this dimension is required. For example if we have a person dimension that person dimension would be used every time we need either an employee dimension, a customer dimension or a contact dimension etc, etc. This provides us with a level of consistency in design which means we gain benefit from the user interpretation perspective as well as the physical implementation angle.
So it’s not the biggest mental leap to conclude that Big Data environments would gain massive benefits by adopting a conformed dimension approach for data delivery. This can also be extended to fit with a dual technology approach where traditional data warehousing architecture is supported by a Big Data architecture. Just think we can share our conformed dimensions across the two architectures.
Anyway to read more on this.