Monday, August 22, 2016

DW Patterns: The good - type 2

I'm afraid this is going to be another short one - buying a house apparently takes time, and so does watching Netflix. It's a tough job, but someone's gotta do it.

Anyway, the topic today is Type 2 - probably the most misunderstood of dimensions. Since this is The Good, I'm going to be outlining just what it is good for, but just a word of warning - Type 2 SCD only solves one problem in terms of change tracking - it's not a magic "now we have all history" bullet, and under no circumstances should you apply it as a default pattern. If you do so, you risk doing some fairly hefty damage to your ability to perform meaningful analysis.

Type 2: Add new row

See definition on Wikipedia

This is the first kind of SCD where we actually do tracking of changes to dimension entities over time. An important thing to get your head around is the word "entity" - this is where you have a business object which forms the grain of your dimension table. If you have a dimension based around such an entity, you will probably have at least some attributes that you want to be type 2. As an example, you might have Dim_Student; the student may have names (which you probably don't want to make type 2 as I mentioned previously), an ID, probably demographics like ethnicity and age.
Type 2 will work for some of these, and not others. Some candidates are obvious - we know that age changes over time, and we want to be able to relate our facts to the age of the student at the time that the event that our fact measures occurred (e.g. enrolment). This works well because we have an attribute (age) of our entity (student) which changes over time, and most importantly the value changing over time represents a change in the entity over time.

Let's take another example - ethnicity. Students' ethnicity can change over time for a variety of reasons - maybe they didn't know their ancestry before, maybe they've reconnected with it, or in some cases they're claiming it because they realize it allows them to apply for certain scholarships. Whatever the reason, it doesn't represent a change in the student, it represents an update to what we know about the student. This is a Type 1 attribute.

"Hold on, we need to match what we report to the government, so it has to be Type 2."
Absolutely - in this case, your Dim_Student is not modelling a student - it's modelling a student record. It's very important that you are completely honest with yourself about what it is that you're modelling - student records have very different needs from actual students. Fear not, though, I'm not suggesting that you have a Dim_Student and a Dim_Student_Record dimension - you can have the best of both worlds and model both entities in one dimension, though you may wind up making some compromises in order to get that simplicity. Stay tuned to find out how! (Just kidding - it's type 6. I really will get there though...)

Continue your journey through my mind with DW Patterns: The good - type 3

No comments:

Post a Comment