Bangalore – Bengaluru Slowly Changing Dimensions

I would like to introduce Type 6 EBS Modified SCD and get your feedback. I couldn’t find any documentation explaining this approach so I guess this is a first, please prove me wrong.

So it’s finally happened Bangalore is now officially Bengaluru. Naturally this means a lot of the changes on the road and Highways of India. Changes to Google maps and other cartography software and most importantly the postal system.

A minor impact is going to be the database systems which store addresses. So I figured now would be a good time to revisit slowly changing dimensions. Most people are familiar with the Slowly Changing Dimension task available in SSIS DFT. But people don’t realize there are more than 3 SCD types. So here are the types once more

Type 0

No Change at all, Ignore any change to the state columns

Id

City

State

1

Bangalore

KA

 

This is probably not what most people will do since they would want to reflect the new name on their site.

Type 1

Overwrite existing record

Before

Id

City

State

1

Bangalore

KA

 

After

Id

City

State

1

Bengaluru

KA

 

This solution seems to be the best or easiest approach. Historical representation of the data doesn’t add value in most cases.

Type 2

Store Historical data by adding new row.

Before

Id

City

State

1

Bangalore

KA

 

After

Id

City

State

EndDate

1

Bangalore

KA

31 Oct 2014

2

Bengaluru

KA

NULL

 

Type 3

Add new column for change data

Before

Id

City

State

1

Bangalore

KA

 

After

Id

City

State

New City Name

1

Bangalore

KA

Bengaluru

 

Most likely not going to be used since it doesn’t provide any effective date for the new change and doesn’t not support. Also complicates the application logic.

Type 4

Store historical data in history table

Now(Current table)

Id

City

State

1

Bengaluru

KA

 

Now (History)

Id

City

State

Create Date

1

Bangalore

KA

1 November 2014

 

A popular approach that is used frequently to capture change but it makes the historical data virtually inaccessible for real time queries without complicated application logic.

Type 6

A Combination of Type (1 +2 +3). It’s a slight complicated model. I can’t see any real world scenario where this is actually implemented I think it’s more of an intellectual exercise.

Before

Id

City

State

1

Bangalore

KA

 

After

Id

City

State

New City Name

Effective Date

Current

1

Bangalore

KA

Bangalore

31 Oct 2014

N

2

Bengaluru

KA

Bengaluru

NULL

Y

 

Type 6 EBS Modified

This is a modified version of the type 6 Dimension that I came up with while writing this blog. I haven’t got the details ironed out but would love to hear any thoughts on this approach.

Before

Id

City

State

1

Bangalore

KA

 

After

Id

City

State

Parent

Effective Date

1

Bangalore

KA

1

31 Oct 2014

2

Bengaluru

KA

1

1 Dec 2050

3

Bengaluru

Union territory

2

NULL

 

The difference between this one and the Type 6 is the capture of additional hierarchy of the change information. The best way to understand this is to compare the result of the “after” case in both types. In the Type 6 EBS Modified version we get rid of the Current flag and use NULL effective date to identify current rows. Additionally we replace the New City Name with ParentID, why? The Type 6 Dimension will fail if there are 2 cities called Bangalore in different state of India. It loses the ability to correctly identify the parent record of the change. The idea being that by recursively looping this table on id and parent id we get the complete and accurate change log for the City column.