Normalize Excel tables

Business data is quite often expressed across many dimensions. The profitability equation in a company is very simple in concept, but in practice those revenues come across regions, product lines, products etc., making them in fact multidimensional data.

Users of OLAP systems are very aware of multidimensional data. However, many spreadsheet users are not, so they manage to flatten the data the best they can, using pages or subtotals for dimensions beyond the second one.

Modern spreadsheets have “PivotTable” capabilities, which makes easier to deal with multidimensional data. To feed these pivot tables, the information has to be normalized. This post explains how to normalize data from non-normal representation, using a Visual Basic macro

If you are familiar with OLAP and normalized data, skip to the code. I’ll show samples of non-normalized data, how the same data would look normalized, and why this recipe is useful.

Very often people provide me with data on Excel sheets that I need to analyze across different “breakdown dimensions”. The specific workbooks I receive usually have several worksheets with similar structures (for instance, one per country). Within the worksheet there are usually several tables with similar structure, too, which add together. Easier to show with an example:

The numbers show a quantity (let’s say production) for products in a given platform, for different years. When a new year is needed, the tables are extended to the right, and new products are added in new lines as needed. Each table shows products belonging to a platform (or produced in a plant, or whatever). Now, let’s assume there are several countries. Typically, the Excel user just replicates the structure in another sheet and has one sheet per country. Quick and easy.

When analysis is required, things become harder. Usually formulas are added at the end of rows and columns to sum up. Sometimes, a “dimension” is worked out having subtotal rows. SUM ranges now become of different lengths depending on the number of rows in the subrange. Spreadsheets become error prone and hard to maintain because the data and the formulas are not separated. If 10 new products are added across several categories, the spreadsheet user will have to go inserting rows, expended formulas, etc. Some clever users add formulas in the corners of the table that compare the sum of rows with the sum of columns to make sure no formula was left improperly expended. Sounds familiar?

All the grief in maintaining those sheets can be avoided using PivotTables. There are many tutorials on PivotTables out there, so I won’t go into repeating it. Believe me… if the scenario depicted above is familiar to you and you are not using pivottables, make yourself a favor and read the tutorials. The two images below give an idea of the reports that the PivotTable tool will make for you. Need a report looking somhow different or with a different “cut” of the data? You can have hundreds of them linked to the same data and they will update automagically when you add data. New years? No problem… as soon as some data element refers say, to 2009 in the example, each PivotTable you had will extend in the appropriate dimension to include this year.

The only problem, is that to use PivotTables, the information

```Sub normalize()
' Takes a matrix with names in rows and columns and turns it into a
' normalized version.
' Names for rows and columns are expected to be outside the selection
' Cell {-1,-1} is an attribute to be repeated in all entries
' Spreadsheet name is an attribute to be repeated in all entries
' Will modify the row and column structure of the spreadsheet
' Will overwrite anyting in the destination area
' Will not check if there is enough space on the spreadsheet

Dim ss As Range
Dim rr As Integer, cc As Integer
Dim startRow As Integer, startCol As Integer
Set ss = Selection

rr = ss.Rows.Count
cc = ss.Columns.Count

If rr < 2 Then GoTo normalize_done
If cc < 2 Then GoTo normalize_done

startRow = ss.Row
startCol = ss.Column

If startRow < 2 Then GoTo normalize_done
If startCol < 2 Then GoTo normalize_done

' Copy all the data transposed into rows
For ci = 2 To cc
Application.Intersect(ss, Columns(startCol + ci - 1)).Copy
Cells(startRow + rr * (ci - 1), startCol).PasteSpecial xlPasteValues
Next ci

' Copy the row names in all the transposed rows
Range(Cells(startRow, startCol - 1), Cells(startRow + rr - 1, startCol - 1)).Copy
Range(Cells(startRow + rr, startCol - 1), Cells(startRow + rr * cc - 1, startCol - 1)).PasteSpecial xlPasteValues

Dim v As String
v = ActiveSheet.Name
' Copy the column names, matrix name and sheet name in each of the rows
For ci = 1 To cc
Cells(startRow - 1, startCol + ci - 1).Copy
Range(Cells(startRow + rr * (ci - 1), startCol + cc), _
Cells(startRow + rr * ci - 1, startCol + cc)).PasteSpecial xlPasteValues
Cells(startRow - 1, startCol - 1).Copy
Range(Cells(startRow + rr * (ci - 1), startCol + cc + 1), _
Cells(startRow + rr * ci - 1, startCol + cc + 1)).PasteSpecial xlPasteValues
Range(Cells(startRow + rr * (ci - 1), startCol + cc + 2), _
Cells(startRow + rr * ci - 1, startCol + cc + 2)).Value = v
Next ci

' Move the last columns back to be closer to the data and clean up
Range(Cells(startRow, startCol + cc), _
Cells(startRow + rr * cc - 1, startCol + cc + 2)).Cut _
Destination:=Range(Cells(startRow, startCol + 1), _
Cells(startRow + rr * cc - 1, startCol + 3))
Range(Cells(startRow - 1, startCol + 4), _
Cells(startRow + rr * cc - 1, startCol + cc + 2)).ClearContents
Range(Cells(startRow, startCol - 1), _
Cells(startRow + rr * cc - 1, startCol + 3)).Select
Selection.Style = "Normal"
Application.CutCopyMode = False
normalize_done:

End Sub
```