Normalize Excel tables

Business data is quite often expressed across many dimensions. The profitability equation in a company is very simple in concept, but in practice those revenues come across regions, product lines, products etc., making them in fact multidimensional data.

Users of OLAP systems are very aware of multidimensional data. However, many spreadsheet users are not, so they manage to flatten the data the best they can, using pages or subtotals for dimensions beyond the second one.

Modern spreadsheets have “PivotTable” capabilities, which makes easier to deal with multidimensional data. To feed these pivot tables, the information has to be normalized. This post explains how to normalize data from non-normal representation, using a Visual Basic macro

If you are familiar with OLAP and normalized data, skip to the code. I’ll show samples of non-normalized data, how the same data would look normalized, and why this recipe is useful.

Very often people provide me with data on Excel sheets that I need to analyze across different “breakdown dimensions”. The specific workbooks I receive usually have several worksheets with similar structures (for instance, one per country). Within the worksheet there are usually several tables with similar structure, too, which add together. Easier to show with an example:

Pivot Example - Table 1

The numbers show a quantity (let’s say production) for products in a given platform, for different years. When a new year is needed, the tables are extended to the right, and new products are added in new lines as needed. Each table shows products belonging to a platform (or produced in a plant, or whatever). Now, let’s assume there are several countries. Typically, the Excel user just replicates the structure in another sheet and has one sheet per country. Quick and easy.

When analysis is required, things become harder. Usually formulas are added at the end of rows and columns to sum up. Sometimes, a “dimension” is worked out having subtotal rows. SUM ranges now become of different lengths depending on the number of rows in the subrange. Spreadsheets become error prone and hard to maintain because the data and the formulas are not separated. If 10 new products are added across several categories, the spreadsheet user will have to go inserting rows, expended formulas, etc. Some clever users add formulas in the corners of the table that compare the sum of rows with the sum of columns to make sure no formula was left improperly expended. Sounds familiar?

All the grief in maintaining those sheets can be avoided using PivotTables. There are many tutorials on PivotTables out there, so I won’t go into repeating it. Believe me… if the scenario depicted above is familiar to you and you are not using pivottables, make yourself a favor and read the tutorials. The two images below give an idea of the reports that the PivotTable tool will make for you. Need a report looking somhow different or with a different “cut” of the data? You can have hundreds of them linked to the same data and they will update automagically when you add data. New years? No problem… as soon as some data element refers say, to 2009 in the example, each PivotTable you had will extend in the appropriate dimension to include this year.

Pivot Example - Table 3

Pivot Example - Table 4

The only problem, is that to use PivotTables, the information

Pivot Example - Table 2

Sub normalize()
' Takes a matrix with names in rows and columns and turns it into a
' normalized version.
' Names for rows and columns are expected to be outside the selection
' Cell {-1,-1} is an attribute to be repeated in all entries
' Spreadsheet name is an attribute to be repeated in all entries
' Will modify the row and column structure of the spreadsheet
' Will overwrite anyting in the destination area
' Will not check if there is enough space on the spreadsheet

    Dim ss As Range
    Dim rr As Integer, cc As Integer
    Dim startRow As Integer, startCol As Integer
    Set ss = Selection
    rr = ss.Rows.Count
    cc = ss.Columns.Count
    If rr < 2 Then GoTo normalize_done
    If cc < 2 Then GoTo normalize_done
    startRow = ss.Row
    startCol = ss.Column
    If startRow < 2 Then GoTo normalize_done
    If startCol < 2 Then GoTo normalize_done
    ' Copy all the data transposed into rows
    For ci = 2 To cc
        Application.Intersect(ss, Columns(startCol + ci - 1)).Copy
        Cells(startRow + rr * (ci - 1), startCol).PasteSpecial xlPasteValues
    Next ci
    ' Copy the row names in all the transposed rows
    Range(Cells(startRow, startCol - 1), Cells(startRow + rr - 1, startCol - 1)).Copy
    Range(Cells(startRow + rr, startCol - 1), Cells(startRow + rr * cc - 1, startCol - 1)).PasteSpecial xlPasteValues
    Dim v As String
    v = ActiveSheet.Name
    ' Copy the column names, matrix name and sheet name in each of the rows
    For ci = 1 To cc
        Cells(startRow - 1, startCol + ci - 1).Copy
        Range(Cells(startRow + rr * (ci - 1), startCol + cc), _
            Cells(startRow + rr * ci - 1, startCol + cc)).PasteSpecial xlPasteValues
        Cells(startRow - 1, startCol - 1).Copy
        Range(Cells(startRow + rr * (ci - 1), startCol + cc + 1), _
            Cells(startRow + rr * ci - 1, startCol + cc + 1)).PasteSpecial xlPasteValues
        Range(Cells(startRow + rr * (ci - 1), startCol + cc + 2), _
            Cells(startRow + rr * ci - 1, startCol + cc + 2)).Value = v
    Next ci
    ' Move the last columns back to be closer to the data and clean up
    Range(Cells(startRow, startCol + cc), _
            Cells(startRow + rr * cc - 1, startCol + cc + 2)).Cut _
            Destination:=Range(Cells(startRow, startCol + 1), _
            Cells(startRow + rr * cc - 1, startCol + 3))
    Range(Cells(startRow - 1, startCol + 4), _
            Cells(startRow + rr * cc - 1, startCol + cc + 2)).ClearContents
    Range(Cells(startRow, startCol - 1), _
            Cells(startRow + rr * cc - 1, startCol + 3)).Select
    Selection.Style = "Normal"
    Application.CutCopyMode = False
End Sub

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>