Nov30

Normalize Excel tables

Tagged with: .
2 Responses

Business data is quite often expressed across many dimensions. The profitability equation in a company is very simple in concept, but in practice those revenues come across regions, product lines, products etc., making them in fact multidimensional data.

Users of OLAP systems are very aware of multidimensional data. However, many spreadsheet users are not, so they manage to flatten the data the best they can, using pages or subtotals for dimensions beyond the second one.

Modern spreadsheets have “PivotTable” capabilities, which makes easier to deal with multidimensional data. To feed these pivot tables, the information has to be normalized. This post explains how to normalize data from non-normal representation, using a Visual Basic macro

If you are familiar with OLAP and normalized data, skip to the code. I’ll show samples of non-normalized data, how the same data would look normalized, and why this recipe is useful.

Very often people provide me with data on Excel sheets that I need to analyze across different “breakdown dimensions”. The specific workbooks I receive usually have several worksheets with similar structures (for instance, one per country). Within the worksheet there are usually several tables with similar structure, too, which add together. Easier to show with an example:

Pivot Example - Table 1

The numbers show a quantity (let’s say production) for products in a given platform, for different years. When a new year is needed, the tables are extended to the right, and new products are added in new lines as needed. Each table shows products belonging to a platform (or produced in a plant, or whatever). Now, let’s assume there are several countries. Typically, the Excel user just replicates the structure in another sheet and has one sheet per country. Quick and easy.

When analysis is required, things become harder. Usually formulas are added at the end of rows and columns to sum up. Sometimes, a “dimension” is worked out having subtotal rows. SUM ranges now become of different lengths depending on the number of rows in the subrange. Spreadsheets become error prone and hard to maintain because the data and the formulas are not separated. If 10 new products are added across several categories, the spreadsheet user will have to go inserting rows, expended formulas, etc. Some clever users add formulas in the corners of the table that compare the sum of rows with the sum of columns to make sure no formula was left improperly expended. Sounds familiar?

All the grief in maintaining those sheets can be avoided using PivotTables. There are many tutorials on PivotTables out there, so I won’t go into repeating it. Believe me… if the scenario depicted above is familiar to you and you are not using pivottables, make yourself a favor and read the tutorials. The two images below give an idea of the reports that the PivotTable tool will make for you. Need a report looking somhow different or with a different “cut” of the data? You can have hundreds of them linked to the same data and they will update automagically when you add data. New years? No problem… as soon as some data element refers say, to 2009 in the example, each PivotTable you had will extend in the appropriate dimension to include this year.

Pivot Example - Table 3

Pivot Example - Table 4

The only problem, is that to use PivotTables, the information

Pivot Example - Table 2

Sub normalize()
' Takes a matrix with names in rows and columns and turns it into a
' normalized version.
' Names for rows and columns are expected to be outside the selection
' Cell {-1,-1} is an attribute to be repeated in all entries
' Spreadsheet name is an attribute to be repeated in all entries
' Will modify the row and column structure of the spreadsheet
' Will overwrite anyting in the destination area
' Will not check if there is enough space on the spreadsheet

    Dim ss As Range
    Dim rr As Integer, cc As Integer
    Dim startRow As Integer, startCol As Integer
    Set ss = Selection

    rr = ss.Rows.Count
    cc = ss.Columns.Count

    If rr < 2 Then GoTo normalize_done
    If cc < 2 Then GoTo normalize_done

    startRow = ss.Row
    startCol = ss.Column

    If startRow < 2 Then GoTo normalize_done
    If startCol < 2 Then GoTo normalize_done

    ' Copy all the data transposed into rows
    For ci = 2 To cc
        Application.Intersect(ss, Columns(startCol + ci - 1)).Copy
        Cells(startRow + rr * (ci - 1), startCol).PasteSpecial xlPasteValues
    Next ci

    ' Copy the row names in all the transposed rows
    Range(Cells(startRow, startCol - 1), Cells(startRow + rr - 1, startCol - 1)).Copy
    Range(Cells(startRow + rr, startCol - 1), Cells(startRow + rr * cc - 1, startCol - 1)).PasteSpecial xlPasteValues

    Dim v As String
    v = ActiveSheet.Name
    ' Copy the column names, matrix name and sheet name in each of the rows
    For ci = 1 To cc
        Cells(startRow - 1, startCol + ci - 1).Copy
        Range(Cells(startRow + rr * (ci - 1), startCol + cc), _
            Cells(startRow + rr * ci - 1, startCol + cc)).PasteSpecial xlPasteValues
        Cells(startRow - 1, startCol - 1).Copy
        Range(Cells(startRow + rr * (ci - 1), startCol + cc + 1), _
            Cells(startRow + rr * ci - 1, startCol + cc + 1)).PasteSpecial xlPasteValues
        Range(Cells(startRow + rr * (ci - 1), startCol + cc + 2), _
            Cells(startRow + rr * ci - 1, startCol + cc + 2)).Value = v
    Next ci

    ' Move the last columns back to be closer to the data and clean up
    Range(Cells(startRow, startCol + cc), _
            Cells(startRow + rr * cc - 1, startCol + cc + 2)).Cut _
            Destination:=Range(Cells(startRow, startCol + 1), _
            Cells(startRow + rr * cc - 1, startCol + 3))
    Range(Cells(startRow - 1, startCol + 4), _
            Cells(startRow + rr * cc - 1, startCol + cc + 2)).ClearContents
    Range(Cells(startRow, startCol - 1), _
            Cells(startRow + rr * cc - 1, startCol + 3)).Select
    Selection.Style = "Normal"
    Application.CutCopyMode = False
normalize_done:

End Sub

No related posts.

This website uses IntenseDebate comments, but they are not currently loaded because either your browser doesn't support JavaScript, or they didn't load fast enough.

2 Comments to “Normalize Excel tables”
  1. Sach says:

    HI Juan Carlos,

    This could be very useful, but your images are not showing so I don't know if it's what I'm looking for. Would be great if you could resolve this.

    Thanks!

  2. HeathWalker says:

    Hi Juan Carlos

    I have a lot of data normalising to do and I do it manually using the PivotTable PivotChart wizard. This involves seven or eight mouse clicks and range selection using the arrow keys. I would really like to automate the process.

    I am not at all experienced in VBA but I have tried to build a macro by modelling it on the macro recorder and trying to modify the hard coding but I have run into problems so I was very interested to see your code doing it in a completely different way. However I can't get it to work. Can you help please?

    The data I was trying to get your code to work on is a sheet called "MultipleConsolidationMacroDevel". The data is in the range A1:P194. It has months across the top, products down the side with sales in the row/column intersections.

    When I select/highlight the data area and start your macro running nothing seems to happen at all.
    I am probably being very stupid but your macro looks really useful – how can I get it to work?

    Thank you in advance

    With kind regards

    Heath Walker

Sorry, the comment form is closed at this time.