posted 4/17/2011 by MarkGStacey - Views: [7133]
(Download excel data sheet: http://www.bidn.com/Assets/Uploaded-CMS-Files/6e42b004-cc2c-4e57-8ca1-8bc1a0af872aExcelDataTranspose.xlsx )
(Download entire post : http://www.bidn.com/Assets/Uploaded-CMS-Files/b0b5ef30-ab25-497b-aa78-a40bc54ad2dfPPTraining.zip )
Many times, data is provided in an excel, delimited, or fixed width file format, and the data within this file has multiple measures in columns. Ideally, you would like to be able to slice by the dimension provided as column headers, and often, this requires manual manipulation of the file which would need to be done every time a new file is provided.
The technique we will demonstrate today will handle adding new data simply by appending new rows, or replacing the data in the worksheet - or ideally, simply adding a new worksheet and changing the worksheet reference.
The excel functions we will be using to do this are, in order of use:ROW([reference]): Returns the row number of either the current row, or if an argument is used, the referenced row.ADDRESS(row_num, column_num, [abs_num], [a1], [sheet_text]): obtain the address of a cell in a worksheet, given specified row and column numbers.ROUNDUP(number, num_digits): Rounds a number up, away from 0 (zero).INDIRECT(ref_text, [a1]): Returns the reference specified by a text string. In combination with ADDRESS, this lets you programmatically address cells to enable reuseTRANSPOSE(array): The TRANSPOSE function returns a vertical range of cells as a horizontal range, or vice versa. The TRANSPOSE function must be entered as an array formula (array formula: A formula that performs multiple calculations on one or more sets of values, and then returns either a single result or multiple results. Array formulas are enclosed between braces { } and are entered by pressing CTRL+SHIFT+ENTER.) in a range that has the same number of rows and columns, respectively, as the source range (range: Two or more cells on a sheet. The cells in a range can be adjacent or nonadjacent.) has columns and rows.This is really the heart of the technique, and will require you to be comfortable with ARRAY FUNCTIONS, which act on a range of cells rather than a single cell.MOD(number, divisor): Returns the remainder after number is divided by divisor.
See the attached workbook for a fully filled out example. This data was obtained from Statistics SA , and is publicly available.
Total value of mineral sales according to mining divisions, mineral
groups and minerals - Actual indices by year, month and minerals
Total, gold included
Total, gold excluded
Gold
Iron ore
Chromium
1980
January
1397.3
433.5
963.9
23.9
7.4
February
1531
403.7
1127.2
23.1
10.5
March
1309.2
472.3
836.9
34.6
12.2
April
1157.8
454.8
703
26.9
7.5
May
947.5
387.1
560.4
34.1
9.5
June
1093.2
359.6
733.6
20.2
9
July
1592.4
490.8
1101.6
27.6
7.6
August
1316.7
437.6
879
22.9
10.8
September
1404.8
435.6
969.1
24.9
8.4
October
1303.2
450.3
852.9
22.7
6.1
November
1312.3
389.8
922.5
19.2
7.8
December
1138.1
393.3
744.8
16.2
7.3
1981
1274
483.8
790.2
26.5
9.1
1124.3
442.7
681.5
24.4
This data has additional columns and rows of course ~ but the issue is immediately apparent – “Gold”, “Iron Ore” etc. should be a dimension. In addition, the blank rows between 1980 and 1981 will present a problem – that solution can be done using the ROWS/Transpose technique we’lluse later, but has a simpler solution working in this worksheet.Add a new column between “A” and “B” and in the new “B6” enter this formula:=IF(A6="", B5, A6)Copy this formula down for the entire row – if the original date column has a value, use it, else fill in from above.
Of course, if you are importing data from a new spreadsheet, you would do this on a new worksheet, rather than adding a column to the existing one. We are going to use this technique, so delete the new column you created, and create a new worksheet called Table11DateColumn.In the cell A6, put the following formula: =IF(Table11!A6="",A5,Table11!A6)
Copy this formula out for the same number of rows as you have data in the first sheet, and you now have a filled out date sheet.
So we have filled in the data blanks in the columns (and repeat this for as many columns as you need to). In order to do the transposition, we need to know how many columns are being changed to rows. In this case, the columns to be converted to a dimension are in C5:R5, 16 items. Note this down.Create a new worksheet to do the transposition. In our first column, add a heading “Row”, and in the second row put the formula “=ROW()-1”. The-1 is to compensate for the heading. In column “B”, put the heading “Start Ref”, and the formula :=ADDRESS(ROUNDUP(A2/16,0)+5,3,,,"Table11")Starting in the middle of this formula : ROUNDUP(A2/16) will take the row within this sheet, and convert it to a row in the original worksheet – because we are converting 16 columns to rows, there will be 16 times as many rows in this sheet as the previous one.Then, we add 5 to compensate for the headers in the first sheet. This gives us the first argument to ADDRESS, the row number. The second argument is the column number, which is 3 as we are starting with column C.The final argument is the worksheet to use.In column C add the heading “EndRef”, and the formula :=ADDRESS(ROUNDUP(A2/16,0)+5,18,,,"Table11")The only difference here is that we are referencing the 18th column, column “R”
Take the row you’ve created, and paste it 16 times ~ ie the number of columns we will convert to rows.
Row
Start Ref
End Ref
1
Table11!$C$6
Table11!$R$6
2
3
4
5
6
7
8
10
11
12
13
14
15
16
Now we get to the magic: extracting the values. In column “D”, put the heading “Values”, then in D2 put the formula “=TRANSPOSE(INDIRECT(B2):INDIRECT(B3)”
INDIRECT uses a text value, the Start Ref and End Ref you calculated previously to get a value from a cell. TRANSPOSE takes a range, and transposes it ~ currently, it’ll only return a single value.
In order to return a range, we need to convert this cell formula to a range formula. Click on the formula you just created, and select it and the next 15 rows, for a total of 16, to match the column count) – this should be rows D2 to D17. Now press F2, and then Ctrl-Shift-Enter. This creates an array formula and transposes our values. You should see the below:
Value
26
17.5
79.9
7.1
36.3
102.3
17.1
1.7
8.7
6.7
116
Our next step is to get the dimension values into this table: we will start with the dimension that was a measure previously. In column E, add a heading “MeasureDimRef”, and put a formula:=ADDRESS(5,MOD(A2-1,16)+3,,,"Table11")A little bit different here. For the row, we know the values are in row 5. For the column, we first adjust the row number to be from 0 to 15 instead of 1 to 16 – then MOD divides that number 1 and returns the remainder, i.e. row number from the transposed rows will give us the column number. We then add 3 to get to row C for the first row, as the MOD function is returning a 0 based result. NB: This zero based result approach to getting the correct result, else the final row will return a value for the first row.Add a new heading “MeasureDim” to column F, and put in the formula =INDIRECT(E2) to reference the cell found by the previous formulaCopy these formulae out for all 16 rows.
This will give you the following results:
MeasureDim Ref
MeasureDim
Table11!$C$5
Table11!$D$5
Table11!$E$5
Table11!$F$5
Table11!$G$5
Table11!$H$5
Copper
Table11!$I$5
Manganese ore
Table11!$J$5
PGMs
Table11!$K$5
Nickel
Table11!$L$5
Other metallic minerals
Table11!$M$5
Coal
Table11!$N$5
Building materials
Table11!$O$5
Building materials: granite or norite
Table11!$P$5
Building materials: lime and limestone
Table11!$Q$5
Building materials: other
Table11!$R$5
Other non-metallic minerals
Our final steps are to get the existing dimensions. Let’s do Month first :Add “Month” to the heading, then use the following formula:=INDIRECT(ADDRESS(ROUNDUP(A2/16,0)+5,2,,,"Table11"))We’re combining the INDIRECT and ADDRESS functions: ROUNDUP(A2/16) + 5 gives us the original row number, and we use 2 for the column. Finally, we’ll do the same for the year. Use the column header “Year”, and enter the formula=INDIRECT(ADDRESS(ROUNDUP(A2/16,0)+5,1,,,"Table11DateColumn"))
The only difference between this and the month is that we are referencing our filled in years on our derived sheet.And you’re now done ~ the next steps are to use this data in PowerPivot, or even in traditional pivot charts.
NB: PowerPivot can't use data from a table valued function. One more step is needed to pull this data into powerpivot
Thanks for posting this very useful post.