Thursday, January 15, 2009

How to model a histogram/Gaussian Distribution in Microsoft Excel 2007

I had tried quite hard to model a histogram/Normal distribution given a series of data. Never quite got the hang of it..

So-once, when I finally got the leads in the right direction thanks to my Project Management faculty, I decided to put it up here :)

You need to use the Analysis ToolPak in Microsoft Excel to use the Statistics Toolpacks for modeling a histogram. The Analysis ToolPak is a Microsoft Office Excel add-in program that is available when you install Microsoft Office or Excel. To use it in Excel, however, you need to load it first.

Here's how to go about the whole thing-
  1. Click the Microsoft Office Button, and then click Excel Options.
  2. Click Add-Ins, and then in the Manage box, select Excel Add-ins.
  3. Click Go.
  4. In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK.

    Note- If you get prompted that the Analysis ToolPak is not currently installed on your computer, click Yes to install it.

  5. After you load the Analysis ToolPak, the Data Analysis command is available in the Analysis group on the Data tab.
Once the Data Analysis command is showing in the Data tab, you need to use the following steps to get the histogram-
  1. Create the bins(buckets) to put along the X-axis. Preferably, use equally-spaced intervals, so you would be able to identify the skewness in a better manner.
  2. On pressing the Data Analysis command on the Analysis section in the Data tab, you will be prompted with a set of tools to use. This list ranges from 'Anova-Single factor' to 'z-test: Two Samples for Means'. Choose Histogram in this list and press OK.
  3. You will get a prompt asking you to enter the input range and the Bin range. Enter these 2 values in the prompt and you are all set.
  4. Excel returns you a frequency chart with the intervals(Bins) and frequency in a separate tab. You can change the options here, if you want the frequency chart to be created in the same tab or on a new worksheet.
  5. Once you have the frequency chart, use the 'Insert Charts' option and add a 'Clustered Column chart' in the 'Column charts' to obtain a histogram chart. You can thus get a far bit of an idea if the pattern approximates the normal distribution.
  6. If you are still not convinced and would like to get a nice bell-curve rather, that's possible too. Use the 'Scatter' chart and select the 'Scatter with smooth lines' to get the same.
Voila- there you are matching it up with Mr. Gauss on his own distribution ;)

-Sameer

1 comment:

Milan said...

i've been trying to get right aligned balanced scatter pie for heuristic modelled data for time series generated from pre-guassian histogram? is there any way i could do that in excel 2003. i need that urgently for my final year project..