Histograms
Main Concept
A histogram is a graphical representation of a frequency distribution of a sample of data. In a histogram, tabulated frequencies are shown as adjacent rectangles over discrete intervals, known as bins. The area of each rectangle is proportional to the frequency of observations in the interval and the height of a rectangle is equal to the frequency divided by the width of the interval.
The total area of the histogram corresponds to the number of observations in the data sample. If the data has been normalized, the resulting graph displays a relative frequency histogram where each rectangle shows the proportion of observations in its particular interval and the total area of the histogram is equal to 1.
The choice of the number of bins to use is important, however, there is no best number of bins as different bins sizes can reveal different features within a data set.
Number of Bins
There are various guidelines when picking the number of bins, where k is the number of bins and n is the range. The number of bins, k, can be calculated from a suggested bin width h as follows:
h = maximum− minimumh ,
where the braces indicate the ceiling function.
Square-root choice: The simplest method of deciding on the number of bins is to take square root of the number of data points.
k = n
Sturges' formula: Sturges' Formula is derived from a binomial distribution and assumes that the data is normally distributed. Sturges' formula has been known to perform poorly in some cases if n is less than 30 and if the data is not normally distributed.
k = log2n + 1
Scott's normal reference rule: Scott's normal reference rule minimizes the integrated mean squared error of the density estimate and is well suited for random samples of normally distributed data.
h = 3.5 σ^n13,
where σ^ is the sample standard deviation.
Freedman-Diaconis' choice: The Freedman-Diaconis' choice is based on the interquartile range (IQR). It is less sensitive to outliers in data than Scott's normal reference rule because of using the interquartile range.
h = 2 IQRSamplen13
Choose numbers between 1 and 102050100
# of bins =
More MathApps
MathApps/ProbabilityAndStatistics
Download Help Document