HISTOGRAM

HISTOGRAM

A histogram is graphical representation of the distribution of numerical data.. Usually it has bins, where every bin has a minimum and maximum value

It was first introduced by Karl Pearson.

Basically, histograms are used to represent data given in form of some groups. X-axis is about bin ranges where Y-axis talks about frequency. It is a kind of bar graph.

What is Bin?? : divide the entire range of values into a series of intervals -- and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent, and are often (but are not required to be) of equal size.

It is a kind of bar graph. Basically, histograms are used to represent data given in form of some groups. X-axis is about bin ranges where Y-axis talks about frequency. So, if you want to represent age wise population in form of graph then histogram suits well as it tells you how many exists in certain group range or bin, if you talk in context of histograms.

Bin 0 to 100 100 to 200 200 to 300 300 to 400 400 to 500 500 to 600 2.5 to 3.49

Count 10 15 21 45 35 14 23

This is the data for the histogram to the right, using 500 items:

BASIS

HISTOGRAM

BAR GRAPH

What is ...? Indicates

It refers to a graphical representation, that displays data by way of bars to show the

Distribution of non-discrete variables

Bar graph is a pictorial representation of data that uses bars to compare different

Comparison of discrete variables

Presents Quantitative data

Categorical data

Spaces

Bars are close to each other (no space in between)

Data

Reordering of bar Possible? Width of bars

Data(Values) are grouped together, so that they are considered as ranges. No

May differ

Bars are not very close to touch each other( there is a space between bars). Data (Values) are taken as individual entities.

Yes

Same

PLOTTING HISTOGRAM USING MATPLOTLIB

Step 1: Collect the data for the histogram For example, let's say that you have the following data about the no. of books written by 100 individual authors:

NO of Books Written 01,01,02,03,03,05,07,08,09,10,10,11,11,13,13,15,16,17,18,18,18,19, 20,21,21,23,24,24,25,25,25,25,26,26,26,27,27,27,27,27,29,30,30,31, 33,34,34,34,35,36,36,37,37,38,38,39,40,41,41,42,43,44,45,45,46,47, 48,48,49,50,51,52,53,54,55,55,56,57,58,60,61,63,64,65,66,68,70,71, 72,74,75,77, 81,83,84,87,89, 90,90,91

Step 2: Determine the number of bins Now, determine the number of bins to be used for the histogram. For simplicity, let's set the number of bins to 10.

Step 3: Plot the histogram in Python using matplotlib

OUTPUT

Another Method to Calculate Bin:

Other Method to calculate bin Using formulas: N = number of observations = 100 Range = max value ? min value = 91 ? 1 = 90 No. of intervals = n = 100 = 10 Width of intervals = Range / (No. of intervals) = 90/10 = 9

Based on this information, the frequency table would look like this:

Intervals (bins) 0-9

10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90?99

Frequency 9 13 19 15 13 10 7 6 5 3

Note that

The starting point for the first interval is 0, which is very close to the minimum observation of 1 in our data-set. (If, for example, the minimum observation was 10 in another dataset, then the starting point for the first interval should be 10, rather than 0.)

Note :

bins in the Python code below, you'll need to specify the values highlighted in RED, rather than a particular number (such as 10, which we used before). We must include the last value of 99.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download