PROC BOXPLOT - Using the CLIPFACTOR Option to Produce More ...
PROC BOXPLOT - Using the CLIPFACTOR Option to Produce More
Readable Figures
Heidi Nasizadeh, Sangart Inc., San Diego, CA
Michelle Wang, Sangart Inc., San Diego, CA
ABSTRACT
The BOXPLOT procedure in SAS/Graph? provides the ability to visualize summary statistics such as maximum, third
quartile, median, mean, first quartile, and minimum. The SCHEMATIC option allows the outlier values within a group
to be shown as separate points. For exploratory analysis using ¡°dirty data¡± or data prone to extreme outliers, the box
and whisker elements can become too compressed for the figures to be easily interpreted. PROC BOXPLOT, will
show all the values in all groups and will not allow us to limit the y-axis to a value that is within the data range. Using
the CLIPFACTOR option to clip the extreme values produces a more readable and useful plot to visualize the data.
INTRODUCTION
A box-and-whisker plot is a great way to display distribution of groups of numeric data. The most common boxplot
includes a box which ends at the first and third quartile, draws the median as a horizontal line in the box, extends the
"whiskers" to the farthest points that are not outliers and for outliers draws a dot. Outliers are any point more or less
than 1.5 times the first and third quartile range (from the end of a box), draws a dot.
This paper describes how to use the PROC BOXPLOT¡¯s clipping options when a few extreme outliers can cause the
box and whiskers elements become too compressed and the figure is no longer useful in interpreting the data.
The data used in this paper is not real patient data and was programmically generated to illustrate various options. It
is assumed to be the result of a lab test of 2 groups over time. These results contain 2 extreme values , which in turn
produces a compressed box and whiskers.
COMMON BOX PLOT VS. A CLIPPED BOX PLOT
A general PROC BOXPLOT code such as:
proc boxplot data=final;
plot lbstresn*visitnum=trt01on/
boxstyle=schematic
boxwidth=3
wasxis=2
symbollegend = legend1
cboxfill=white
haxis=axis1
vaxis=axis2 vminor=5 ;
run;
quit;
creates Figure 1, which demonstrates the effect of 2 extreme outliers. The box and whiskers are compressed and the
figure is not helping us understand the data. PROC BOXPLOT is designed to show all the values in one group and
will ignore code to limit the y-axis to a specific value.
Figure 2, shows how by adding the CLIPFACTOR option we can improve this visual display. The 2 extreme values
are clipped, a legend on the bottom right makes note of it, and it is easier to understand how the summary statistics
of the 2 groups change over time as the plot is now zoomed into the section with the most relevant data.
An Example of the benefits of this process is visible when analyzing the data from Treatment A and B on Day 1.
Figure 1¡¯s compressed box and whiskers give us very little understanding of what the maximum, minimum, Q1, Q3,
median and mean are while figure 2 easily displays these values and the next outliers position.
1
Figure 1. General Box and Whisker Plot
500
Standard Lab Result
400
300
200
100
0
Screen/Baseline
Day 1
Day 2
Planned Treatment (N)
TRT A
Day 3
TRT B
Figure 2. Box and Whisker Plot using the CLIPFACTOR option:
80
Standard Lab Result
60
40
20
0
Screen/Baseline
Day 1
Day 2
Planned Treatment (N)
TRT A
2
Day 3
TRT B
Boxes clipped=2
The SAS code generating figure 2 is:
proc boxplot data=final;
plot lbstresn*visitnum=trt01pn/
clipfactor = 20
clipsymbol = dot
cliplegpos = bottom
cliplegend = 'Boxes clipped=#'
clipsubchar = '#'
boxstyle=schematic
boxwidth=3
waxis=2
symbollegend = legend1
cboxfill=white
haxis=axis1
vaxis=axis2 vminor=5 ;
run;
quit;
The CLIPFACTOR should be a value greater than one.
Clipping is applied as follows:
1-
The mean of the first quartile
and the mean of third quartile
2-
Values outside the range of
+(
3-
Any statistics outside
+(
¨C
¨C
)*factor and
)*factor and
+(
+(
¨C
across all groups are calculated.
¨C
)*factor will be clipped.
)*factor will be clipped.
The clipping is only applied to the plot not the actual statistical values and a legend in the chart indicates the number
of boxes clipped. CLIPSYMBOL allows us to specify how the clipped points are to be marked. CLIPLEGPOS
positions the clipping legend CLIPLEGEND and CLIPSUBCHAR options specify the legend content.
CONCLUSION
Boxplots are very helpful in visualizing data¡¯s distribution and the variations in mean, range, quartiles and outliers
over time and/or between groups. Though for exploratory analysis using ¡®dirty data¡¯ or data prone to extreme outliers
the elements can become compressed and the figures less useful. This paper demonstrates how using the
CLIPFACTOR option can help overcome this problem, adjusting the y-axis by clipping extreme values. Since this is
not a widely used and known option, we recommend producing both set of Boxplots (original and clipped) for the
reviewers and footnote what has been clipped and in applicable an explanation
REFERENCES
ACKNOWLEDGMENTS
We will like to thank Mohamed Darif, Sangart¡¯s Director of Biostatistics & Programming for his review and
suggestions.
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the authors at:
Heidi Nasizadeh
Sangart Inc.
6175 Lusk Blvd
San Diego, CA 92121
858.458.2379
hnasizadeh@
3
Michelle Wang
Sangart Inc.
6175 Lusk Blvd
San Diego, CA 92121
858.458.2303
mwang@
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ? indicates USA registration.
Other brand and product names are trademarks of their respective companies.
4
5
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- chapter 18 the boxplot procedure
- box and whisker plot notes and hw dcs
- box and whisker plot review
- box whisker worksheet livingston public schools
- tudent learning c entre box and whisker plots
- 10 4 box and whisker plots big ideas learning
- box plots in sas univariate boxplot or gplot
- creation and use of box and whisker plots to analyze local
- a 3 recognize that a measure of center for a
- s29 interpreting bar charts pie charts box and whisker plots
Related searches
- no option to sign in with pin
- how to be more attractive to women
- zillow rentals option to buy
- houses for rent with option to buy
- zillow lease option to buy
- best pension option to take
- lease with option to buy house
- best option to invest money
- homes for lease with option to buy
- boxplot using ggplot in r
- lease with option to purchase
- zillow rent with option to buy