R: Box Plot

Box plot is an effective way to visualize the distribution of your data.It only takes a few lines of code in R to come up with a basic box plot.

If you are new to box plots, I would recommend you to watch this video to get an idea of range, mean and the four quartiles.

For this example I am using Social Security Payments dataset which can be downloaded from data.gov.au

ssp<-read.csv("C:/data/dsselectoratedatamarch2014flat.csv",header=TRUE)
boxplot(disability_support_pension~state_of_commonwealth_electoral_
        ,data=ssp
	,main="Disability Support Stats")
mtext("State",side=1,line=3)
mtext("Number of Disability Support Payments",side=2,line=3)

Here I am plotting the number of disability support pension payments made for each state in a certain period using boxplot(). The axis are labelled using mtext(). The result is shown in Screen Capture 1.

Screen Capture 1 - Basic Box Plot
Screen Capture 1 – Basic Box Plot

As you can see from Screen Capture 1, the x-axis is labelled by state names in ascending order. However some state name are missing especially when the preceding label is longer.This can be corrected by aligning the labels on x-axis labels top down. This is done by setting the attribute las=2 . Also we can differentiate each box with a different colour using col attribute.

boxplot(disability_support_pension~state_of_commonwealth_electoral_,data=ssp
,main="Disability Support Stats",las=2
,col=c("violet","turquoise","blue","green","yellow","orange","red","cyan"))
Screen Capture 2 - Box Plot Version 2
Screen Capture 2 – Box Plot Version 2

Screen Capture 2 looks better, however the labels in x-axis is now truncated. The best way to fix this is to alias the label names to an abbreviated value using names attribute.

boxplot(disability_support_pension~state_of_commonwealth_electoral_,data=ssp
,main="Disability Support stats",las=2
,col=c("violet","turquoise","blue","green","yellow","orange","red","cyan")
,names =c("ACT","NSW","NT","QLD","SA","TAS","VIC","WA"))

It’s important to note that the data values displayed in the axis is sorted in ascending order. Hence the corresponding column alias should follow the same order.

Screen Capture 3 - Box Plot Version 3
Screen Capture 3 – Box Plot Version 3

The box plots can be flipped about their axis by setting the attribute horizontal=TRUE. Just remember to change the mtext() axis labels as well.

ssp<-read.csv("C:/data/dsselectoratedatamarch2014flat.csv",header=TRUE)
boxplot(disability_support_pension~state_of_commonwealth_electoral_
,data=ssp
,main="Disability Support Stats",las=1,horizontal=TRUE
,col=c("violet","turquoise","blue","green","yellow","orange","red","cyan")
,names =c("ACT","NSW","NT","QLD","SA","TAS","VIC","WA"))
mtext("Number of Disability Support Payments",side=1,line=3)
mtext("State",side=2,line=3)
Screen Capture 4 - Horizontal Box Plot Version
Screen Capture 4 – Horizontal Box Plot

You can also filter out certain sections of the box plot. For example, if I am only interested in eastern states, I can use subset() method the filter the data before passing on to boxplot()

ssp<-read.csv("C:/data/dsselectoratedatamarch2014flat.csv",header=TRUE)
sspEast<-subset(ssp
,state_of_commonwealth_electoral_== c("New South Wales","Queensland"))
boxplot(disability_support_pension~state_of_commonwealth_electoral_
,data=sspEast
,main="Disability Support Stats",las=1,horizontal=TRUE
,col=c("violet","turquoise","blue","green","yellow","orange","red","cyan")
,names =c("ACT","NSW","NT","QLD","SA","TAS","VIC","WA"))
mtext("Number of Disability Support Payments",side=1,line=3)
mtext("State",side=2,line=3)
Screen Capture 5 - Box Plot Subset
Screen Capture 5 – Box Plot Subset

That’s it. As you can see it only takes a few lines of code to visualize your data distribution using R.

[tweet https://twitter.com/paul_eng/status/653507313106612224 hide_thread=’true’ hide_media=’true’]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s