One of the first steps of data analysis is the descriptive analysis; this helps to understand how the data is distributed and provides important information for further steps. This set of exercises will include functions useful for one variable descriptive analysis, including graphs. Before proceeding, it might be helpful to look over the help pages for the
length, range, median, IQR
, hist
, quantile
, boxplot
, and stem
functions.
For this set of exercises you will use a dataset called islands
, an R dataset that contains the areas of the world’s major landmasses expressed in squared miles. To load the dataset run the following instruction: data(islands)
.
Answers to the exercises are available here.
If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
Exercise 1
Load the islands
dataset and obtain the total number of observations.
Exercise 2
Measures of central tendency. Obtain the following statistics of islands
a)Mean
b)Median
Exercise 3
Using the function range
, obtain the following values:
a)Size of the biggest island
b)Size of the smallest island
Exercise 4
Measures of dispersion. Find the following values for islands:
a)Standard deviation
b)The range of the islands size using the function range
.
Exercise 5
Quantiles. Using the function quantile
obtain a vector including the following quantiles:
a) 0%, 25%, 50%, 75%, 100%
b) .5%, 95%
Exercise 6
Interquartile range. Find the interquartile range of islands.
Exercise 7
Create an histogram of islands with the following properties.
a) Showing the frequency of each group
b) Showing the proportion of each group
Exercise 8
Create box-plots with the following conditions
a) Including outiers
b) Without outliers
Exercise 9
Using the function boxplot
find the outliers of islands. Hint: Notice that the boxplot
function does not only creates a plot, but also gives some useful information about the data,
Exercise 10
Create a stem and leaf plot of islands
I am afraid these exercises are very rudimentary and not suitable for intermediate level visitors to the site.
Hi Dr Pankaj Kumar Agarwal,
This is a fair comment, I do think the website gets an above average amount of people who look for more advanced topics. In return we also want to publish content that fits visitors of all levels.
We will discuss if there is something possible that clearly classifies an exercise set as we upload it.
Kind regards,
So what are newbies and semi newbies supposed to do? I was not born knowing this language and learning it is one of the harder things I have endured. The US tax code was a breeze to learn compared to R.
Hi Carl,
Dont worry we are comitted to trying to help every visitor to this website and will continue to do so. We would never allow the addition of more advanced topics to surpress our other content.
I do agree the US tax code is certainly alot more difficult then R!
Kind regards,
Exercise 9 is a great example of the worth of these exercises. R’s so called help page on box plots does not show an argument for prob. Typical unhelpful R page. Most of my learning of this language does not come from its “help pages”. Over a two year time span I have found them to usually be nothing more or less than a waste of time. Without external sources of information, the R user community would be very small.