Data science enhances people’s decision making. Doctors and researchers are making critical decisions every day. Therefore, it is absolutely necessary for those people to have some basic knowledge of data science. This series aims to help people that are around medical field to enhance their data science skills.
We will work with a health related database the famous “Pima Indians Diabetes Database”. It was generously donated by Vincent Sigillito from Johns Hopkins University. Please find further information regarding the dataset here.
This is the first part of the series, it is going to be about data display.
Before proceeding, it might be helpful to look over the help pages for the
You also may need to load the
Please run the code below in order to load the data set and transform it into a proper data frame format:
url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"
data <- read.table(url, fileEncoding="UTF-8", sep=",")
names <- c('preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class')
colnames(data) <- names
Answers to the exercises are available here.
If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
Create a frequency table of the
data$class.fac <- factor(data[['class']],levels=c(0,1), labels= c("Negative","Positive"))
Create a pie chart of the
Create a bar plot for the
Create a strip chart for the
Create a density plot for the
Create a histogram for the
Create a boxplot for the
Create a normal QQ plot and a line which passes through the first and third quartiles.
Create a scatter plot for the variables
age against the
mass variable .
Create scatter plots for every variable of the data set against every variable of the data set on a single window.
hint: it is quite simple, don’t overthink about it.