In this set of exercise , you will explore how to handle bigdata with RevoscaleR package from Microsoft R (previously Revolution Analytics).It comes with Microsoft R client . You can get it from here . get the Credit card fraud data set from revolutionanalytics and lets get started
Answers to the exercises are available here.Please check the documentation before starting these exercise set
If you obtained a different (correct) answer than those listed on the solutions page, please feel free to post your answer as a comment on that page.
The heart of RevoScaleR is the xdf file format , convert the creditcardfraud data set into xdf format .
use the newly created xdf file to get information about the variables and print 10 rows to check the data .
use rxSummary ,get the summary for variables gender, balance ,cardholder where numTrans is greater than 10
use rxDataStep and create a variable avgbalpertran which is balance /numTran+numIntlTran .use rxGetInfo to check if your changes being reflected in the xdf data
use rxCor and find the correlation between the newly created variable and fraudRisk
use rxLinMod to construct the linear regression of fraudRisk on gender,balance and cardholder. Dont forget to check the summary of the model .
Find the contingency table of fraudRisk and Gender , use rxCrossTab .Hint : Figure out how to include factors in the formula .
use rxCube to find the mean balance for each of the two genders .
Create a histogram from the xdf file on balance to show the relative frequency histogram .
Create a two panel histogram with gender and fradurisk as explanatory variable to show the relative frequency of fraudrisk in two genders .