Assignment+5


 * Assignment 5 - analyzing patient data**

In this assignment, you will use R to analyze some anonymized patient data to look for links between various descriptors and outcomes. The dataset is in CSV format and contains a random sample of patient medical records. It will be provided in Canvas and you will need to read this into R.// Columns included are the following: //

YEAR – Year for which the data is aggregated. STATE – State abbreviation AGE_CATEGORY – 1 = 18 years to 44 years 2 = 45 to 64 years 3 = 65 to 79 years 4 = 80+ years GENDER – Gender of patient. DISEASE_CATEGORY – Diagnosis category. 1 = diabetes 2 = hypertension PATIENTS – Number of patients included in strata (rows with less than 100 patients removed) OFFICE_VISITS – Number of office visits in strata A1C_MEAN – mean HgbA1c value (%) A1C_MEDIAN – median HgbA1c value (%) A1C_STDDEV – standard deviation of HgbA1c values (%) WEIGHT_MEAN – mean weight value (lb) WEIGHT_MEDIAN – median HgbA1c value (lb) WEIGHT_STDDEV – standard deviation of HgbA1c value(lb) BMI_MEAN – mean BMI BMI_MEDIAN – median BMI BMI_STDDEV – standard deviation of BMI values FBG_MEAN – mean fasting blood glucose FBG_MEDIAN – median fasting blood glucose FBG_STDDEV – standard deviation of fasting blood glucose values SBP_MEAN – mean systolic blood pressure SBP_MEDIAN – median systolic blood pressure SBP_STDDEV – standard deviation of systolic blood pressure values DBP_MEAN – mean diastolic blood pressure DBP_MEDIAN – median diastolic blood pressure DBP_STDDEV – standard deviation of diastolic blood pressure values

Answer the following questions. You may use R or any other statistical package you prefer. In R, you can use ggplot2 or other plotting libraries to create graphs, and the [|corrplot package] to find correlations.

1. Identify the 5 U.S. states in the dataset which have the largest number of patients 2. Identify the 5 U.S. states with the highest number of diabetes patients 3. Create a single pie chart that shows the number of diabetes patients in each state 4. Which states have the highest and lowest mean BMIs? 5. Plot mean BMI against mean systolic blood pressure. Show your plot, and discuss whether you think they are correlated or not. 6. Plot mean BMI against mean A1C. Show your plot, and discuss whether you think they are correlated or not. 7. Which age bracket has the highest frequency of hypertension? 8. Make a box plot (showing means, deviations) of all mean descriptors (A1C, weight, BMI, FBG, SBP, DBP) for both patients with hypertension and with diabetes separately. What do you learn from these? 9. Identify which of the following variables correlate: A1C_MEAN,A1C_STDDEV,WEIGHT_MEAN,WEIGHT_STDDEV, BMI_MEAN,BMI_STSDEV,FBG_MEAN,FBG_STDDEV,SBP_MEAN,SBP_STDDEV,DBP_MEAN and DBP_STDDEV. Describe your results identifying which variables are highly correlated

Submit your answers and plots in PDF format on the Oncourse site.