Exploratory Data Analysis Checklist

This week I read the second chapter of Peng’s book. For some reason I was unable to do the data analysis along with the text for some reason. I’m not sure if I’m the only one who experienced this issue, but when I clicked on the link in the chapter for the EPA’s Air Quality System there was a connection error. It could be possible that the site itself was down for maintenance or the like because I wasn’t having any internet issues with other sites at the time. So, I’m going to come back to this chapter in a few days to see if it’ll work.

However, I was able to follow along with the text by using one of the data sets for coral reef pollution. The data set I ended up using was Coral observations for physical damage. What I ended up finding out about this data is that there are 17875 observations, 35 variables and that the entirety of the data was collected in 2013. Other than this, I found that it was hard to follow along the example in the text with a completely different data set. Regardless, I was able to understand what was being explained and how to apply it in a different context.

Over the weekend, I plan on playing around with the data sets I’ve downloaded a bit more to practice what I’ve learned in the second chapter.

Working with R Studio and the dplyr Package

Over the past two weeks I have read through Getting Started with R and Managing Data Frames with the dplyr package in Peng’s textbook. After downloading R and R Studio, I followed along with the example given by Peng on air pollution and temperature data for Chicago in R Studio to work on the memorization of key verbs that will help in data analysis. By following along with this example, the language used in R Studio seemed simple enough.

However, after a week of practicing this I have been getting confused when trying to figure out how to apply the language to other data. For example, I’ve been looking at some of the sample data posted on the course site, specifically for Reefs of the Future: Resilience of Coral Reefs in the Main Hawaiian Islands. I chose to look at this data because I’ve always been interested in ocean life and sustainability and because I would like to use this data for my project, but I’ve been having trouble using it in R Studio for some reason. I have a feeling it may be because I’m not using the programming language properly or I’m mistyping the verbs. I plan on practicing more with it this weekend and re-reading the text. If need be I’ll look up some examples and tutorials online if I get really stuck.