Starting My Analytical Report

Since I have finished with my data collection and assessing my data to see what aspects are significant, this week I started on writing my report. Right now, I have finished writing the methods part for my analysis and have moved on to writing the results section.

I usually wait to write my introduction and abstract until I finish with writing the results and conclusion, I find it easier to write that way.

Other than focusing on writing there’s really nothing else significant for me to write about this week.

Z-Scores

A few days ago I finished collecting all the z-scores, skewness levels, and kurtosis levels for physical damage and disease prevalence within the Hawaiian Coral Reefs. I ultimately decided not to look into deaths considering I already had a lot of data to look at. So instead I’m going to compare damage and disease in terms of how much harm has come to the colonies in the coral reefs.

So far, what I’ve found by looking at my results is that none of the distributions are normal. The fact of the matter is that all of my distributions are skewed to the right, and even after taking out larger outliers that fall outside of 3 standard deviations of the mean, the distributions still appear to be skewed.

Though I did find some interesting data points within the results, specifically a majority of the z-scores had been below the mean, and only a handful were above the mean. And of the z-scores that were above the mean, only one of them was not above 0.05. Considering that there are 69 organisms and about 19 are significant, I believe it is safe to say that those 19 are what I will be focusing on in my analysis.

Finding My Research Question

This week I finalized the question I want to focus on for my final project. What I wasn’t to focus on finding is which of the organisms in the Hawaiian Coral Reefs, out of the 69 that were studied, have been affected negatively the most. The factors I want to focus most on are the levels of disease, physical damage, and deaths among organisms.

After discussing this in the previous class, it was decided that in order to do this research I would need to find the z-scores for each of the organisms in the three categories and look at their standard deviations.

So far, I have done this for the data on disease, and I plan to look at both physical damage and deaths by the end of next week.

Disease Prevalence, Physical Damage, and Recruitment

This week I looked at three data sets, Coral Observations for Disease Prevalence, Coral Observations for Physical Damage, and Coral Observations for Recruitment. In terms of Disease Prevalence, data had been collected in 2008, 2010, and 2013 for a total of 90147 observations distributed among 68 different species. After analyzing the data, it was shown that a majority of the disease in the Hawaiian Coral Reefs occurred in 2010 with a total of 35609 cases. In 2008 there were 22984 cases and in 2013 there were 31581 cases. But while 2010 had the most diseased species, 2013 had the most deaths resulting from diseased species. In 2013, there were a total of 9004 deaths of species, in 2010 there were 4330 deaths, and in 2008 there were 3050 deaths. This showed that the disease in 2013 made up approximately 55% of all species deaths within the Hawaiian Coral Reefs.

When it comes to damage, this data set only covered 2013 instead of 2013 along with 2010 and 2008.  Among the 68 species, there were 17731 instances of no damage and 144 instances of damage in 2013. Additionally, the 144 instances of damage was spread across only seven different species, but even then, those same species had more not damaged than damaged. These species are the montipora capitata (18 damaged, 3167 not damaged), montipora patula (2 damaged, 2424 not damaged), porites compressa (19 damaged, 858 not damaged), porites evernmanni (2 damaged, 163 not damaged), pocillopora eydouxi (1 damaged, 26 not damaged), porites lobata (53 damaged, 5885 not damaged), and pocillopora meandrina (41 damaged, 2464 not damaged). Because of this small amount of damages, it only makes up about 1% of the entire data on the Hawaiian Coral Reefs in 2013, whereas 99% is made up of not damaged species.

For the last data set I looked at, Coral Observations for Recruitment. From what I could tell after looking at this data set was that there was nothing really notable to take away from it. The reason I say this is because the variables represented in the data set were the same ones represented in the previous three data sets I’ve looked at, except there was nothing new added. So for that reason I don’t think I’ll be using this specific data set in my final analysis.

I’ve made descriptive charts ranging from grouped bar charts to pie charts for the first two data sets, and I plan on expanding on these along with my analysis with the data once I finish looking through all of the data sets. I’m also still thinking that I might change my research question for this data set if I get inspired to do so.

Pie Charts and Bar Charts for Benthic Image Analysis in Hawaiian Coral Reefs

This week a spent a lot of time getting to know R Studio better in terms of making graphs to help represent what the data shows. For this, I used the data set titled “Benthic Image Analysis.” This data set discusses the species located at the absolute bottom of the Hawaiian Coral Reefs. The species included in this data were coral, soft coral, coralline alga, macroalga, mobile fauna, sediment, sessile invertebrate, tape, wand, shadow, turf alga, and a few other species that are unclassified.

Since there were ten categories of species, and data had been collected on them from 2010 and 2013, I wanted to see what the differences in population of the species looked like. So first I looked at each different species and found out what subcategories they were comprised of. For example, in 2010, the Sessile Invertebrates had been made up of 8% sponge, 6% Bryozoan, 66% Zoanthid, and 20% unclassified invertebrates. I had done this for the data collected in 2013 as well and with the other species documented and created pie charts for each of them.

After doing this and getting the total number for each category in 2010 and 2013, I created a grouped bar chart to compare side by side to see how the populations of the species increased and decreased between those years. I ended up having to create my own data set within R Studio from the total populations because this wasn’t a set of numbers listed itself in the original data set. Making a bar chart was honestly extremely difficult, it took me a whole two days to figure out because I think there was something wrong with the data set I made itself or the way I was writing it into the code.

Regardless, I understand how to make grouped bar charts (and regular bar charts) now and may use them again in the future. Since I have 13 more data sets to look through I plan on getting through at least two a week, at most three, in order to have everything ready to analyze by the end of the semester.

Principles of Analytic Graphics and Exploratory Graphs

This week I read through and watched the video presentations for the chapters on the principles of analytic graphics and the chapter on exploratory graphs. After reading through these I had an understanding of how I could better use R studio to analyze data. Because of this, I decided it was time to start thinking about what question I want to ask about the data I’ve downloaded on Coral Reefs in the Hawaiian Islands. Right now, I’m thinking about trying to see which factor documented in the data negatively affects the coral reefs the most. The reason I’m interested in this is because the data provides for a lot of factors that could potentially negatively affect the coral reefs such as disease prevalence, the diversity within the coral reefs, the physical damage the coral reefs have sustained, the affects of bleaching in the ocean, biomass of the fish in the area, fishing pressure, and the amount of scleratinia present. There are more subsets of data present in this study, and so far I have listed the number of observations, variables, and the summary statistics for all 12 of them.

Since I currently have a question I want to explore in the data, I plan on looking more closely at the data with the next week by creating graphs to show any correlation between the data points and using other exploratory graphs to get a better understanding of what each data set shows.

Exploratory Data Analysis Checklist

This week I read the second chapter of Peng’s book. For some reason I was unable to do the data analysis along with the text for some reason. I’m not sure if I’m the only one who experienced this issue, but when I clicked on the link in the chapter for the EPA’s Air Quality System there was a connection error. It could be possible that the site itself was down for maintenance or the like because I wasn’t having any internet issues with other sites at the time. So, I’m going to come back to this chapter in a few days to see if it’ll work.

However, I was able to follow along with the text by using one of the data sets for coral reef pollution. The data set I ended up using was Coral observations for physical damage. What I ended up finding out about this data is that there are 17875 observations, 35 variables and that the entirety of the data was collected in 2013. Other than this, I found that it was hard to follow along the example in the text with a completely different data set. Regardless, I was able to understand what was being explained and how to apply it in a different context.

Over the weekend, I plan on playing around with the data sets I’ve downloaded a bit more to practice what I’ve learned in the second chapter.

Working with R Studio and the dplyr Package

Over the past two weeks I have read through Getting Started with R and Managing Data Frames with the dplyr package in Peng’s textbook. After downloading R and R Studio, I followed along with the example given by Peng on air pollution and temperature data for Chicago in R Studio to work on the memorization of key verbs that will help in data analysis. By following along with this example, the language used in R Studio seemed simple enough.

However, after a week of practicing this I have been getting confused when trying to figure out how to apply the language to other data. For example, I’ve been looking at some of the sample data posted on the course site, specifically for Reefs of the Future: Resilience of Coral Reefs in the Main Hawaiian Islands. I chose to look at this data because I’ve always been interested in ocean life and sustainability and because I would like to use this data for my project, but I’ve been having trouble using it in R Studio for some reason. I have a feeling it may be because I’m not using the programming language properly or I’m mistyping the verbs. I plan on practicing more with it this weekend and re-reading the text. If need be I’ll look up some examples and tutorials online if I get really stuck.