I was searching for a way to analyse radar image data using boxplots so that I could visualise the distribution of backscatter values of different forest cover types for a given polarisation. The answer was provided by R, perhaps the most powerful free software tool there is for statistical computing and graphics, which enabled me to generate the boxplots for my purpose. It took a bit of time to produce as I had to learn R from scratch, which became possible thanks to available online resources. R also gave me the flexibility to manipulate and visualise the data the way I wanted to, which could not have been possible in Excel, and it generated more visually appealing graphics.
The boxplot shown above was generated through R via RStudio, an integrated development environment that provided a powerful interface for R, and the ggplot2 library package, a plotting system for R based on the grammar for graphics developed by Dr Hadley Wickham. From this boxplot, for example, it can be said that forest types closed mixed forests (CFM) and mangrove forests (MF) can be distinguished apart from each using just ALOS/PALSAR’s HV polarisation. On the other hand, the radar backscatter values of open forests (i.e., OFB, OFC, OFM) regardless whether these are broadleaved, coniferous, or mixed overlap significantly; hence, these types could be hardly distinguished from each other using the same image.
After the boxplot, the next step was to see the relationship and distribution of the backscatter values between two variables (against both HH and HV polarisation) by displaying them in a scatterplot, which would also tell me whether the backscatter values of these forest types could be separated from one another. I finally figured it out after much trial and error and drinking several mugs of coffee. I documented the process for creating the scatterplot using R below:
At the R console using RStudio IDE, I loaded the ggplot2 package:
Next, I loaded my data which was contained in a CSV file using the following command:
> SPHHHV <- read.csv(file="/filepath/SPHHHV.csv", header=TRUE, sep=",")
This meant that I was telling R that the values in the file were separated by a comma (sep=”,”) and a column heading was present (header=TRUE). The data was also captured in a data frame that was stored in the variable SPHHHV. Then I used the head command to display the first several rows of the matrix or data frame SPHHHV as follows:
> head (SPHHHV) ForestType Mean.HH SD.HH Mean.HV SD.HV 1 CFB -7.477564 1.525228 -12.81651 1.673345 2 CFC -7.031030 1.938132 -12.09642 1.563122 3 CFM -6.595219 0.827661 -11.16381 0.686749 4 FPB -7.007717 1.937744 -13.31794 2.057441 5 MF -8.146819 1.136936 -14.70111 1.624990 6 OFB -7.414856 1.523413 -13.03388 1.754264
Next, I created another data frame q to contain the ggplot using aesthetics mapping, which specified ggplot would access the data frame SPHHHV and that the mean HH and mean HV values were to be plotted along x- and y-axes, respectively, and these values would be grouped or colored according to forest type. I went on to construct data frames to define the limits for the error bars, specifically one each for HH and HV.
> q <- ggplot(SPHHHV, aes(colour=ForestType, y=Mean.HV, x=Mean.HH)) > limitsHH <- aes(xmax = Mean.HH + SD.HH, xmin = Mean.HH - SD.HH) > limitsHV <- aes(ymax = Mean.HV + SD.HV, ymin = Mean.HV - SD.HV)
It was time to test whether the plot would work. I executed the following command, which generated the subsequent scatterplot. Just to note: geom_point() plots the points contained within the data frame q, and then produces error bars along the x-axis (geom_errorbarh) and y-axis (geom_errorbar) defined by the limits (limitsHH, limitsHV):
> q + geom_point() + geom_errorbarh(limitsHH, height=0.2) + geom_errorbar(limitsHV, width=0.2)
Looks like it worked. Finally, the finishing touches were made by adding the following commands to include the axis labels, plot title, and some plot enhancements:
Increasing the point size to enhance the visibility of the mean values:
Adding axes labels and and the plot title:
+ xlab(label='HH (dB)') + ylab(label='HV (dB)') + ggtitle(label='Mean Radar Backscatter of Forest Types for ALOS/PALSAR HH & HV Polarization')
Expanding the limits of the plot area to add a bit more space:
+ coord_cartesian(xlim = c(-4.50,-9.50),ylim = c(-10,-17))
The complete command to generate the final scatterplot is as follows:
> q + geom_point(size=5) + xlab(label='HH (dB)') + ylab(label='HV (dB)') + ggtitle(label='Mean Radar Backscatter of Forest Types for ALOS/PALSAR HH & HV Polarization') + coord_cartesian(xlim = c(-4.50,-9.50),ylim = c(-10,-17)) + geom_errorbarh(limitsHH, height=0.2) + geom_errorbar(limitsHV, width=0.2)
The scatterplot shows that mean backscatter values of most forest types are clustered and their plots overlap, which would make it difficult, if not impossible, to distinguish them from each other. Perhaps an exception could be mangrove forests (MF) can be discriminated from closed mixed forest (CFM), which are situated at the lower-left and upper-right areas of the plot.
Now, time to move on to the next item on the list of analyses to be completed.