Functions, loops, and if/else statements
Getting started
Remember everything in a line after the # sign is comments that will not be run. Look in these comments for practice exercises for the tutorial, which will be marked #Practice
Always begin by setting the working directory (the folder on your computer where R will read and write data files) I like to start by asking R for the current working directory to remind me of the correct formatting
getwd()
Now set the desired working directory. Your code will look different from mine here.
setwd("/Users/leedietterich/Dropbox/STRI R course/Lecture 20200506 LD")
Practice: set your own working directory
Read in a data table. This is an abbreviated dataset from Dietterich et al. 2017 Ecological Applications. In brief, I collected leaves, roots, and soils from many individuals of five plant species growing throughout a site contaminated with zinc, lead, and cadmium. I sought to understand the relationships between soil metal concentrations, root colonization by arbuscular mycorrhizal fungi (AMF), and leaf metal concentrations. The provided dataset gives AMF colonization (percent of root length colonized), and pollutant concentrations of leaves and soils in mg/kg.
mydata <- read.csv("Dietterich et al 2017 data for R course.csv")
#Use the structure command to make sure R is interpreting the data frame correctly
str(mydata)
#Convert the Species column to a factor
mydata$Species <- as.factor(mydata$Species)
#Convert the AMF column from proportion to percent
mydata$AMF <- mydata$AMF*100
Let’s make a table of the averages ± standard errors of AMF colonization and all of the metal concentrations for each species.
Recall how the mean function works, and experiment with how it handles NA values (we’ll see why soon). Make two vectors which are identical except that one has a couple NA values, and test the mean function on both of them.
x.test <- 1:10
x.test.with.nas <- c(1:3, NA, NA, 4:8, NA, 9, 10)
mean(x.test)
## [1] 5.5
mean(x.test.with.nas)
## [1] NA
Here are two ways of removing NA values before calculating the mean. First, using the na.rm argument within the mean function
mean(x.test.with.nas, na.rm=T)
## [1] 5.5
Second, using the na.omit function
na.omit(x.test.with.nas)
## [1] 1 2 3 4 5 6 7 8 9 10
## attr(,"na.action")
## [1] 4 5 11
## attr(,"class")
## [1] "omit"
mean(na.omit(x.test.with.nas))
## [1] 5.5
Writing functions
The aggregate function is very useful for doing calculations on multiple treatment groups at once. As an example, we’ll calculate the average AMF colonization of each species in the experiment.
agg.amf <- with(mydata, aggregate(AMF, by=list(Species), FUN=mean))
agg.amf
## Group.1 x
## 1 AA NA
## 2 AP NA
## 3 DF NA
## 4 ES NA
## 5 MP NA
This returns all NA values because the mean function’s default does not handle NAs the way we want. We’ll need to write a function that calculates averages but omits NA values.
First, an introduction to function notation.
xplustwo <- function(x) x+2
For simple functions like this, the curly brackets are optional, but you will need them for more complicated functions so I will use them consistently.
xcubed <- function(x) {
x^3
}
Let’s test our new functions.
xplustwo(5)
## [1] 7
xcubed(8)
## [1] 512
x.test
## [1] 1 2 3 4 5 6 7 8 9 10
xplustwo(x.test)
## [1] 3 4 5 6 7 8 9 10 11 12
xcubed(x.test)
## [1] 1 8 27 64 125 216 343 512 729 1000
Now, the function we want to calculate the average of all NA values in a vector x.
avg <- function(x) {
mean(na.omit(x))
}
While we’re at it, let’s write a similar function to calculate standard error. Recall that the standard error of a vector is the standard deviation divided by the square root of the sample size.
se <- function(x) {
sd(na.omit(x))/sqrt(length(na.omit(x)))
}
For more complicated functions, it can be helpful to define variables inside the function. Here’s another way of writing the se function above.
se2 <- function(x) {
y <- na.omit(x)
return(sd(y)/sqrt(length(y)))
}
Let’s test these functions on the same vector, with and without NA values, and make sure they give the same answer.
x.test <- 1:10
x.test.with.nas <- c(1:3, NA, NA, 4:8, NA, 9, 10)
mean(x.test)
## [1] 5.5
avg(x.test)
## [1] 5.5
mean(x.test.with.nas)
## [1] NA
avg(x.test.with.nas)
## [1] 5.5
sd(x.test)/sqrt(length(x.test))
## [1] 0.9574271
se(x.test)
## [1] 0.9574271
se2(x.test)
## [1] 0.9574271
sd(x.test.with.nas)/sqrt(length(x.test.with.nas))
## [1] NA
se(x.test.with.nas)
## [1] 0.9574271
se2(x.test.with.nas)
## [1] 0.9574271
Practice: write a function takes the average of a vector of numbers without using the mean function. Check that it works. Bonus if it works on vectors with NA values too!
Practice: write a function that takes the log+1 transform of a vector of numbers. That is, for each number in the vector, add one and then take the logarithm, base 10. Check that it works. Bonus if it works on vectors with NA values too!
Let’s get back to the table we started in line 31. We’ll start with just AMF colonization as an example.
agg.amf2 <- with(mydata, aggregate(AMF, by=list(Species), FUN=avg))
agg.amf2
## Group.1 x
## 1 AA 9.166572
## 2 AP 2.428819
## 3 DF 5.964999
## 4 ES 13.400351
## 5 MP 1.080950
#Notice that aggregate doesn't preserve column names. Let's correct that.
names(agg.amf2) <- c("Species", "Average AMF colonization")
agg.amf2
## Species Average AMF colonization
## 1 AA 9.166572
## 2 AP 2.428819
## 3 DF 5.964999
## 4 ES 13.400351
## 5 MP 1.080950
We can make our table much more efficiently by using aggregate on many columns at once.
#Look at the column names of mydata to see which column numbers we want to aggregate
names(mydata)
## [1] "Sample.id" "Species" "AMF" "Cd.leaf" "Pb.leaf" "Zn.leaf"
## [7] "Cd.soil" "Pb.soil" "Zn.soil"
#Run the aggregate
agg.avg <- with(mydata, aggregate(mydata[,3:9], by=list(Species), FUN=avg))
#Look at the results
agg.avg
## Group.1 AMF Cd.leaf Pb.leaf Zn.leaf Cd.soil Pb.soil Zn.soil
## 1 AA 9.166572 16.550551 62.883543 1217.2924 51.28371 650.0625 5140.286
## 2 AP 2.428819 9.552666 23.731515 1195.8129 90.49262 1027.4266 6923.761
## 3 DF 5.964999 4.116137 10.794596 376.2118 43.73006 1186.7936 2942.489
## 4 ES 13.400351 34.452875 41.123450 1533.0363 70.74253 475.6594 6650.177
## 5 MP 1.080950 52.731918 7.612877 6450.6295 441.23633 2380.3515 35435.358
#Here only the first column needs to be renamed
names(agg.avg)[1] <- "Species"
agg.avg
## Species AMF Cd.leaf Pb.leaf Zn.leaf Cd.soil Pb.soil Zn.soil
## 1 AA 9.166572 16.550551 62.883543 1217.2924 51.28371 650.0625 5140.286
## 2 AP 2.428819 9.552666 23.731515 1195.8129 90.49262 1027.4266 6923.761
## 3 DF 5.964999 4.116137 10.794596 376.2118 43.73006 1186.7936 2942.489
## 4 ES 13.400351 34.452875 41.123450 1533.0363 70.74253 475.6594 6650.177
## 5 MP 1.080950 52.731918 7.612877 6450.6295 441.23633 2380.3515 35435.358
Generate standard errors by the same technique
agg.se <- with(mydata, aggregate(mydata[,3:9], by=list(Species), FUN=se))
names(agg.se)[1] <- "Species"
agg.se
Let’s round all of the entries in both of these dataframes to three significant figures to make the table look pretty.
?round
Test the signif command on one column
round.test1 <- signif(agg.avg$AMF, 3)
round.test1
## [1] 9.17 2.43 5.96 13.40 1.08
#Can signif handle multiple columns at once?
round.test2 <- signif(agg.avg[,2:8], 3)
round.test2
## AMF Cd.leaf Pb.leaf Zn.leaf Cd.soil Pb.soil Zn.soil
## 1 9.17 16.60 62.90 1220 51.3 650 5140
## 2 2.43 9.55 23.70 1200 90.5 1030 6920
## 3 5.96 4.12 10.80 376 43.7 1190 2940
## 4 13.40 34.50 41.10 1530 70.7 476 6650
## 5 1.08 52.70 7.61 6450 441.0 2380 35400
#Yes it can!
Make a table of all of our pretty averages
round.avg <- with(agg.avg, cbind(Species, signif(agg.avg[,2:8], 3)))
round.avg
## Species AMF Cd.leaf Pb.leaf Zn.leaf Cd.soil Pb.soil Zn.soil
## 1 AA 9.17 16.60 62.90 1220 51.3 650 5140
## 2 AP 2.43 9.55 23.70 1200 90.5 1030 6920
## 3 DF 5.96 4.12 10.80 376 43.7 1190 2940
## 4 ES 13.40 34.50 41.10 1530 70.7 476 6650
## 5 MP 1.08 52.70 7.61 6450 441.0 2380 35400
Make a table of pretty SEs by the same technique
round.se <- with(agg.avg, cbind(Species, signif(agg.se[,2:8], 3)))
round.se
## Species AMF Cd.leaf Pb.leaf Zn.leaf Cd.soil Pb.soil Zn.soil
## 1 AA 1.460 2.130 11.900 208.0 13.10 162.0 1090
## 2 AP 0.717 0.770 2.500 133.0 12.20 102.0 1080
## 3 DF 2.070 0.315 1.800 51.1 9.96 162.0 414
## 4 ES 2.550 4.250 5.950 137.0 13.20 87.7 1260
## 5 MP 0.534 5.760 0.564 591.0 71.50 359.0 6020
To finish our table, we want each cell of columns 2 through 8 to display our rounded average, the character ±, and then our rounded SE. We will use a technique called a “for loop” to do this one cell at a time.
Loops
Here is a simple for loop to demonstrate the technique.
for(ii in 1:5){
print(ii^2)
}
## [1] 1
## [1] 4
## [1] 9
## [1] 16
## [1] 25
It can be convenient to embed loops inside functions. As an example, we’ll calculate the sum of squared deviations from a sample average. That is, for a vector x, we will find the average, calculate the difference of each element of x from that average, square that difference, and sum those differences for all elements of x.
sum.of.squares <- function(x){
#Make an empty vector of the correct length where we will fill in our values
sq.diffs <- rep(NA, length(x))
for(ii in 1:length(x)){
#Define each element of sq.diffs as the squared difference we want
sq.diffs[ii] <- (x[ii]-mean(x))^2
}
#Return the sum of the entries in sq.diffs
return(sum(sq.diffs))
}
Test this function
x.test
## [1] 1 2 3 4 5 6 7 8 9 10
sum.of.squares(x.test)
## [1] 82.5
Note that this function can also be done without a for loop.
sum((x.test-mean(x.test))^2)
## [1] 82.5
Let’s get back to our table. We’ll start with an empty destination table again as we did with the sum of squares function, but we’ll leave the Species column as it is because it doesn’t need anything new done to it.
The following code defines variables ii and jj and uses them to cycle through the rows and columns, respectively, of mytable. Then it defines entry [ii,jj] of mytable by concatenating the corresponding values of round.avg and round.se, separated by the character string " ± ".
mytable <- round.avg
for(ii in 1:5){
for(jj in 2:8){
mytable[ii,jj] <- paste(round.avg[ii,jj], round.se[ii,jj], sep=" ± ")
}
}
mytable
## Species AMF Cd.leaf Pb.leaf Zn.leaf Cd.soil
## 1 AA 9.17 ± 1.46 16.6 ± 2.13 62.9 ± 11.9 1220 ± 208 51.3 ± 13.1
## 2 AP 2.43 ± 0.717 9.55 ± 0.77 23.7 ± 2.5 1200 ± 133 90.5 ± 12.2
## 3 DF 5.96 ± 2.07 4.12 ± 0.315 10.8 ± 1.8 376 ± 51.1 43.7 ± 9.96
## 4 ES 13.4 ± 2.55 34.5 ± 4.25 41.1 ± 5.95 1530 ± 137 70.7 ± 13.2
## 5 MP 1.08 ± 0.534 52.7 ± 5.76 7.61 ± 0.564 6450 ± 591 441 ± 71.5
## Pb.soil Zn.soil
## 1 650 ± 162 5140 ± 1090
## 2 1030 ± 102 6920 ± 1080
## 3 1190 ± 162 2940 ± 414
## 4 476 ± 87.7 6650 ± 1260
## 5 2380 ± 359 35400 ± 6020
Save mytable as a csv.
write.csv(mytable, file="My pretty table.csv")
For some reason when I open this resulting file in Excel, I get the symbol  before each ±, which seems to be a problem of text encoding between R and Excel but easy to fix in find/change. Interestingly when I read it back into R it displays correctly again:
mypretty <- read.csv(file="My pretty table.csv")
mypretty
## X Species AMF Cd.leaf Pb.leaf Zn.leaf Cd.soil
## 1 1 AA 9.17 ± 1.46 16.6 ± 2.13 62.9 ± 11.9 1220 ± 208 51.3 ± 13.1
## 2 2 AP 2.43 ± 0.717 9.55 ± 0.77 23.7 ± 2.5 1200 ± 133 90.5 ± 12.2
## 3 3 DF 5.96 ± 2.07 4.12 ± 0.315 10.8 ± 1.8 376 ± 51.1 43.7 ± 9.96
## 4 4 ES 13.4 ± 2.55 34.5 ± 4.25 41.1 ± 5.95 1530 ± 137 70.7 ± 13.2
## 5 5 MP 1.08 ± 0.534 52.7 ± 5.76 7.61 ± 0.564 6450 ± 591 441 ± 71.5
## Pb.soil Zn.soil
## 1 650 ± 162 5140 ± 1090
## 2 1030 ± 102 6920 ± 1080
## 3 1190 ± 162 2940 ± 414
## 4 476 ± 87.7 6650 ± 1260
## 5 2380 ± 359 35400 ± 6020
Practice with indexing: recreate mytable but with the columns in the following order: Cd.soil, Cd.leaf, Pb.soil, Pb.leaf, Zn.soil, Zn.leaf, AMF
Practice (more complicated): make a table in the style of mytable, but instead of presenting mean ± SE, present the lower and upper bounds of an approximate 95% confidence interval: mean ± 2*SD.
Hint: there are many ways to organize this. I suggest using aggregate with perhaps one or more homemade functions to calculate the numbers you want to put in the table, and then using a loop as above to arrange the numbers correctly into the table.
Note: if you’re actually trying to publish a 95% confidence interval you will want to use a more statistically rigorous version!
Moving toward if/else statements
Suppose we are interested in the relationships between plant and soil metal concentrations for each species: for instance, is the amount of Zn in the soil under species AA related to the amount of Zn in the leaves of AA? Let’s make a table where, for each plant species and each metal studied, we report whether the relationship between plant and soil metal concentrations is positive, negative, or not significant.
Statistical background: linear regression
This technique asks whether there is a significant linear relationship between two continuous variables. For example, let’s look at Zn in species AP
For the rows of mydata in which Species is AP, run a linear model of Zn.leaf as predicted by Zn.soil
AP.Zn.model <- with(mydata[mydata$Species=="AP",], lm(Zn.leaf~Zn.soil))
#View model summary
summary(AP.Zn.model)
##
## Call:
## lm(formula = Zn.leaf ~ Zn.soil)
##
## Residuals:
## Min 1Q Median 3Q Max
## -950.6 -199.1 -104.3 187.1 1182.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 443.86803 116.61543 3.806 0.000705 ***
## Zn.soil 0.08760 0.01264 6.931 1.55e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 417 on 28 degrees of freedom
## (8 observations deleted due to missingness)
## Multiple R-squared: 0.6318, Adjusted R-squared: 0.6186
## F-statistic: 48.04 on 1 and 28 DF, p-value: 1.555e-07
Regression assumes residuals are normally distributed - here are two ways of checking that. A histogram of the residuals should approximate a bell curve
hist(AP.Zn.model$residuals)
A normal quantile-quantile plot should approximate a 1:1 line.
qqnorm(AP.Zn.model$residuals)
Does log-transformation help?
apznlog <- with(mydata[mydata$Species=="AP",], lm(log10(Zn.leaf)~log10(Zn.soil)))
summary(apznlog)
##
## Call:
## lm(formula = log10(Zn.leaf) ~ log10(Zn.soil))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.45167 -0.07118 0.00576 0.08498 0.45617
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.47971 0.34666 1.384 0.177
## log10(Zn.soil) 0.66314 0.09322 7.114 9.69e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1749 on 28 degrees of freedom
## (8 observations deleted due to missingness)
## Multiple R-squared: 0.6438, Adjusted R-squared: 0.6311
## F-statistic: 50.61 on 1 and 28 DF, p-value: 9.689e-08
hist(apznlog$residuals)
qqnorm(apznlog$residuals)
#Yes, somewhat.
Moving back toward our table… From the model summary, we want to pull the parameter estimate (slope of the relationship) to determine whether the relationship is positive or negative, and the P-value to determine whether it is statistically significant. Let’s figure out how to call them.
#Give the summary a shorter, easier name
sz <- summary(apznlog)
#Display the summary
sz
##
## Call:
## lm(formula = log10(Zn.leaf) ~ log10(Zn.soil))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.45167 -0.07118 0.00576 0.08498 0.45617
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.47971 0.34666 1.384 0.177
## log10(Zn.soil) 0.66314 0.09322 7.114 9.69e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1749 on 28 degrees of freedom
## (8 observations deleted due to missingness)
## Multiple R-squared: 0.6438, Adjusted R-squared: 0.6311
## F-statistic: 50.61 on 1 and 28 DF, p-value: 9.689e-08
#Find the parameter estimate
sz$coefficients[2,1]
## [1] 0.6631386
#Find the P-value
sz$coefficients[2,4]
## [1] 9.688803e-08
When we make our table, we’ll make the models for each species and metal, extract parameter estimate and P-value from each, and fill in the table based on the results. Since we are running many models, we will need to correct for multiple comparisons. I’ll use the Dunn-Sidak correction here but there are many possibilities.
Find the corrected P-value according to the Dunn-Sidak correction. We’ll make a function to calculate the corrected p-value for n tests at significance level alpha.
dunn.sidak <- function(alpha, n) 1-(1-alpha)^(1/n)
p.corr <- dunn.sidak(0.05, 15)
p.corr
## [1] 0.003413713
Practice: Write a function and use it to find the corrected P-value according to a different correction of your choice, such as Bonferroni.
If/else statements
Now that we know how to pull these values out of a summary.lm object, let’s write a function that produces the entry we want in a given cell in our table: “NS” if the relationship is not statistically significant, and “Positive” or “Negative” for statistically significant relationships based on the parameter estimate.
Example of if/else syntax
x <- 5
if(x > 0){
print("Positive number")
} else {
print("Not a positive number")
}
## [1] "Positive number"
If/else statements can also be useful in functions
myfun1 <- function(p){
if(p>=p.corr){
return("NS")
} else {
return("SIG")
}
}
myfun1(0.001)
## [1] "SIG"
myfun1(0.003)
## [1] "SIG"
myfun1(0.004)
## [1] "NS"
myfun1(0.05)
## [1] "NS"
myfun1(0.8)
## [1] "NS"
myfun1(-10)
## [1] "SIG"
myfun1(200)
## [1] "NS"
Here’s the function we want for the table. It takes input values p (assumed to be a p-value) and q (assumed to be a parameter estimate). If p is greater than or equal to the corrected P-value we calculated above (p.corr), it returns “NS” for not significant. If p is less than p.corr, then: if q is positive, it returns “Positive” to indicate a positive relationship, otherwise it returns “Negative” to indicate a negative relationship.
myfun2 <- function(p, q){
if(p>=p.corr){
return("NS")
} else {
if(q>0){
return("Positive")
} else {return("Negative")}
}
}
Let’s test our function in several scenarios.
myfun2(1,-1)
## [1] "NS"
myfun2(1,0.04)
## [1] "NS"
myfun2(1, 100)
## [1] "NS"
myfun2(0,-1)
## [1] "Negative"
myfun2(0,0.05)
## [1] "Positive"
myfun2(0,100)
## [1] "Positive"
myfun2(0.04, 0.04)
## [1] "NS"
Practice: what does myfun2 do if p<p.corr and q=0? How would you handle this situation more sensibly? Would you ever expect this situation to happen in real life?
We’re getting to the table, I promise!
Make an empty destination table to store our values
Sp <- c("AA", "AP", "DF", "ES", "MP")
Cd <- rep(NA, 5)
Pb <- rep(NA, 5)
Zn <- rep(NA, 5)
out <- as.data.frame(cbind(Sp, Cd, Pb, Zn))
out
## Sp Cd Pb Zn
## 1 AA <NA> <NA> <NA>
## 2 AP <NA> <NA> <NA>
## 3 DF <NA> <NA> <NA>
## 4 ES <NA> <NA> <NA>
## 5 MP <NA> <NA> <NA>
Get the names of mydata to remind us which numbered columns we need names(mydata) The actual table
out
## Sp Cd Pb Zn
## 1 AA <NA> <NA> <NA>
## 2 AP <NA> <NA> <NA>
## 3 DF <NA> <NA> <NA>
## 4 ES <NA> <NA> <NA>
## 5 MP <NA> <NA> <NA>
for(ii in 1:5){ #Loop through the five species
for(jj in 1:3){ #Loop through the three metals
model.data <- mydata[mydata$Species==Sp[ii],] #Define dataframe model.data as the rows of mydata where Species is equal to the iith entry of our Sp vector.
model <- with(model.data, lm(log10(model.data[,jj+3])~log10(model.data[,jj+6]))) #Define the regression model, including log-transformation
param <- summary(model)$coefficients[2,1] #Extract the parameter estimate from the model summary
p.val <- summary(model)$coefficients[2,4] #Extract the P-value from the model summary
out[ii,jj+1] <- myfun2(p.val, param) #Use our function myfun2 to determine the correct entry for cell [ii, jj+1] of out.
}
}
out
## Sp Cd Pb Zn
## 1 AA NS Positive Positive
## 2 AP NS NS Positive
## 3 DF NS NS NS
## 4 ES Positive NS Positive
## 5 MP NS NS NS
The same code without annotation
out
for(ii in 1:5){
for(jj in 1:3){
model.data <- mydata[mydata$Species==Sp[ii],]
model <- with(model.data, lm(log10(model.data[,jj+3])~log10(model.data[,jj+6])))
param <- summary(model)$coefficients[2,1]
p.val <- summary(model)$coefficients[2,4]
out[ii,jj+1] <- myfun2(p.val, param)
}
}
out
Spot-check: do individual model outputs match their entries in the table?
aacd <- with(mydata[mydata$Species=="AA",], lm(log10(Cd.leaf)~log10(Cd.soil)))
summary(aacd)
##
## Call:
## lm(formula = log10(Cd.leaf) ~ log10(Cd.soil))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.45880 -0.21812 0.06296 0.20826 0.31249
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.9322 0.1878 4.965 0.000141 ***
## log10(Cd.soil) 0.1654 0.1183 1.398 0.181114
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2494 on 16 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 0.1089, Adjusted R-squared: 0.0532
## F-statistic: 1.955 on 1 and 16 DF, p-value: 0.1811
dfpb <- with(mydata[mydata$Species=="DF",], lm(log10(Pb.leaf)~log10(Pb.soil)))
summary(dfpb)
##
## Call:
## lm(formula = log10(Pb.leaf) ~ log10(Pb.soil))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.39174 -0.19209 0.09309 0.18684 0.32930
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.2839 0.8272 -0.343 0.739
## log10(Pb.soil) 0.3940 0.2738 1.439 0.181
##
## Residual standard error: 0.2512 on 10 degrees of freedom
## (6 observations deleted due to missingness)
## Multiple R-squared: 0.1716, Adjusted R-squared: 0.08871
## F-statistic: 2.071 on 1 and 10 DF, p-value: 0.1807
eszn <- with(mydata[mydata$Species=="ES",], lm(log10(Zn.leaf)~log10(Zn.soil)))
summary(eszn)
##
## Call:
## lm(formula = log10(Zn.leaf) ~ log10(Zn.soil))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.29230 -0.04965 0.05243 0.09096 0.19372
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.67121 0.31221 5.353 4.35e-05 ***
## log10(Zn.soil) 0.40196 0.08462 4.750 0.00016 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1474 on 18 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.5562, Adjusted R-squared: 0.5316
## F-statistic: 22.56 on 1 and 18 DF, p-value: 0.00016
One way of testing the loop: add a line to print one or more values of interest with each iteration of the loop. This is also useful to check the progress of loops that may be slow to run.
for(ii in 1:5){
for(jj in 1:3){
model.data <- mydata[mydata$Species==Sp[ii],]
model <- with(model.data, lm(log10(model.data[,jj+3])~log10(model.data[,jj+6])))
param <- summary(model)$coefficients[2,1]
p.val <- summary(model)$coefficients[2,4]
out[ii,jj+1] <- myfun2(p.val, param)
print(c(ii, jj, p.val, myfun2(p.val, param)))
}
}
## [1] "1" "1" "0.181114382568599"
## [4] "NS"
## [1] "1" "2" "0.00148698582414258"
## [4] "Positive"
## [1] "1" "3" "0.00324182977349798"
## [4] "Positive"
## [1] "2" "1" "0.0172017389841376"
## [4] "NS"
## [1] "2" "2" "0.421834656757258"
## [4] "NS"
## [1] "2" "3" "9.68880334416056e-08"
## [4] "Positive"
## [1] "3" "1" "0.311749392929456"
## [4] "NS"
## [1] "3" "2" "0.180700457147639"
## [4] "NS"
## [1] "3" "3" "0.645117518799968"
## [4] "NS"
## [1] "4" "1" "0.00305551203149827"
## [4] "Positive"
## [1] "4" "2" "0.229989700967829"
## [4] "NS"
## [1] "4" "3" "0.000159957854141786"
## [4] "Positive"
## [1] "5" "1" "0.797476793109707"
## [4] "NS"
## [1] "5" "2" "0.981483007204014"
## [4] "NS"
## [1] "5" "3" "0.371930357679382"
## [4] "NS"
out
## Sp Cd Pb Zn
## 1 AA NS Positive Positive
## 2 AP NS NS Positive
## 3 DF NS NS NS
## 4 ES Positive NS Positive
## 5 MP NS NS NS
Practice: follow the workflow above to make a table reporting, for each species and each metal, the significance and sign of the relationship between leaf metal concentration and root AMF colonization.