R functions and the Apply family
By Vera Pfeiffer
Data for today
1BadFungi<-read.csv("BadFun.csv", row.names=1)
2head(BadFungi)
3
4## Botrytis Cladosporium Monilinia
5## X10orgedge 0.02131367 0.15160858 0.04504021
6## X10orgint 0.02654155 0.09718499 0.03900804
7## X11convedge 0.04959786 0.24664879 0.02560322
8## X11convint 0.03391421 0.32600536 0.04571046
9## X15softedge 0.10442359 0.17694370 0.05201072
10## X15softint 0.09195710 0.21796247 0.01300268
11## Mycosphaerella Penicillium
12## X10orgedge 0.3230563 0
13## X10orgint 0.1786863 0
14## X11convedge 0.6823056 0
15## X11convint 0.5143432 0
16## X15softedge 0.3900804 0
17## X15softint 0.4044236 0
18
19Envi<-read.csv("FunEnvi.csv", row.names=1)
20head(Envi)
21
22## PCNM1 PCNM2 Precip MinT VPDmin
23## X10orgedge -0.11090592 -0.01442449 2.31 37.0 1.15
24## X10orgint -0.11092923 -0.01445247 2.31 37.0 1.15
25## X11convedge -0.11065756 -0.01294471 2.31 37.0 1.15
26## X11convint -0.08651768 0.69306694 2.31 37.0 1.15
27## X15softedge -0.11087064 -0.03984493 2.18 37.9 1.27
28## X15softint -0.11088600 -0.04012614 2.18 37.9 1.27
29## VPDmax PropPear PropFruit PropFor soft org
30## X10orgedge 10.20 7.219268 0.1145916 5.7295780 0 1
31## X10orgint 10.20 7.219268 0.1145916 5.7295780 0 1
32## X11convedge 10.20 66.119330 1.6042818 0.3437747 0 0
33## X11convint 10.20 66.119330 1.6042818 0.3437747 0 0
34## X15softedge 11.28 14.782311 1.4896903 0.0000000 1 0
35## X15softint 11.28 14.782311 1.4896903 0.0000000 1 0
R packages for today
install.packages("CCA"), install.packages("vegan")
R functions
Everything that exists in R is an object, and everything that is done (data transformations, plotting, operations) in R is a function.
There are many functions supplied by the base R software package, as well as libraries of code accessible through open access packages available at R Cran repositories.
You can also write your own functions, and activate them by running them in your script. Or alternatively, you can to use source() to load them from a separate file in your working directory.
Writing functions
Abstracting your code into many small functions is a great way to improve your organization, comfort, and efficiency in R. You can store your functions in .R documents and source them from a script if they are long and obscure the flow of your working version of code. Try to avoid making very long or complex functions that do too much at once.
FUNCTION.NAME = function(PARAMETER.LIST) {
BODY
}
A function call proceeds as follows:
- Execution jumps to the first line of the function upon seeing the call FUNCTION.NAME(ARGUMENT.LIST)
- Function's PARAMETER.LIST is copied from caller's ARGUMENT.LIST by name or position, and from defaults specified as PARAMETER.NAME=DEFAULT in PARAMETER.LIST
- Assignment to function parameters and local variables doesn't affect caller's variables
- Code in function is executed until return(EXPRESSION), or until function's closing }
- Execution returns to caller; if caller assigned a variable to function it gets EXPRESSION from function's return() or last expression
1a=4
2b=5
3square.a <- function(a = 1, b = 2) {
4 b=100
5 c=a*a
6 return(c)
7}
8square.a
9
10## function(a = 1, b = 2) {
11## b=100
12## c=a*a
13## return(c)
14## }
15
16square.a()
17
18## [1] 1
19
20square.a(a=3)
21
22## [1] 9
23
24square.a(3)
25
26## [1] 9
27
28square.a(3,7)
29
30## [1] 9
31
32a
33
34## [1] 4
35
36b
37
38## [1] 5
You don't have to use a return statement...
1sum.of.squares <- function(x,y) {
2 x^2 +y^2
3}
4
5sum.of.squares(5,6)
6
7## [1] 61
8
9x<-c(2:11)
10y<-c(3:12)
11sum.of.squares(x,y)
12
13## [1] 13 25 41 61 85 113 145 181 221 265
Function goals:
- short
- performs one opperation
- intuitive name
Functions should primarily help you improve the readability of your code, and secondarily improve your efficiency and organization. It also helps you maintain your code by storing it in separate easily accessible files, rather than in super, super, long whole-project document.
Bigger function example
1badfungiCC<-CCorA(Envi,BadFungi, stand.X=TRUE,stand.Y=TRUE)
2summary(badfungiCC)
3
4## Length Class Mode
5## Pillai 1 -none- numeric
6## Eigenvalues 5 -none- numeric
7## CanCorr 5 -none- numeric
8## Mat.ranks 2 -none- numeric
9## RDA.Rsquares 2 -none- numeric
10## RDA.adj.Rsq 2 -none- numeric
11## nperm 1 -none- numeric
12## p.Pillai 1 -none- numeric
13## p.perm 1 -none- logical
14## Cy 150 -none- numeric
15## Cx 150 -none- numeric
16## corr.Y.Cy 55 -none- numeric
17## corr.X.Cx 25 -none- numeric
18## corr.Y.Cx 55 -none- numeric
19## corr.X.Cy 25 -none- numeric
20## control 12 how list
21## call 5 -none- call
22
23badfungiCC$CanCorr
24
25## [1] 0.8717640 0.8035587 0.6797292 0.5107511 0.4393814
26
27#0.87 0.80
28#Eigenvalues are canonical correlations squared
29#Pillai's trace is the sum of squared Canonical correlations
30print(sum(badfungiCC$CanCorr^2))
31
32## [1] 2.321634
33
34print(sum(badfungiCC$Eigenvalues))
35
36## [1] 2.321634
37
38print(badfungiCC$Pillai)
39
40## [1] 2.321634
41
42#RDA R squares
43print(badfungiCC$RDA.Rsquares)
44
45## [1] 0.1611034 0.4759253
46
47# ...tell you how much variance in the *dependent* variables is explained by the independent *canonical variate* (and vice versa)
48# i.e. X|Y = variance in X variables explained by the Y canonical variate
49# i.e. Y|X = variance in Y variables explained by the X canonical variate
50
51# IMPORTANT
52# corr.Y.Cy = Correlation of each variable in Y with its Canonical variate (similarly, corr.X.Cx)
53# Also called Canonical loadings or canonical structure correlations, interpreted as factor loadings
54# ...tells you how much do individual variables contribute to their own canonical variates
55badfungiCC$corr.Y.Cy
56
57## CanAxis1 CanAxis2 CanAxis3 CanAxis4
58## PCNM1 -0.32081197 -0.33933054 -0.21230081 -0.27697487
59## PCNM2 0.20369029 0.14892281 -0.22857132 0.48254574
60## Precip -0.18490434 -0.22563958 -0.08315862 -0.23894850
61## MinT 0.03450369 0.15228199 0.23966211 0.26635597
62## VPDmin -0.13952698 -0.21987239 0.03910030 0.14645347
63## VPDmax -0.12043970 0.25484033 -0.05257090 -0.13428438
64## PropPear 0.45021330 0.69746308 -0.10002690 0.25950088
65## PropFruit 0.20965421 0.37011541 0.14175772 0.33301858
66## PropFor -0.20070175 -0.34619064 -0.19023253 -0.33176122
67## soft 0.32626004 0.05835548 0.07913943 -0.59234743
68## org -0.21437285 -0.34322738 -0.06779914 -0.01384469
69## CanAxis5
70## PCNM1 0.16866057
71## PCNM2 0.12143280
72## Precip 0.26614803
73## MinT -0.23270298
74## VPDmin 0.13155815
75## VPDmax -0.41006701
76## PropPear -0.05107942
77## PropFruit -0.07686641
78## PropFor 0.26656009
79## soft -0.28857927
80## org 0.15500380
81
82badfungiCC$corr.X.Cx
83
84## CanAxis1 CanAxis2 CanAxis3
85## Botrytis -0.58500580 0.49284541 -0.40984921
86## Cladosporium -0.04194937 0.70546350 -0.46226576
87## Monilinia 0.66393123 0.18551419 0.44195394
88## Mycosphaerella -0.28847351 0.81947263 0.09200482
89## Penicillium 0.26637032 0.09997318 -0.14498754
90## CanAxis4 CanAxis5
91## Botrytis -0.4107330 0.27963134
92## Cladosporium 0.4506787 0.28941422
93## Monilinia -0.4621564 0.34039386
94## Mycosphaerella 0.4837810 0.05233234
95## Penicillium -0.2441335 -0.91565805
96
97# So the Canonical axis 1 correlation is driven by the high influence of the first spatial distance vector and the proportion of pear in the surrounding area - on Botrytis and Monilinia
98
99# The second Canonical axis 2 correlation shows an even stronger influence of the proportion of pear, as well as more distributed effects of the proportion of fruit, the first spatial vector, organic management and precipitation on Cladosporium and Mycosphaerella
100
101# ...NOT THAT INFORMATIVE
102# corr.Y.Cx = Correlation of each variable in Y with each Canonical variate in X (similarly corr.X.Cy)
103# Also called canonical cross-loading. Correlation of each variabel with the opposite canonical variate
104# ...tells you how strongly are individual variables correlated with the canonical variates of the dependent/independent set
105
106badfungiCC$corr.Y.Cx
107
108## CanAxis1 CanAxis2 CanAxis3 CanAxis4
109## PCNM1 -0.27967234 -0.27267200 -0.14430705 -0.141465227
110## PCNM2 0.17756987 0.11966822 -0.15536659 0.246460778
111## Precip -0.16119295 -0.18131464 -0.05652534 -0.122043214
112## MinT 0.03007908 0.12236752 0.16290533 0.136041613
113## VPDmin -0.12163460 -0.17668037 0.02657761 0.074801276
114## VPDmax -0.10499500 0.20477916 -0.03573398 -0.068585896
115## PropPear 0.39247977 0.56045251 -0.06799120 0.132540364
116## PropFruit 0.18276900 0.29740945 0.09635686 0.170089613
117## PropFor -0.17496457 -0.27818449 -0.12930660 -0.169447413
118## soft 0.28442177 0.04689206 0.05379338 -0.302542112
119## org -0.18688254 -0.27580334 -0.04608505 -0.007071191
120## CanAxis5
121## PCNM1 0.07410631
122## PCNM2 0.05335531
123## Precip 0.11694049
124## MinT -0.10224536
125## VPDmin 0.05780420
126## VPDmax -0.18017580
127## PropPear -0.02244335
128## PropFruit -0.03377367
129## PropFor 0.11712154
130## soft -0.12679636
131## org 0.06810578
132
133badfungiCC$corr.X.Cy
134
135## CanAxis1 CanAxis2 CanAxis3
136## Botrytis -0.50998702 0.39603021 -0.27858646
137## Cladosporium -0.03656995 0.56688132 -0.31421552
138## Monilinia 0.57879137 0.14907154 0.30040898
139## Mycosphaerella -0.25148083 0.65849435 0.06253836
140## Penicillium 0.23221207 0.08033432 -0.09855226
141## CanAxis4 CanAxis5
142## Botrytis -0.2097823 0.12286480
143## Cladosporium 0.2301846 0.12716321
144## Monilinia -0.2360469 0.14956272
145## Mycosphaerella 0.2470917 0.02299386
146## Penicillium -0.1246915 -0.40232309
147
148biplot(badfungiCC)
Effect sizes and significant canonical variates for Bad Fungi
1badfungiCCA<-cc(Envi,BadFungi)
2#Effect sizes
3badfungiCCA$ycoef
4
5## [,1] [,2] [,3] [,4]
6## Botrytis -22.438664 11.750585 -8.474913 32.178310
7## Cladosporium 12.718896 2.526191 -17.735366 -6.120606
8## Monilinia 2.291911 1.647968 1.267286 1.787275
9## Mycosphaerella -3.954103 4.631983 7.749042 -1.847856
10## Penicillium 1.862247 2.207945 -1.680171 1.763883
11## [,5]
12## Botrytis 4.676756
13## Cladosporium 5.152196
14## Monilinia 1.122518
15## Mycosphaerella -2.364223
16## Penicillium -5.048976
17
18badfungiCCA$xcoef
19
20## [,1] [,2] [,3]
21## PCNM1 20.20318698 -30.87925582 -3.54177300
22## PCNM2 1.22514085 -0.68045395 -2.74271617
23## Precip -30.26201821 40.71110573 24.93730815
24## MinT 1.26711128 -2.85800379 4.65692142
25## VPDmin -25.44970665 30.85066149 -11.54735182
26## VPDmax -5.64031688 6.12719196 -1.63367262
27## PropPear -0.01383938 0.07172806 -0.02568558
28## PropFruit 0.87891684 -0.85420311 0.48111233
29## PropFor 0.02406463 -0.04232783 -0.05020187
30## soft 2.06630668 -0.45726184 0.57663036
31## org 0.56344937 0.21743181 0.63884373
32## [,4] [,5]
33## PCNM1 -16.952541694 -53.60023442
34## PCNM2 -1.837648383 2.73584459
35## Precip 19.380736253 38.64802445
36## MinT -3.515535279 -9.39025411
37## VPDmin 18.076110329 38.23013274
38## VPDmax 3.280948570 6.08024432
39## PropPear 0.001137222 -0.06726919
40## PropFruit 0.006018099 0.43610758
41## PropFor 0.005374205 0.12673231
42## soft 1.806271076 -1.39622883
43## org 0.126172135 -2.55539272
FIND WHICH CANONICAL DIMENSIONS ARE SIGNIFICANT
1ev=(1 - badfungiCCA$cor^2)
2n=dim(Envi)[1]
3p=length(Envi)
4q=length(BadFungi)
5k=min(p, q)
6m=n - 3/2 - (p + q)/2
7w=rev(cumprod(rev(ev)))
8# initialize
9d1=d2=f=vector("numeric", k)
10for (i in 1:k) {
11 s=sqrt((p^2 * q^2 - 4)/(p^2 + q^2 - 5))
12 si=1/s
13 d1[i]=p * q
14 d2[i]=m * s - p * q/2 + 1
15 r=(1 - w[i]^si)/w[i]^si
16 f[i]=r * d2[i]/d1[i]
17 p=p - 1
18 q=q - 1
19}
20pv=pf(f, d1, d2, lower.tail = FALSE)
21dmat=cbind(WilksL = w, F = f, df1 = d1, df2 = d2, p = pv)
22
23print(dmat)
24
25## WilksL F df1 df2 p
26## [1,] 0.02728644 1.4637894 55 68.38983 0.06724978
27## [2,] 0.11368049 1.1369873 40 58.73361 0.32254775
28## [3,] 0.32086535 0.8348354 27 47.37057 0.68786933
29## [4,] 0.59643919 0.6265401 16 34.00000 0.83991561
30## [5,] 0.80694401 0.6151972 7 18.00000 0.73655072
As a function...
1TestSignificance <- function(veganCCA,x,y) {
2
3ev=(1 - veganCCA$cor^2)
4n=dim(x)[1]
5p=length(x)
6q=length(y)
7k=min(p, q)
8m=n - 3/2 - (p + q)/2
9w=rev(cumprod(rev(ev)))
10# initialize
11d1=d2=f=vector("numeric", k)
12for (i in 1:k) {
13 s=sqrt((p^2 * q^2 - 4)/(p^2 + q^2 - 5))
14 si=1/s
15 d1[i]=p * q
16 d2[i]=m * s - p * q/2 + 1
17 r=(1 - w[i]^si)/w[i]^si
18 f[i]=r * d2[i]/d1[i]
19 p=p - 1
20 q=q - 1
21}
22pv=pf(f, d1, d2, lower.tail = FALSE)
23dmat=cbind(WilksL = w, F = f, df1 = d1, df2 = d2, p = pv)
24
25print(dmat)
26
27}
28
29TestSignificance(badfungiCCA, Envi, BadFungi)
30
31## WilksL F df1 df2 p
32## [1,] 0.02728644 1.4637894 55 68.38983 0.06724978
33## [2,] 0.11368049 1.1369873 40 58.73361 0.32254775
34## [3,] 0.32086535 0.8348354 27 47.37057 0.68786933
35## [4,] 0.59643919 0.6265401 16 34.00000 0.83991561
36## [5,] 0.80694401 0.6151972 7 18.00000 0.73655072
Apply function family
Apply functions are implicit loops that iterate over the elements in an object like a dataframe, list, matrix, or array (3-d matrix) and execute some set of functions.
Apply functions are a more efficient way to write a loop - which can help to keep your code organized and efficient. They are also more computationally efficient.
Depending on your inputs and outputs - there is a whole family of apply functions. The first four apply a function over all the elements in a single object.
lapply(X, FUN, ...)
list apply; returns a list of values obtatined by applying a function to a vector, list, or data frame.
1lapply(Envi$Precip, sqrt)
2
3## [[1]]
4## [1] 1.519868
5##
6## [[2]]
7## [1] 1.519868
8##
9## [[3]]
10## [1] 1.519868
11##
12## [[4]]
13## [1] 1.519868
14##
15## [[5]]
16## [1] 1.476482
17##
18## [[6]]
19## [1] 1.476482
20##
21## [[7]]
22## [1] 1.476482
23##
24## [[8]]
25## [1] 1.476482
26##
27## [[9]]
28## [1] 1.486607
29##
30## [[10]]
31## [1] 1.486607
32##
33## [[11]]
34## [1] 1.486607
35##
36## [[12]]
37## [1] 1.486607
38##
39## [[13]]
40## [1] 1.593738
41##
42## [[14]]
43## [1] 1.593738
44##
45## [[15]]
46## [1] 1.593738
47##
48## [[16]]
49## [1] 1.593738
50##
51## [[17]]
52## [1] 1.593738
53##
54## [[18]]
55## [1] 1.593738
56##
57## [[19]]
58## [1] 1.479865
59##
60## [[20]]
61## [1] 1.479865
62##
63## [[21]]
64## [1] 1.479865
65##
66## [[22]]
67## [1] 1.479865
68##
69## [[23]]
70## [1] 1.519868
71##
72## [[24]]
73## [1] 1.519868
74##
75## [[25]]
76## [1] 1.584298
77##
78## [[26]]
79## [1] 1.584298
80##
81## [[27]]
82## [1] 1.486607
83##
84## [[28]]
85## [1] 1.486607
86##
87## [[29]]
88## [1] 1.486607
89##
90## [[30]]
91## [1] 1.486607
sapply(X, FUN, ...)
Like lapply, but simplifies the output to the more simple data type... may return a vector instead of a list.
1sapply(Envi$Precip, sqrt)
2
3## [1] 1.519868 1.519868 1.519868 1.519868 1.476482 1.476482
4## [7] 1.476482 1.476482 1.486607 1.486607 1.486607 1.486607
5## [13] 1.593738 1.593738 1.593738 1.593738 1.593738 1.593738
6## [19] 1.479865 1.479865 1.479865 1.479865 1.519868 1.519868
7## [25] 1.584298 1.584298 1.486607 1.486607 1.486607 1.486607
apply(X, MARGIN, FUN, ...)
returns a vector or array or list of values obtatined by applying a function to margins (1 = rows, 2 = columns) of a data frame, matrix or array.
1apply(Envi, MARGIN=2, mean)
2
3## PCNM1 PCNM2 Precip MinT
4## -5.086263e-18 -6.604124e-17 2.309333e+00 3.710000e+01
5## VPDmin VPDmax PropPear PropFruit
6## 1.216000e+00 1.051000e+01 2.711236e+01 1.443854e+00
7## PropFor soft org
8## 9.862514e+00 3.333333e-01 3.333333e-01
tapply(X, INDEX, FUN = NULL)
Table apply, this apply statement requires an index values (object of the same length as X), and allows you to apply a function to a subset of your data.
1tapply(Envi$MinT, Envi$Precip, mean)
2
3## 2.18 2.19 2.21 2.31 2.51 2.54
4## 37.9 37.9 37.3 37.0 35.8 36.3
... and...
The multiple arguments apply function allows you to apply a function across multiple objects... takes several vectors, and applies the function to the first object of each, then the second object of each... can be different lengths
mapply(FUN, ...)
returns a vector or array or list of values obtatined by applying a function to margins of a matrix or array.
1x= 1:4
2y= 5:8
3z= 9:12
4mapply(sum, x, y, z)
5
6## [1] 15 18 21 24
MAP(FUN, ...)
is apparently similar and often preferable version, that does not simplify data structures or break lazy argument loading norms.