-
Notifications
You must be signed in to change notification settings - Fork 22
Expand file tree
/
Copy pathR_Introductory_tutorial_part_1.R
More file actions
727 lines (427 loc) · 22.1 KB
/
R_Introductory_tutorial_part_1.R
File metadata and controls
727 lines (427 loc) · 22.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
# R tutorial for Statistical Methods in Ecology and Evolution- ZOL851, CSE845 -MSU
# And for Bio720 (Bioinformatics) - McMaster
# Written by Ian Dworkin
# Last updated Sept 22nd, 2015
# TOC (a bit out of date now.)
# Section 1: What is R; R at the console; quiting R
# Section 2: R basics; R as a calculator; assigning variables; vectorized computation in R
# Section 3: pre-built functions in R
# Section 4: Objects, classes, modes - Note: should I add attributes?
# Section 5: The R workspace; listing objects, removing objects (should I add attach and detach?)
# Section 6: Getting Help in R
# Section 7: Using A script editor for R
# Section 8: Writing simple functions in R
# Section 8b: Using source() to call a set of functions
# Section 9: Regular sequences in R
# Section 10: Extracting (and replacing), indexing & subsetting (using the index). Can also be used for sorting.
#### Advanced stuff to learn on your own...
# ..... setting attributes of objects.... (names, class, dim )
# ..... environments (see ?environments)
#### Installing R
# go to
# http://www.r-project.org/
# click download and follow for your OS. (see power point slides)
# you may also wish to consider R studio as an alternative
#################
#What is R, really.... (demonstration from the console)
#Where to find stuff in GUI R (navigating GUI-R)
#pull down menu's
##############
# How to close R
# to quit R
q() ##NOTE to MAC (OS X) users... this may no longer work (R V2.11.+ ) from the Mac R GUI... See below in OS X specific notes if you have any issues.
### NOTE
# for the moment when it asks you to save workspace image, say no.
#############
########### R Basics ############
# anything followed by the number sign is ignored by R
#R as a calculator
2+2
#creating variables in R
y = 2
# When you create a variable like this, it does not provide any immediate output.
# but when you type y and press return
y
#Note that the "[1]" is just an index for keeping track where the answer was put. It actually means that it is the first element in a vector.
# Note that R is case SENSITIVE that is y & Y are not the same
y
Y
#####
x = 3
x + y
z <- x+y
########
# You will notice that sometimes I am using "=" and sometimes "<-".
# These are called assignment operators. In most instances they are equivalent. but the "<-" is preferred in R, and can be used anywhere
# You can look at the help file (more on this in a second) to try to parse the difference
?"="
# In The R GUI in Mac OS X if you hold down the alt/option key and then press "-" it will give you <-
#### We may want to ask whether a variable that we have computed equals something in particular
# for this we need to use "==" not "=" (one equals is an assignment, two means "equal to")
x == 3
x == 4
x == y
# what happens if we write
x = y
# we have now assigned the current value of y (2) to x. This also shows you that you can overwrite a variable assignment.
########
# standard mathematical operators apply
# * for multiplication
2*3
# / for division
6/3
# ^ for exponents. Can also use **
3^2
3**2 # same as above
# you can use "^0.5" or sqrt() function for square root
9^0.5
sqrt(9)
# to raise something to e^some exponent
exp(2) # this is the performing e^2
# natural log (base e)
log(2.7)
# To raise to an arbitrary base
log(2.7, 10) # base 10
# can also use log10() or log2() for base 10 or base 2.
########
#R is vectorized.. it can do its operations on vectors
a <- c(2, 6, 4, 5)
b <- c(2, 2, 2, 1)
#The c is short for concatenate
#you can add the elements of the vectors together
a + b
#Or multiply the elements of the vector together (note this is NOT vector multiplication.. Those have special operators, i.e. %*%)
a * b
#if you want to make one big vector out of them
ab <- c(a, b)
# how might you make a vector that repeats vector "a" 3 times ?
## MANY MANY operations can be vectorized, and R is really good at it!!! So play!
############## Simple functions in base R
#you can find out the length of the new vector
length(ab) # This uses one of R's pre-built functions
#length() is an example of a pre-built function in R. Most things in R revolve around using functions to do something, or extract something. We will write our own simple functions soon.
#here are some more common ones that you may use
mean(ab)
sum(ab)
sd(ab) # standard deviation
var(ab) # variance
cor(a, b) # Pearson correlation (there are options to change this to other types of correlations, among the arguments for this function.....)
# Say we want to keep the mean of c for later computation we can assign it to a variable
mean_ab <- mean(ab)
# We can look at the underlying code of the function (although some times it is buried, in these cases).
# so we can clearly add up all of the elements of the vector.
#we can also join the two vectors together to make a matrix
d <- cbind(a,b)
d
# double check that we really made a matrix
is.matrix(d) # sets up a "Boolean". In other words when we ask "is d a matrix" it answers TRUE or FALSE
mode(d)
class(d)
# while the mode of d is still numeric, the class is now a matrix.
#### Make a new vector q that goes a,b,ab
##########Objects in R, classes of objects, mode of objects #########
#R is an object-oriented language. Everything in R is considered an object. Each object has one or more attributes (which we do not generally need to worry about, but useful for programming.)
# Most objects in R have an attribute which is the "class" of the object, which is what we will usually care about. R has a bunch of useful classes for statistical programming.
mode(ab) # mode of the object. The most basic (atomic?) feature. NOTE this does not mean the "mode" of a distribution
class(ab) # class of the object
# typeof(ab) # internal representation of type
mode(mean_ab) # type of object
class(mean_ab) #
# as we will see soon, mode and class are not always going to report back the same thing.
# mode and class are not the same thing. modes are the basic 'structures" for the objects. integer, numeric, vector, matrix, character... class is a a bit more complicated.. but we will not get into it here.
########### Objects, classes
# There are many different classes of objects each with their own features.
# The basic ones that we will see are numeric, character, matrix, data.frame, formula, array, list & factor.
# There are many more than we will come across, as we start using some functions
# We can also make vectors that are not numeric
cities <- c("Okemos", "E.Lansing", "Toronto", "Montreal")
class(cities)
mode(cities)
# note this tells us how many strings we have in the object "cities" not the length of the string
length(cities)
nchar(cities) # This tells us how many characters we have for each string.
# So if we just do this
q = "okemos"
length(q)
nchar(q)
rivers <- c("Red Cedar", "Red Cedar", "Don Valley", "Sainte-Laurent")
cities_rivers <- cbind(cities, rivers)
cities_rivers
class(cities_rivers)
mode(cities_rivers)
# In this above example we have made a matrix, but filled with characters, not numerical values.
### Another type of object we will need for this class (eventually) is called formula
# Not surprisingly this is used generally to generate a formula for a statistical model we want to fit.
model_1 <- y ~ x1 + x2 + x1:x2 # note this is just the model formula, and we HAVE NOT FIT ANY MODEL YET!!!!!! It just tells us the model we want to fit. That is the object model.1 has not yet been "evaluated"
model_1
#typeof(model.1)
class(model_1)
terms(model_1) # also see all.names() and all.vars
# It will often be useful (and we will do this later on), to create a seperate formula object.
#When we use lm() or glm() or other mle2() or other model FITTING functions, then the model formula is used during the fitting procedure.
# Usually we do not need to worry about this, or specify the model outside of the context of fitting.
##################
############## Workspaces, and objects in them ###################
#R stores variables, datafiles, functions, vectors, etc in what is called the Workspace. This contains all of the items that you can #access directly within your R session. You can list all of the objects in your workspace using:
ls()
# If you want to remove a particular variable (say x) use the rm() function
rm(x)
# you could remove multiple objects
rm(x,y,z)
# If you want to remove all of the objects in your workspace
rm(list=ls()) # We will learn what this means later, but basically we are making a list that contains all of the objects found by performing ls()
# Saving the workspace.
# Some people like to save their workspaces, not only because it contains all of the commands they have written, but also all of the objects they have created during that session.
# I personally do not do this unless I have created objects that have taken a long time to compute. Instead I just save the scripts I write (which I will show you in the next part).
# However if you write your commands directly at the console (like we have been doing) without a script editor, you should save your workspaces
save.image("file_name")
# if you want to load it again
load("file_name.RData")
# you will need to have the correct working directory set, which I will show you how to do shortly.
########### GETTING HELP in R ################
#There are a number of places where you can get help with R directly from the console.
?lm
?"*" # for help for operators use quotes
#This brings up a description of the function "lm"
# sometimes you will need to use
help.search("lm")
#This brings up all references to the lm function in packages and commands in R. We will talk about packages later.
RSiteSearch("lm")
#This is quite a comprehensive search that covers R functions, contributed packages and R-help postings. It is very useful but uses the web.
# You can also use the html version of the help
help.start()
# or just go to the help menu
# Using the various help functions answer the following questions
#1
#2
#3
###########################
#### Writing everything at the console can be a bit annoying, so we will use a script editor. ########
# In Mac OS X I personally find the built-in script editor useful
# You can highlight the text in the script editor and press command (apple) + return to send it to the R console. Or place the cursor at the end of the line that you want to submit to R with command+ return.
# It also provides syntax highlighting, and shows the syntax & options for functions
# Note about using q() on the Mac R GUI in v2.11.+
# The programming team decided the default behaviour was potentially "dangerous", and people may lose their files, so they have changed it to command + q to quit instead. If you are an old-fogey like me and like to use q(), you have a couple of options.
base::q() # This will work, but it is annoying.
# you can set your .Rprofile to have the following line.
options(RGUI.base.quit=T)
# and the next time you run R the old q() will work.
# If you do not know how to create or edit .Rprofile, come speak with me...
# However, most of you are under the spell of Bill Gates....... While the basic script editor does not have much functionality, many people have written excellent script editors (show webpage)
# The base script editor in windows will submit a line with ctrl-R(???)
# There are many windows script editors with syntax highlighting (such as Tinn-R)
# For a list of some
# http://www.sciviews.org/_rgui/
# In general we will save R scripts with the extension .R
# now let's type something into our new script
x <- c(3,6,6,7)
# now highlight that line and press
#ctrl+r (windows)
#apple key + return (mac)
# This should send the highlighted portion to R
# This does not seem to work on all versions of Tinn-R anymore (alas)
# go to options - > shortcut customizations - > r_sending
# double click on it and bind it to something like alt+f1 (or whatever you want)
# input the following
x <- c(2, 2, 2, 2)
y <- c(3, 3, 3, 3)
z <- cbind(x, y)
z
# highlight it all and it should send it to R.
a <- c(x, y)
b <- c(x, x, y)
#################
############### Writing our own functions in R ###########
# we have now used a few built in functions in R (there are many).
# anything where you use "()" is a function.
# We will often want to compute something for which there is no pre-built function.
# Thankfully it is very easy to write our own functions in R. You should definitely get in the habit of doing so.
# functions have the following format
aFunction <- function(input variable 1, input variable 2, argument, etc...) {expressions to calculate}
# this is abstract so let me give you a real example
# We want to compute the standard error of the mean which is ~equal to the sd/sqrt(sample size). How might we do it?
# We want to compute it for the numeric vector a
# we could do it by hand
a <- c(1,2,3,5,7,3,2,5,2)
sd_a <- sd(a)
sample_a <- length(a)
sd_a/sqrt(sample_a)
# or we could do it in one line
sd(a)/sqrt(length(a)) # notice the function within a function
# but we can also do it so that we can use any vector input we wanted by writing a function
StdErr <- function(vector) {
sd(vector)/sqrt(length(vector)) }
# now type sem
StdErr
# repeats the function
# If you want to edit the function just type
edit(StdErr)
StdErr(a)
# gives the result
# we can now use this vector for as long as we have this workspace open (and do not remove it)
StdErr(b)
# Exercise
# 1) Write your own function to do something simple, like calculate the co-efficient of variation (CV) which is the sd/mean.
# 2) Write a function (or set of functions, with one nested within another) to....
# it takes some practice but learning "functional programming" can be extremely helpful for R.
# One thing to keep in mind, is that it is very easy to call one function from within another. It is generally considered good practice to write functions that do one thing, and one thing only. It is way easier to find problems (debug).
#####################
#################### 8b: Using source() to load your functions
# One of the great things about writing simple functions, is that once you have them working, you can keep using them over and over.
# However, it is generally a pain to have to include the text of the function in every script you write.
# instead, R has a function source() which allows you to "load" a script that contains functions you have written (and other ptions you may want), so that you can use them.
# For instance, I have written a little script that contains just two useful functions, for estimating the CV and SE (as discussed above.). It is in a file called "useful_R_function_ID_2011.R".
# to use it I just call
source('/Users/ian/BEACON_COURSE_2011/BEACON R tutorial/Source_useful_R_function_ID_Feb24_2011.R')
# Once you have written functions that you "trust", you can make your own collections so that you can utilize them at your leisure....
#exercises
### Open a next editor window, (gedit, notepad, notepad++, or the R text editor)
# write a few functions (or copy the ones you have written)
# Save it Your_name_Source_identifier.R
# now open it
source('copy your directory here')
######### Regular Sequences #############
# Sometimes we want regular sequences or to create objects of repeated numbers or characters. R makes this easy.
# If you want to create regular sequences of integers by units of 1
one_to_20 <- 1:20
one_to_20
twenty_to_1 <- 20:1
twenty_to_1
# for other more complicated sequences, use the seq() function
seq1 <- seq(from=1,to=20,by=0.5)
seq1
#or
seq1 <- seq(1,20,0.5)
seq1
# this shows that for default options (in the correct order) you do not need to specify things like "from" or "by"
##### Exercise: Make a sequence from -10 to 10 by units of 2
#### What if you want to repeat a number or character a set number of times?
many_2 <- rep(2, times=20)
# works for characters as well
many_a <- rep("a", times = 10)
#We can even use this to combine vectors
seq_rep <- rep(20:1, times = 2)
seq_rep
seq_rep <- rep(one_to_20, 3)
# What if you wanted to repeat a sequence of numbers (1,2,3) 3 times?
rep_3_times <- rep(c(1,2,3), times=3)
# or
rep(1:3, times=3)
#let's say you wanted to create a factor within lakes. the first 5 observations are benthic, the second set of 5 observations are limnetics. One way (there is another way to generate levels of a factor, the function gl() )
# Here we will use the "each" option
lakes_rivers <- rep(c("lake", "river"), each=5)
lakes_rivers
# what if we wanted to perform this to create a matrix
matrix(rep(20:1,4),20,4)
### Exercise...
# now we have all the tools we need to build some data sets from scratch (which is helpful for simulations)
######### Indexing, extracting values and subsetting from the objects we have created ##########
# often we will want to extract certain elements from a vector, list or matrix. Sometimes this will be a single number, sometimes a whole row or column.
# We index in R using [ ] (square brackets)
a <- 1:20
b <- 5*a
a
b
length(a)
length(b)
#If we want to extract the 5th element from "a"
a[5]
# if we want to extract the 5th and 7th element from "b"
b[c(5,7)]
# if we want to extract the fifth through 10th element from "b"
b[5:10]
# how about if we want all but the 20th element of "a"?
a[-20]
# indexing can also be used when we want all elements greater than (less than etc...) a certain value
b[b > 20]
# or between certain numbers
b[b > 20 & b < 80]
# Exercise
# generate a vector with 20 elements
# create a "sub" vector that has elements 1:5, 16:20
# create a "sub" vector with odd elements 1,3,5,...,19
# Indexing for matrices
d <- a+b
q_matrix <- cbind(a,b,d) #cbind "binds" column vectors together into a matrix (also see rbind)
q_matrix
# what happens if we ask for the length of q.matrix?
# we can instead ask for number of rows or columns
nrow(q_matrix)
ncol(q_matrix)
# it is more useful perhaps to find out the dimensions of the matrix
dim(q_matrix) # R always specifies in row by column format.
# now say we want to extract the element from the 3rd row of the second column (b)
q_matrix[3, 2]
# how about if we want to extract the entire third row?
q_matrix[ 3, ]
# how about the second column?
# we can also pull things out by name
q_matrix[ ,"d"] # This is an example of indexing via "key" instead of numerical order
######################## A few thoughts about indexing for those used to Python (If you don't, ignore this)
# R indexing begins at 1 (not 0 like Python)
# Negative values of indexes in R mean something very different. for instance
a[-1] # this removes the first element of a, and prints out all of the remaining elements.
# As far as I know all classes of objects are mutable, which means you can write over the name of the objects, values within the objects, and slots....
# Indexing on a character string does not work in R
string_1 <- "hello world"
string_1[1]
# instead you need to use the substr() function
substr(x=string_1, start=1,stop=1)
# similarly
length(string_1) # this gives an output of 1
nchar(string_1) # this gives the 11 characters
#################################
################ subsetting data sets.....
#Subsetting by indexing
#Subsetting by using subset()
# For more advanced data manipulations (including data sorting), see the reshape library, and the bok by Phil Spector (Data manipulation in R. See resources on ANGEL for a link)
#### Accessing values in objects
# the at "@" is used to extract the contents of a slot in an object.. We will not use it much for this class, but it is essential for object oriented programming in R.
# objectName@slotName
# The dollar sign "$" is used to extract elements of an object. We will use this a lot to extract information from objects (scuh as information from our models, like co-efficients)
# object.name$element.name
# For more information ?"$"
######## Ok let's take a break
#### A few advanced topics... For your own amusement (not nescessary for this class, but helps for more advanced R programming)
### Setting attributes of objects.
# Objects have attributes. The one we have thought about most is the class of the object, which tells us (and R) how to think about the object, and how it can be used or manipulated (methods). # We have also looked at dim() which is another attribute
# Here is a list of common ones:
# class, comment, dim, dimnames, names, row.names and tsp
#
# We can set attributes of objects in easy ways like
x <- 4:6
names(x) <- c("observation_1", "observation_2", "observation_3")
x
# you can see the attributes in a bunch of ways
str(x)
attributes(x)
attr(x, "names") # Same as above, but we will be able to use this to set attributes of the object x as well
y <- cbind(1:5, 11:15)
attributes(y)
colnames(y) <- c("vec1", "vec2")
comment(y) <- c("the first column is pretend data", "the second column is yet more pretend data ")
str(y)
attributes(y)....
# Generic functions and methods
# calling a function like summary() will do very different things for different object classes. We will use this call alot for data frames and output from statistical models
summary(x) # numeric vector
summary(string_1) # character string
# the call to summary() is generic, which first looks at the class of the object, and then uses a class specific method
# for x
summary(x)
summary.default(x)
# but..
summary.lm(x).. # Since this was looking for an object of class lm
methods(summary) # to see all of the methods used when you call the generic summary() for S3 classes.
### Need to in the future add material on S3 vs S4 classes.... But for now read on your own.
# Style guide for my class
http://www.msu.edu/~idworkin/ZOL851_style_guide.html
# R style guide (from Google)
http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html
# Note about using q() on the Mac R GUI in v2.11.+
# The programming team decided the default behaviour was potentially "dangerous", and people may lose their files, so they have changed it to command + q to quit instead. If you are an old-fogey like me and like to use q(), you have a couple of options.
base::q() # This will work, but it is annoying.
# you can set your .Rprofile to have the following line.
options(RGUI.base.quit=T)
# and the next time you run R the old q() will work.
# If you do not know how to create or edit .Rprofile, come speak with me...