-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy path02-objects.Rmd
More file actions
571 lines (409 loc) · 22.8 KB
/
02-objects.Rmd
File metadata and controls
571 lines (409 loc) · 22.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
# Managing objects {#objects}
After completing the introductory chapter you now know how to
* initialize an R session;
* save your workspace;
* open an existing project;
* execute simple tasks in R to obtain numerical, text or graphical results;
* obtain help.
You know also that everything in R can be considered as some kind of an object. In this chapter the focus is on what properties the different objects have and how to manage objects in the workspace.
## Instructions and objects in R
### General
Recall that
* instructions are separated by a semi-colon or start on new lines;
* the `#` symbol marks the rest of the line as comments;
* the default R (primary) prompt is `>`; the secondary default prompt is `+`;
* use of `<-` to create objects. (The equality sign (`=`) will also be accepted. However, avoid this practice and use
+ `=` only for function arguments;
+ `<-` for assignment;
+ `==` for comparison / control structures);
* the use of `->` for assigning left-hand side to the name on right-hand side.
* the use of function `assign()` for assigning names to objects. (to be discussed in detail in Chapter \@ref(operators))
##### Examples {-}
```{r}
aa <- 1:10
```
Assigning numeric vector to name "aa". Assignment takes place in global environment.
```{r}
aa <- seq(from = 1,to = 10,by = 0.01); yy <- c("a","b","c")
c("a","b","c") -> bb
```
Assigning character vector to name "bb".
```{r}
assign("aa", rnorm(10), pos = 1)
```
Note the use of the argument `pos`, " " or ' ' are used for characters. Be careful when mixing single quotes and double quotes. See below.
```{r, quotes1, error = 0}
c("u",'v',"'w'",""x"",'"y"',''z'') -> cc
```
```{r, quotes2, error = 0}
c("u",'v',"'w'",'"x"','"y"',''z'') -> cc
```
```{r, quotes3, error = 0}
c("u",'v',"'w'",'"x"','"y"','z') -> cc
cc
```
* Explain error message above.
* Explain backslash above.
```{r, objects, error = 0}
objects()
aa
bb
objects()[3]
parse(text=objects()[3])
eval(parse(text=objects()[3]))
rm(a,b)
rm(aa,bb)
objects()
rm("cc")
objects()
```
### Objects in R
(a) Everything is an object but there are many different types of objects.
(b) Study and also take note of the following *<span style="color:#FF9966">naming conventions</span>*:
* Allowed are upper or lower case letters, numbers 0 – 9, full stop(s) and underscore(s).
* Must not begin with a number.
* R is case sensitive i.e. `John` and `john` refer to different objects.
* Use full stops (periods) or underscores to break up a name into meaningful words.
* Avoid `c`, `s`, `t`, `C`, `F`, `T`, `diff` as well as other reserved words for naming an object.
(c) The use of the functions `conflicts()` and `find()` when naming objects. The instruction `conflicts (detail = TRUE)` outputs details on whether and where objects with identical names exist on the search path e.g.
```{r, conflicts}
conflicts(detail=TRUE)
```
The instruction `find ("object")` outputs details on whether and where objects with the name object exist on the search path e.g.
```{r, find}
find("kronecker")
```
(d) Objects can possess several attributes e.g.
* mode (The way an object is internally stored)
* length
* names
* dim
* class
#### Examples {-}
```{r, objectTypes}
a <- 1:10
class(a)
b <- factor(c("a","b","c"))
class(b)
b
mode(a)
mode(b)
length(a)
length(b)
dim(a)
mat <- matrix(1:12,nrow=4)
mat
dim(mat)
mode(mat)
logic <- c(TRUE,TRUE,FALSE,TRUE)
mode(logic)
class(logic)
```
<div style="margin-left: 30px; margin-right: 20px;">
Levels show that it is a categorical variable (object).
Mode `numeric` tells us that the categorical variable (object) `b` is internally stored as a set of numeric codes.
</div>
(e) Special attention is given to the class and mode of integers. An object of type integer is stored internally more effectively than an integer represented in double format.
```{r, integer, collapse = TRUE}
x <- 5
y <- 5L
typeof(x)
typeof(y)
class(x)
class(y)
mode(x)
mode(y)
```
(f) Objects in R are *<span style="color:#FF9966">vectors</span>*, *<span style="color:#FF9966">functions</span>* or *<span style="color:#FF9966">lists</span>*. There are no scalars - instead vectors of length one are used. In addition to the above three types, there are several other types of objects.
(g) Objects that are created during a session are permanently stored in the *<span style="color:#FF9696">.RData</span>* file in the folder containing the *<span style="color:#3399FF">workspace</span>* (unless not saved at termination).
(h) Objects that are created within a function exist only for as long as the function is being executed.
(i) Use of `rm()` and `rm(list = ListOfNames)` to remove objects from the *<span style="color:#3399FF">workspace</span>*.
(j) Use of `objects()` or equivalently `ls()` to obtain a list of object names in a data base (by default the *<span style="color:#3399FF">workspace</span>*). Note the optional arguments `pos`, `all.names` and `pattern` to specify which database to be considered and what object names to include.
(k) How can an object be printed to the screen?
(l) *<span style="color:#FF9966">Warning:</span>* If a new object is assigned to a name that already exists in the working directory the old object is overwritten without warning and it cannot be retrieved again.
### Data in R
(a) R has several built-in data sets. Use `?datasets` and/or `library(help= "datasets")` for details. Note that the two instructions return different information.
(b) Study the help file of `c()`.
(c) Study the help file of `scan()`.
(d) Study the help files of `read.table()` and `read.csv()`. Care must be taken with data containing characters (text) and categorical variables. Reading data into R will be discussed in detail in Chapter \@ref(data).
### Generation of data
Study the operators and functions `:`, `seq()`, `rep()`, `rev()`, `rnorm()`, `runif()` with the following instructions:
```{r, eval = FALSE}
1:10
8:3
seq(from=1, to=10, length=10)
seq(from=2, to=10, length=5)
rev(10:1)
rnorm (20, mean=50, sd=5)
runif (10, min=1, max=3)
```
The function `rmvnorm()` for generating multivariate normal samples is in the `mvtnorm` R package. This package must first be loaded by using the instruction
```{r, eval = FALSE}
library(mvtnorm)
```
Alternatively, for generating multivariate normally data there is also a function `mvrnorm()` in R package `MASS`.
## Introduction to functions in R
We introduced R functions in section \@ref(FunctionIntro). The basic structure of an R function is as follows:
```{r, eval = FALSE}
func.name <- function(list of arguments)
{
# R code
}
```
When the function `func.name()` is called, the code in `{ }` is executed.
The arguments of a function can be inspected by using the command
```{r, eval = FALSE}
args(name of function)
```
The function `str(x)` provides information on the object `x`. If `x` is a function its output is similar to that of `args()`. Default values are given to function arguments using the construction (`argument name = value`). It is good programming practice to make extensively use of comments to describe arguments and / or what a particular chunk of code does.
What is the usage of the following function:
```{r, cubeFunc}
cube <- function(a) a^3
```
In the above function the argument a is called a *<span style="color:#FF9966">dummy argument</span>*. What will happen to an object `a` in the working directory?
Functions are called by replacing the *<span style="color:#FF9966">formal arguments</span>* by the *<span style="color:#FF9966">actual arguments</span>*. This can be done *<span style="color:#FF9966">by position</span>* or *<span style="color:#FF9966">by name</span>*. *Hint*: It is less error prone to call functions using named arguments. Create the following function
```{r, Demofunc}
Demofunc <- function(vec = 1:10, m,k)
{ # Function to subtract a specified constant from
# each element of a given vector and after subtraction
# divide each element by a second specified constant.
# The result of the above transformation is returned.
(vec - m)/ k
}
```
Execute the following function calls and explain the output
```{r, DemofuncCall, error = TRUE}
Demofunc(3, 2, 5)
Demofunc(2,5)
Demofunc(m = 2, k = 5)
Demofunc(m = 2, k = 5, vec = 1:100)
```
Note the use of `prompt()` and `package.skeleton()` to provide a new function with a help-file.
The final expression in an R function is automatically returned when the function completes execution.
```{r, funcReturn}
my.func <- function(a=5)
{ a+2
}
my.func()
```
When a function consists of a single line, it can be written more succinctly
```{r, funcReturn2}
my.func <- function(a=5) { a+2 }
my.func()
```
or even without the `{ }`:
```{r, funcReturn3}
my.func <- function(a=5) a+2
my.func()
```
In general, functions will consist of more lines of code and often multiple outputs are returned. If only a single output object needs to be returned, the object can be created in the last line of the code
```{r, funcReturn4}
my.func <- function(a=5)
{ number <- (a+3)^2
number/a
}
my.func()
```
or with a `return()` statement:
```{r, funcReturnReturn}
my.func <- function(a=5)
{ number <- (a+3)^2
return(number/a)
}
my.func()
```
In general, all the outputs are combined and returned as a `list`. The final expression in the function creates the list object:
```{r, funcReturnList}
my.func <- function(a=5)
{ number <- (a+3)^2
list(number/a)
}
my.func()
```
To return multiple outputs, the list is simply extended as shown below:
```{r, funcReturnMultiple}
my.func <- function(a=5)
{ number <- (a+3)^2
list(number, number/a)
}
my.func()
```
It is good practice to name the output objects in the list, such as:
```{r, funcReturnNamed}
my.func <- function(a=5)
{ number <- (a+3)^2
list(number = number, ratio = number/a)
}
my.func()
```
Finally, to place the output into an object for further processing, the function is assigned to an object name:
```{r, funcReturnObject}
my.func <- function(a=5)
{ number <- (a+3)^2
list(number = number, ratio = number/a)
}
out <- my.func()
out
```
## How R finds data { #findData }
In order to understand how objects are found by R it is necessary to have some understanding of the concepts
* Environment
* Frame
* Search path
* Parent environment
* Inheritance.
The mechanism that R uses to organize objects is based on frames and environments. A *<span style="color:#FF9966">frame</span>* is a collection of named objects and an *<span style="color:#FF9966">environment</span>* consists of a frame together with a pointer or reference to another environment called the *<span style="color:#FF9966">parent environment</span>*. Environments are nested so that the *<span style="color:#3399FF">parent environment</span>* is the environment that directly contains the current *<span style="color:#3399FF">environment</span>*. At the start of an R session a *<span style="color:#3399FF">workspace</span>* is created which always has an associated environment, the *<span style="color:#3399FF">global environment</span>*. The global environment occupies the first position on the *<span style="color:#FF9966">search path</span>* and is accessed by a call to `globalenv()`. Packages and databases can be added to the search path by a call to `attach()` and removed from the search path by a call to `detach()`.
* What is an R *<span style="color:#FF9966">package</span>*? What is the difference between *<span style="color:#FF9966">installing</span>* and *<span style="color:#FF9966">loading</span>* a package?
* Work through the following example:
```{r, search}
search()
```
To attach the package `MASS`
```{r, attach}
library (MASS)
```
By default `MASS` is attached in the second position in the search path.
```{r, attachSearch}
search()
```
We use `detach` to remove `MASS` from the search path.
```{r, detach}
detach("package:MASS")
search()
```
To obtain the parent of the global environment
```{r, parent}
parent.env(.GlobalEnv)
parent.env(parent.env(.GlobalEnv))
parent.env(parent.env(parent.env(.GlobalEnv)))
environmentName(parent.env(parent.env(parent.env(.GlobalEnv))))
```
When the R evaluator looks for an object and it cannot find the name in the global environment it will search the parent of the global environment. It will carry on the search along the search path until the first occurrence of the name. If the name is not found it will return the message `Error: object 'xx' not found`. The usage of the double colon `::` and the triple colon `:::` is to access the intended object when more than one object with the same name exist on the search path. These two operators use the *<span style="color:#3399FF">namespace</span>* facility of R packages. The *<span style="color:#FF9966">namespace</span>* of a package allow the creator of a package to hide functions and data that are meant only for internal use; it provides a way through the operators `::` and `:::` to an object within a particular package. Thus a namespace prevent functions from breaking down when a user selects a name that clashes with one in the package. The double-colon operator `::` selects objects from a particular namespace. Only functions that are exported from the package can be retrieved in this way. The triple-colon operator `:::` acts like the double-colon operator but also allows access to hidden objects. Packages are often inter-dependent, and loading one may cause others to be automatically loaded. Such automatically loaded packages are not added to the search list.
We note that the *<span style="color:#FF9966">function</span>* call `getAnywhere()`, which searches multiple packages can be used for finding hidden objects. When a function is called, R creates a new (temporary) environment which is enclosed in the current (calling) environment. Objects created in the new environment are not available in the parent environment and dies with it when the function terminates. Objects in the calling environment are available for use in the new environment created when a function is called.
Similarly, when an *<span style="color:#FF9966">expression</span>* is evaluated a hierarchy of environments is created. Search for objects continue up this hierarchy and if necessary to the global environment and from there up onto the search path.
* Study the use of the arguments `pos`, `all.names`, and `pattern` of the function `objects()`.
* Study the behaviour of the functions `conflicts()` and `exists()` in the examples below:
```{r, conflictsExists}
conflicts()
conflicts(detail=TRUE)
exists("kronecker")
exists("kronecker", where = 1)
exists("kronecker", where = 1, inherits = FALSE)
exists("kronecker", where = 2)
exists("kronecker", where = 2, inherits = FALSE)
exists("kronecker", where = 7, inherits = FALSE)
exists("kronecker", where = 8, inherits = FALSE)
exists("kronecker", where = 9, inherits = FALSE)
```
* Study the above code carefully and then explain what inheritance does.
* The example below leads to the same conclusion as above but is more complicated at this stage. Its behaviour will become clear as we work through the coming chapters.
```{r, inherits}
sapply(search(), function(x) exists("kronecker", where = x, inherits=FALSE))
```
* Direct access to objects down the search path can be achieved with the function `get()`.
The function `get()` takes as its first argument the name of an object as a character string. The optional argument `pos` can be used to specify where on the search list to look for the object. As an illustration explain the outcomes of the following function calls:
```{r, getExample, error = TRUE}
get ("%o%")
mean <- mean (rnorm (1000))
get (mean)
get ("mean")
get ("mean", pos = 1)
get ("mean", pos = 2)
rm (mean)
```
* Instead of attaching databases the function `with()` is often to be preferred. Discuss the usage of `with()` by referring to the instructions:
```{r, with}
with (beaver1, mean(time))
with (beaver2, mean(time))
```
## The organisation of data (data structures)
Study the help files of `list()`, `matrix()`, `data.frame()` and `c()` carefully.
A *<span style="color:#FF9966">list</span>* is created with the function `list()`. A list is the basic means of storing a collection of data objects in R when the modes and/or lengths of the objects are different. List elements are accessed using `[[ ]]` or `$` when the objects are named. List objects are named using the construction
```{r, namedList}
my.list <- list(name1 = 1:10, name2 = mean)
my.list
```
and elements are retrieved using the instruction
```{r, listElement}
my.list[[2]]
my.list$name2
```
A *<span style="color:#FF9966">matrix</span>* in R is a rectangular collection of data, all of the same mode (e.g. numeric, character/text or logical). It is formed with the construction
```{r, createMatrix}
my.matrix <- matrix(1:12, ncol=3, nrow=4, byrow=FALSE)
my.matrix
```
Matrix elements are accessed using `my.matrix[i,j]`. The functions `nrow()`, `ncol()`, `dim()`, `dimnames()`, `colnames()` and `rownames()` are useful when working with matrices.
A *<span style="color:#FF9966">dataframe</span>* is also a rectangular collection of data but the columns can be of different modes. It can be regarded as a cross between a list and a matrix. Dataframes are constructed with the function `data.frame()`.
Study the help files of the above functions.
## Time series
Study the usage of the function `ts()`.
## The functions `as.xxx()` and `is.xxx()`
The function `as.xxx()` transforms an object as best as possible to a specified type e.g. `as.matrix(mydata)` transforms the numerical dataframe to a numerical matrix. `is.xxx()` tests if the argument is of a certain type e.g. `is.matrix(mydata)` evaluates to false if `mydata` does not satisfy all the conditions of a matrix.
## Simple manipulations; numbers and vectors
* Explain vector calculations and the recycling principle by referring to the example below.
```{r, recycling}
c(1,3,5,9) + c(1,2,3)
```
* Logical vectors. Explain the behaviour of the instruction below
```{r, logicalArith}
sum (c (TRUE, FALSE, TRUE, TRUE, FALSE))
```
* Missing values: `NA` (indicate a missing value in the data), `NaN` (not a number)
```{r, missing, error = TRUE}
10/0
0/0
```
* Character vectors: see section \@ref(character)
* Subscripting vectors: see section \@ref(vectorSubscripting)
## Objects, their modes and attributes
* Vector elements must be of same mode: logical, numeric, complex, character
* Empty object; once created (e.g. `xx <- numeric()`) components may be added (e.g. `xx[5] <- 22`)
* Getting and setting attributes: The functions `attr()` and `attributes()`
* Class of an object and the function `unclass()` for removing class.
## Representation of objects
We have already seen that a representation of an object can be obtained by calling (entering) its name:
```{r, showObject}
cars
```
It is often not convenient to have a full representation returned of an object as above. The functions `head()`, `str()` and `summary()` are available for extracting a partial representation of an object:
```{r, showShort}
head(cars)
summary(cars)
str(cars)
```
There are many more R functions provided for getting information of what an R object represents. Some of these functions like `mode()`, `class()`, `length()`, `levels()`, `is.xxx()` and `as.xxx()` have already been encountered and others will be given in the chapters to come.
```{r, showDetail}
length(cars)
length(as.matrix(cars))
dim(cars)
is.matrix(cars)
is.data.frame(cars)
is.list(cars)
mode(cars)
class(cars)
levels(cars)
```
## Exercise
::: {style="color: #80CC99;"}
### Exercise
According to the central limit theorem (CLT) the distribution of the sum (or mean) of independently, identically distributed stochastic variables converges to a normal distribution with an increase in the number variables. The binomial distribution can be expressed as the sum of independently, identically distributed Bernoulli stochastic variables and therefore converges in distribution to the normal distribution. The lognormal distribution in contrast cannot be expressed as a sum.
Make use of the function `rbinom()` to generate a sample of size 10 from a binomial distribution modelling 20 coin flips with a probability of $0.4$ for returning “heads”. Use the function `hist( , breaks=(0:21)-0.5)` to graph the results. Repeat with sample sizes $50$, $100$, $1000$, $10000$ and $500000$.
Repeat the whole study with a success probability of $0.5$, $0.25$ and $0.1$. Discuss your findings.
Now repeat the same exercise using (a) the lognormal distribution with the function `rlnorm()` and (b) the uniform distribution over the interval $[10; 25]$ with the function `runif(min = 10, max = 25)`. Comment on your findings.
### Exercise
Assume that a random sample of size $n$ is available from a certain distribution. A bootstrap sample is obtained by sampling, with replacement, a sample of size $n$ from the given sample. One of the uses of the bootstrap is to obtain an estimate of the standard error of a statistic. For example, a bootstrap estimate of the standard error of $\bar{X}$ can be obtained as follows:
* Generate $B$ bootstrap samples independent of each other.
* Calculate the mean of the B bootstrap samples, i.e. calculate $\bar{x}_1^*, \bar{x}_2^*, \dots, \bar{x}_B^*$.
* Calculate $\bar{\bar{x}} = \frac{1}{B} \sum_{i=1}^{B}{\bar{x}_i^*}$.
* Calculate $\hat{se}(b) = \sqrt{\frac{1}{B-1} \sum_{i=1}^{B}{(\bar{x}_i^*-\bar{\bar{x}})^2}}$.
(a) Generate a random sample of size $25$ from a $normal (100; 255)$ distribution.
(b) Use R to obtain graphical representations and statistics of the characteristics of the sample.
(c) Program the necessary instructions in R to obtain bootstrap estimates of the standard error of the sample mean as well as the sample median. Use $50$, $100$, $500$ and $1000$ for $B$ (the number of bootstrap repetitions). How do your answers compare with what is theoretically expected?
(d) Program the necessary R instructions to obtain graphical representations of the bootstrap distribution in (c).
### Exercise
Generate a random sample of size $50$ from a multivariate normal distribution with mean vector $(118, 396, 118, 400)$ and a covariance matrix so that the variances of the variables are given by $778$, $1810$, $580$ and $2535$ respectively. Variables 1 and 2 have a covariance of $-642.5$ and variables 3 and 4 have a covariance of $-670$. The other variables are uncorrelated. Store the sample as a matrix object and then program the necessary R instructions to calculate the sample covariance matrix and sample mean vector.
### Exercise
Execute the instruction `set.seed(101023)`.
Next, obtain $400$ random $normal (0; 1)$ values and arrange them in a matrix with $20$ rows and $20$ columns. Finally, write an R function to calculate and return (i) the sum of all the elements in the matrix, (ii) the eigenvalues of the matrix, (iii) the inverse of the matrix ( Only provide the first three rows and first three columns, e.g. `my.inv [1:3, 1:3]`) as well as (iv) the rank of the matrix <span style="text-decoration:underline">making use of the eigenvalues</span>.. *Hint*: Read the help of the functions `eigen()` and `solve()`.)
:::