More R Key Objects
This page has the following sections:
What is an object?
Attributes
Other Characteristics
Atomic Types
Matrices and Data Frames
Factors
Missing values
The best tool
Functions are objects too
The word “vector”
What is an object?
A characterization of “object” that is at least close to true is:
If you can assign something a name, then it is an object. If you can not assign something a name, then it is not an object.
In normal speech we think of an object as something that we can hold, turn upside down, and look at. R objects are very much like that.
Attributes
The main reason that R has such a wide variety of data types is because of attributes. Objects have a main part to them and then they can have one or more attributes that can modify how either R or the user thinks about the object.
You can have a plain bowl of rice and beans. If you add spice, it is something new. Different spice, different dish.
Attributes are spice.
A very common attribute is names
. The elements of an atomic vector can each have a name. The components of a list can each have a name.
class
is a very important attribute.
Other characteristics
There are characteristics that are inherent in an object.
Objects have a “length”. The length of an atomic vector is the number of elements it has. The length of a list is the number of components that it has. The length of NULL
is zero.
Objects have a “mode”. This says what kind of object they are. There is a mode
function that will tell you the mode of an object. There is also the typeof
function that is slightly more specific about an object.
Atomic Types
There are three atomic types that you are likely to care about.
Numeric objects hold numbers.
See More R Numbers.
Logical objects have values that are TRUE
, FALSE
and NA
.
Character objects have a string as each element.
Matrices and Data Frames
To us humans matrices and data frames are rectangular objects with rows and columns. They are both poseurs. Both of them are linear structures pretending to be rectangular. They have very different approaches though.
A matrix is a vector that has a dim
attribute. The dim is a vector of two integers saying how many rows and columns there are. The length of the matrix is the number of rows times the number of columns. You can see the order of the elements within the matrix by doing a command like:
> matrix(1:15, 5) [,1] [,2] [,3] [1,] 1 6 11 [2,] 2 7 12 [3,] 3 8 13 [4,] 4 9 14 [5,] 5 10 15
A data frame has a class
attribute that is "data.frame"
. It is really a list with as many components as there are columns. Each component has to have the same number of elements (the number of rows).
Both matrices and data frames can have names for the rows and the columns. (This is mandatory for data frames.) These are implemented differently in the two types of object, but you can get them from either type with rownames
and colnames
.
You can test if an object is a data frame with:
> is.data.frame(x)
Circle 8 of The R Inferno discusses a number of possible problems you might have with matrices and data frames.
Factors
Factors have two key attributes. They have a class
attribute and a levels
attribute. The levels is a character vector that gives the possible categories for the object. The basic part of the object is a vector of integers that are the location of the category in the levels vector.
Circle 8.2 of The R Inferno begins with several ways of going wrong with factors.
Missing values
All of the atomic modes have a missing value. This is printed as NA
.
You test for missing values with the is.na
function. For example:
> is.na(x)
will return a logical vector as long as x
that is TRUE
for the missing values in x
and FALSE
for the other values.
If you feel compelled to replace missing values by something else (like zero), you are almost surely making life harder for yourself rather than easier.
The best tool
The str
function is one of your best friends. It tells you how an object is structured. Its output may seem cryptic to you at first, but you will soon learn to appreciate the crypticness.
> examp <- list(A=1:10, B=letters, C=list(NULL, TRUE)) > str(examp) List of 3 $ A: int [1:10] 1 2 3 4 5 6 7 8 9 10 $ B: chr [1:26] "a" "b" "c" "d" ... $ C:List of 2 ..$ : NULL ..$ : logi TRUE
Functions are objects too
Functions in R are objects just as numeric vectors are objects. You think that is a good idea. It may take you some time before you realize that you think it is a good idea. But I guarantee you that you think it is a good idea.
The word “vector”
The word “vector” is quite unfortunate in R. There are three distinct meanings:
- an atomic vector
- an object without attributes (except perhaps names)
- an object that has length
If we always said “atomic vector” for the first meaning, there would not be a problem with that. But all of us get bored saying “atomic vector” and shorten it to “vector”.
The second meaning comes from the meaning in mathematics. It is distinguishing a linear structure from a matrix. The latter has a dim
attribute, the former does not.
The third meaning is the literal sense of the word. This includes lists, which the first meaning excludes.
Be careful.
Back to top level of Impatient R
rice photo by michaelaw via stock.xchng