More R Random
This page has the following sections:
Generation of normals
Two types of uniform
Random permutations
Seed setting
Probability distributions
Pseudorandomness
Resources
Generation of normals
If you want to generate 200 standard normals, then do:
> xn <- rnorm(200)
You will get different numbers in xn
if you do the command again.
There are additional arguments to control the mean and standard deviation.
Two types of uniform
You can have a distribution that has all numbers in some range to be equally likely — a continuous uniform. Alternatively you can have a distribution that is equally likely for some finite set of objects, such as a range of integers — a discrete uniform.
Continuous uniform
You can generate 100 numbers that are continuously uniform between 0 and 1 with:
> xcontu <- runif(100)
You will get different numbers in xcontu
if you do the command again.
There are additional arguments to change the range.
Discrete uniform
Use the sample
function to generate uniformly from some set of integers (or other types of objects). For example:
> xdiscu <- sample(1:100, 4, replace=TRUE)
selects 4 numbers between 1 and 100, inclusive, with replacement.
You will get different numbers in xdiscu
if you do the command again.
You can get a random color from among the named colors with the command:
> sample(colors(), 1)
The prob
argument to sample
allows you to give different probabilities to the elements of the vector that is being selected from. Thus sample
will perform non-uniform sampling as well.
Random permutations
The sample
function also does random permutations. In fact, that is its default behavior:
> xpermute <- sample(x) > sample(1:9) [1] 1 9 3 8 5 2 6 4 7
You will get a different order in xpermute
if you do the command again. (We are assuming here that x
is a vector with more than one element.)
Seed setting
In all of the commands above, you get different answers as you repeat them. That is pretty much the point of them. However, it can be useful to know that you will get the same answers again even though you are generating random numbers. You can do that by setting the random seed.
In R there is an object called .Random.seed
that controls random generation. Once you have generated something random, there will be a .Random.seed
object in your global environment. (It doesn’t show up in ls()
because the name starts with a dot — you can see such objects by saying: ls(all=TRUE)
.)
Calls to random functions change the value of .Random.seed
. That is, these calls not only return a value, they also have the side effect of changing .Random.seed
.
But if the random seed is the same at the start of a call, then the results will be the same. There are two ways of setting the seed: you can save the seed and then assign it, or you can use set.seed
The preferred method is to use set.seed
. You can just give a number as the first argument:
> set.seed(123) > rnorm(4) [1] -0.56047565 -0.23017749 1.55870831 0.07050839 > rnorm(4) [1] 0.1292877 1.7150650 0.4609162 -1.2650612 > set.seed(123) > rnorm(4) [1] -0.56047565 -0.23017749 1.55870831 0.07050839
Probability distributions
R has functions for a number of probability distributions. In general, there are four functions for each distribution as shown in Table 1.
Function name | Description |
---|---|
rxxx | random generation |
dxxx | density function |
pxxx | cumulative probability function |
qxxx | quantile function |
For example rnorm
is the random generation function for the normal distribution. dnorm
is the density for the normal. pnorm
is the cumulative probability function for the normal — that is, this gives the probability of being less than or equal to a given quantile. qnorm
is the quantile function — the inverse of the probability function (that is, it returns a quantile given a probability).
Table 2 shows a few of the distributions that are available in R.
Distribution | Functions |
---|---|
Uniform | runif dunif punif qunif |
Normal | rnorm dnorm pnorm qnorm |
Student’s t | rt dt pt qt |
F | rf df pf qf |
Exponential | rexp dexp pexp qexp |
Log normal | rlnorm dlnorm plnorm qlnorm |
Beta | rbeta dbeta pbeta qbeta |
Binomial | rbinom dbinom pbinom qbinom |
Poisson | rpois dpois ppois qpois |
You can see a more complete list with the command:
> ??distribution
The ecdf
function takes a data vector as an argument and returns a function that is the cumulative probability function of the data.
Many contributed packages contain functions for additional distributions.
Pseudorandomness
In a certain sense most of what is said on this page is a lie. When you use a function like rnorm
or sample
, you are not generating randomness at all. These are pseudorandom functions. Technically you are generating chaos when you use them, not randomness. There are two main reasons to use pseudorandomness rather than randomness.
The first is convenience. In the early days of computing there was no way to actually get true random values, so they had to invent pseudorandom methods. Now there is the possibility of using truly random values, but it is generally harder to do and seldom offers an advantage.
The second reason to prefer pseudorandomness is reproducibility. Random numbers (by definition) are not reproducible. A program without reproducible results is a program that can not be debugged.
It is largely accidental that we have pseudorandom functions and not truly random functions. It’s a happy accident.
Resources
This includes a discussion of probability distributions.
Back to top level of Impatient R