R Language‎ > ‎

### The Basics of R

Assignments, vectors, matrices and functions are some of the basics to be clearly understood if one's want to go further on this scripting language. Here you will find a step by step guide to start using the R scripting and to understand the purpose of this Language.

#### The assignment operator

R uses a two character operator for assignments "<-" ... it could sound rare if you have prior knowledge at programming.

`> a <- 2^2       # will store number 4 in variable a. `

Also operator "=" can be used when supplying default function arguments.

`# will assign value 10 to variable b when calling function f.`
`> f <- function(a, b=10)`

#### Vector

In R, the vector is the primary data type and it should be understood as an ordered collection of 'same data type',  rather than a geometric point in the space.
A vector can be of data type numeric, character, complex and logical.

`# individual values 5, 7, 8, 10 can be combined into a vector by`` using function c()`
`> a <- c(5,7,8,10)`

`# You can also use the operator ":" to generate sequences`
`> c(1:10)`
`  1 2 3 4 5 6 7 8 9 10`

`# More advanced sequences can be achieved with 'seq' operator`
`> seq(length=5, from=10, by=0.2)`
`  10.0 10.2 10.4 10.6 10.8`

`# Repeat entries from 1 to 5, and do it 2 times`
`> rep(1:5, times=2)`
`  1 2 3 4 5 1 2 3 4 5`

`# Eliminate duplicate entries`
`> unique(rep(1:5, times=2))`
`  1 2 3 4 5`

You can run the above example in your R console. Just download file RBasics_Vector.txt to your default R folder and type in your console the following code

`> source("RBasics_Vector.txt")`

#### Sub-setting Vectors

The following are some examples on how to deal with vector ranges.

`# Select positions 2:6 from vector x`

`> x <- 1:20; x[2:6] `
`  2 3 4 5 6 `

`# Select all positions but range 2:6 from vector x `

`> x[-(2:6)] `
`  1 7 8 9 10 11 12 13 14 15 16 17 18 19 20 `

`# Produce a TRUE FALSE list after comparing `

`> x[2:6]=="2" `
`  TRUE FALSE FALSE FALSE FALSE `

`# Function "which" returns index numbers where "3" occurs in vector x[2:6] `

`> which(x[2:6]=="3") `
`  2 `

`# Function "match" returns index numbers where "3" and "4" occurs in vector x `

`> match(c(3,4),x) `
` 3 4`

`# Sort function for vectors`
`> sort(x, decreasing = TRUE)`

`  20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1`

`# We would get the same result with function reverse`
`> rev(x)`

#### The recycling behavior in R Vectors

Since vectors in R language have nothing to do with geometric points, instead they are just collections of ordered atoms, this fact will produce a number of surprising effects on some basic operations ... let's see

`# The sum of vectors of different length`
`> x<- 1:2; y <- 1:6; x+y`
`  2 4 4 6 6 8`

 x 1 2 y 1 2 3 4 5 6 x+y 1+1 2+2 3+1 4+2 5+1 6+2 x+y 2 4 4 6 6 8

Just for the operation, the shorter vector of length 2 is enlarged to length 6 by repeating its values.
Note that we will find the same effect when dealing with matrices, since for R language a matrix is just an ordered collection of vectors of the same data type.

#### Comparing entries between vectors

`# Let's take the UEFA Champions League semi-finals 2011 and 2012`
`> c2011 <- c("Real Madrid","Barcelona","Schalke 04", "Manchester United")`
`> c2012 <- c("Bayern Munich","Real Madrid","Chelsea", "Barcelona")`

`# Join two vectors`
`> champions <- c(c2011, c2012)`

`# Find identical entries of two vectors. Two ways of doing so`
`> intersect(c2011,c2012)`
`> champions[champions %in% champions[5:8]]`

`# Returns the duplicated entries`
`> champions[duplicated(champions)]`
`  "Real Madrid" "Barcelona" `

`# Getting the unique entries ``occurring`` only in the first vector`
`> setdiff(champions[1:4], champions[5:8])`
`  "Schalke 04"        "Manchester United"`

#### Factors

Factors are vector objects that contain grouping information of its components, like frequency and unique entries, let's see.

`# Following the above example, we will build factor championf`
`> championf <- factor( champions <- c(c2011, c2012))`

`# Lets output the frequencies`
`> championff <- table(championf)`
`> championff`
`championf`
` Barcelona Bayern Munich Chelsea Manchester United `
` 2 1 1 1 `
` Real Madrid Schalke 04 `
` 2 1 `

`# Lets output the levels or unique entries`
`> levels(championf)`
` "Barcelona" "Bayern Munich" "Chelsea" "Manchester United" "Real Madrid" "Schalke 04"`

#### Matrices & Arrays

Just as vectors and factors are one dimensional objects, further, we will see that matrices are 2 dimensional data objects consisting of rows and columns.

`# We put in a Matrix of 4 rows and 10 columns, numbers from 1 to 40 ordered by row`
`> x <- matrix(1:40, 4, 10, byrow = F)`
`> x`

` [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]`
`[1,] 1 5 9 13 17 21 25 29 33 37`
`[2,] 2 6 10 14 18 22 26 30 34 38`
`[3,] 3 7 11 15 19 23 27 31 35 39`
`[4,] 4 8 12 16 20 24 28 32 36 40`

`# We verify that x is of class matrix`
`> class(x)`
` "matrix"`

`# Matrix x has 2 dimensions (rows 4, columns 10)`

`> dim(x)`
` 4 10`

At this point we can introduce arrays as matrices which can hold more than two dimensions. We could understand arrays as layers of matrices.

In other words, arrays are layers of rows and columns.
Perhaps an example will help.

`# We will proceed to convert the above x matrix into a 2 layer array`
`> dim(x) <- c(4,5,2)`
`> x`

`, , 1`

` [,1] [,2] [,3] [,4] [,5]`
`[1,] 1 5 9 13 17`
`[2,] 2 6 10 14 18`
`[3,] 3 7 11 15 19`
`[4,] 4 8 12 16 20`

`, , 2`

` [,1] [,2] [,3] [,4] [,5]`
`[1,] 21 25 29 33 37`
`[2,] 22 26 30 34 38`
`[3,] 23 27 31 35 39`
`[4,] 24 28 32 36 40`

`# Now x is no longer a matrix, but an array instead`
`> class(x)`
` "array"`

`# Array x is a 3 dimensional object (rows 4, columns 5, layers 2)`

`> dim(x)`
` 4 5 2`

`# We could create array x in just one step`
`> y <- array(1:40, c(4,5,2))`

Unlike data frames, matrices and arrays must be made of the same data objects, i. e. we can only have matrices or arrays of numbers or of characters, but not a mixture of both.

#### Sub-setting Arrays

Commands for extracting data from different slices of an array

`# Just play and guess the results of the following 5 commands`
`> x[1,,] ; x[,1,] ; x[,,1] ;`` x[-1,,] ; x[-1:-3,,]`

Data Frames

Two dimensional data object made of rows and columns and of different data types, this is a data frame in R. In other words, data frames are matrices with the ability of mixing data types.

Let's develop and example

`# Start by building a numeric data frame with 2 columns and 7 rows`
`> dm_frame <- data.frame(v1=rnorm(7,0,1), v2=rnorm(7,10,2))`

`# Once the data frame is created we could add one column`
`> temperature <- c('freezing', 'cold', 'chilly', 'mild', 'warm', 'hot', 'burning')`
> dm_frame <- data.frame(temp = temperature, dm_frame)
`# At this step we have a date frame of 2 numeric columns and 1 character columns`

`# We could also delete this column> dm_frame\$temperature <- NULL`
`# Let's change the names of rows and columns`
`> names(dm_frame) <- c('temp', 'elasticity', 'kilograms')`
`> row.names(dm_frame) <- c('dough1', 'dough2', 'dough3', 'dough4',`
`                       ``    'dough5', 'dough6', 'dough7')`

`> dm_frame           temp elasticity kilogramsdough1 freezing  0.2610284  6.820507dough2     cold -0.0603621 11.399211dough3   chilly  0.9028467  9.509476dough4     mild -0.5860686  8.602553dough5     warm -0.2733258  7.460272dough6      hot  1.6622075  5.561564dough7  burning -1.9425868  5.821402`

The above example could be, for instance, a set of 7 samples of bread dough with its measures of temperature, elasticity and weight.

#### Slicing Frames

`# Print data from column 'elasticity'`
`> dm_frame\$elasticity`

`# Sort data frame by column 'temperature'`
`dm_frame[order(dm_frame\$temp, decreasing=TRUE),]`

`           temp elasticity kilogramsdough5     warm -0.2733258  7.460272dough4     mild -0.5860686  8.602553dough6      hot  1.6622075  5.561564dough1 freezing  0.2610284  6.820507dough2     cold -0.0603621 11.399211dough3   chilly  0.9028467  9.509476dough7  burning -1.9425868  5.821402# Print rows with elasticity greater than 0> dm_frame[dm_frame\$elasticity>0,]          temp elasticity kilogramsdough5    warm 0.72261931  8.066522dough7 burning 0.04527281 11.564052# Select rows with criteria in a query vector cdm_frame[dm_frame\$temp %in% c("hot", "warm"),]       temp elasticity kilogramsdough5 warm  0.7226193  8.066522dough6  hot -0.2536747  8.756252# Other operators are ...# Logical: & (and), | (or), ! (not)# Comparison: == (equal), != (not equal), >= (greater than or equal)`

#### Special functions

`#`
`# rowSums adds selected rows `
#
`> ``rowSums(dm_frame[,2:3])`
`dough1   dough2   dough3   dough4   dough5  dough6  dough7 `
`5.11479 10.80463  6.98309  7.50637  8.78914 8.50257 11.609325 `

,
|\        __
| |      |--|             __
|/       |  |            |~'
/|_      () ()            |
//| \             |\      ()
| \|_ |            | \
\_|_/            ()  | The best practice
|                  | is a good theory.
@'                 ()         (Vladimir Vapnik)