Assignments, vectors, matrices and functions are some of the basics to be clearly understood if one's want to go further on this scripting language. Here you will find a step by step guide to start using the R scripting and to understand the purpose of this Language. The assignment operatorR uses a two character operator for assignments "<-" ... it could sound rare if you have prior knowledge at programming. > a <- 2^2 # will store number 4 in variable a. Also operator "=" can be used when supplying default function arguments. # will assign value 10 to variable b when calling function f. > f <- function(a, b=10) VectorIn R, the vector is the primary data type and it should be understood as an ordered collection of 'same data type', rather than a geometric point in the space.
A vector can be of data type numeric, character, complex and logical.
# individual values 5, 7, 8, 10 can be combined into a vector by using function c() > a <- c(5,7,8,10) # You can also use the operator ":" to generate sequences > c(1:10) [1] 1 2 3 4 5 6 7 8 9 10 # More advanced sequences can be achieved with 'seq' operator > seq(length=5, from=10, by=0.2) [1] 10.0 10.2 10.4 10.6 10.8 # Repeat entries from 1 to 5, and do it 2 times > rep(1:5, times=2) [1] 1 2 3 4 5 1 2 3 4 5 # Eliminate duplicate entries > unique(rep(1:5, times=2)) [1] 1 2 3 4 5
> source("RBasics_Vector.txt") Sub-setting VectorsThe following are some examples on how to deal with vector ranges.
# Select positions 2:6 from vector x > x <- 1:20; x[2:6] [1] 2 3 4 5 6 # Select all positions but range 2:6 from vector x > x[-(2:6)] [1] 1 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # Produce a TRUE FALSE list after comparing > x[2:6]=="2" [1] TRUE FALSE FALSE FALSE FALSE # Function "which" returns index numbers where "3" occurs in vector x[2:6] > which(x[2:6]=="3") [1] 2 # Function "match" returns index numbers where "3" and "4" occurs in vector x > match(c(3,4),x) [1] 3 4 # Sort function for vectors > sort(x, decreasing = TRUE) [1] 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 # We would get the same result with function reverse > rev(x) The recycling behavior in R VectorsSince vectors in R language have nothing to do with geometric points, instead they are just collections of ordered atoms, this fact will produce a number of surprising effects on some basic operations ... let's see# The sum of vectors of different length > x<- 1:2; y <- 1:6; x+y [1] 2 4 4 6 6 8
Just for the operation, the shorter vector of length 2 is enlarged to length 6 by repeating its values. Note that we will find the same effect when dealing with matrices, since for R language a matrix is just an ordered collection of vectors of the same data type. Comparing entries between vectors# Let's take the UEFA Champions League semi-finals 2011 and 2012 > c2011 <- c("Real Madrid","Barcelona","Schalke 04", "Manchester United") > c2012 <- c("Bayern Munich","Real Madrid","Chelsea", "Barcelona") # Join two vectors > champions <- c(c2011, c2012) # Find identical entries of two vectors. Two ways of doing so > intersect(c2011,c2012) > champions[champions %in% champions[5:8]] # Returns the duplicated entries > champions[duplicated(champions)] [1] "Real Madrid" "Barcelona" # Getting the unique entries occurring only in the first vector > setdiff(champions[1:4], champions[5:8]) [1] "Schalke 04" "Manchester United" FactorsFactors are vector objects that contain grouping information of its components, like frequency and unique entries, let's see. # Following the above example, we will build factor championf > championf <- factor( champions <- c(c2011, c2012)) # Lets output the frequencies > championff <- table(championf) > championff championf Barcelona Bayern Munich Chelsea Manchester United 2 1 1 1 Real Madrid Schalke 04 2 1 # Lets output the levels or unique entries > levels(championf) [1] "Barcelona" "Bayern Munich" "Chelsea" "Manchester United" "Real Madrid" "Schalke 04" | Matrices & ArraysJust as vectors and factors are one dimensional objects, further, we will see that matrices are 2 dimensional data objects consisting of rows and columns. # We put in a Matrix of 4 rows and 10 columns, numbers from 1 to 40 ordered by row > x <- matrix(1:40, 4, 10, byrow = F) > x [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 1 5 9 13 17 21 25 29 33 37 [2,] 2 6 10 14 18 22 26 30 34 38 [3,] 3 7 11 15 19 23 27 31 35 39 [4,] 4 8 12 16 20 24 28 32 36 40 # We verify that x is of class matrix > class(x) [1] "matrix" # Matrix x has 2 dimensions (rows 4, columns 10) > dim(x) [1] 4 10 At this point we can introduce arrays as matrices which can hold more than two dimensions. We could understand arrays as layers of matrices. In other words, arrays are layers of rows and columns. Perhaps an example will help. # We will proceed to convert the above x matrix into a 2 layer array > dim(x) <- c(4,5,2) > x , , 1 [,1] [,2] [,3] [,4] [,5] [1,] 1 5 9 13 17 [2,] 2 6 10 14 18 [3,] 3 7 11 15 19 [4,] 4 8 12 16 20 , , 2 [,1] [,2] [,3] [,4] [,5] [1,] 21 25 29 33 37 [2,] 22 26 30 34 38 [3,] 23 27 31 35 39 [4,] 24 28 32 36 40 # Now x is no longer a matrix, but an array instead > class(x) [1] "array" # Array x is a 3 dimensional object (rows 4, columns 5, layers 2) > dim(x) [1] 4 5 2 # We could create array x in just one step > y <- array(1:40, c(4,5,2)) Unlike data frames, matrices and arrays must be made of the same data objects, i. e. we can only have matrices or arrays of numbers or of characters, but not a mixture of both. Sub-setting ArraysCommands for extracting data from different slices of an array # Just play and guess the results of the following 5 commands > x[1,,] ; x[,1,] ; x[,,1] ; x[-1,,] ; x[-1:-3,,] Data Frames Two dimensional data object made of rows and columns and of different data types, this is a data frame in R. In other words, data frames are matrices with the ability of mixing data types. Let's develop and example # Start by building a numeric data frame with 2 columns and 7 rows > dm_frame <- data.frame(v1=rnorm(7,0,1), v2=rnorm(7,10,2)) # Once the data frame is created we could add one column > temperature <- c('freezing', 'cold', 'chilly', 'mild', 'warm', 'hot', 'burning') > dm_frame <- data.frame(temp = temperature, dm_frame) # At this step we have a date frame of 2 numeric columns and 1 character columns
# Let's change the names of rows and columns > names(dm_frame) <- c('temp', 'elasticity', 'kilograms') > row.names(dm_frame) <- c('dough1', 'dough2', 'dough3', 'dough4', 'dough5', 'dough6', 'dough7')
The above example could be, for instance, a set of 7 samples of bread dough with its measures of temperature, elasticity and weight. Slicing Frames# Print data from column 'elasticity' > dm_frame$elasticity # Sort data frame by column 'temperature' dm_frame[order(dm_frame$temp, decreasing=TRUE),]
Special functions# # rowSums adds selected rows # > rowSums(dm_frame[,2:3]) dough1 dough2 dough3 dough4 dough5 dough6 dough7 5.11479 10.80463 6.98309 7.50637 8.78914 8.50257 11.609325 , |\ __ | | |--| __ |/ | | |~' /|_ () () | //| \ |\ () | \|_ | | \ \_|_/ () | The best practice | | is a good theory. @' () (Vladimir Vapnik) |
R Language >