Tuesday, June 17, 2014

Introduction to R: Sub-setting Part I

In our previous discussion we talked about factors, missing values, data frames and naming. In this session we will talk about sub-setting elements in R. Sub-setting is useful if you have a manageable set of data, the operation is helpful when you want to know what element is present on a particular location in your list, vectors or matrices.

There are three operators for sub-setting objects in R:
[ ] is used to extract an object of the same class as the original, by this we mean that if you want to extract a list form a list, the output is a list, or, if you want to extract a character element from a vector the output is a character element.

[[ ]] is used to extract a single element from a list or a data frame. The class of the output is not necessarily the same as the original. It means that if you extract a numeric vector 1 from a list, the output may not necessarily be a numeric vector 1, it can be an integer 1 or a character "1".

$ is used to extract elements from a list or data from that has a name. Remember, in our previous discussion, names are useful to reference an object. Again, the class of the output is not necessarily the same as the original. It means that if you extract a numeric vector 1 from a list, the output may not necessarily be a numeric vector 1, it can be an integer 1 or a character "1".

Let's go to some exercises:
1. Express a character vector having the element a to d.
2. Extract the 1st element, 2nd, 3rd and 4th.
3. Extract all the elements other than a.

Solution:
We can actually do this in two ways, first we can use the numeric index method. In this method R recognizes each element of the character vector with a number say "a"=1, "b"=2, "c"=3 and "d"=4. If we type x[1], x is the vector of interest and [1] is the 1st element within the vector of interest, if we hit enter it gives us an output of "a".























To solve the next problem, we can use the logical index method. By default, R can recognize lexical ordering, that is c>b>a or a<b<c<d<e and so forth, by using this default we can create a logical indexing. Since b,c and d are greater than a we can let y be anything that is greater than a. The expression is y<-x>"a", in this case the expression x>"a" is coercing the values b,c,d to the vector of interest which is x, the expression y<-x gives a new variable vector which is y which will nest all the values from the vector if interest which is x, in this case all values greater than a.























The output for y is actually a series of logical values: FALSE TRUE TRUE TRUE. The first one is FALSE because "a" is not greater than "a". To determine which are greater than a the x(y) function is used and the output is a series of letters that is greater than "a".

Next, we will learn on how to sub-setting a matrix. Matrices can be sub-setted with the (row, column) type index. Let's jump to the exercises.
Ex:
a. Construct a 2 by 3 matrix with a number sequence 1 to 6. 
b. Extract the element of row 1, column 2.
c. Extract the element of row 2, column 1.























The solution is simple and the expression x[row, column] is used to extract the element of interest. Let's go further, say we want to:
d. Extract all the element of the first row only.
e. Extract all the elements of the second column only.























Remember in our previous discussion on data frames that the number before the comma represents a row and any number after the comma represents a column; so [1,] is row 1, [2,] is row 2 and so on while [,1] is column 1, [,2] is column 2 and so on. The same expression goes to extracting all the elements of a row or a column in a matrix, using the expression x[row number,] or x[, column number]. so we have x[1,] to extract all the elements at row 1 and x[,2] to extract all the elements at column 2.

By default when an element from a matrix is extracted or sub-setted, R usually gives a vector output. But what if we want the output to be a 1x1 matrix, how can we do this? The expression x[row number, column number, drop=FALSE] can be used. So let's go on to some more exercises. Let's take the previous problem, and say we want to:
a. Extract the element of row 1, column 2.
b. Extract the element of row 1, column 2 in a matrix form.























c. Extract the elements of row 1.
d. Extract the elements of row 1 in matrix form.



No comments:

Post a Comment