Friday, June 13, 2014

Introduction to R: Vectors, Matrices and Lists

Our previous discussion talks about a short introduction to R, in this topic we will discuss the basic commands on creating vectors,matrices , and lists. To begin with, the syntax typed into the R console is termed as an expression and the "<-" symbol used in writing an expression is termed as the assignment operator, this gives a variable an assigned "value".
Example:
type the expression x <-5 in your R console, press enter, type x and press enter again. it will show you the value of x. What you did was entering an expression in R.





















In our example, the expression x<-5 is a numeric vector with a first element that is a number object 1. our second example is a character vector with a character string My name is Gerard.

Now that you have written your first expression, let's make a sequence of numbers. A number sequence can be created by the colon : operator. Type the the variable x in your R console and assign the value 1:30, then auto-print your expression by entering x. your expression should look like:





















The command expresses a sequence of numbers from 1 to 30. The output is an integer vector, none-scalar numbers. The first line has the [1] because the first element is the number 1 and the second line has the [26] because the first element starts with the number 26.

Now we will create vector of objects by the c() function. the c here refers to catenation, this means that it connects a series of objects to form ties or links.





















the first example, x<-c("1-2i", "1+i"), is termed as a complex vector because it has the complex, none real number, i.
The second example, x<-c("a","b","c"), is termed as a character vector because it has the characters in the alphabet.
the third and fourth examples, x<-c("TRUE", "FALSE") and x<-c("YES", "NO"), are termed logical vectors because it contains a priori conditions before an action will be executed. We will discuss logical functions in our advance topics.
The fifth example, x<-c(0.099, 1) is a numeric vector because it contains the number with scalar values.

We can also create a vector by the vector() function, it is actually the long hand way of writing a vector in R.





















Here we have an expression x<-vector("numeric", length=5), or a vector that contains numeric elements with a length of 5 objects. the output are all zero's because by default an unassigned numeric vector will have a value of 0.

Most of the time we express vectors with mixed objects. The R program prioritizes the vectors depending on the atomic classes present as an element of the vector. the sequence of prioritization are as follows:
1st Priority = character
2nd Priority = numeric
3rd Priority = logical
here is an example:





















In the first example, x<-c(2,"b"), the expression is a character vector and the number 2 is coerced as a character vector because of the element b which is under the atomic class of "character". Thus, this expression is a character vector with the elements 2 and b. The second example, x<-c("FALSE", 10) is a numeric vector with the elements "FALSE" and 10. The third example, x<-c("FALSE", "b") is a character vector with the elements "FALSE" and "b".

In R Language, you can express one atomic class element into another element by forced coercion using the as. function.





















In our example we created a sequence of numbers from 0 to 3 by the expression x<-0:3, to determine what kind of atomic class it is we used the class() function and typed class(x). The output showed that x is an integer vector, we then tried to force coerced the integer vector into numeric, logical, character and complex using the as. function. Take note that in the logical vector, by default the value 0 is equivalent to FALSE, anything greater than 0 is TRUE.

Note should be taken in forced coercing atomic classes to another atomic classes as there are times that it might end up as a "illogical coercion" which will result in NA. In the next example we have a character vector expression x<-c("a", "b", "c") and we try to express it as numeric, logical, integer and complex. The outputs are all NA because there is no logical way of forcefully expressing "a", "b" and "c" into another set of atomic classes.





















Now that you have made vectors, our next step is to create matrices. Matrices are special kind of vectors because it contains a dimension attribute. The dimension attribute in a matrix is defined by rows and columns (nrow, ncol).

Try typing in your R console x<-matrix(nrow=3, ncol=5) and hit enter. The output is a matrix with 3 rows and 5 columns. Try typing dim(x), and hit enter the output will show 3 and 5. 3 for rows and 5 for columns. Now type attribute(x) and hit enter and the output will give you the dimension of your matrix which is 3x5.





















Take note that matrices are created in a column-wise manner. This means that the first column is filled first and when the maximum number of rows is reached the next column is then filled. Say for example we type the expression x<-matrix(1:10, nrow=2, ncol=5).





















The maximum number of rows in our expression is two, as you have observed that the first column [,1] is filled first and when the maximum number of rows are reached, [1,] and [2,], the next column [,2] is then filled and so on.

Matrices can also be created by its dimension function, dim(). Lets create a series of numbers from 1 to 10, then lets create a 2x5 matrix (a matrix with 2 rows and 5 column) from this series of numbers using the dim() function.





















Matrices can also be made by cbind or rbind, this creates a matrix by binding rows and columns. If you want your vectors, say x and y, to be a part of the column, the cbind function cbin() will be used while if you want your vectors as a part of the rows then the rbind function rbind() is used.





















Aside from matrices, a list is also a special kind of vector that can be used in R. List are special because it contain different sets of atomic classes. This special kind of vector uses the list function, list(). Say for example you want to create a list x<-list("1+i", 3, "FALSE", "a"). This is a list containing a complex atomic class, a numeric, a logical and a character. The output is different because each element in the list has a different atomic class.

Wednesday, June 11, 2014

Introduction to R

Now that we have integrated our Google Analytics to our Facebook page, lets start to learn R. The R Language is probably the most powerful and the most versatile free software for Data Analytics. It contains comprehensive packages supported by millions of programmers working in the academe and research around the world. Commercial analytics program like "Oracle R" have appreciated the value of this free software and have adopted the R Language and its environment to support statistician, data analysts, and data scientists to perform advance analytics. Oracle R used the R language to generate sophisticated graphics in their programs. R Language has also been integrated in other commercial analytics software like Adobe AnalyticsGoogle AnalyticsSQL. Even IBM, the maker of SPSS, have used R to extend its functionality. SAP and TIBCO-SPOTFIRE have join in the band wagon in integrating R Language. The newest member of the commercial software who integrated R into their system is Tableau 8.1.


Because R Language is free and is supported by millions of programmers in the academe and research, the functionality of the R Language is probably more powerful and flexible than SAS, there has been a R Language vs. SAS debate ongoing in the Data Analytics community for years. For me though, coming from an avid SPSS user who moved to STATA then to SAS then to R, I find R more appealing because of its readily available packages and more flexible environment.

Now let's go to the more serious discussions. The basic 'atomic classes of objects' in R are composed of:
- Characters (a,b,c,d,e,f,g)
- Complex (1+i, 1-2i)
- Logical (True/False, Yes/No, If Yes/If Not)
- Integers (1L, 2L, 3L....)
- Numeric (Real Numbers:1,2,3,4,5.....)

Vector is the most basic object in R, it contains a single or multiple 'atomic classes of objects' under the same class. Example x<-c(1,2,3,4,5) or x<-1 is a numeric vector, y<-c(a,b,c,d,e,f) or y<-a is a character vector, z<-c("true") is a logical vector, w<-c(1L, 2L,3L) or x<-w is an integer vector. on the other hand an empty vector can be made by the vector function:

vector()

The list is a special kind of vector because it contains multiple 'atomic classes of objects' of different classes, say for example: x<-c(a,1,2,1i,2L,"True") is a list.

Numbers (numerics) is the most important atomic class in R. to express a number as an integer simply add the suffix L, say for example 1 is a number but 1L is an integer in the R Language.

inf, or infinity, is also taken as a number say if 1/0=inf and 1/inf=0. NaN on the other hand represents an undefined value, for example 0/0=NaN. A NaN value also means that there is a missing value in your vector.

So, this ends our introduction to R, our next topic will be on basics of R.

Monday, June 9, 2014

Integrating Google Analytics to Facebook Fan Page

We have successfully linked our Google Analytics account into our blogspot. Here we will discuss on how to integrate Google Analytics to your Facebook Fan Page. It is assumed that you already have a Google Analytics account and a Facebook Account. I suggest you also create a Facebook Page. In your FB search box type: Static HTML:iframce tabs and click the star symbol.



It will redirect you to their page, click the visit website (encircled in the black).



Click the "Add Static HTML to a Page" button.



It will redirect you to the Add Page Tab Button, in this case you will be presented with a drop down menu. Here, you  can choose which of your FB fan page you want to integrate the static HTML Button. In my case, I created one fan page named Infection and Immunity Research.


After choosing a FB fan page, visit your FB fan page as an admin, you can see that a star symbol is created. Click that. 

Click the "Edit Tab" button.



and click the "Login with Facebook" button.



On the right side corner you will see the star symbol, click the "Edit Tab" button.



You will see the "Content" option being active.



Go to your Google Analytics account, click "Admin" located at the upper-centre corner and click "Account" and click "Create New Account".



Fill in the necessary information required.



You can read the "terms of Service Agreement" at your own leisure, and click "I Accept".



You will be directed to your Tracking ID and your Tracking Code. 



Copy your Tracking Code.



Go to your StaticHTML account, delete the previous script and replace it with your Google Analytics Tracking Code. 



In your StaticHTML account, click the "FanGate" and click the "Enable FanGate" Button.



Click the "See More Apps" button at the right side of the Instant FanGate.



It will redirect you to another page, click the "Use Static HTML App" button.



A static HTML will be integrated with your Fangate. Copy the same Google Analytic Code to your Fangate.


If you go to your Google Analytics account you will notice that is says Tracking Not Installed, as you do not own Facebook this is usually the case but still you can track your Facebook Fan Page.


View your Facebook fan page as a visitor, go to your Google Analytics account, click "Reporting", click "Real Time"" and click "Overview". To determine that you have successfully tracked your FB fan page there should be a 1 active user on site, this is you as a visitor looking at the fan page.



Our next topic will be introduction to R programming. The exercises on our R Programming will be based on the quizzes given by John Hopkins University School of Public Health and Bio-Statistics. So some exercises might be quite challenging.

Sunday, June 8, 2014

Integrating Google Analytics into Blogger.com

Our previous discussion was about the basics of Git commands. Now one major topic that we will discuss in R Programming is Business Intelligence and Web Analytics using R.  It is better to use the data generated from your own social media account when we have our sample exercises. This will also provide you an opportunity to use your own "BIG" data.. For those who have a facebook fan page and a twitter account, integrating Google Analytics into this social media sites will be discussed in a separate topic. So before we engage into R, let's generate our own data first.

let's begin. go to http://www.google.com/analytics, and sign in to access Google Analytics, this requires a gmail account. (not yahoo mail not hotmail but gmail....).


Once you have logged in using your gmail account, you can click the sign-up page.


After clicking signing up, fill the required information:

Accept the Terms of Agreement,

After this step, it will lead you to your Tracking ID and your Tracking Code, although it is an optional step I usually copy my Tracking ID and Tracking code into a text or word document.


Once you have your copy your Tracking ID (high light it and press the Ctrl button and the C button). Go to your blogger.com account. Click "Settings" and click "Other", at the bottom part of your page you will see Google Web Property ID, paste your Tracking ID here (press Ctrl button and V button).


You have integrated your Google Analytics tracking ID to your blogger account. To check if your Google Analytics is working, try viewing your blog.


Now go to your Google Analytics Account, at the upper corner you will see a tab named  "Reporting" click that, click the "Real Time" tab and then click "Overview". It contains 1 active user


And you can also determine which country the active user is located if you scroll down.


Keep this as is and we will coming back to your Google Analytics Account later during our discussion in Business Intelligence and Web Analytics using R Program, there is a program package available in CRAN for Web Analytics. By integrating Google Analytics into R, we can perform advance statistical analysis, predictive modelling and attribution analysis.

Saturday, June 7, 2014

Data Analytics: Introduction to Basic Git

Now that we have learned the importance of data analytics and its applications in our works, businesses and daily lives, we will now discuss the importance of how to store data. Git is a program where your programs can be stored in a remote and local repository. This program is a type of version control system (VCS) and is therefore very helpful if you plan to improve or revise your program more frequently. You can download the Gitbash at this site: http://git-scm.com/downloads.



If you have done installing the Gitbash in your computer, you have to set it up so that you can join the global community of Git users. To set up:
type:

git config --global user.name "your name here"
git config --global user.email "your email here"

to check if the set-up is correct type:

git config --list

to exit the gitbash type:

exit



Now that you have done setting up your Gitbash, let's run down on some basic Git commands:

pwd is the command identifying the current working directory.

clear is the command that cleans the Git console.

ls is the command that lists all the folders and files in the present working directory.

ls-a is the command that presents all the hidden and unhidden files and folders in the current 
working directory.

ls-al is the command that lists all the details in the current working directory.

cd (followed by the name of the directory) is the command that changes one directory to the defined directory.

cd (that is not followed by the name of the directory) is the command that will lead back to the home directory.

cd.. is the command that goes up one level.

cp is the command for copying files.

cp-r is the command for copying a folder and all the files it contained.

rm is the command that remove or delete a file. USE WITH CAUTION THERE IS NO RETRIEVAL OPERATION ONCE DELETED.

rm-r is the command the removes a folder. USE WITH CAUTION THERE IS NO RETRIEVAL OPERATION ONCE DELETED.

mv is the command to move or rename folders.

let's do some simple exercises.
first click the Start menu, then click Computer, Local Disk (C:), click Users and then click User.



Our first exercise to to create a directory named Gerard (or if you want after your first name) at the Computer > Local Disk (C:)> Users>User pathway.
open your Gitbash.
in the Gitbash console type:

mkdir Gerard (I suggest you type your name.)





If you look at the Computer > Local Disk (C:)> Users>User pathway, you can see a new Gerard folder is created. Now let's find out what is the current working directory by typing:

pwd




as you can see the current directory is still at /c/Users/User, this is now yet the working directory that we want, we need to navigate to the Gerard folder . by typing

cd Gerard



Now we are already in the folder Gerard, you can check by looking in the Git console output, it should say: User@User~PC~/Gerard. 

You can also type 

pwd 

to determine what is the current working directory.

after typing pwd you can see that the current working directory is /c/Users/User/Gerard.

We will create new folders under this working directory (the Gerard folder). To create a new directory type:

mkdir Folder 1
mkdir Folder 2
mkdir Note
mkdir Trash



If you have observed in the Gitbash console there is a warning that says mkdir: cannot create directory 'Folder': File exist. this is because on the first syntax command mkdir Folder 1, Gitbash recognizes Folder and 1 as two unique names so it creates two directories one named Folder and the other named 1, when we typed our second syntax command mkdir Folder 2, Gitbash recognizes 2 as a unique name for a directory but it recognizes the word Folder as an existing directory under the current working space.

If you look at your windows and click the Gerard folder, you can see the directories: 1, 2, Folder, Note and Trash.
for the next exercise, we will copy the directory named Folder inside the directory named Note by typing

cp -r Folder Note

in this case, Gitbash recognizes the first directory name as the directory to be copied and the second directory name as the direction where the copied directory will be located.

if you can see the Computer>Local Disk (C:)> Users>User>Gerard>Note pathway has the directory Folder in it.
Now we will try to remove the directory Folder in the Computer>Local Disk (C:)> Users>User>Gerard>Note pathway. First you need to check your current working directory by typing

pwd

if your current working directory is not 

User@User-PC~Gerard/Note

move to the Note directory by typing

cd Note

and try to check your working directory again by typing

pwd

the console should say:

User@User-PC~Gerard/Note

Check the folders in this directory by typing:

ls

(note this is not capital i but a small letter L)

the output should be:

folder

to remove the directory folder in the Note Directory, type

rm-r Folder

As you can see, in the Windows Computer>Local Disk (C:)> Users>User>Gerard>Note pathway, the directory named "Folder" has been deleted. Now we will try to go back to the home directory by typing

cd

check the current working directory by typing

pwd

as you can see the command leads you back to /c/Users/user or  Computer>Local Disk (C:)> Users>User
Now exit the program simply by typing 

exit

You are ready to have your GitHub account to make a remote repository of your programs. Please visit www.github.com to make your account, there is a direction there on how to push your local repository to your remote (online) repository. Just follow the direction on the GitHub and you are ready to go.