Tuesday, June 3, 2014

The "Data" in Data Analytics

Now that we have introduced ourselves to the topic of data analytics, we will now try to understand what is data. Data is a measurement of information that can be in the form of a quantitative measure (objective) or a qualitative fact (subjective) from a set of variables. Variables differ from data in such a way that variables are measurements or characteristics of a data. Therefore variables can be qualitative or quantitative.

An example of a qualitative variable are nominal variables or count data such as gender, country, race, religion and other demographic indicators. Statements from interviews are also termed as qualitative variables, variables you encounter in the social sciences are mostly qualitative. Quantitative variables are scalar in nature, this means that they can be measured, such as height, weight, temperature, pressure and volume, parameter you encounter in the physical and engineering sciences are quantitative. Variables can be dependent or independent. let's take for example a linear equation:


In the equation the dependent variable is "Y" because it represents its dependence to the independent variable which is "X". A unit change in "X" will have an effect on "Y", but a unit change in "Y" does not exert an effect on variable "X". The dependent variable "Y" is also termed as output, response, measured or effect variable while the independent variable "X" is sometimes called input variable, cause, regressor or predictor.

With regards to this, data are often encountered in different ways. Some data are so large and messy, some data needs to be collected and cleaned, some data exist in a surplus that there is an overwhelmingly large amount of data which needs filtering, some data are too few that it needs effort for retrieval and sorting. Whatever these challenges might pose, data analytics has become an innovative way to deal with different kinds of data sets and information. Nowadays, data are very much cheaper and easier to collect, in addition to this there are new tools that have made data analytics easier to perform.

With the advent of faster and more powerful computers, big data analytics have become more available to the public. Big data here refers to the number of data in zettabytes and usuully has a high profile result.

(Please be reminded that our future discussion will involve the use of Gitbash and Github, followed by R programming and then SPSS. You may opt to install this if you are following our discussions.)

No comments:

Post a Comment