Rabu, 15 Mei 2013

My Non-Linear Career Life

by Albert Anthony D. Gavino, MBA

Boring as it seems, I am fascinated by how we jump from one career to another.

My Non-Linear Career
If you have to track down your career into a linear type of graph, its not really good because it wont give you that many options, the best is to have a non-linear career, entering into all sorts of fields like Management, Statistics, Web Analytics, Behavioral Psychology, Law. These fields will widen your horizons into new types of career. Five years ago, I did not imagine myself going back into statistics but if you put down all your domains, these are somewhat related to each other like web analytics to statistics to web development to research.

What I am saying is graduates should not limit themselves to the traditional type of careers that we have, sometimes we have to cross-sell ourselves with other industries. Cross yourself with a manufacturing industry to a banking industry to an academe type of environment. This will broaden your horizons, your contacts and even broaden opportunities. Career Life is ever changing with the demands of industry, Reading on new type of books gets you ahead and more so will a master's degree or a doctoral degree for that matter. Yes, do not be afraid to get into new things like cloud computing, data mining, predictive modeling. Do not fear of the unknown but get interested to read more on new things, as far as we are learning we get to improve ourselves more and be more open minded to new ideas, new culture, new learnings.


My Business Intelligence Framework (4 simple steps on how to get there)

by Albert Anthony D. Gavino

These days, everyone seems to be an expert at business intelligence, but what is business intelligence all about? its just not about data analytics and some graphs and some more pie charts, Business Intelligence involves a meticulous process that starts from the craft of a business strategy. 

Business Strategy
This business strategy comes from the company's vision mission statement aligned with its objectives, there are several ways to attack business strategy, you can use the Michael Porter's model or analyze your competition through market niche or market segmentation.



B.I. framework by Albert Anthony D. Gavino

Performance Management
Performance management is the field that discusses the nitty gritty stuff that involves the dreaded Balanced Scorecard where every organization unit is defined by a perfomance metric such as sales performance or sales quota or number of calls made per day, these metrics are defined by upper management that will in turn drive customer value and share holder value, somewhat taken from the Japanese qualitative methodology that we don't want to read like Six Sigma Belters that exemplify utmost quality by reducing the number of defects almost close to zero.

Data Warehousing
Of course you can't analyze your data if you dont have a good data warehouse, a good data warehouse involves flat files and cubes that are interconnected by a snowflake schema or a starflake schema. These are queried through your SQL server, your MySQL if your on a budget and in some other cases you would be needing your stored procedures that have scripts and subscripts nested within each other, talk about five pages of code with your database administrator (goodluck with that)

and Lastly...

Advance Statistics
your Company Statistician is best consulted with what kind of data you will be handling, no he doesn't care where you put your string fields in, he only has three kinds of variables and that are nominal, ordinal and scale. for in each there is a specific t-test, z-test, chi-square test, for elementary statistics and then there is the advance statistics from linear regression, logistic regression, time series analysis, cluster analysis, CHAID, C5  trees and Neural Networks which involve complex statistical models vastly computed through business intelligence software such as IBM-SPSS data modeler or your other competitor models that you would want to use.




Rabu, 01 Mei 2013

Basic Stat Tools to use for Research


Basic Statistical Chart by Albert Anthony D. Gavino
Basic Decision Tree Chart for Statistical Analysis

How to use:

Step 1: Consider your Independent Variable
Step 2: Indicate the number of experimental conditions
Step 3: Indicate if the groups are related or dependent
Step 4: Identify the variable if it is Nominal, Ordinal or Interval
Step 5: Use the appropriate Statistical Tool for your Research
Step 6: Get ready to use your data with IBM-SPSS software


Kamis, 25 April 2013

Starting a Career in Business Intelligence and Data Mining

by Albert Anthony D. Gavino

New grads like to start a new career that's cool and techie. but what do College students need to learn to get a career in this new industry?

  • a background in Statistics that involves choosing the right statistical tool with a Research Perspective (Choosing a parametric test instead of a non-parametric test) Using Logistic Regression to predict certain outcomes.
  • a background in the field of IT, a basic knowledge of SQL scripts and statements would be helpful and some little knowledge in data warehousing would be an advantage as data ranges from flat files to cubes and some use of snowflake schema.
  • a background in marketing, as you need to present reports in an infographic manner to your specific stakeholders, creating a story out of your models and your theories make it a successful story to tell for your company.
More to come

One and Two Tailed Tests

Going in what direction?

Suppose we have a null hypothesis H
0 and an alternative hypothesis H1. We consider the distribution given by the null hypothesis and perform a test to determine whether or not the null hypothesis should be rejected in favor of the alternative hypothesis.

There are two different types of tests that can be performed. A one-tailed test looks for an increase or decrease in the parameter whereas a two-tailed test looks for any change in the parameter (which can be any change- increase or decrease).

We can perform the test at any level (usually 1%, 5% or 10%). For example, performing the test at a 5% level means that there is a 5% chance of wrongly rejecting H0.

If we perform the test at the 5% level and decide to reject the null hypothesis, we say "there is significant evidence at the 5% level to suggest the hypothesis is false".


 One-Tailed Test

We choose a critical region. In a one-tailed test, the critical region will have just one part (the red area below). If our sample value lies in this region, we reject the null hypothesis in favor of the alternative.

Suppose we are looking for a definite decrease. Then the critical region will be to the left. Note, however, that in the one-tailed test the value of the parameter can be as high as you like.

Example


Suppose we are given that X has a Poisson distribution and we want to carry out a hypothesis test on the mean, l, based upon a sample observation of 3.

Suppose the hypotheses are:

H0: l = 9
H1: l < 9

We want to test if it is "reasonable" for the observed value of 3 to have come from a Poisson distribution with parameter 9. So what is the probability that a value as low as 3 has come from a Po(9)?

P(X ≤ 3) = 0.0212 (this has come from a Poisson table)

The probability is less than 0.05, so there is less than a 5% chance that the value has come from a Poisson(3) distribution. We therefore reject the null hypothesis in favour of the alternative at the 5% level.

Two-Tailed Test

In a two-tailed test, we are looking for either an increase or a decrease. So, for example, H0 might be that the mean is equal to 9 (as before).

This time, however, H1 would be that the mean is not equal to 9. In this case, therefore, the critical region has two parts:

Example


Lets test the parameter p of a Binomial distribution at the 10% level.

Suppose a coin is tossed 10 times and we get 7 heads. We want to test whether or not the coin is fair. If the coin is fair, p = 0.5 .


Put this as the null hypothesis:
H0: p = 0.5
H1: p ≠ 0.5

Now, because the test is 2-tailed, the critical region has two parts. Half of the critical region is to the right and half is to the left. So the critical region contains both the top 5% of the distribution and the bottom 5% of the distribution (since we are testing at the 10% level).

If H0 is true, X ~ Bin(10, 0.5).

If the null hypothesis is true, what is the probability that X is 7 or above?


P(X ≥ 7) = 1 - P(X < 7) = 1 - P(X ≤ 6) = 1 - 0.8281 = 0.1719

Is this in the critical region? No- because the probability that X is at least 7 is not less than 0.05 (5%), which is what we need it to be.

So there is not significant evidence at the 10% level to reject the null hypothesis.

Reference:

Selecting Statistical Tests

by Albert Anthony D. Gavino

Parametric and Non-parametric tests

My office mate uses technical terms in statistics like parametric and non parametric tests, but what are they actually? parametric tests are those that involve interval like data such as weights, height because they can be computed for numerical calculations unlike those of non-parametric tests like variables we cannot compute values on like male and female.

These tests are important in the field of researchers as they begin to plan their data for a specific statistical tool.

Experimental Conditions

any research design has an experimental condition, as such researcher would want to have one condition, two conditions or two or more conditions. The more conditions there are, the more complex they become.

Related or Unrelated designs

this only means if you use a group or participants and be using them again, they are regarded as related, unlike for independent groups, they are called unrelated designs.

Decision Charts

Decision charts are useful for analyzing which statistical test fits your research problem that you would want to solve with a correct statistical test. Here is an example of a statistical decision chart



A Decision Tree Guide on what Statistical Tool to Use

Statistical Decision Tree based on type of data

Determine the data types
Data types range from Nominals, Ordinal or Interval/Scale

If its Ordinal or if the variable has an order, you may opt to test the relationships between them or the differences among rankings. If its Independent of each other you can use Mann Whitney to test two groups or Kruskal Wallis ANOVA for three or more groups. If the groups are Dependent of each other, use Wilcoxon for two groups and use Friedman's two way ANOVA for three or more groups.