Kamis, 25 April 2013

Selecting Statistical Tests

by Albert Anthony D. Gavino

Parametric and Non-parametric tests

My office mate uses technical terms in statistics like parametric and non parametric tests, but what are they actually? parametric tests are those that involve interval like data such as weights, height because they can be computed for numerical calculations unlike those of non-parametric tests like variables we cannot compute values on like male and female.

These tests are important in the field of researchers as they begin to plan their data for a specific statistical tool.

Experimental Conditions

any research design has an experimental condition, as such researcher would want to have one condition, two conditions or two or more conditions. The more conditions there are, the more complex they become.

Related or Unrelated designs

this only means if you use a group or participants and be using them again, they are regarded as related, unlike for independent groups, they are called unrelated designs.

Decision Charts

Decision charts are useful for analyzing which statistical test fits your research problem that you would want to solve with a correct statistical test. Here is an example of a statistical decision chart



A Decision Tree Guide on what Statistical Tool to Use

Statistical Decision Tree based on type of data

Determine the data types
Data types range from Nominals, Ordinal or Interval/Scale

If its Ordinal or if the variable has an order, you may opt to test the relationships between them or the differences among rankings. If its Independent of each other you can use Mann Whitney to test two groups or Kruskal Wallis ANOVA for three or more groups. If the groups are Dependent of each other, use Wilcoxon for two groups and use Friedman's two way ANOVA for three or more groups.



Kamis, 18 April 2013

E Learning Edge and Data Analytics

by Albert Anthony D. Gavino, MBA

I just attended a free session on E-Learning Edge: Data Analytics 102, (with the help of our good Marketing Expert Adolfo Aran III) we looked around and there were around 15 to 20 participants. Our speaker had a background in Statistics and MBA and discussed the basics of Data Analytics which involves the following:

  • Information
  • Statistics
  • Technology
  • Strategy
  • Communication
First of all Data analytics doesn't come with a push of a button, it involves proper problem statement of cases, it involves cleaning up of databases, it involves using Statistical Software and lastly involves a proper reporting to its stakeholders such as Infographics. Models such as CHAID, Decision Trees and Logisitic Regression Models are all scientific, but at the end of the day its our stakeholders who we need to communicate with. 

What are the resources that you need?
  1. Statistical software such as SPSS or SAS
  2. a Data Warehouse that is up and running
  3. Research Oriented people who have a background in Statistics
  4. Marketing people who know how to shape the info into Palatable content
  5. and Lastly shareholders who know how to use Technology 
The Data Mining Industry or Business Intelligence, data analytics are shaping the Information Technology Industry with concepts such as Big Data, use of hardware such as Terabytes, Cloud computing with the entry of Google Drive and even Aggressive marketing with Data ads targeted at customers by which Facebook, LinkedIn have now been using.

Always keep in mind that the Innovation with Data Mining is exponentially growing at a fast rate and we now have new careers right in front of us.

For more information about E-Learning Edge


Minggu, 27 Januari 2013

SEM software in the market today


  • LISREL (one of the leaders in SEM and longtime advocate of SEM)
  • IBM-SPSS AMOS (a bit expensive from IBM, but tech support is worth it)
  • SMART PLS (freeware, uses partial least squares method and can run on a small sample size)
Most of these software are used by Statisticians who want to look deeper into their models by using moderation and mediation, Most of these are present in research journals and higher education.

Kamis, 10 Januari 2013

BULK insert statement in MS SQL SERVER 2008

by: ALBERT ANTHONY D. GAVINO

Using BULK INSERT statement in MS SQL Server 2008 helps in importing data files from other sources such as excel, txt into your TABLES in MS SQL Server.

Create a new table in your MS SQL Server
Open SQL Server Management Studio
Select an Instance
Right click Databases>Create a Database
and name the Database as NBA_PLAYERS
with a table name: Contacts

Create the fields as follows:

IDNO (int, not null)
Lastname (varchar(50), not null)
Firstname (varchar(50), not null)

Create a text file on your C: folder with the following records

101, juan, delacruz
102, michael, jordan
103, kobe, bryant

and save it as a txt file

use the code as follows:

BULK INSERT Contacts FROM 'c:\sample\sample.txt'
WITH 
   (
 FIELDTERMINATOR = ',',
 ROWTERMINATOR = '\n"
 )
GO

To view your Appended table

SELECT * FROM Contacts

Kamis, 03 Januari 2013

Using Adventure Works 2008 for Sample

BY: ALBERT ANTHONY D. GAVINO

RECOMMENDED DOWNLOAD

Example AdventureWorks2012_Database.zip
example, 37158K, uploaded Aug 29, 2012 - 36665 downloads

OTHER AVAILABLE DOWNLOADS

Example AdventureWorksLT2012_Database.zip
example, 1131K, uploaded Aug 29, 2012 - 5988 downloads
Example AdventureWorks2008R2_Database.zip
example, 36098K, uploaded Aug 29, 2012 - 48729 downloads
Example AdventureWorksLT2008R2_Database.zip
example, 1028K, uploaded Aug 29, 2012 - 12976 downloads
Example AdventureWorks2008_Database.zip
example, 58410K, uploaded Aug 29, 2012 - 29436 downloads
Example AdventureWorksLT2008_Database.zip
example, 1353K, uploaded Aug 29, 2012 - 7854 downloads

RELEASE NOTES

About this release
This release consolidates AdventureWorks databases for SQL Server 2012,
2008R2 and 2008 versions to one page. Each zip file contains an
mdf database file and ldf log file.

This should make it easier to find and download AdventureWorks databases
since all OLTP versions are on one page.

There are no database schema changes.

For each release of the product, there is a light-weight and full version of the
AdventureWorks sample database.
The light-weight version is denoted by the LT abbreviation in the filename.

An AdventureWorks database can be installed by attaching the database, or using the
 CREATE DATABASE statement.

To attach an AdventureWorks database
  1. Download a database for your SQL Server version.
  2. Unzip the database (mdf) file and log (ldf) file.
  3. From Microsoft SQL Server Management Studio,
    connect to a SQL Server instance.
  4. Right click Databases.
  5. Click Attach.
  6. Click the Add button.
  7. Locate the AdventureWorks database mdf file. For instance,
    AdventureWorks2012_Data.mdf.
  8. Click the OK button on the Locate Database Files dialog window.
  9. Click the OK button on the Attach Databases dialog window to
     attach the database.
CREATE DATABASE statement
From SQL Server Management Studio, execute the following T-SQL:

Example T-SQL
CREATE DATABASE AdventureWorks2012
ON (FILENAME = 'M:\Data\AdventureWorks2012_Data.mdf'),
(FILENAME = 'L:\Tlogs\AdventureWorks2012_Log.ldf')
FOR ATTACH;

MERGE Statement With Table-Valued Parameters


Using Table-Valued Parameter (User-Defined Table Type) as Source Table in the MERGE Statement
(Available in MS SQL Server 2008 r2)
This article continues the discussion of the Table-Valued Parameters article and will combine this new feature with the MERGE statement, yet another new feature of SQL Server 2008.  The stored procedure at the end of that article is as follows:

CREATE PROCEDURE [dbo].[usp_ProcessContact]
@Contact ContactTemplate READONLY
AS

-- Update First Name and Last Name for Existing Emails
UPDATE A
SET [FirstName] = B.[FirstName],
[LastName] = B.[LastName]
FROM [dbo].[Contact] A INNER JOIN @Contact B
ON A.[Email] = B.[Email]

-- Add New Email Addresses
INSERT INTO [dbo].[Contact] ( [Email], [FirstName], [LastName] )
SELECT [Email], [FirstName], [LastName]
FROM @Contact A
WHERE NOT EXISTS (SELECT 'X' FROM [dbo].[Contact] B
WHERE A.[Email] = B.[Email])
GO

The definition of the ContactTemplate user-defined table type is as follows:
CREATE TYPE [ContactTemplate] AS TABLE (
[Email] VARCHAR(100),
[FirstName] VARCHAR(50),
[LastName] VARCHAR(50)
)
GO

As can be seen from this stored procedure, what it does is first update the FirstName and LastName in the [dbo].[Contact] table for all those contacts that exist in the @Contact table-valued parameter based on the Email address.  Then it adds those contact information that are in the @Contact table-valued parameter but are not yet in the [dbo].[Contact] table.
Converting to the MERGE Statement
The first step in converting the UPDATE and INSERT statements into the MERGE statement is to identify the source table and the destination or target table.  In this case the source table is the @Contact user-defined table variable and the destination or target table is the [dbo].[Contact] table.  Once the tables have been identified, we then need to determine the columns that will be used to "merge" the two tables.  In our case, this will be the [Email] column of the [dbo].[Contact] table joined with the [Email] column of the @Contact user-defined table variable.
We then use these tables in the MERGE statement and join them on the columns identified, which will look as follows:
MERGE [dbo].[Contact] AS [Target]
USING @Contact AS [Source]
ON [Target].[Email] = [Source].[Email]

The next step after identifying the tables to "merge" and the columns to join these tables with, we now need to determine what to do for those records that exist on both tables based on the joined column.  In our case, we would want to update the [FirstName] and [LastName] in the [dbo].[Contact] target table with the values in the @Contact source table.
WHEN MATCHED THEN
UPDATE SET [FirstName] = [Source].[FirstName],
[LastName] = [Source].[LastName]

Since the target table has already been identified in the MERGE statement, there's no need to include the target table in the UPDATE statement.  If the target table is included in the UPDATE statement, you will get the following error message:
WHEN MATCHED THEN
UPDATE [dbo].[Contact]
SET [FirstName] = [Source].[FirstName],
[LastName] = [Source].[LastName]

Msg 102, Level 15, State 1, Procedure usp_ProcessContact, Line 11
Incorrect syntax near 'dbo'.

After specifying the statement to perform for those records that exist on both the source table and the target table, the next step is to determine what to do for those records that exists in the source table but are not in the target table.  These records are the new contacts that are not yet in the [dbo].[Contact] table.  In our case, we would want to add those new contacts to the [dbo].[Contact] destination table.  The INSERT statement that will be included in the MERGE statement will look like this:
WHEN NOT MATCHED THEN
INSERT ( [Email], [FirstName], [LastName] )
VALUES ( [Source].[Email], [Source].[FirstName], [Source].[LastName] );

Just like in the UPDATE statement above, since the target table has already been specified in the MERGE statement, there's no need to include the target table in the INSERT statement.  If the target table is included in the INSERT statement, you will get the following error message:
WHEN NOT MATCHED THEN
INSERT INTO [dbo].[Contact] ( [Email], [FirstName], [LastName] )
VALUES ( [Source].[Email], [Source].[FirstName], [Source].[LastName] );

Msg 156, Level 15, State 1, Procedure usp_ProcessContact, Line 14
Incorrect syntax near the keyword 'INTO'.


Putting It All Together


Converting the UPDATE and INSERT statements into a single MERGE sstatement, the stored procedure will now look like this:

CREATE PROCEDURE [dbo].[usp_ProcessContact]
@Contact ContactTemplate READONLY
AS

MERGE [dbo].[Contact] AS [Target]
USING @Contact AS [Source]
ON [Target].[Email] = [Source].[Email]
WHEN MATCHED THEN
UPDATE SET [FirstName] = [Source].[FirstName],
[LastName] = [Source].[LastName]
WHEN NOT MATCHED THEN
INSERT ( [Email], [FirstName], [LastName] )
VALUES ( [Source].[Email], [Source].[FirstName], [Source].[LastName] );
GO
To use this stored procedure, you simply declare a local variable with a type of ContactTemplate, fill up the variable with data and pass it as a table-valued parameter to the stored procedure.
DECLARE @Contacts ContactTemplate

INSERT INTO @Contacts
VALUES ( 'mickey@mouse.com', 'Mickey', 'Mouse' ),
( 'minnie@mouse.com', 'Minnie', 'Mouse' )

EXECUTE [dbo].[usp_ProcessContact] @Contacts
SELECT * FROM [dbo].[Contact]

Email FirstName LastName
------------------ ----------- -----------
mickey@mouse.com Mickey Mouse
minnie@mouse.com Minnie Mouse