Selasa, 30 Oktober 2012

Creating an Instance of SQL Server

BY ALBERT ANTHONY D. GAVINO

Security noteSecurity Note
By default, only members of the local administrators group can start, stop, pause, r
resume or restart a service.
To grant non-administrators the ability to manage services,
 see How to grant users rights to manage services in Windows Server 2003.
 (The process is similar on other versions of Windows.)

To start the default instance of SQL Server

  1. On the Start menu, point to All Programs, point to Microsoft SQL Server 2008 R2, point to Configuration Tools, and then click SQL Server Configuration Manager.
  2. In SQL Server Configuration Manager, in the left pane, click SQL Server Services.
  3. In the details pane, right-click SQL Server (MSSQLServer), and then click Start.
    A green arrow on the icon next to the server name and on the toolbar indicates that the server started successfully.
  4. Click OK to close SQL Server Configuration Manager.

To start a named instance of SQL Server

  1. On the Start menu, point to All Programs, point to Microsoft SQL Server 2008 R2, point to Configuration Tools, and then click SQL Server Configuration Manager.
  2. In SQL Server Configuration Manager, in the left pane, click SQL Server.
  3. In the details pane, right-click the named instance of SQL Server, and then click Start.
    A green arrow on the icon next to the server name and on the toolbar indicates that the server started successfully.
  4. Click OK to close SQL Server Configuration Manager.

To start an instance of SQL Server with startup options

  1. On the Start menu, point to All Programs, point to Microsoft SQL Server 2008 R2, point to Configuration Tools, and then click SQL Server Configuration Manager.
  2. In SQL Server Configuration Manager, in the left pane, click SQL Server.
  3. In the details pane, right-click the instance of SQL Server, and then click Properties.
  4. In the SQL Server (<instancename>) Properties dialog box, click the Advanced tab, and then click Startup Parameters.
  5. At the end of the original text, in the Value column, type the startup parameters you want, and then click OK. Separate parameters with a semi-colon, for example, –c;-m.
  6. Stop and restart SQL Server for the parameters to take effect.
    NoteNote
    SQL Server running on a cluster is best managed by using Cluster Administrator.
    For more information, see Using SQL Server Tools with Failover Clustering.

Rabu, 24 Oktober 2012

Benefits of SQL Server


Microsoft SQL Server is a robust and scalable solution all of your business application needs. Whether you're using it for data centralization for an Access front-end, a primary data source for your web-based software system, or for managing content on your business website; SQL Server is ideal for your application.

Learn more about the benefits of using SQL Server for your data storage needs:
  • Server Roles
  • Database Roles
  • Scalability
  • Centralized Data
  • Data Recovery
  • Security Benefits

Server Roles

Server roles provide levels of access to server operations and tasks.
Server roles are predefined and grant the permission for the whole server. These roles are not database specific and cannot be customized. The table below provides a listing and explanation for each type of server role.
sysadminAble to do anything in SQL Server
serveradminAble to modify SQL Server settings and shut down SQL Server
setupadminAble to install replication and control extended stored procedures
securityadminAble to control server logins and create database permissions
processadminAble to control SQL Server processes
dbcreatorAble to create and modify databases
diskadminAble to manage disk files
bulkadminAble to execute bulk insert statements

Database Roles

Database roles provide the assignment of a set of database-specific permissions to an individual or a group of users. Database roles can be assigned to Windows Authenticated logins or SQL Server Authenticated logins. Roles that are assigned to Windows Authenticated logins can be assigned to NT users and NT groups. Roles can also be built with a hierarchical model.
Database roles are database specific. SQL Server provides three types of roles:
  1. Predefined database roles
  2. User-defined database roles
  3. Implicit roles

Predefined Database Roles

Predefined database roles are standard SQL Server database roles. Each database in SQL Server has these roles. Predefined database roles make it easy to delegate responsibility.
Predefined database roles are database specific and cannot be customized. The table below provides a listing and explanation for each type of database role.
db_ownerHas complete access to all objects within the database, can drop and re-create objects, and has the capability to assign object permissions to other users. It can modify database settings and perform database maintenance tasks. This role encompasses all functionality listed in the other predefined database roles.
db_accessadminControls access to the database by adding or removing Windows Authentication users and SQL Server users.
db_datareaderHas complete access to SELECT data from any table in the database. This role does not grant INSERT, DELETE, or UPDATE permissions on any table in the database.
db_datawriterCan perform INSERT, DELETE, or UDPATE statements on any table in the database. This role does not grant SELECT permission on any table in the database.
db_ddladminHas the capability to create, modify, and drop objects in the database.
db_securityadminPerforms security management within the database. This role manages statement and object permissions and roles within the database.
db_backupoperatorHas the capability to back up the database.
db_denydatareaderDenies SELECT permission on all tables in the database. However, this role does allow users to modify existing table schemas. It does not allow them to create or drop existing tables.
db_denydatawriterDenies data modification statements (INSERT, DELETE, or UPDATE) from being performed against any tables in the database.
publicEvery database user is a member of the public role. A user automatically becomes part of the public role when she is permitted access to the database.

Scalability

Computer systems typically grow over time. Maybe you have a need for 30 concurrent users in your application now, but you have plans to grow and expand. SQL Server is built to be scalable and is able take your application from 30 users to thousands of users, quickly and easily.

Table size can be an issue with other database servers, but not SQL Server. The table size is only limited by the amount of space that you have on your physical server. This means that 90% of SQL Server users can store all of their data within the same instance.

This database server gives you so much flexibility that you can have multiple applications using the same SQL Server. Imagine your web applications and desktop applications using the same data, so that people in the office and employees remotely could simultaneously update data just as if they were sitting next to each other. When you choose SQL Server, you choose flexibility for your data-driven applications.

Centralized Data

Having a centralized place to store data is a great benefit to SQL Server users. Centralization is a primary SQL Server benefit and means that everyone is using the same data source. As a result there is no need to merge information together in order to receive an accurate version of a record.
With centralized data, every time that you retrieve a record, you will be confident that you have the latest information.

Data Recovery

One of the SQL Server benefits is the presence of built-in tools to manage data and database backups.  Imagine if you had a Microsoft Access database and it was deleted.  There is no good way to get your information back.  With SQL Server, you have the ability to schedule regular backups of the database.  These back-ups could be stored in a separate location for disaster recovery.
There are two kinds of backups available in Microsoft SQL Server:
  • Full Backup
  • Differential Backup

Full Backup

A full backup stores a complete copy of the database on any supported backup media.  You do not have to take the database offline to perform a full backup, however these backups consume a large amount of resources and users may notice that response times are a little slower.  Full backups can be scheduled for off peek times to minimize the impact that performance degradation would have on the end users.

Differential Backup

There are situations where you may not want to use as many resources as a full backup requires, but feel that it is important to have a backup at a certain time.  This is when you may consider a differential backup.  This backup uses less resources since it backs up information that has changed or been added since the last full backup.  **It is very important to note that a differential backup alone will not be enough to restore the database, since it only contains data that was modified since the last full backup.

Security Benefits

There is an old saying that states that knowledge is power.  The ability to securely store and retrieve data is one of the pillars of the IT world.  With SQL Server there are several security options to ensure that your data is stored safely.
SQL Server supports two security modes:
  1. Windows Authentication
    • Uses the credentials of the person logged into the Windows machine.
  2. SQL Server and Windows Authentication (Mixed Authentication)
    • Uses both Windows Authentication and SQL Server Authentication.
If you are using Mixed Authentication, please note that sa (which means system administrator) by default, does not contain passwords.  The sa username is very powerful and gives the user the right to manipulate all databases on SQL Server.  After the installation of SQL Server, please make sure that changing the sa password is at the top of your list of things to do.
If you are using Mixed Authentication it is good practice to create a username and password for applications that would restrict access to only the databases that the application uses.
There are two types of permissions in SQL Server:
  • Server Roles
  • Database Roles

Disadvantages of using MS Access, why choose MS SQL


While there are many advantages to choosing Microsoft Access as your database management system, there are also a few disadvantages that you might want to consider.

Concurrent Users

Even though Microsoft has stated that Access is able to support 255 concurrent users, it is a more practical choice to select Access when the database will be used by only 15 to 20 simultaneous users. If more than 20 users try to gain access to the database, performance and response time may begin to suffer.

Multiple Windows Operating Systems

Whenever various Windows operating systems like Windows 98, Windows ME, Windows 2000, Windows XP, or Windows Vista are used to access the same database, the database may experience corruption issues. It is recommended that all Access users are operating with the identical operating system.

Novice User – Design and Development Limitations

A user who is new to software development may find that they are a bit overwhelmed when using Microsoft Access. In order to maximize the potential of Access, a thorough understanding of relational database design along with VBA coding is highly recommended. As with any software development tool, the ease of screen use, database performance, automation of data input, and report quality relies on the skill level and experience of the developer.

Sensitive Data Needs

If you have sensitive data needs like storing health and medical records, financial information, academic records or employment data, for example, you will require more extensive database security than Access can offer. One suggestion would be to use SQL Server as the database engine with an Access front-end. We can help. Please contact us to discuss your needs. We have a solution that will work for you.

Internet Limitations

Microsoft Access does have a few limitations. Access was not designed to be used with applications that are web based since its forms and reports are only functional in a Windows environment and are not compatible with an internet browser like Internet Explorer. The Access Jet database that is used by Microsoft Access is a file based system and does not contain the server features that are available in SQL Server, Oracle or other ODBC compliant databases. Access databases are more suited for web based solutions when the number of users is small or when the users are just viewing data rather than editing or deleting.

Remote Access

In the past, record locking activities in Access caused an increase in response time which resulted in slow performance on a VPN (Virtual Private Network) or WAN (Wide Area Network) when accessing the Access Jet database. However, this problem has been eliminated as a result of quicker broadband connections. It is important to note that performance can also be enhanced if a continuous connection is maintained to the back end database throughout the session rather than opening and closing the sessions every time a table is requested. If you find that Access database performance over a VPN or WAN connection continues to suffer, then perhaps Microsoft Terminal Services is an effective alternative. We have found that Access databases linked to a SQL Server or Access Data Projects work quite well over VPN and WAN connection. Please contact us and let us know how we can help.

Access on a Mac

Unfortunately, Microsoft Access is not designed to operate on Apple Macintosh computers. Since Access is commonly used in many businesses that use PC’s, Microsoft does not have any future plans to expand the database to be functional on a Mac.

Reference:
http://www.sql-programmers.com/home.aspx

Selasa, 23 Oktober 2012

Export your tables into SQL Server Express


Export Microsoft Access (v2003) tables into
SQL Server Express Edition (v2005)

People want to migrate their tables from the measly MS Access to a much powerful SQL Server Express Edition.
  • Open Microsoft SQL Server Management Studio Express
  • Create a database for the import, if necessary
  • Open the Microsoft Access database that will be imported into SQL
  • Switch to “Tables” in Microsoft Access
  • Right click on the Table to be exported
  • Select “Export”
  • In the ‘Save Type As…’ field select “ODBC Databases()”
  • Name the Table and Press “OK”
  • Press the “New…” button
  • Select “SQL Server” and Press “Next>”
  • Enter a [name] for the Data Source then Press “Next>”
  • Press “Finish”
  • Enter a Description if necessary
  • Select the SQL Server to connect to and Press “Next>”
  • Select the correct Authentication settings and Press “Next>”
  • Check “Change the default database to:” and Select the correct Database then Press “Next>”
  • Press “Finish”
  • Press “Test Data Source…” to very the information entered was correct
  • If test completed successfully then Press “OK”
  • In “SQL Server Login” dialog box, enter correct authentication information and Press “OK”
  • Export should occur without any further dialog boxes or prompts
  • Switch back to the Microsoft SQL Server Management Studio Express
  • Navigate to the database, refresh the console and verify the table was created/imported
Sample Video of creating a table in MS SQL server


Data Warehousing, first step in Data Mining.

Data Warehousing is one of the important aspects in data mining, without a data warehouse, you will get lost with all your data, Imagine thousands of excel files that you will be merging without any primary key, this will really make you feel lost, Always structure your data in such a way that they can be linked together like a star schema or snow flake schema, make sure that your data lies in one platform, it can be in MS SQL Server, Oracle or a large scale database server.

Once all of these are in place, then that is the time you will be able to do data mining, up to the last minute detail. This type of process is also helpful in creating your Digital Dashboard.


Kamis, 18 Oktober 2012

Creating Metrics for your Digital Dashboard

by Albert Anthony D. Gavino



Most businesses think or Educational Institutions think its pretty easy to make a digital dashboard, when in fact its not, the first thing you may want to consider is your available data, now if your data is in MS SQL server or in one intact database and assuming most of its records are clean and quantitative in nature, which means they consist of dates, binary codes (1,0), income levels in one rate then they are good to go, However if your database is in rotten or dirty condition, doing a data mining is not a good idea.

Furthermore, you may want to consult with your Board of Directors what kind of metrics they want, is it a metric of INCOME, what is the target Income for this year, these things are important in establishing your dashboard, which in turn will become valuable reports real-time. Its the same scenario when your buying or selling a certain stock, the data is in real time.

for example:
Number of Actual Students for SY 2011-2012: 5,000 (is your actual data)Target Metric of Students for SY 2011-2012: 6,000 (is the metric set by your BOT)
Also consider the power of your hardware in your company or institution, this way, when users query reports on your dashboard it will be speedy.

Digital Dashboard/ Business Intelligence Software

  • Qlikview
  • Cognos Insight
  • Oracle Business Intelligence
  • Dundas
  • Microsoft

Online Applications:

One of the famous free apps you can find on the web for your dashboard solutions for a website is your friendly Google Analytics apps, Google analytics is free and tags your customer base globally and regionally, most of its stats are based on IP addresses and even be used for marketing such as Google Ad Words.

to get a google analytics account you will need a gmail account and just sign up to this URL, http://www.google.com/analytics/ Voila! you know have a digital dashboard for your web site.

to set your own metrics, Google Analytics allows you to set them on your goals and objectives.

Who uses Business Intelligence and Predictive Analytics Software?

  • Banks who approve house loans, credit loans
  • Credit card companies who approve credit card applications
  • Education institutions who would want to screen for scholarship applications



Using Crosstabs to validate or quality check data


One of the useful things in SPSS is to use Crosstabs to validate data, you can also put category and sub-category as what we do in pivot tables in excel, to do this just click on Analyze>Descriptive Statistics>CrossTabs (SPSS version 15.0)




You can put variables on row vs columns to compare the dataset(s) for quality assurance.

Categories and Sub-Categories, Solution to cleaning "string" and "text" fields

by Albert Anthony D. Gavino

Region or Geographic Location Categories and Sub-Categories


Database Administrators have problems cleaning "strings" and "texts", to prevent this from happening you could always resort to putting categories and sub-categories, for example think of a solution for ZIP-CODES, it can be broken further into COUNTRY, REGION, CITY.

In my case I live in the Philippines, so its good to have LUZVISMIN which are 3 categories namely Luzon, Visayas and Mindanao and then Region has 17 regions including NCR, and lastly the difficult part would be CITY such as Metro Manila, Cavite and Quezon City for example. This gives you a clear idea that your variable ZIP code needs to be connected to at least three other fields, so when creating your ZIP code table, include these 3 additional variables to make life easier, hence no need to clean your "STRINGS" or "TEXT FILES" for other variables that you can't categorize, make sure you have a category such as "OTHERS", rule of thumb of what I do is limit your "OTHERS" category to at least 10 percent. 

for Geographic Region
here, your category is structured as: 
Category Design: COUNTRY > LUZVISMIN > Region > City
ex. Philippines > Luzon > NCR > Manila

Geographic Tip: 
Add Country: ABROAD, to make way for Foreign Students/ Applicants

University or College Categories or Sub-Categories

the same thing can be used for a School or University say for example:
Category Design: University > College > Degree Program 

Ex. De La Salle University > College of Engineering > B.S. in Mechanical Engineering

Personal Tip:
Don't use text boxes in your web app forms, if you want your Database to be clean, instead USE Drop Down Boxes and Radio buttons, this also follows for Survey Forms. As much as possible do not use text fields, Use them only for qualitative purposes such as Comments and Feedback, but do not use them for categories as this will be a big problem to your database cleaning.

Philippine Regions

Luzon
  1. NCR
  2. CAR
  3. Ilocos
  4. Cagayan Valley
  5. Central Luzon
  6. CALABARZON
  7. MIMAROPA
  8. Bicol
Visayas
  1. Western
  2. Central
  3. Eastern
Mindanao 
  1. Zamboanga
  2. Northern Mindanao
  3. Davao
  4. SOCCSKSARGEN
  5. Caraga
  6. ARMM


One Platform, less headaches

by Albert Anthony D. Gavino

When doing process flow in data mining its best to stick to one platform or one database, if its MS-SQL Server 2008, MS-Access 2005 or any other platform just use one, hence you won't have problems connecting your tables and your queries will be much more simple.

Lets hope your not using MS-Excel or other low spreadsheet files since their rows and columns are limited, trust me excel file was not built for tables and small database files. Excel files are for financial statements and balance sheets and that's probably it.

TIP: DO NOT USE EXCEL FOR DATA MINING

In SPSS data modeler, they have tools where you can have queries straight from your SQL server, this makes life much more easier.

Rabu, 17 Oktober 2012

STAR schema and Snowflake Schema

Item
SNOWFLAKE SCHEMA
STAR SCHEMA
Ease of use
More complex queries and hence less easy to understand
Less complex queries and easy to understand
Query performance
More foreign keys-and hence more query execution time
Less no. of foreign keys and hence lesser query execution time
Normalization
Has normalized tables
Has De-normalized tables
Type of datawarehouse
Good to use for small datawarehouses/datamarts
Good for large datawarehouses
Joins
Higher number of Joins
Fewer Joins
Dimension table
It may have more than one dimension table for each dimension
Contains only single dimension table for each dimension
When to use
When dimension table is relatively big in size, snow-flaking is better as it reduces space.
When dimension table contains less number of rows, we can go for Star schema.
Ease of maintenance/change
No redundancy and hence more easy to maintain and change
Has redundant data and hence less easy to maintain/change

Unstructured tables and Foreign Keys

by Albert Anthony D. Gavino

To be able to data mine correctly, one must have organized tables and structures, usually designed by the Database Administrator (DBA).

One of my worst nightmares is dealing with Manual tables and files coming from excel that is, not only did this bring me a problem of efficiency, the design of its foreign key was also faulty, imagine a foreign key that resets every term and every school year, hence the problem of creating Schemas for the data structure.

let me lecture on Important aspects of tables

if your foreign key repeats itself and its status, create another foreign key that will not repeat, but this key should also be reflected in your other tables so that you will be able to connect them altogether.

say for example:

SCHOOLYEAR+TERM+STATUS+CASENO

  • 2011-2012-1-FR-1234 (freshman)
  • 2011-2012-1-TR-1234 (transferee)
  • 2011-2012-1-UG-1234 (2nd UG degree)

SQL Server 2008
  1. Primary Key Constraint: Primary Keys constraints prevents duplicate values for columns and provides unique identifier to each column, as well it creates clustered index on the columns.
  2. Foreign Key Constraint: When a FOREIGN KEY constraint is added to an existing column or columns in the table SQL Server, by default checks the existing data in the columns to ensure that all values, except NULL, exist in the column(s) of the referenced PRIMARY KEY or UNIQUE constraint.
  3. Default Constraint: Default constraint when created on some column will have the default data which is given in the constraint when no records or data is inserted in that column.


Rabu, 10 Oktober 2012

Data Mining: Missing Data and Imputation Solutions

Missing Data has been one of my constant problems especially when it comes to data mining, it will have a great effect on my data sets on its accuracy and the correct model used when using IBM data modeler, we have been discussing whether an imputation solution would be good to use, but my colleague still has his hesitations upon using that method. It is then that I move to my second option to generate more data sets to satisfy the appropriate number of records that we can use. Our director is very demanding on our outputs also.



so here are certain Imputations that one can use, ask first your colleagues or boss if they would allow it. IBM data modeler has also an option when your lacking in records just choose (Reduce or Boost) nodes in the model when using C & R Decision Tree.

I have always wanted to use default values to replace the missing values, like for age for example, if it ranged from 18 to 21, I would probably be putting a value of 17 for these fields so it would make my data more robust.


MULTIPLE IMPUTATION in IBM-SPSS Command Additional Features

The command syntax language also allows you to:
• Specify a subset of variables for which descriptive statistics are shown (IMPUTATIONSUMMARIES subcommand).
• Specify both an analysis of missing patterns and imputation in a single run of the procedure.
• Specify the maximum number of model parameters allowed when imputing any variable (MAXMODELPARAM keyword)


Download IBM DATA MODELER 12.0 

Selasa, 09 Oktober 2012

BayesiaLab, a new competitor of IBM-Data Modeler

I came across this add from my Facebook Account, a BayesiaLab seminar at singapore, from November 6 to 8, 2012. 


Bayesian Network Application


The course covers the basics of probabilistic graphical models and introduces BayesiaLab as the software platform for manually modeling and machine-learning Bayesian networks. Participants will learn how to generate Bayesian networks for a wide range of analytics tasks, including:
  • prediction/forecasting 
  • diagnostics
  • classification
  • clustering
  • missing values imputation
  • what-if scenario simulation
  • target optimization

Pricing Information

As a general reference point, prices for renting a commercial single-user license of BayesiaLab 5.0 start at approx. USD3,500/year (Standard Edition)

Trial Version:

Download BayesiaLab 5.0.7 Trial

A free 30-day evaluation version of the latest release of BayesiaLab 5.0.7 Professional Edition is available for immediate download from the Bayesia S.A.S. server:
Click here to register and download BayesiaLab (Windows, OS X, Linux/Unix, 32/64-bit)
This will allow you to experiment with new the features of BayesiaLab 5.0.7, plus you can try out all the Bayesian network examples explained in our series of white papers.


Personal Take:

at the rate of 41 pesos to 1 usd the software only costs 143,500 pesos compared to IBM data modeler which amounts to more than 450,000 or more per license for a Data Mining Software. However the software was built from S.A.S., a competitor of the SPSS line of products. I will have to say IBM will have to be more aggressive in marketing their Data Mining software and Business Intelligence Products.