Monday 12 May 2014

R mapped with Microsoft SQL in Detail with an Example



                     Mapping of R Code with Microsoft SQL with an Example

INPUT SOFTWARE VERSIONS

1. R Programming :- Version 3.0.3 (64 bit)
2. Microsoft SQL :- SQL Server 2014 (CTP2)
3. Microsoft Windows 7 (Professional)
SQL server can be connected using different platforms such as ODBC and JDBC.

ODBC stands for Open Database Connectivity which has an independent interface platform.
JDBC stands for Jave Database Connectivity in which Java application is used to access databases.

Currently, we will be using ODBC.

Details of Data set:-
1. Name:-  Bank Authentication
2. Source:- UCI repository   https://archive.ics.uci.edu/ml/datasets/banknote+authentication
3. Data set Characteristics:- Multivariate
4. Task Performed:- Classification
5. Number of Independent Variables or Attributed:- 5
6. Number of Instances:- 1072
  
Prior connecting SQL to R, few basic steps need to covered.
(Screenshots are provided for better coherence)

                                                  1. Create a Data Source using ODBC

a) Open Control Panel  >   System and Security  >  Administrative Tools
 Double click Data Sources (ODBC)



b) In User DSN tab, click ADD


c) Select ODBC Driver 11 for SQL server and Click Finish



d) Select the Server you want to connect to and provide a Name to the data source with a short description. Click Next.




e) Select the option for SQL server to verify the Authentication of Login ID. Here, I have used the option of "With Integrated Windows Authentication". Click Next.





f) Select the Default Database where your data is stored using the dropdown list under Change the default database to:. You can also change the Application intent to Readwrite or Readonly mode. Here I have selected ReadWrite mode. Click Next and Finish.





 


You can Test the Data Source and Click OK.
 Finally a new Data Source is created.



                                                  2. Install the Package RODBC in R



a) The following code installs the package RODBC in R
> install.packages("RODBC")
> library(RODBC)


b) Open the ODBC Connection in R
> odbcBank<-odbcConnect("Bank_Authentication")

c) We should close the ODBC connection after running the requisite operations like queries etc. on the Open ODBC connection.
>odbcClose(odbcBank)

d) List of All ODBC Data Sources in your system

> odbcDataSources(type=c("all","user","system"))


e) List of all SQL Type Informations
>getSqlTypeInfo()
>getSqlTypeInfo("Microsoft SQL Server")

f)Reading Tables from ODBC Databases
>odbcBank<-odbcConnect("Bank_Authentication")
>sqlFetch(odbcBank,"dbo.bank_authentication")  ## Read Complete Tables
>sqlFetch(odbcBank,"dbo.bank_authentication",max=100)   ## First 100 rows
>sqlFetchMore(odbcBank,max=100)     ## Shows the next 100 rows
>sqlFetchMore(odbcBank)                  ## Fetches all the remaining rows
>odbcClose(odbcBank)

g) Enquire about column structures in ODBC tables
>sqlColumns(odbcBank,"dbo.bank_authentication")
## To Know about unique columns which can be assigned as primary key in the database.
## Eg. Customer ID, Serial NO., Transcation ID etc.
>sqlColumns(odbcBank,"dbo.bank_authentication",special=TRUE) 

h) To know about the database table catalog, schema, name and type.
>sqlTables(odbcBank)

i) Add a new table to the dataframe in SQL.
## In this scenario, lets add a Serial No. table in the odbcBank database
>odbcBank<-odbcConnect("Bank_Authentication")
> SerialNo<-data.frame(1:1072)
> sqlSave(odbcBank, SerialNo, tablename="SerialNo", rownames=F)  
## Hence, A New Table named SerialNo is added to the list of tables under bank_authentication dataframe


EXAMPLE


## Get the data from Microsoft SQL Server
install.packages("RODBC")
library(RODBC)
bank<-odbcConnect("Bank_Authentication")


## Reading Tables from SQL

bank<-sqlFetch(bank,"dbo.bank_authentication")
str(bank)
mode(bank)
bank1<-bank[,-1]


## Apply logistic regression for Classification

library(caret)
x<-as.matrix(bank1[,1:4])
y<-as.matrix(bank1[,5])
mode(x)
mode(y)
bank_class_knn<-train(bank1[,5]~.,bank1,method="knn")
predict_class<-predict(bank_class_knn)
str(predict_class)


##Confusion Matrix
xtab<-table(predict_class,y)
xtab


## Insert the prediction as a new table in SQL
bank<-odbcConnect("Bank_Authentication")
a<-as.data.frame(predict_class)
sqlSave(bank,a,tablename="predictclass",rownames=F)