Wednesday, 7 May 2014

SUPPORT VECTOR MACHINE (SVM) - Detailed Example on Classification in R



R Code - Bank Subscription Marketing - Classification {SUPPORT VECTOR MACHINE}

 R Code  for Support Vector Machine (SVM)
Data Set:- Bank Marketing https://www.blogger.com/blogger.g?blogID=4359514443959080595#editor/target=post;postID=5426750358330739357
Source of Data Set:-  UCI Repository (http://archive.ics.uci.edu/ml/datasets/Bank+Marketing)
The Code includes the following:-
1. Data Exploration - Missing Values, Outliers
2. Data Visualisation
3.  Correlation Matrix
4. Data Partitioning
5. Package used "e1071"
6. Model Tuning and Plots
7. Confusion Matrix


## The data has been imported using Import Dataset option in R Environment
## The data set can be obtained from http://archive.ics.uci.edu/ml/datasets/Bank+Marketing
## DATASET UNDERSTANDING
head(bank_full) ## Displays first 6 rows for each variable
str(bank_full) ## Describes each variables
summary(bank_full) ## Provides basic statistical information of each variable

## DATA EXPLORATION - Check for Missing Data
## Option 1
is.na(bank_full) ## Displays True for a missing value
## Since it is a large dataset, graphical display of missing values will prove to be easier
##Option 2
require(Amelia)
missmap(bank_full,main="Missing Data - Bank Subscription", col=c("red","grey"),legend=FALSE)

## No red colour stripes are visible. hence no missing values.
##Option 3
summary(bank_full) ## displays missing values if any under every variable

## DATA VISUALISATION
## Use Box plots (Only for continuous variables)- To Check Ouliers
boxplot(bank_full$age~bank_full$subscribed, main=" AGE",ylab="age of customers",xlab="Subscribed")
boxplot(bank_full$balance~bank_full$subscribed, main=" BALANCE",ylab="Balance of customers",xlab="Subscribed")
boxplot(bank_full$lastday~bank_full$subscribed, main=" LAST DAY",ylab="Last day of contact",xlab="Subscribed")
boxplot(bank_full$lastduration~bank_full$subscribed, main="LAST DURATION",ylab="Last duration of contact",xlab="Subscribed")
boxplot(bank_full$numcontacts~bank_full$subscribed, main="NUM CONTACTS",ylab="number of contacts",xlab="Subscribed")
boxplot(bank_full$pdays~bank_full$subscribed, main=" Previous DAYS",ylab="Previous days of contact",xlab="Subscribed")
boxplot(bank_full$pcontacts~bank_full$subscribed, main=" Previous Contacts",ylab="Previous Contacts with customers",xlab="Subscribed")

## Though some outliers are observed in Previous contacts, NumContacts and LastDuration, they have not bee removed keeping their significance into consideration
## Use Histograms (For both continuous and categorical variables)
## These histograms provide details abpout Skewness, Normal Distribution etc
## Function to create histograms for continuous variables with normal curve
bank_Conthist<-function(VarName,NumBreaks,xlab,main,lengthxfit) ## xlab and main should be mentioned under quotes as they are characters
{
  hist(VarName,breaks=NumBreaks,col="yellow",xlab=xlab,main=main)
  xfit<-seq(min(VarName),max(VarName),length=lengthxfit)
  yfit<-dnorm(xfit,mean=mean(VarName),sd=sd(VarName))
  yfit<-yfit*diff(h$mids[1:2])*length(VarName)
  lines(xfit,yfit,col="red",lwd=3)
}
bank_Conthist(bank_full$age,10,"age of customers","AGE",30)
bank_Conthist(bank_full$balance,50,"Balance of customers","Balance",100)

## Balance is more skewed towards to Negative or Zero
bank_Conthist(bank_full$lastday,5,"Last Day of contact","LAst Day",10)
bank_Conthist(bank_full$lastduration,100,"LastDuration of COntact","Last Duration",10)

## Last Duration is more skewed towards 0 to 100 secs.
bank_Conthist(bank_full$numcontacts,30,"Number of Contacts","NUmContacts",20)
## NUmContacts are more skewed towards 1
bank_Conthist(bank_full$pdays,30,"Previous Days of contacts","PDays",20)
## Many were not contacted previously
bank_Conthist(bank_full$pcontacts,20,"Previous Contacts","PContacts",10)
## Since many were not contacted previously, therefore Pcontacts is 0

## Barplots for Categorical Variables
barplot(table(bank_full$job),col="red",main="JOB")
barplot(table(bank_full$marital),col="green",main="Marital")
barplot(table(bank_full$education),col="red",main="Education")
barplot(table(bank_full$creditdefault),col="red",main="Credit Default")

## Since Credit Default is highly skewed towards NO, this shall be removed from further analysis
bank_full[5]<-NULL
str(bank_full)
barplot(table(bank_full$housingloan),col="red",main="Housing Loan")
barplot(table(bank_full$personalloan),col="blue",main="Personal Loan")
barplot(table(bank_full$lastcommtype),col="red",main="Last communication type")
barplot(table(bank_full$lastmonth),col="violet",main="Last Month")
barplot(table(bank_full$poutcome),col="magenta",main="Previous Outcome")

## Correlation Matrix among input (or independent) continuous variables
bank_full.cont<-data.frame(bank_full$age,bank_full$balance,bank_full$lastday,bank_full$lastduration,bank_full$numcontacts,bank_full$pdays,bank_full$pcontacts)
str(bank_full.cont)
cor(bank_full.cont)

## It can be observed that No two variables are highly correlated


 ## Partitioning Data into Train and Test datasets in 70:30
library(caret)
set.seed(1234567)
train2<-createDataPartition(bank_full$subscribed,p=0.7,list=FALSE)
train<-bank_full[train2,]
test<-bank_full[-train2,]
write.table(bank_full,"bank1.csv",sep=",")
str(train)
str(test)



## CLASSIFICATION USING SUPPORT VECTOR MACHINE(SVM) Model
 


install.packages("e1071")
library(e1071)

## TUNING to perform cross-validation to know about optimum cost parameter for SVM modelling. Tune() performs only ten-fold cross validation here.
bank_full.tune<-tune(svm,subscribed~.,data=bank_full,kernel="linear",ranges=list(cost=c(0.01,0.1,1,5,10,100)))
## Diffferent kernels such as Polynomial, Radial Basis and Sigmoid can also be used for tuning.
summary(bank_full.tune)
bestmodal.tune<-bank_full.tune$best.model
summary(bestmodal.tune)

##We get cost=0.01 as optimum one with kernel = linear.
##In the following example, Polynomial function is being used
train.svm<-svm(subscribed~.,train,kernel="polynomial",cost=0.01,scale=TRUE,degree=3,gamma=1)
summary(train.svm)

plot(train.svm,train)  ## Not possible for large data-sets
## Testing data using the trained model
test.svm<-predict(train.svm,test)
## Misclassification Table
table(predict=test.svm,truth=test$subscribed)

2 comments:

  1. 1) There is no variable called subscribed in bank-fill data
    2) In Function to create histograms for continuous variables with normal curve there you have not defined h ?

    ReplyDelete
  2. telasmosquiteira-sp.com.br

    telas mosquiteiras sp
    telas mosquiteira sp

    As telas mosquiteiras sp , telas mosquiteiro sp garantem ar puro por toda casa livrando-a completamente dos mosquitos e insetos indesejáveis. As telas mosquiteira garantem um sono tranquilo a toda família, livrando e protegendo-nas dos mais diversos insetos. Muitos destes insetos são transmissores de doenças e a tela mosquiteira é indispensável no combate a mosquitos transmissores de doenças.
    s
    A dengue, por exemplo, já matou centenas de pessoas só na capital de São Paulo e um pequeno investimento em nossas telas mosquiteiras podem salvar vidas. As telas mosquiteiras também impedem a entrada de insetos peçonhentos como as aranhas e os escorpiões, estes insetos também oferecem risco, pois seu veneno em poucos minutos podem levar uma criança a morte.
    telas mosquiteira jundiai
    telas mosquiteiro jundiai
    telas mosquiteira São Paulo
    telas mosquiteiro São Paulo
    telas mosquiteiras sp
    telas mosquiteiras Jundiai
    telas mosquiteira sp
    telas mosquiteiro Jundiai
    telas mosquiteira sao paulo
    telas mosquiteiro sao paulo

    A chegada da temporada Primavera/Verão traz consigo a elevação da temperatura e a maior ocorrência de chuvas. Mas não é só isso. As estações mais quentes do ano causam muita dor de cabeça e muitos zumbidos indesejáveis em função das pragas urbanas – pernilongos, baratas, cupins e outros insetos -, que afetam todas as regiões brasileiras.

    Nossa missão é oferecer telas mosquiteiras de qualidade a um preço acessível, fazendo com que as telas mosquiteiras sejam uma opção viável para muitas pessoas.

    telas mosquiteiras Jundiaí
    telas mosquiteiro Jundiai
    telas mosquiteiras jundiai
    telas mosquiteiro industria
    telas mosquiteira restaurante
    telas mosquiteiro restaurante
    telas mosquiteira empresa
    telas mosquiteiro empresa

    ReplyDelete