RADIX101
has been designed with an intention to ease the search of suitable packages for
specific Data Analysis in R. As RADIX is well known for its Source/Origin of
Something, here it refers to a systematic beginning of Robust Data Analysis
towards Extreme Statistical Computing in R.
Purpose of RADIX101:
Ø
Systematic
Approach to any Data Analytics using R.
Ø
Comprehensive
categorization of numerous CRAN Packages into Data Specific Desired Outcomes.
Ø
Platform
on Wide range of specialised statistical packages under various Data Analytic
stages.
RADIX101
categorises CRAN packages into 3 main stages:-
1.
Pre
Modelling stage:-
Various Analysis performed on Raw Data prior suitable models are applied. This
stage comprises of :-
•
Data Visualisation – Different kinds of
visualisation plots to understand the Data spread.
•
Data Statistics –
Statistical Exploration of Raw Data.
•
Data Transformation- Suitable Transformation on
Raw Data prior to models.
•
Outlier Detection – Analysis of Outliers in the
Raw Data.
•
Feature Selection- Prior Model
Selection of important Independent/Predictor variables.
•
Dimension Reduction- Reduction of Rows (Input
Fields) and Columns (Independent Variables) by combination or merging using
suitable algorithms.
2.
Modelling
stage:- Various
analytical models to study the Data to observe and predict suitable insights
and outcomes.
•
Continuous
Regression- Basic and Advanced models on Regression of Continuous Data.
•
Ordinal
Regression- Basic and Advanced models on Regression of Ordinal Data.
•
Classification-
Basic and Advanced models on Classification predictions.
•
Clustering-
Basic and Advanced algorithms for suitable Clustering methods.
•
Time Series- Time Series Analysis includes AR,
MA, ARMA, ARIMA, SARIMA.
•
Survival – Survival Analysis includes Cos
Proportion Hazard models, Kaplan Meir Curves, Risk Prediction models etc.
•
Association/Conjoint- To study and analyse
the Transaction Patterns.
•
Probabilistic Choice –
Analysis of choices using Probabilistic/Discrete choice theory.
•
Fraud Analytics- To study the frauds using
Benfords algorithm.
•
Miscellaneous Models- Models for Text
Mining, Customer Relationship Management, Boosting, Bagging etc.
3.
Post-Modelling
Stage:- Various Evaluation packages for
validation of models along with ROC analysis.
•
General Model Validation- To study model
accuracy for model comparisons and diagnostics
•
Regression Validation- Validation of
Regression Models.
•
Classification Validation- Validation of
Classification Models.
•
Clustering Validation- Validation of
Clustering Models.
• ROC analysis-
Receiver Operating Characteristic (ROC) curves for complete empirical
descriptions.
P.S.
R
is an Open Source Software Programming Language used for Statistical Computing,
Data Analysis and Graphical Visualisations. A numerous packages have been created
which allow specialised statistical techniques to be performed on various data
sets with different business objectives. There are more than 5,800 additional
packages (a core set of packages are already pre-installed) and 120,000
functions (as on June 2014) available at Comprehensive R Archive Network
(CRAN).These Packages provide wide range of statistical techniques like
Regression Analysis, Classification Analysis, Clustering Analysis etc. using a
varied number of models such as Neural Networks, Bayesian Techniques, Nearest
Neighbour etc. with some specialised in certain domains like HealthCare, Bio-Sciences,
Genetic Studies etc. Graphical User Interfaces of R software like Rattle,
Deducer, RStudio, RGUI etc. are also widely used with specific design outcomes.
RADIX is a great effort ! Summarizes much-needed info every 'R'-bie looks for ...
ReplyDelete@Mayank:- Thanks a lot for your encouragement on blogging.
Delete