Monday 21 July 2014

COMPREHENSIVE VIEW ON CRAN PACKAGES- Robust Analysis of Data In eXtreme (RADIX)





RADIX101 has been designed with an intention to ease the search of suitable packages for specific Data Analysis in R. As RADIX is well known for its Source/Origin of Something, here it refers to a systematic beginning of Robust Data Analysis towards Extreme Statistical Computing in R.



Purpose of RADIX101:
Ø  Systematic Approach to any Data Analytics using R.
Ø  Comprehensive categorization of numerous CRAN Packages into Data Specific Desired Outcomes.
Ø  Platform on Wide range of specialised statistical packages under various Data Analytic stages.

  

RADIX101 categorises CRAN packages into 3 main stages:-
1.      Pre Modelling stage:- Various Analysis performed on Raw Data prior suitable models are applied. This stage comprises of :-
   Data Visualisation – Different kinds of visualisation plots to understand the Data spread.
   Data Statistics – Statistical Exploration of Raw Data.
   Data Transformation- Suitable Transformation on Raw Data prior to models.
   Outlier Detection – Analysis of Outliers in the Raw Data.
   Feature Selection- Prior Model Selection of important Independent/Predictor variables.
   Dimension Reduction- Reduction of Rows (Input Fields) and Columns (Independent Variables) by combination or merging using suitable algorithms.

2.              Modelling stage:- Various analytical models to study the Data to observe and predict suitable insights and outcomes.

   Continuous Regression- Basic and Advanced models on Regression of Continuous Data.
   Ordinal Regression- Basic and Advanced models on Regression of Ordinal Data.
   Classification- Basic and Advanced models on Classification predictions.
   Clustering- Basic and Advanced algorithms for suitable Clustering methods.
   Time Series- Time Series Analysis includes AR, MA, ARMA, ARIMA, SARIMA.
   Survival – Survival Analysis includes Cos Proportion Hazard models, Kaplan Meir Curves, Risk Prediction models etc.
   Association/Conjoint- To study and analyse the Transaction Patterns.
   Probabilistic Choice – Analysis of choices using Probabilistic/Discrete choice theory.
   Fraud Analytics- To study the frauds using Benfords algorithm.
   Miscellaneous Models- Models for Text Mining, Customer Relationship Management, Boosting, Bagging etc.


3.      Post-Modelling Stage:- Various Evaluation packages for validation of models along with ROC analysis.
   General Model Validation- To study model accuracy for model comparisons and diagnostics
   Regression Validation- Validation of Regression Models.
   Classification Validation- Validation of Classification Models.
   Clustering Validation- Validation of Clustering Models.
   ROC analysis- Receiver Operating Characteristic (ROC) curves for complete empirical descriptions.



P.S. 
 R is an Open Source Software Programming Language used for Statistical Computing, Data Analysis and Graphical Visualisations. A numerous packages have been created which allow specialised statistical techniques to be performed on various data sets with different business objectives. There are more than 5,800 additional packages (a core set of packages are already pre-installed) and 120,000 functions (as on June 2014) available at Comprehensive R Archive Network (CRAN).These Packages provide wide range of statistical techniques like Regression Analysis, Classification Analysis, Clustering Analysis etc. using a varied number of models such as Neural Networks, Bayesian Techniques, Nearest Neighbour etc. with some specialised in certain domains like HealthCare, Bio-Sciences, Genetic Studies etc. Graphical User Interfaces of R software like Rattle, Deducer, RStudio, RGUI etc. are also widely used with specific design outcomes.








2 comments:

  1. RADIX is a great effort ! Summarizes much-needed info every 'R'-bie looks for ...

    ReplyDelete
    Replies
    1. @Mayank:- Thanks a lot for your encouragement on blogging.

      Delete