Linear Models, Variable Selection, Artificial Intelligence
By Riyadh Alrawkan, Edward Boone, Ryad Ghanam, Anton Westveld
TLDR
This paper introduces an AI-based (ANN) method for variable selection in linear models, outperforming traditional techniques.
Key contributions
- Introduces an Artificial Neural Network (ANN) for variable selection in linear regression models.
- Trains an ANN to assess variable significance using OLS estimates.
- Simulations demonstrate improved accuracy over traditional methods like LASSO, AIC, and BIC.
- Offers a pretrained ANN (up to 100 predictors) and code for practical application.
Why it matters
Variable selection is crucial for robust linear models, and traditional methods often have limitations. This AI approach offers a novel, more accurate solution, potentially simplifying model building and improving predictive power across various fields.
Original Abstract
Variable selection in linear regression models has been a problem since hypothesis testing began. Which variables to include or exclude from a model is not an easy task. Techniques such as Forward, Back ward, Stepwise Regression sequentially add or delete variables from a model. Penalized likelihood methods such as AIC, BIC, etc. seek to choose variables that have a significant contribution to the likelihood. Penalized sum of square methods such as LASSO and Elastic Net have been used to penalize small coefficients to only allow variables with large coefficients in the model. This work introduces an Artificial Intelligence approach to model selection where an ANN is trained to determine the significance of the variables based on OLS estimates. A simulation study shows the accuracy across various sample sizes and variances. Furthermore, a simulation study is conducted to compare the performance of the approach against Forward, Backward, AIC, BIC and LASSO. The approach is illustrated using a dataset from the World Health Organization regarding Life Expectancy. A github link is provided to the pretrained ANN that can handle up to 100 predictor variables, the original WHO dataset and the subset used in this work.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.