Authors: Jiaan Han, Junxiao Chen, Yanzhe Fu
Abstract: We introduce CatNet, an algorithm that effectively controls False Discovery
Rate (FDR) and selects significant features in LSTM with the Gaussian Mirror
(GM) method. To evaluate the feature importance of LSTM in time series, we
introduce a vector of the derivative of the SHapley Additive exPlanations
(SHAP) to measure feature importance. We also propose a new kernel-based
dependence measure to avoid multicollinearity in the GM algorithm, to make a
robust feature selection with controlled FDR. We use simulated data to evaluate
CatNet’s performance in both linear models and LSTM models with different link
functions. The algorithm effectively controls the FDR while maintaining a high
statistical power in all cases. We also evaluate the algorithm’s performance in
different low-dimensional and high-dimensional cases, demonstrating its
robustness in various input dimensions. To evaluate CatNet’s performance in
real world applications, we construct a multi-factor investment portfolio to
forecast the prices of S\&P 500 index components. The results demonstrate that
our model achieves superior predictive accuracy compared to traditional LSTM
models without feature selection and FDR control. Additionally, CatNet
effectively captures common market-driving features, which helps informed
decision-making in financial markets by enhancing the interpretability of
predictions. Our study integrates of the Gaussian Mirror algorithm with LSTM
models for the first time, and introduces SHAP values as a new feature
importance metric for FDR control methods, marking a significant advancement in
feature selection and error control for neural networks.
Source: http://arxiv.org/abs/2411.16666v1