ArXiv TLDR

Modeling Sparse and Bursty Vulnerability Sightings: Forecasting Under Data Constraints

🐦 Tweet
2604.16038

Cedric Bonhomme, Alexandre Dulaunoy

cs.CR

TLDR

This paper explores forecasting sparse and bursty cyber vulnerability sightings, evaluating SARIMAX and count-based models for improved threat intelligence.

Key contributions

  • Investigates forecasting sparse and bursty cyber vulnerability sightings.
  • Evaluates SARIMAX models, finding them inadequate for sparse, short, bursty data.
  • Proposes count-based models (e.g., Poisson regression) for more stable forecasts.
  • Discusses simple operational alternatives like exponential decay for short horizons.

Why it matters

Anticipating vulnerability activity is critical for cyber threat intelligence. This paper offers practical guidance and evaluates methods for integrating predictive analytics into vulnerability intelligence workflows, especially for challenging sparse and bursty data. It helps improve proactive cyber defense.

Original Abstract

Understanding and anticipating vulnerability-related activity is a major challenge in cyber threat intelligence. This work investigates whether vulnerability sightings, such as proof-of-concept releases, detection templates, or online discussions, can be forecast over time. Building on our earlier work on VLAI, a transformer-based model that predicts vulnerability severity from textual descriptions, we examine whether severity scores can improve time-series forecasting as exogenous variables. We evaluate several approaches for short-term forecasting of sightings per vulnerability. First, we test SARIMAX models with and without log(x+1) transformations and VLAI-derived severity inputs. Although these adjustments provide limited improvements, SARIMAX remains poorly suited to sparse, short, and bursty vulnerability data. In practice, forecasts often produce overly wide confidence intervals and sometimes unrealistic negative values. To better capture the discrete and event-driven nature of sightings, we then explore count-based methods such as Poisson regression. Early results show that these models produce more stable and interpretable forecasts, especially when sightings are aggregated weekly. We also discuss simpler operational alternatives, including exponential decay functions for short forecasting horizons, to estimate future activity without requiring long historical series. Overall, this study highlights both the potential and the limitations of forecasting rare and bursty cyber events, and provides practical guidance for integrating predictive analytics into vulnerability intelligence workflows.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.