Minkey Chang

Data Scientist / AI Engineer

Could multivariate time series have their own representations?

For images and text, people mostly agree on what a “good” representation is supposed to do. For multivariate time series, it is less obvious: you might want something low-dimensional that carries time structure, helps forecast, and—if you care about causality—does not change meaning under every reparameterization. Dynamic factor models and deep forecasters both compress the panel, but they emphasize different goals, and identifiability is often where the stories diverge.

Below: where the friction is, how iVDFM (Identifiable Variational Dynamic Factor Model) is put together, what I saw in synthetic and benchmark experiments, and what I would not overread into the results.


Where the friction is

A model can learn an embedding that predicts well while the axes of that embedding are still arbitrary: rotate or warp the latent space and you can leave the forecast almost unchanged. That is fine if you only care about error on the next step; it is awkward if you want to talk about factors as stable objects across runs or datasets.

Classical DFMs are built around factors too, but in the usual Gaussian setup they are identified only up to orthogonal rotations—many rotations give the same likelihood. iVAEs showed that conditioning latents on observed auxiliary variables can pin things down in static settings. The harder part is time: how to carry that idea through stochastic dynamics without reintroducing a free rotation at every step.


How iVDFM is put together

The starting idea is to put identifiability on innovations ηt—the shocks that drive the system—rather than on a loosely defined state, and to use dynamics simple enough that whatever you identify at the innovation level can still be read off in the factors ft.

Diagram: auxiliary variables and regime embedding feed an innovation prior; innovations drive factors through diagonal linear dynamics; factors decode to observations; training uses reconstruction and KL to the innovation prior.
Model overview: innovation prior, diagonal linear dynamics to factors, decoder to observations, and variational training.

Innovations use a conditional exponential-family prior that depends on auxiliary ut (time, covariates, etc.) and a regime embedding et. Under the usual “enough variation in the conditioners” story, components of ηt are identifiable up to permutation and component-wise affine maps, in line with iVAE-style results. Gaussian innovations are the wrong tool for that identification argument in practice, so implementations lean on non-Gaussian choices (e.g. Laplace).

Dynamics are linear and diagonal: each factor follows its own lags and its own innovation component, without mixing across factors inside the transition. That keeps the per-component structure from the prior from getting scrambled. AR(p) is handled in companion form if you want longer memory without breaking that layout.

Observations yt come from a decoder g(ft) (injective in the theory) plus noise. Training is standard variational: infer innovations, roll dynamics forward, maximize ELBO (reconstruction minus KL to the innovation prior).


What I looked at

I grouped checks into three buckets—same spirit as the paper, but the labels are just how I think about them.

Synthetic factor recovery. When the data are simulated from a dynamic process with known factors, iVDFM often aligned better than classical DFM / deep DFM / plain VAE baselines on correlation-type metrics—but on a static iVAE-style DGP, plain VAE/iVAE were frequently stronger. So the win is not “always best”; it is more plausible when the truth really is dynamic in the way the model assumes.

Factor recovery plot: true vs recovered factors across several components, showing correlation values per factor.
Example factor recovery on synthetic data (true vs recovered factors).

Synthetic interventions. On small structural causal models, do-operator interventions on the learned representation gave impulse responses that stayed in a reasonable band relative to ground truth across a few SCM shapes. That is a sanity check on intervenability, not evidence that the representation is causal in the wild.

Forecasting. On ETT and Weather, probabilistic scores (CRPS, MSE on standardized targets) were in the same neighborhood as several strong baselines (e.g. TimeMixer, TimeXer, DDFM) in the experiments I ran—not a full sweep of every horizon and dataset. The lesson I draw is narrower: identifiability constraints did not automatically destroy forecast quality there; they do not mean you should drop dedicated forecasters.


Caveats and takeaway

If you only need a good number on one benchmark horizon, a forecast-first model is usually simpler to ship. iVDFM is for settings where you also care whether the latent axes are more than a rotating embedding, and where you might later connect latents to regimes or shocks in factor-model language. That only pays off when auxiliaries actually move the innovation prior enough to identify, when non-Gaussian innovations are acceptable, and when the decoder assumptions are not wildly wrong—none of which is automatic.

Bottom line: multivariate series can be modeled with shocks and states in mind, not only with next-row prediction. iVDFM is one variational approach in that direction. In my runs, synthetic checks were useful for understanding behavior, and benchmark forecasting was competitive but not consistently top. Treat the method as problem-dependent, not as a general replacement for simpler forecasters.


References

  1. iVDFM — Chang, M., & Kim, J.-Y. Conditionally Identifiable Latent Representation for Multivariate Time Series with Structural Dynamics.

  2. Dynamic factor models — Stock, J. H., & Watson, M. W. (2002). Macroeconomic Forecasting Using Diffusion Indexes. Journal of Business & Economic Statistics, 20(2), 147–162.

  3. iVAE / identifiable latents — Khemakhem, I., Kingma, D., Monti, R., & Hyvärinen, A. (2020). Variational Autoencoders and Nonlinear ICA: A Unifying Framework. AISTATS.

  4. ICA / non-Gaussianity — Hyvärinen, A., Karhunen, J., & Oja, E. (2001). Independent Component Analysis. Wiley.

  5. Deep dynamic factors — Andreini, P., Izzo, C., & Ricco, G. (2020). Deep Dynamic Factor Models. Working paper.

  6. Causal representation — Schölkopf, B., et al. (2021). Toward Causal Representation Learning. Proceedings of the IEEE, 109(5), 612–634.

Other posts