Machine learning

Adapting language model architectures for time series forecasting

Tokenizing time series data and treating it like a language enables a model whose zero-shot performance matches or exceeds that of purpose-built models.

March 18, 2024

Time series forecasting is essential for decision making across industries such as retail, energy, finance, and health care. However, developing accurate machine-learning-based forecasting models has traditionally required substantial dataset-specific tuning and model customization.

TrendRec probabilistic graphic model.png

Time series forecasting enables up-to-the-minute trend recognition, while novel two-step training process improves forecast accuracy.

In a paper we have just posted to arXiv, we present Chronos, a family of pretrained time series models based on language model architectures. Like large language models or vision-language models, Chronos is a foundation model, which learns from large datasets how to produce general representations useful for a wide range of tasks.

The key insight behind Chronos is treating time series data as a language to be modeled by off-the-shelf transformer architectures. To tokenize real-valued time series observations into a fixed vocabulary, we scale the time series by its absolute mean and then quantize the scaled time series into a fixed number of uniformly spaced bins.

In addition to these bin tokens, we add two special tokens, PAD and EOS, to denote padding/missing values and end-of-sequence, respectively. We can then train standard language models like T5 on such a "language of time series" using the conventional cross-entropy loss function, with no changes to the model architecture itself.

A set of three flow charts, all progressing from top to bottom. Panel 1 is labeled "time series tokenization": We begin with regular squiggles, representing a signal; their amplitude is reduced, indicating mean scaling, and squares are superimposed on parts of the signal to represent quantized values; the quantized values are assigned to evenly spaced bins on a number line, then converted to fixed-value "context tokens". Panel 2 is labeled "training": The context tokens from panel 1 enter a time series language model, which outputs predicted probabilities, represented as histograms on the same number-line bins; finally, cross-entropy loss leads to an output token (in this case, "2350"). Panel 3 is labeled "inference": the same context tokens enter the same time series language model as in panel 2, but the outputs are sampled tokens, which are dequantized and unscaled to produce the squiggles of a signal. — High-level depiction of Chronos. *Left:* Input time series is scaled and quantized to obtain a sequence of tokens. *Center:* The tokens are fed into a language model, which is trained using the cross-entropy loss. *Right:* During inference, tokens are sampled autoregressively from the model and mapped back to numerical values.

Despite its simplicity, Chronos is remarkably accurate. In a comprehensive evaluation involving 42 datasets, Chronos significantly outperformed classical statistical methods, as well as specialized deep-learning models, on data held out from its training sets. More important, on entirely new datasets, Chronos’s zero-shot performance was comparable and occasionally superior to that of models trained directly on those datasets.

A core strength of Chronos is its ability to leverage diverse time series data from different domains to improve generalization. To enhance the model’s robustness, we augmented the public data sources used for pretraining with randomly mixed-in real samples (TSMix) and with a synthetically generated dataset based on Gaussian processes (KernelSynth).

Adapting language model architectures for time series forecasting

Tokenizing time series data and treating it like a language enables a model whose zero-shot performance matches or exceeds that of purpose-built models.

Related content

Work with us