What is Retail Forecasting?

Retail forecasting is a difficult problem that has both the elements of scale (store/SKU counts that are often in the 10’s of millions) as well as the complexity of the forecast itself. For example, is the forecast determining In-Season or Pre-Season allocations and buys? What are the expected size curves, seasonal patterns and holiday lifts? Or further, what other attributes (options, colors, styles, pricing) need to be reconciled across company departments?

The result is complex data pipelines that bring together large data sets into complex machine learning models thereby creating a challenge to orchestrate and control the overall forecasting process.

What is Kubeflow Pipelines?

The Kubeflow Pipelines platform consists of:

  • A user interface (UI) for managing and tracking experiments, jobs, and runs.
  • An engine for scheduling multi-step Machine Learning (ML) workflows, otherwise known as “pipelines”.
  • A software development kit (SDK) for defining and manipulating pipelines and components.
  • Notebooks for interacting with the system using the SDK.

The following are the goals of Kubeflow Pipelines:

  • End-to-end orchestration: enabling and simplifying the orchestration of machine learning pipelines.
  • Easy experimentation: making it easy for you to try numerous ideas and techniques and manage your various trials/experiments.
  • Easy re-use: enabling you to re-use components and pipelines to quickly create end-to-end solutions without having to rebuild each time.

Kubeflow execution graph UI

The runtime execution graph of a pipeline pictured in the Kubeflow UI

Why KubeFlow makes sense for retail forecasting?

The complexity of retail forecasting ML models and their ML Pipelines is demonstrated by their multi-facets—there are pre-season models, in-season models, size curve decomposition, Rate of Sale (ROS) calculations—and by the fact that all of this data processing needs to feed the allocation, replenishment and other executional systems downstream. A retail forecasting pipeline might forecast in-season and pre-season separately, then combine the forecasts to consider the halo & cannibalization effects across the assortment. This then could feed into a seasonal and size decomposition of the forecast. Kubeflow provides an easy and effective means to deploy, orchestrate and monitor these complex pipelines as production systems.

Second, forecasting experimentation needs to allow for department / class level model tuning, monitoring and improvement. With Kubeflow, most components can be re-used but there can be a wide variety in specifics by class. For example, attributes of tops vs. bottoms can drive different ML models; hard goods may not require size decomposition vs. softgoods forecasting at the Style/Color level and decomposed by size curve.

The end result is

  1. A well-governed production system that allows us to scale the ML forecasting to retail sized problems.
  2. The ability to leverage common components across multiple pipelines / departments while still having the ability to fine-tune specific departments as needed.
  3. When used with the Google Kubernetes Engine, the ability to monitor the production system to ensure a high SLA for downstream systems consuming the forecast.


Parth Mishra, Cloud Engineer

Chris Houck, Partner