The feature engineering problem

ML teams re-implement the same feature transformations for training (batch), serving (real-time), and experimentation. The same logic diverges over time causing training-serving skew — the model sees different features at train time vs serve time.

What a feature store solves

  • Single definition of a feature used by both training pipelines and the serving path.
  • Reuse across teams — the orders team's "customer 30-day spend" feature is available to the fraud team.
  • Point-in-time correct training data — retrieves feature values as they existed at training event time, preventing label leakage.

Offline vs online store

  • Offline store — historical feature values, used for training. Backed by a data warehouse or S3.
  • Online store — latest feature values, used for low-latency inference. Backed by Redis or DynamoDB.

Options

  • Open-source: Feast, Hopsworks Community.
  • Managed: Databricks Feature Store, AWS SageMaker Feature Store, Tecton.