Notes on: Baylor, D., Koc, L., Koo, C. Y., Lew, L., Mewald, C., Modi, A. N., Polyzotis, N., … (2017): Tfx: a tensorflow-based production-scale machine learning platform

Table of Contents


ML engineer who's using TFX for their models / data

TensorFlow Extended

The focus of TFX is the following:

  • Data analysis
  • Data transformation
  • Data validation
  • Trainer
  • Model Evalutaion and Validation
  • Serving

Data analysis

  • Processes each dataset fed to the system and generates a set of descriptive statistics on the included features

Data transformation

  • Allows feature wrangling for model training and serving
  • E.g. includes generation of feature-to-integer mappings, also known as vocabularies

Data validation

  • Is the data healthy or are there anomalies that need to be flagged to the user?
  • Uses a schema which provides a versioned succinct description of the expected properties of the data

Validation schema

The following are examples of the properties that can be encoded in the schema:

  • Features present in the data
  • Expected type of each feature
  • Expected presence of each feature , in terms of minimum count and fraction of examples that must contain the feature (e.g. ensuring that we have all possible outcomes in our training data)
  • Expected valency of each of the features in each example, i.e. minimum and maximum number of values
  • Expected domain of a feature, i.e. the small universe of values for a string feature, or range for an integer feature



  • Inspired by transfer learning, we simply initialize our parameters using parameters from previously trained models

Model specification API

  • FeatureColumns are a declarative way of defining the input layer of a model
  • Estimator handles training and evaluation

Model evaluation

Decides whether or not a model is good enough to be "served":

  • safe to serve which means obvious requirements, e.g. model should not crash or cause errors, shouldn't use more resources than allocated, uses the same version.
  • prediction quality which simply validates that the predictions are deemed "good enough"


  • Evaluate prediction quality by comparing the model quality against a fixed threshold as well as against a baseline model (e.g. the current production model)
  • Models failing the above will not be served and the user will be notified

TensorFlow Serving

Multitenancy with Isolation

  • Enabling a single instance of the server to serve multiple machine-learned models concurrently