Train, Evaluate, and Deploy

Once you deploy the first version of your product is when you really start working on it since you will start understanding better your users and their real needs.

End-to-end workflow for training, evaluation and deploying models.

Firstly, it is crucial to understand that your data will keep growing over time, and the sooner you can feed it back to your system, the better your model will generalize to your problem domain.

Once you move on to train models, try to keep your system simple:

  • Start with light models, small architectures so they can be easily debugged, e.g. MobileNet for visual classification tasks.

  • Take a small sample of your dataset that you are certain that is properly curated.

  • Try approaches that have worked for others, e.g. for convolutional networks, use ReLU activation functions; or for LSTM models, TanH activation.

  • Normalize your data before feeding it to the model.

  • For class imbalance problems, start with a balanced subset, so so you can skip weighting your loss function.

  • Keep the number of classes fixed, so you can guarantee a fair comparison between models.


Before evaluating your model, it is highly recommended debugging your system to make sure that the model is not overfitting to your training data, making sure your experiments are reproducible (using seed values can help the debugging process), compare similar architectures trained on the same data with small variations to the hyper-parameter values, or take third-party models from frameworks like Keras that they most of the time guarantee correct functionality. Unit tests are strongly suggested to be included in your code pipeline.

In order to guarantee a fair evaluation, validation techniques like cross-validation are commonly used. A recommended practice is to benchmark your solution to others like challenges or public test sets.

In order to improve the performance of baseline solutions, more complex architectures can be proposed to overcome under-fitting, or data augmentation and/or regularization techniques can help with overfitting problems. Playing with a range of different hyperparameter values can also benefit the training process. Some frameworks like Keras provide features like Keras Tuner to enable the search of the best hyperparameter values.


Once you are satisfied with the performance of your model, your prediction system is ready to be deployed into the serving system. ML engineers usually work with a combination of Docker for local development, shipping code, and deploying operations, and Kubernetes to automate container orchestration systems. The final solution can be presented as a web interface or as a REST API service, applying inference using CPU or GPU depending on the computational demand.