Hyperparameter Optimization


Hyperparameter optimization (sometimes called hyperparameter search, sweep, or tuning) is a technique to fine-tune a model to improve its final accuracy.

Common hyperparameters include the number of hidden layers, learning rate, activation function, and number of epochs. There are various methods for searching the various permutations for the best possible outcome. Examples include grid search, random search, and Bayesian methods.

What is a Hyperparameter?

A model hyperparameter is a configuration that is external to the model whose value cannot be estimated from data. Since it is not possible to know the best value for a model hyperparameter on a given problem, the hyperparameter optimization process needs to iterate through the various possible permutations. We may use rules of thumb, copy values used on other problems, or search for the best value by trial and error.

Hyperparameter Tuning with Hyperopt‚Äč

Hyperopt is a method for searching through a hyperparameter space. For example, it can use the Tree-structured Parzen Estimator (TPE) algorithm, which intelligently explores the search space while narrowing down to the best estimated parameters.

It is thus a good method for meta-optimizing a neural network. Whereas a neural network is an optimization problem that is tuned using gradient descent methods, hyperparameters cannot be tuned using gradient descent methods.

That's where Hyperopt shines -- it's useful not only for tuning hyperparameters like learning rate, but also for tuning more sophisticated parameters in a flexible way. Hyperopt can change the number of layers of different types, the number of neurons in one layer or another, or even the type of layer to use at a certain place in the network given an array of choices -- each of which may have nested, tunable hyperparameters.

It is more efficient to randomly search through values and intelligently narrow the search space, rather than looping on fixed sets of hyperparameter values. This kind of Oriented Random Search is Hyperopt's strength, as opposed to a simpler Grid Search where hyperparameters are pre-established with fixed-step increases. Random Search for Hyperparameter Optimization has proven to be such an effective search technique that it's no surprise that the paper detailing this technique is among the most cited of all deep learning papers.

If you want to learn more about Hyperopt, you'll probably want to watch the video below, made by the creator of Hyperopt:

Hyperparameter Optimization + Gradient

Gradient offers powerful hyperparameter tuning out of the box, something very difficult to implement on your own. At a bare minimum, you need a mechanism to orchestrate serial/parallel training runs, a central data repository to sync results, and have some way of measuring and exploring the output. Gradient uses TensorBoard for model comparison and Hyperopt on the backend.