[Now Reading] Confidence Estimation in Deep Neural Networks via Density Modelling

Title: Confidence Estimation in Deep Neural Networks via Density Modelling
Authors: Akshayvarun Subramanya, Suraj Srinivas, R.Venkatesh Babu
Link: https://arxiv.org/abs/1707.07013

Quick Summary:
Confidence level via traditional softmax activation function does not produce very good estimates. Given an input, if we increase its values (for instance, x1.3), the confidence of the winner class will consequently increase.

cropped1

The authors propose to estimate the confidence level based on density modelling. Given our inputs [latex]X[/latex] and the pre-softmax result [latex]z[/latex], and since there is one-to-one mapping from [latex]X[/latex] to [latex]z[/latex], we are interested in calculate [latex]P(y_i | X)[/latex] (probability of each class given the input X).

[latex]P(y_i | z) = \frac{P(z | y_i) P(y_i)}{\sum_{j=1}^N P(z | y_i) P(y_i)}[/latex]

[latex]P(y_i)[/latex]: probability of having that class.
[latex]P(X | y_i) = N(z | \mu_i, \sigma_i)[/latex]: the probability is the value [latex]z[/latex] of the normal distribution of [latex]\mu_i, \sigma_i[/latex]. The mean and variance are learn during the training and the density function is generated.

A more graphical and probably easy-to-understand way to see this is to think that we have a vector [latex]z[/latex] of size N (number of classes). To get the confidence level we would normally use a softmax function, but instead, the authors calculate a density model to calculate later how likely is that given a specific [latex]z[/latex], we predict [latex]y_i[/latex].

Juan Miguel Valverde

"The only way to proof that you understand something is by programming it"

Leave a Reply