Project Blog

Stability vs. Discriminability

14.12.2018

Certain deep learning architectures can be shown to possess desirable stability and invariance properties. Discriminability, however, is a different story entirely.

Neural networks that perform classification tasks are desired to exhibit certain stability and invariance properties. To give an example: A neural network capable of recognizing cars in images can be expected to be robust with respect to transformations such as translations, rotations, or slight deformations. A car should be correctly recognized independent of its exact location in the image and independent of the angle from which the image was taken. Furthermore, the characteristic shapes and structures of a car slightly differ from model to model and any machine learning model designed for the detection of cars should not be sensitive to a specific brand or type of car.

A first mathematical investigation of invariance and stability properties of deep learning architectures was conducted by Mallat and Bruna with respect to a special case of CNNs, so-called scattering networks [1,2]. This work has been highly influential in the past years and opened a completely new path towards the understanding of deep learning architectures.

The first layer of a scattering network consists of convolutions with all elements from a fixed set of dilated and rotated complex-valued wavelets, where each convolution is followed by a point-wise application of the complex modulus function. At each succeeding layer, all outputs from the previous layer are again convolved with the same set of complex wavelets, and the complex modulus is again used as a non-linearity. A scattering network is thus a special case of convolutional neural network where each node is associated with a cascade of convolutions and point-wise applications of the complex modulus (cf. Figure 1). The complete set of features yielded by a scattering transform is obtained by convolving the output from each node at each layer with a low-pass filter and concatenating all results.

It was shown that the features computed by a single layer of a scattering network are perfectly invariant to translations when considering an infinite number of dilated complex wavelets [1] . Furthermore, it was shown that scattering networks are Lipschitz-stable with respect to diffeomorphic actions on compactly supported functions in \(L^2(\mathbb{R}^{d})\) in the sense that changes in the features are bounded above by the magnitude of the diffeomorphism. It should be noted, however, that these results on translation invariance and deformation stability only hold for scattering networks that use a specific class of wavelets which fulfill a certain technical admissibility condition.

Scattering architectures that are based on Gabor frames instead of systems of complex wavelets were subsequently analyzed by Czaja and Li [3]. It was shown that the features obtained by such a scattering network are also stable with respect to translations and diffeomorphic actions when considering band-limited functions. The respective proofs are largely based on the fact that the band-limited elements of the considered Gabor frames have uniform support sizes in the frequency domain, which is not the case for differently dilated wavelets. It was furthermore shown that for the considered scattering architectures, the energy of a signal decays exponentially with respect to depth [3].

A further generalization of Mallat's results was given by Wiatowski and Bölcskei, who considered the application of a wide range of frames in \(L^2(\mathbb{R}^{d})\) and various types of non-linearities in scattering-type architectures [4]. In particular, they showed that translation invariance and deformation stability in the case of band-limited functions are properties that hold for many types of structured (e.g. Gabor frames, curvelets, shearlets, ridgelets, wavelets, etc.), random, and learned convolution kernels when considering Lipschitz continuous non-linearities (e.g. ReLU, sigmoidal functions, hyperbolic tangents, modulus, etc.). This indicates that stability with respect to translations and small deformations in scattering networks is not tied to a specific combination of filters and non-linearities but rather a property of the network structure per se. In contrast to the work of Mallat, the result on translation invariance shown in [4] is of a vertical nature in the sense that it is not obtained by expanding the width of the scattering network but by increasing its depth. Another interesting aspect of the work of Wiatowski and Bölcskei is that the upper bound in the statement on deformation stability depends on the bandwidth of the input signal in the sense that a small bandwidth, which indicates a high regularity, implies more stability. Similar deformation stability bounds for a specific class of functions that are not band-limited, so-called cartoon-like functions, were established later [5].

However, invariance properties are only one of several aspects that determine the suitability of a neural network architecture for a specific classification task. A second equally important property is that the mapping defined by a neural network should not contract but expand the distance between signals that belong to distinct classes. In particular, this should also hold for signal that lie relatively close to each other in the input space.

The goal of our project is to rigorously investigate how different properties of a deep learning architecture influence its discriminatory power. In particular, such an analysis should contribute to a better understanding of the significance of depth in deep learning architectures. Considering the already established results on the connection between depth and the expressibility of a neural network architecture, we assume that similar statements hold with respect to discriminatory properties. To be specific, we suspect that for most classification tasks, increasing the depth of a network architecture is a highly efficient way of facilitating the realization of a neural network that can reliably discriminate between signals from two different classes, even if they are almost indistinguishable in the input space. Second, we view such an investigation complementary to the analysis of invariance and stability properties initiated by Mallat et al. We hope that by eventually showing that a specific deep learning architecture possesses both desirable invariance and discriminatory properties, we can make a substantial contribution towards a holistic understanding of deep learning approaches.

 

[1] Stéphane Mallat. Group invariant scattering. Communications on Pure and Applied Mathematics, 65(10):1331–1398, 2012.

[2] Joan Bruna and Stéphane Mallat. Invariant scattering convolution networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1872–1886, 2013.

[3] Wojciech Czaja and Weilin Li. Analysis of time-frequency scattering transforms. Applied and Computational Harmonic Analysis, 2017.

[4] Thomas Wiatowski and Helmut Bölcskei. A mathematical theory of deep convolutional neural networks for feature extraction. arXiv preprint arXiv:1512.06293, 2015.

[5] Philipp Grohs, Thomas Wiatowski, and Helmut Bölcskei. Deep convolutional neural networks on cartoon functions. In Information Theory (ISIT), 2016 IEEE International Symposium on, pages 1163–1167. IEEE, 2016.

Figure 1. The tree structure defined by the first three layers of a scattering-type neural network.