Seyeon An June 24, 2021

<aside> 🔗 We have reposted this blog on our Medium publication. Read this on Medium.

</aside>

A heuristic, or a heuristic technique, is any approach to problem solving that uses a practical method or various shortcuts in order to produce solutions that may not be optimal but are sufficient given a limited timeframe or deadline.

Humans normally rely on heuristics to solve cognitive tasks. For instance, to identify the image of a kangaroo, we perform a set of checks— does it have round ears? does it have a pouch? does it have four legs?— without even noticing. As a friendly reminder, the goal of deep learning is to train Deep Neural Networks (DNNs) to perform tasks like humans. In this case, the key capacity for performance would be the ability to yield numerical representations of data suitable for solving a set of tasks. Representation learning aims to provide such capacities to DNNs.

Then, what can we do about unsuitable information in data, which hugely manipulates our DNNs? Our work starts from this question, and provides an answer for the treatment of such irrelevant information. Our method jointly learns features and feature-wise drop probability for discretely discarding irrelevant information via the Information Bottleneck (IB) framework, called Drop-Bottleneck.

A Quick Overview of the Drop-Bottleneck : We propose a novel information bottleneck (IB) method named Drop-Bottleneck, which discretely drops features that are irrelevant to the target variable. Drop-Bottleneck jointly trains a feature extractor and performs feature selection, dropping out irrelevant information and keeping the essential ones.

Information Bottleneck (IB) Framework

We have introduced a new IB method through our work. In this section, we will explain the basic concepts of the IB framework, and show what prior IB methods have lacked.

Information Bottleneck (IB) Framework

The information bottleneck framework formalizes a problem of obtaining $X$'s compressed representation $Z$, still preserving information about $Y$ by deriving prediction and compression terms.

$$ \operatorname{minimize}-\underbrace{I(Z ; Y)}{\mathclap{\text {prediction }}}+\overbrace{\beta}^{\mathclap{\text {Lagrangian multiplier }}} \underbrace{I(Z ; X)}{\mathclap{\text {compression }}} $$

The prediction term encourages preserving task-relevant information.
The compression term penalizes $Z$ for containing information of $X$.

Motivation

Our motivation is to develop an IB method that can be used for inference tasks, (i) without stochasticity and (ii) with improved efficiency as the result of compression, which are the properties that prior IB methods, such as VIB, lack.

Drop-Bottleneck (DB) and Its Objective

Drop-Bottleneck (DB)

We approach the problem by proposing Drop-Bottleneck (DB), an IB method ****which discretely drops irrelevant features with joint feature learning.

We define the $i$th feature of $Z$, $Z_i$ as

$$ \begin{aligned}Z_{i}=& b \cdot \operatorname{Bernoulli}\left(1-p_{i}\right) \cdot X_{i} \\& \text { for } b=\frac{d}{d-\sum_{k} p_{k}}\end{aligned} $$