These are intended mainly as suggestions to help seed some ideas, not a menu that you must select from. Some are very ambitious so you are welcome to consider a scaled-down version, or to take the suggestion in a different direction. The point is to explore an idea or question that interests you and to learn from the experience.

Dendritic nonlinearities

As we discussed in class, a single neuron is a much more powerful device than a Perceptron due to the nonlinear signal integration that occurs within dendritic trees. Demonstrate this by using a biophysically realistic model of a neuron (or a network of such neurons) to perform a pattern classification (or other) task. How many Perceptron models would be needed to obtain similar performance? One advantage of of the Perceptron model is that we can derive a learning rule for changing the weights to perform a task. How would you train a realistic neuron model with dendritic nonlinearities? See papers of Bartlett Mel, and for ideas about learning see his “Clusteron model.”

Visualization of retinal images

As you walk around in starlight, the photon catch rate for rods is roughly 1/500 to 1/50 per second per rod. SL figure 8.2 attempts to show what this would look like over a rod array when viewing a baboon, including the dark light. But this isn’t a very good visualization for a number of reasons: 1) it should be a movie rather than a static image, 2) the brain never gets to see the output of individual rods, but rather the retinal ganglion cells which get their input from rod bipolar cells that integrate over pools over 100 or more rods. Create a biophysically realistic animation that shows a scene as it would appear at several different neural stages — on the rod array, on the rod bipolar cells, and in the spiking activity of retinal ganglion cells. And do this at a variety of light levels - starlight, moonlight, dusk - up to about 1 cd/m$^2$ (which is where rods begin to saturate). For the rod bipolar cells it will be important to include the effect of thresholding synapses (or try with and without to demonstrate its significance), and you can find out more about that from this paper. Such an animation would be of great value in communicating both to the public and neuroscientists just how much of our perception relies upon filling in by the brain.

Silicon retina

Building on the resistive network of challenge problem 10, implement a 2D network as a hexagonal grid and show how it blurs an image. Show the resulting difference image you get from subtracting this from the input image (i.e., the Mahowald/Mead silicon retina). Then making realistic assumptions about capacitance in horizontal cells, simulate the dynamical properties of this system as a function of R and C. From your leaky integrator equation, you should find that $\tau\approx RC/2$ (assuming axial resistance is much less than membrane resistance). When the time constant is small, then the effect of injecting current into any one node will spread quickly to neighboring nodes, and vice-versa when it is large. (Recall that to make a neuron faster, you need to make it larger. Here we can see why: as you increase the diameter D of the cable, R will decrease (more cytoplasm to conduct charge) and C will increase (more membrane). However R decreases proportional to $D^2$ (area) whereas C increases proportional to D (circumference). Thus, $\tau$ falls as 1/D.) Show how a current injected into one (or multiple) node(s) spreads over time for a variety of assumed diameters in the horizontal cell circuit.

Filtering via convolution vs. network recurrence

The silicon retina shows how a simple resistor network can implement a lowpass filter. Essentially the cable is a kind of recurrent neural network that computes the convolution with the kernel $e^{-|r|}$, where r is the radius from the center of the kernel. But what if you wanted to implement convolution with a different kind of kernel, such as the oriented kernels used in convnets? Typically these are computed in an FIR (finite impulse response) fashion, as a weighted sum: $y=W\,x$. But note that if we have a recurrent network such as above then we can compute y via $\tau \dot{y} +y=x+M\,y$ where the matrix M represents a set of linkages (conductances) to neighboring units. This has the equilibrium solution $y=(I-M)^{-1}\,x$. So if we design M appropriately we should be able to implement the desired W. And if we are lucky maybe we can do it just with a few local linkages to neighboring nodes. However one downside here is that such a recurrent network will have a settling time dictated both by $\tau$ and the structure of $M$. So we can also consider a hybrid system, $\tau \dot{y} +y=W\,x+M\,y$, which has equilibrium solution $y=(I-M)^{-1}\,W\,x$. This now provides us with a bit more flexibility. So the goal now is to find a combination of M and W that let's us implement a desired kernel but with the fewest linkages to neighbors (sparse M), inputs (sparse W) and minimum settling time. See if you can find such combinations for the kinds of kernels typically used in convnets, such as oriented filters (in the input layer). This strategy can also be generalized to higher layers. The solution is less easy to visualize, but one could still learn M and W. If implemented as a physical analog circuit rather than a digital simulation it could be dramatically more efficient.

Image compression in the retina

As we discussed in class, the spatial differencing operation performed in the retina is a form of signal compression, since it removes correlations in the image data due to structure in the world (the whitening theory of Atick & Redlich). JPEG performs a similar operation but instead by doing a PCA rotation and allocating bits according to variance in each principal component (within an 8x8 pixel image patch). Both can be justified as a decorrelation strategy and they are based on the same pairwise correlation function of natural images ($1/f^2$ power spectrum). Evaluate and compare these different compression schemes in terms of efficiency (compression ratio), and the conditions under which one of these schemes is preferred over the other.

On- and Off-coding in retina

The Karklin & Simoncelli model demonstrates the conditions under which On- and Off-cells emerge as an optimal coding strategy for retinal images. Their method relies upon a fairly complicated optimization that involves estimating mutual information. See if you can derive a simpler method for obtaining the same results utilizing an auto-encoder model, where the objective is to minimize reconstruction error under a constraint on firing rate (assuming noise in the channel). Going even further, how would you extend this to color, or time?