All the courses
Course’s information
HxWxC → too large → too many params → need conv
edge detection by a filter (3x3 for example) (or kernel) → multiply with original image by a convulution operator (*) → ‣
Sobel filter, Scharr filter (not only 0, 1, -1)
Can use backprop to learn the value of filter.
Not only edge, we can learn by degree of images.
Padding
→ we can pad the images (mở rộng ra thêm all around the images): 6x6 → (pad 1) → 8x8
Stride convulutions
floor()
Conv over volumes (not just on 2d images) → ex on RGB images (3 channels)
6x6x3 * 3x3x3 (3 layers of filers) → 4x4x1
We multiply each layer together and then sum up all 3 layers → give only 1 number on the resulting matrix → that's why we only have 4x4x1
if we wanna detect verticle edge only on the red channel → 1st layer (in 3x3x3) can detect it, the other 2 is 0s.
multiple filters at the same time? → 1st filter (verticle), 2nd (horizontal) → 4x4x2 (2 here is 2 filters)
we can use 100+ fileters → $n_c'=100+$
1 layer of ConvNet
if 10 filters (3x3x3) → how many parameters?
→ no matter what size of image, we only have 280 params with 10 filters (3x3x3)
Notations:
The number of filters used will be the number of channels in the output
SImple example of ConvNet
Type of layer in ConvNet
Pooling layer
Purpose?
Max pooling → take the max of each region
Average pooling → like max pool, we take average
→ max is used much more than avg
LeNet-5 (NN example) ← inspire by it, introduced by Yan LeCun
Why convolution?
Case studies of effective ConvNets
Classic networks
LeNet-5 → focus on section II, III in the article, they're interesting! → use sigmoid/tanh
AlexNet → SImilarity to LeNet but much bigger! → use ReLU
Multiple CPUs
Local Response Normalization (in practice, we don't really use)
Article is easier to look at (and understand)
VGG-16 → much deeper but use much simplier filters → really simplify NN structure
16 → 16 layers that have weights → really large (~138M params)
H,W are reduced each step (a half)
#filters are double on every step
There is also VGG-19 (bigger version)
ResNets (Residual Networks) → researchers were able to build deep neural nets with higher number of layers
Từ VGG đã cho thấy NN càng deep càng tốt, nhưng có thật như vậy ko? → không! (như bài báo resnet), training err và test err đều tăng → ko phải overfitting mà là do "vanishing gradient problem"
vanishing gradient problem: khi training on very deep NN → backprop và gradient dễ bị sai lệch và tính toán về 0 → weights ko được update tới status mới và đúng nhất → no learning performs → bad!
ResNets sinh ra để giải quyết cái vanishing gradient này!
Nếu 1 mạng đủ tốt thì ít ra nó phải đoán ra được hàm identity (f(x)=x)
Nếu 1 mạng NN dự đoán hàm g → thì mạng đo + id (layers) cũng sẽ dự đoán được hàm g. ("được" = same accuracy) → bởi vì những layers sau đơn giản chỉ là learn id function mà thôi.
Tuy nhiên, mạng dưới ko học được id, why?
Thay vì X → DNN → (id) → X, tại sao chúng ta ko X → X trước rồi sau đó tích hợp X → DNN → X (những gì chúng ta muốn learn)?