Problem 1. Sigmoid Function

No matter with how large or small the input, answer always between 0 and 1 → ideal for presenting probabilities

$$ \sigma(x) = \frac{1}{1 + e^{-x}} $$

#Implement:
import numpy as np
def sigmoid(x):
    """
    Vectorized sigmoid function.
    """
    return 1 / (1 + np.exp(- np.asarray(x, dtype = float)))
    pass

Derivative of Sigmoid Function

$$ \frac{d\sigma}{dx} = \sigma(x) \cdot (1 - \sigma(x)) $$

Problem 2: Logistic Regression Training Loop

Model:

$$ p = \sigma(Xw + b) = \frac{1}{1 + e ^ {(- Xw + b)}} $$

Loss Function (Binary Cross-Entropy):

$$ \mathcal{L} = -\frac{1}{N} \sum_{i=1}^{N} [y_i \log p_i + (1 - y_i) \log(1 - p_i)] $$

Gradient:

$$ \frac{\partial \mathcal{L}}{\partial w}= \nabla _w = \frac{1}{N} X ^ T (p - y) \\ \frac{\partial{\mathcal{L}}}{\partial b} = \nabla _b = mean(p - y) $$

Implement:

import numpy as np

def _sigmoid(z):
    """Numerically stable sigmoid implementation."""
    return np.where(z >= 0, 1/(1+np.exp(-z)), np.exp(z)/(1+np.exp(z)))

def train_logistic_regression(X, y, lr=0.1, steps=1000):
    """
    Train logistic regression via gradient descent.
    Return (w, b).
    """
    n_sample, n_feature = X.shape
    w = np.zeros(n_feature)
    b = 0.0

    for _ in range(steps):
        z = X @ w + b
        p = _sigmoid(z)

        error = p - y

        dw = (X.T @ error) / n_sample # gradient w
        db = np.sum(error) / n_sample # gradient b

        w -= lr * dw
        b -= lr * db
    return (w, b)
    pass

Problem 3: Pad Sequences

Topic: NLP, Data Processing

Theory

Sequence padding transforms variable-length sequences into fixed-length ones by adding special padding tokens. In NLP, sentences have differences