No matter with how large or small the input, answer always between 0 and 1 → ideal for presenting probabilities
$$ \sigma(x) = \frac{1}{1 + e^{-x}} $$
#Implement:
import numpy as np
def sigmoid(x):
"""
Vectorized sigmoid function.
"""
return 1 / (1 + np.exp(- np.asarray(x, dtype = float)))
pass
$$ \frac{d\sigma}{dx} = \sigma(x) \cdot (1 - \sigma(x)) $$

Model:
$$ p = \sigma(Xw + b) = \frac{1}{1 + e ^ {(- Xw + b)}} $$
Loss Function (Binary Cross-Entropy):
$$ \mathcal{L} = -\frac{1}{N} \sum_{i=1}^{N} [y_i \log p_i + (1 - y_i) \log(1 - p_i)] $$
Gradient:
$$ \frac{\partial \mathcal{L}}{\partial w}= \nabla _w = \frac{1}{N} X ^ T (p - y) \\ \frac{\partial{\mathcal{L}}}{\partial b} = \nabla _b = mean(p - y) $$
Implement:
import numpy as np
def _sigmoid(z):
"""Numerically stable sigmoid implementation."""
return np.where(z >= 0, 1/(1+np.exp(-z)), np.exp(z)/(1+np.exp(z)))
def train_logistic_regression(X, y, lr=0.1, steps=1000):
"""
Train logistic regression via gradient descent.
Return (w, b).
"""
n_sample, n_feature = X.shape
w = np.zeros(n_feature)
b = 0.0
for _ in range(steps):
z = X @ w + b
p = _sigmoid(z)
error = p - y
dw = (X.T @ error) / n_sample # gradient w
db = np.sum(error) / n_sample # gradient b
w -= lr * dw
b -= lr * db
return (w, b)
pass
Topic: NLP, Data Processing
Sequence padding transforms variable-length sequences into fixed-length ones by adding special padding tokens. In NLP, sentences have differences