What is the difference between Ridge Regression, the LASSO, and ElasticNet?

tldr: “Ridge” is a fancy name for L2-regularization, “LASSO” means L1-regularization, “ElasticNet” is a ratio of L1 and L2 regularization. If still confused keep reading…

Logistic Regression

h(x|theta) = sigmoid(x dot theta + b)

loss(theta)= ∑ y*log(h(x|theta)) + (1−y)log(1−h(x|theta))

loss(theta)= ∑ (y - h(x|theta))

Regularization

loss(theta)= ∑ y*log(h(x|theta)) + (1−y)log(1−h(x|theta))

loss(theta) = basic_loss(theta) + k * magnitude(theta)

Norms

loss(theta) = basic_loss(theta) + k * L1(theta)

loss(theta) = basic_loss(theta) + k * L2(theta)

loss(theta) = basic_loss(theta) + k(j*L1(theta) + (1-j)L2(theta))

On the Naming of Algorithms

There should be one - and preferably only one - obvious way to do it

Comparing regularization techniques — Intuition

Comparing regularization techniques — In Practice

L2-regularized Logistic Regression
L1-regularized Logistic Regression

Conclusion

conscious mammalian organism, fanatical tea snob.