Naive Bayes¶

$$P(Y|X_{1}, ..., X_{n}) = \frac{P(X_{1}, ..., X_{n}|Y)P(Y)}{P(X_{1}, ..., X_{n})}$$

Likelihood, $P(X_{1}, ..., X_{n}|Y)$; Prior, $P(Y)$
Naïve Bayes Assumption: Assume that all features are independent given the class label Y $$P(X_{1}, ..., X_{n}|Y) = \Pi_{i=1}^{n}P(X_{i}|Y)$$
Pros
- Reduce complexity from $O(a^n)$ to $O(n)$
- Require a small amount of training data to estimate the necessary parameters
- Alleviate problems stemming from the curse of dimensionality
- In spite of their apparently over-simplified assumptions, naive Bayes classifiers have worked quite well in many real-world situations, famously document classification and spam filtering
Cons
- Known to be a bad estimator, so the probability outputs are not to be taken too seriously

Gaussian Naive Bayes¶

Used in cases when all our features are continuous

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)

gnb = GaussianNB()
gnb.fit(X_train, y_train)

y_pred = gnb.predict(X_test)

from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, y_pred)

array([[21,  0,  0],
       [ 0, 30,  0],
       [ 0,  4, 20]])

Multinomial Naive Bayes (MNB)¶

It is used when we have discrete features, such as, movie ratings ranging 1 and 5 as each rating will have certain frequency to represent

import numpy as np
rng = np.random.RandomState(1)
# create 6 samples, each has 100 features
# each element value is the count which is from 0 to 99
X = rng.randint(100, size=(6, 100)) # m = 6, n = 100
y = np.array([1, 0, 1, 0, 1, 0]) # m = 6

from sklearn.naive_bayes import MultinomialNB

model = MultinomialNB()
model.fit(X, y)

model.predict(X[2:3])

array([1])

Complement Naive Bayes (CNB)¶

CNB is an adaptation of the standard multinomial naive Bayes (MNB) algorithm that is particularly suited for imbalanced data sets
CNB regularly outperforms MNB on text classification tasks

from sklearn.naive_bayes import ComplementNB

model = ComplementNB()
model.fit(X, y)

model.predict(X[2:3])

array([1])

Bernoulli Naive Bayes¶

May be multiple features but each one is assumed to be a binary-valued
If handed any other kind of data, a BernoulliNB instance may binarize its input
Might perform better on some datasets, especially those with shorter documents

# create 6 samples, each has 100 features
# each element value is the count which is from 0 to 99
# assumes that all our features are binary such that they take only two values. 
# Means 0s can represent “word does not occur in the document” and 1s as "word occurs in the document"
X = rng.randint(2, size=(6, 100)) # m = 6, n = 100
y = np.array([1, 0, 1, 0, 1, 0]) # m = 6

from sklearn.naive_bayes import BernoulliNB
model = BernoulliNB()
model.fit(X, y)

model.predict(X[2:3])

array([1])

Categorical Naive Bayes¶

Assumes that each feature has its own categorical distribution

X = rng.randint(2, size=(6, 100)) # m = 6, n = 100, each feature has two categories
y = np.array([1, 0, 1, 0, 1, 0]) # m = 6

from sklearn.naive_bayes import CategoricalNB
model = CategoricalNB()
model.fit(X, y)

model.predict(X[2:3])

array([1])