Naive Bayes

$$P(Y|X_{1}, ..., X_{n}) = \frac{P(X_{1}, ..., X_{n}|Y)P(Y)}{P(X_{1}, ..., X_{n})}$$
  • Likelihood, $P(X_{1}, ..., X_{n}|Y)$; Prior, $P(Y)$
  • Naïve Bayes Assumption: Assume that all features are independent given the class label Y $$P(X_{1}, ..., X_{n}|Y) = \Pi_{i=1}^{n}P(X_{i}|Y)$$
  • Pros
    • Reduce complexity from $O(a^n)$ to $O(n)$
    • Require a small amount of training data to estimate the necessary parameters
    • Alleviate problems stemming from the curse of dimensionality
    • In spite of their apparently over-simplified assumptions, naive Bayes classifiers have worked quite well in many real-world situations, famously document classification and spam filtering
  • Cons
    • Known to be a bad estimator, so the probability outputs are not to be taken too seriously

Gaussian Naive Bayes

  • Used in cases when all our features are continuous
In [3]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
In [5]:
gnb = GaussianNB()
gnb.fit(X_train, y_train)

y_pred = gnb.predict(X_test)
In [7]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, y_pred)
Out[7]:
array([[21,  0,  0],
       [ 0, 30,  0],
       [ 0,  4, 20]])

Multinomial Naive Bayes (MNB)

  • It is used when we have discrete features, such as, movie ratings ranging 1 and 5 as each rating will have certain frequency to represent
In [55]:
import numpy as np
rng = np.random.RandomState(1)
# create 6 samples, each has 100 features
# each element value is the count which is from 0 to 99
X = rng.randint(100, size=(6, 100)) # m = 6, n = 100
y = np.array([1, 0, 1, 0, 1, 0]) # m = 6
In [56]:
from sklearn.naive_bayes import MultinomialNB

model = MultinomialNB()
model.fit(X, y)

model.predict(X[2:3])
Out[56]:
array([1])

Complement Naive Bayes (CNB)

  • CNB is an adaptation of the standard multinomial naive Bayes (MNB) algorithm that is particularly suited for imbalanced data sets
  • CNB regularly outperforms MNB on text classification tasks
In [57]:
from sklearn.naive_bayes import ComplementNB

model = ComplementNB()
model.fit(X, y)

model.predict(X[2:3])
Out[57]:
array([1])

Bernoulli Naive Bayes

  • May be multiple features but each one is assumed to be a binary-valued
  • If handed any other kind of data, a BernoulliNB instance may binarize its input
  • Might perform better on some datasets, especially those with shorter documents
In [66]:
# create 6 samples, each has 100 features
# each element value is the count which is from 0 to 99
# assumes that all our features are binary such that they take only two values. 
# Means 0s can represent “word does not occur in the document” and 1s as "word occurs in the document"
X = rng.randint(2, size=(6, 100)) # m = 6, n = 100
y = np.array([1, 0, 1, 0, 1, 0]) # m = 6
In [64]:
from sklearn.naive_bayes import BernoulliNB
model = BernoulliNB()
model.fit(X, y)

model.predict(X[2:3])
Out[64]:
array([1])

Categorical Naive Bayes

  • Assumes that each feature has its own categorical distribution
In [67]:
X = rng.randint(2, size=(6, 100)) # m = 6, n = 100, each feature has two categories
y = np.array([1, 0, 1, 0, 1, 0]) # m = 6
In [68]:
from sklearn.naive_bayes import CategoricalNB
model = CategoricalNB()
model.fit(X, y)

model.predict(X[2:3])
Out[68]:
array([1])