How do I prepare for BITS Pilani CS

Classification - Naive Bayes classifier

0 ratings0% found this document useful (0 people voted)
0 views9 pages

Original title

copyright

Share this document

Share or embed the document

Do you think this document is useful?

Dr Aruna Malapati
Asst professor
Dept of CS & IT
BITS Pilani, Hyderabad Campus
First semester 2011-12 CS C415 DATA MINING
BITS Pilani - Hyderabad campus
Today’s agenda

 Bayes classifier
 example

First semester 2011-12 BITS Pilani - CS C415 DATA MINING


Hyderabad campus
"Bayesian classifiers are statistical classifiers based on Bayes"
theorem.

First semester 2011-12 CS C415 DATA MINING


BITS Pilani - Hyderabad campus
X is
considered
"Evidence"
 Let X = {x1, x2,. . . , xn} be a sample whose components
represent values ​​made on a set of n attributes.
 Let H be some hypothesis, such as that the data X belongs to a
specific class C.
 For classification problems, our goal is to determine P (H | X), the
probability that the hypothesis H holds given the "evidence",
(i.e. the observed data sample X).
 P (H | X) is the a posteriori probability of H conditioned on X.

First semester 2011-12 CS C415 DATA MINING


BITS Pilani - Hyderabad campus
 Suppose that H is the hypothesis that our customer will buy a
computer.
 Then P (H | X) is the probability that customer X will buy a
computer given that we know the customer’s age and income.
 In contrast, P (H) is the a priori probability of H.

First semester 2011-12 CS C415 DATA MINING


BITS Pilani - Hyderabad campus
X = (age = youth, income = medium, student = yes, credit = fair)
 We need to maximize P (X | Ci) P (Ci), for i = 1, 2. P (Ci), the a priori
probability of each class, can be estimated based on the training
samples:
 P (buy = yes) = 9/14
 P (buy = no) = 5/14
 To compute P (X | C), for i = 1, 2, we compute the following conditional
i

probabilities:
 P (age = youth | buy = yes) = 2/9
 P (age = youth | buy = no) = 3/5
 P (income = medium | buy = yes) = 4/9
 P (income = medium | buy = no) = 2/5
 P (student = yes | buy = yes) = 6/9
 P (student = yes | buy = no) = 1/5
 P (credit = fair | buy = yes) = 6/9
 P (credit = fair | buy = no) = 2/5
First semester 2011-12 CS C415 DATA MINING
BITS Pilani - Hyderabad campus
 Using the above probabilities, we obtain
 P (X | buy = yes) = P (age = youth | buy = yes)
= P (income = medium | buy = yes)
= P (student = yes | buy = yes)
= P (credit = fair | buy = yes)
= 2/9*4/9*6/9*6/9
= 0.044.
First semester 2011-12 CS C415 DATA MINING
BITS Pilani - Hyderabad campus
 similar,
 P (X | buy = no) = 3/5 * 2/5 * 1/5 * 2/5
= 0.019
 To find the class that maximizes P (X | Ci) P (Ci), we compute
 P (X | buy = yes) P (buy = yes) = 0.028
 P (X | buy = no) P (buy = no) = 0.007
 Thus the naive Bayesian classifier predicts buy = yes for sample X.

First semester 2011-12 CS C415 DATA MINING


BITS Pilani - Hyderabad campus