# MLT Unit 2 Part 1 Regression & Bayesian Learning

Que2.7. Explain how the decision error for Bayesian bracket can be minimized. Answer 1. Bayesian classifier can be made optimal by minimizing the bracket error probability. 2. InFig.2.7.1, it’s observed that when the threshold is moved down from x0, the corresponding shadowed area under the angles always increases. 3. Hence, we’ve to drop this shadowed area to minimize the error. 4. Let R1 be the region of the point space for 1 and R2 be the corresponding region for 2. 5. also an error will be passed if, x R1 although it belongs to 2 or if x R2 although it belongs to 1 i.e., Pe = p( x R2, 1) p( x R1, 2).(2.7.1) 6. Pe can be written as, Pe = p( x R2| 1) p( 1) p( x R1| 2) p( 2) = P( 1) 1 2 2 2 1 |)()(|) R R p x dx p p x dx.(2.7.2) 7. Using the Baye’s rule, = P 1 2 2 1 |)()(|)() R R p x p x dx p x p x dx.(2.7.3) 8. The error will be minimized if the partitioning regions R1 and R2 of the point space are chosen so that R1 p( 1| x)> p( 2| x) R2 p( 2| x)> p( 1| x).(2.7.4) 9. Since the union of the regions R1, R2 covers all the space, we have 1 1 1 2 |)(|) R R p x p( x) dx p x p( x) dx = 1.(2.7.5) 10. Combining equation(2.7.3) and(2.7.5), we get, Pe = p( w1) 1 2 1 ()())() R p x p x p x dx.(2.7.6) 11. therefore, the probability of error is minimized if R1 is the region of space in which p( 1| x)> p( 2| x). also R2 becomes region where the reverse is true. Retrogression & Bayesian literacy 2 – 8 L( CS/ IT- Sem- 5) 12. In a bracket task with M classes, 1, 2,., M an unknown pattern, represented by the point vector x, is assigned to class i if p( i| x)> p( j| x) ji. Que2.9. Define Bayes classifier. Explain how bracket is done by using Bayes classifier. Answer 1. A Bayes classifier is a simple probabilistic classifier grounded on applying Bayes theorem( from Bayesian statistics) with strong( Naive) independence hypotheticals. 2. A Naive Bayes classifier assumes that the presence( or absence) of a particular point of a class is unconnected to the presence( or absence) of any other point. 3. Depending on the precise nature of the probability model, Naive Bayes classifiers can be trained veritably efficiently in a supervised literacy. 4. In numerous practical operations, parameter estimation for Naive Bayes models uses the system of maximum liability; in other words, one can work with the Naive Bayes model without believing in Bayesian probability or using any Bayesian styles. 5. An advantage of the Naive Bayes classifier is that it requires a small quantum of training data to estimate the parameters( means and dissonances of the variables) necessary for bracket. 6. The perceptron bears a certain relationship to a classical pattern classifier known as the Bayes classifier. 7. When the terrain is Gaussian, the Bayes classifier reduces to a direct classifier. In the Bayes classifier, or Bayes thesis testing procedure, we minimize the average threat, denoted by R. For a two- class problem, represented by classes C1 and C2, the average threat is defined R = 11 1 1 22 2 2 1 2 /)(/) x x H H C P P x C dx C P P x C dx 21 1 1 12 2 2 2 1 /)(/) x x H H C P P x C dx C P P x C dx where the colorful terms are defined as follows Pi = previous probability that the observation vector x is drawn from subspace Hi, with i = 1, 2, and P1 P2 = 1 Cij = Cost of deciding in favour of class Ci represented by subspace Hi when class Cj is true, with i, j = 1, 2 Px( x/ Ci) = tentative probability viscosity function of the arbitrary vector X 8.Fig.2.9.1( a) depicts a block illustration representation of the Bayes classifier. The important points in this block illustration are two fold a. The data processing in designing the Bayes classifier is confined entirely to the calculation of the liability rate( x). b. This calculation is fully steady to the values assigned to the previous chances and involved in the decision- making process. These amounts simply affect the values of the thresholdx. c. From a computational point of view, we find it more accessible to work with logarithm of the liability rate rather than the liability rate itself. Que2.10. bandy Bayes classifier using some illustration in detail. Answer For illustration 1. Let D be a training set of features and their associated class markers. Each point is represented by an n- dimensional trait vector X = ( x1, x2,., xn) depicting n measures made on the point from n attributes, independently A1, A2,., An. 2. Suppose that there are m classes, C1, C2,., Cm. Given a point X, the classifier will prognosticate that X belongs to the class having the loftiest posterior probability, conditioned on X. That is, classifier predicts that X belongs to class Ci if and only if, p( Ci| X)> p( Cj| X) for 1 j m, j i therefore, we maximize p( Ci| X). The class Ci for which p( Ci| X) is maximized is called the maximum posterior thesis. By Bayes theorem, p( Ci| X) = |)() X) p X Ci p Ci p 3. As p( X) is constant for all classes, only P( X| Ci) P( Ci) need to be maximized. If the class previous chances aren’t known also it is generally assumed that the classes are inversely likely i.e., p( C1) = p( C2) = . p( Cm) and thus p( X| Ci) is maximized. else p( X| Ci) p( Ci) is maximized. 4.i. Given data sets with numerous attributes, the calculation of p( X| Ci) will be extremely precious. ii. To reduce calculation in assessing p( X| Ci), the supposition of class tentative independence is made. iii. This presumes that the values of the attributes are conditionally independent of one another, given the class marker of the point. therefore, p( X| Ci) = 1 ) n k i k p x C = p( x1| C2) p( x2| C2)x. × p( xn| Ci) iv. The chances p( x1| Ci), p( x2| Ci),., p( xn| Ci) are fluently estimated from the training point. Then xk refers to the value of trait Ak for each trait, it’s checked whether the trait is categorical or nonstop valued. v. For illustration, to cipher p( X| Ci) we consider, a. If Ak is categorical also p( xk| Ci) is the number of point of class Ci in D having the value xk for Ak divided by| Ci, D|, the number of features of class Ci inD. b. If Ak is nonstop valued also nonstop valued trait is generally assumed to have a Gaussian distribution with a mean and standard divagation, defined by, g( x) = 1 2 1 2 2 2 x e so that p( xk| Ci) = g( xk). vi. There’s a need to cipher the mean and the standard divagation of the value of trait Ak for training set of class Ci. These values are used to estimate p( xk| Ci). vii. For illustration, let X = ( 35,Rs.,000) where A1 and A2 are the attributes age and income, independently. Let the class marker trait be buys- computer. viii. The associated class marker for X is yeah( i.e., buys- computer = yes). Let’s suppose that age has not been discretized and thus exists as a nonstop valued trait. ix. Suppose that from the training set, we find that client in D who buy a computer are 38 ± 12 times of age. In other words, for trait age and this class, we’ve = 38 and = 12. 5. In order to prognosticate the class marker of X, p( X| Ci) p( Ci) is estimated for each class Ci. The classifier predicts that the class marker of X is the class Ci, if and only if p( X| Ci) P( Ci)> p( X| Cj) p( Cj) for 1 j m, j i, The prognosticated class marker is the class Ci for which p( X| Ci) P( Ci) is the outside.