Que2.14. Explain EM algorithm with way. Answer 1. The Anticipation- Maximization( EM) algorithm is an iterative way to find maximum- liability estimates for model parameters when the data is deficient or has missing data points or has some retired variables. 2. EM chooses arbitrary values for the missing data points and estimates a new set of data. 3. These new values are also recursively used to estimate a better first data, by filling up missing points, until the values get fixed. 4. These are the two introductory way of the EM algorithm Estimation Step Initialize k, k and k by arbitrary values, or by K means clustering results or by hierarchical clustering results. ii. also for those given parameter values, estimate the value of the idle variables( i.e., k). Maximization Step Update the value of the parameters( i.e., k, k and k) calculated using ML system Initialize the mean k, the covariance matrix k and the mixing portions k by arbitrary values,( or other values). ii. cipher the k values for allk. iii. Again estimate all the parameters using the current k values. iv. cipher log- liability function. Put some confluence criterion. vi. If the log- liability value converges to some value or if all the parameters meet to some values) also stop, additional return to Step 2. Que2.15. Describe the operation, advantages and disadvantages of EM algorithm. Answer operation of EM algorithm 1. It can be used to fill the missing data in a sample. 2. It can be used as the base of unsupervised literacy of clusters. 3. It can be used for the purpose of estimating the parameters of Hidden Markov Model( HMM). 4. It can be used for discovering the values of idle variables. Advantages of EM algorithm are 1. It’s always guaranteed that liability will increase with each replication. 2. TheE-step and M- step are frequently enough easy for numerous problems in terms of perpetration. 3. results to the M- way frequently live in the unrestricted form. Disadvantages of EM algorithm are 1. It has slow confluence. 2. It makes confluence to the original optima only. 3. It requires both the chances, forward and backward( numerical optimization requires only forward probability). Que2.16. Write a short note on Bayesian network. OR Explain Bayesian network by taking an illustration. How is the Bayesian network important representation for query knowledge? Answer 1. A Bayesian network is a directed acyclic graph in which each knot is annotated with quantitative probability information. 2. The full specification is as follows i. A set of arbitrary variables makes up the bumps of the network variables may be separate or nonstop. ii. A set of directed links or arrows connects dyads of bumps. If there is an arrow from x to knot y, x is said to be a parent ofy. iii. Each knot xi has a tentative probability distribution P( xi| parent( xi)) that quantifies the effect of parents on the knot. iv. The graph has no directed cycles( and hence is a directed acyclic graph or DAG). 3. A Bayesian network provides a complete description of the sphere. Every entry in the full common probability distribution can be calculated from the information in the network. 4. Bayesian networks give a terse way to represent tentative independence connections in the sphere. 5. A Bayesian network is frequently exponentially lower than the full joint distribution. For illustration 1. Suppose we want to determine the possibility of lawn getting wet or dry due to the circumstance of different seasons. 2. The rainfall has three countries Sunny, Cloudy, and Rainy. There are two possibilities for the lawn Wet or Dry. 3. The sprinkler can be on oroff.However, the lawn gets wet but if it is If it’s stormy. sunny, we can make lawn wet by pouring water from a sprinkler. 4. Suppose that the lawn is wet. This could be contributed by one of the two reasons- originally, it’s raining. Secondly, the sprinklers are turned on. 5. Using the Baye’s rule, we can conclude the most contributing factor towards the wet lawn. Bayesian network possesses the following graces in query knowledge representation 1. Bayesian network can accessibly handle deficient data. 2. Bayesian network can learn the casual relation of variables. In data analysis, casual relation is helpful for field knowledge understanding, it can also fluently lead to precise vaticination indeed under important hindrance. 3. The combination of bayesian network and bayesian statistics can take full advantage of field knowledge and information from data. 4. The combination of bayesian network and other models can effectively avoidover-fitting problem. Que23.17. Explain the part of previous probability and posterior probability in bayesian bracket. Answer part of previous probability 1. The previous probability is used to cipher the probability of the event before the collection of new data. 2. It’s used to capture our hypotheticals sphere knowledge and is independent of the data. 3. It’s the unconditional probability that’s assigned before any applicable substantiation is taken into account. part of posterior probability 1. Posterior probability is used to cipher the probability of an event after collection of data. 2. It’s used to capture both the hypotheticals sphere knowledge and the pattern in observed data. 3. It’s the tentative probability that’s assigned after the applicable substantiation or background is taken into account. Que2.18. Explain the system of handling approximate conclusion in Bayesian networks. Answer 1. Approximate conclusion styles can be used when exact conclusion styles lead to inferior calculation times because the network is veritably large or densely connected. 2. styles handling approximate conclusion Simulation styles This system use the network to induce samples from the tentative probability distribution and estimate tentative chances of interest when the number of samples is sufficiently large. ii. Variational styles This system express the conclusion task as a numerical optimization problem and also find upper and lower bounds of the chances of interest by working a simplified interpretation of this optimization problem. Que2.19. Write short note on support vector machine. Answer relateQ.1.23, runner 1 – 23L, Unit- 1. Que2.20. What are the types of support vector machine? Answer Following are the types of support vector machine 1. Linear SVM Linear SVM is used for linearly divisible data, which means if a dataset can be classified into two classes by using a single straight line, also similar data is nominated as linearly divisible data, and classifier is used called as Linear SVM classifier. 2. Non-linear SVMNon-Linear SVM is used fornon-linearly separated data, which means if a dataset can not be classified by using a straight line, also similar data is nominated asnon-linear data and classifier used is called asNon-linear SVM classifier. Que2.21. What’s polynomial kernel? Explain polynomial kernel using one dimensional and two dimensional. Answer 1. The polynomial kernel is a kernel function used with Support Vector Machines( SVMs) and other kernelized models, that represents the similarity of vectors( training samples) in a point space over polynomials of the original variables, allowing literacy ofnon-linear models. 2. Polynomial kernel function is given by the equation a × b r) d where, a and b are two different data points that we need to classify. r determines the portions of the polynomial. d determines the degree of the polynomial. 3. We perform the fleck products of the data points, which gives us the high dimensional equals for the data. 4. When d = 1, the polynomial kernel computes the relationship between each brace of compliances in 1- Dimension and these connections help to find the support vector classifier. 5. When d = 2, the polynomial kernel computes the 2- Dimensional relationship between each brace of compliances which help to find the support vector classifier. Que2.22. Describe Gaussian Kernel( Radial Base Function). Answer 1. RBF kernel is a function whose value depends on the distance from the origin or from some point. 2. Gaussian Kernel is of the following format K( X1, X2,) = 2 1 2 exponent( – X – X) 1 2 X – X = Euclidean distance between X1 and X2 Using the distance in the original space we calculate the fleck product similarity) of X1 and X2. 3. Following are the parameters used in Gaussain Kernel C Antipode of the strength of regularization. geste As the value of ‘ c ’ increases the model gets overfits. As the value of ‘ c ’ decreases the model underfits. Gamma( used only for RBF kernel) geste As the value of ‘ ’ increases the model gets overfits. As the value of ‘ ’ decreases the model underfits.