MLT Unit 3 Part 1 Regression & Bayesian Learning

Que3.4. Explain colorful decision tree learning algorithms. Answer colorful decision tree learning algorithms are 1. ID3( Iterative Dichotomiser 3) ID3 is an algorithm used to induce a decision tree from a dataset. ii. To construct a decision tree, ID3 uses a top- down, greedy hunt through the given sets, where each trait at every tree knot is tested to elect the trait that’s stylish for bracket of a given set. iii. thus, the trait with the loftiest information gain can be named as the test trait of the current knot. iv. In this algorithm, small decision trees are preferred over the larger bones
. It’s a heuristic algorithm because it doesn’t construct the lowest tree. v. For erecting a decision tree model, ID3 only accepts categorical attributes. Accurate results aren’t given by ID3 when there is noise and when it’s serially enforced. vi. thus data is preprocessed before constructing a decision tree. vii. For constructing a decision tree information gain is calculated for each and every trait and trait with the loftiest information gain becomes the root knot. The rest possible values are denoted by bends. viii. All the outgrowth cases that are possible are examined whether they belong to the same class or not. For the cases of the same class, a single name is used to denote the class else the cases are classified on the base of splitting trait. 2. C4.5 C4.5 is an algorithm used to induce a decision tree. It’s an extension of ID3 algorithm. ii. C4.5 generates decision trees which can be used for bracket and thus C4.5 is appertained to as statistical classifier. iii. It’s better than the ID3 algorithm because it deals with both nonstop and separate attributes and also with the missing values and pruning trees after construction. iv. C5.0 is the marketable successor of C4.5 because it’s briskly, memory effective and used for erecting lower decision trees. C4.5 performs by dereliction a tree pruning process. This leads to the conformation of lower trees, simple rules and produces more intuitive interpretations. 3. wain( Bracket And Retrogression Trees) wain algorithm builds both bracket and retrogression trees. ii. The bracket tree is constructed by wain through double splitting of the trait. iii. Gini Index is used for opting the splitting trait. iv. The wain is also used for retrogression analysis with the help of retrogression tree. v. The retrogression point of CART can be used in vaticinating a dependent variable given a set of predictor variable over a given period of time. vi. wain has an average speed of processing and supports both nonstop and nominal trait data. Que3.5. What are the advantages and disadvantages of different decision tree learning algorithm? Answer Advantages of ID3 algorithm 1. The training data is used to produce accessible vaticination rules. 2. It builds short and fast tree. 3. ID3 searches the whole dataset to produce the whole tree. 4. It finds the splint bumps therefore enabling the test data to be pared and reducing the number of tests. 5. The computation time of ID3 is the direct function of the product of the characteristic number and knot number. Disadvantages of ID3 algorithm 1. For a small sample, data may be overfitted or overclassified. Decision Tree Learning 3 – 6 L( CS/ IT- Sem- 5) 2. For making a decision, only one trait is tested at an instant therefore consuming a lot of time. 3. Classifying the nonstop data may prove to be precious in terms of calculation, as numerous trees have to be generated to see where to break the nonstop sequence. 4. It’s exorbitantly sensitive to features when given a large number of input values. Advantages of C4.5 algorithm 1. C4.5 is easy to apply. 2. C4.5 builds models that can be fluently interpreted. 3. It can handle both categorical and nonstop values. 4. It can deal with noise and missing value attributes. Disadvantages of C4.5 algorithm 1. A small variation in data can lead to different decision trees when using C4.5. 2. For a small training set, C4.5 doesn’t work veritably well. Advantages of CART algorithm 1. wain can handle missing values automatically using deputy splits. 2. It uses combination of nonstop/ separate variables. 3. wain automatically performs variable selection. 4. wain can establish relations among variables. 5. wain doesn’t vary according to the monotonic metamorphosis of prophetic variable. Disadvantages of CART algorithm 1. wain has unstable decision trees. 2. wain splits only by one variable. 3. It’snon-parametric algorithm. Que3.6. Explain trait selection measures used in decision tree. Answer trait selection measures used in decision tree are 1. Entropy Entropy is a measure of query associated with a arbitrary variable. ii. The entropy increases with the increase in query or randomness and decreases with a drop in query or randomness. iii. The value of entropy ranges from 0- 1. Entropy( D) = 1 2 log() c i i i p p where pi is thenon-zero probability that an arbitrary tuple in D belongs to class C and is estimated by| Ci, D|/| D|. iv. A log function of base 2 is used because the entropy is decoded in bits 0 and 1. 2. Information gain ID3 uses information gain as its trait selection measure. ii. Information gain is the difference between the original information gain demand( i.e. grounded on the proportion of classes) and the new demand( i.e. attained after the partitioning of A). V Suppose we partition the tuples in D on some trait A having V distinct values iii. D is resolve into V partition or subsets,{ D1, D2,. Dj} where MC contains those tuples in D that have outgrowth aj, ofA. iv. The trait that has the loftiest information gain is chosen. 3. Gain rate i. The information gain measure is prejudiced towards tests with numerous issues. ii. That is, it prefers to elect attributes having a large number of values. iii. As each partition is pure, the information gain by partitioning is minimal. But similar partitioning can not be used for bracket. iv. C4.5 uses this trait selection measure which is an extension to the information gain. Gain rate differs from information gain, which measures the information with respect to a bracket that’s acquired grounded on some partitioning. vi. Gain rate applies kind of information gain using a split information value defined as viii. A splitting trait is named which is the trait having the maximum gain rate. Que3.7. Explain operations of decision tree in colorful areas of data mining. Answer The colorful decision tree operations in data mining are 1. E-Commerce It’s used extensively in the field ofe-commerce, decision tree helps to induce online roster which is an important factor for the success of ane-commerce website. 2. Assiduity Decision tree algorithm is useful for producing quality control faults identification) systems. 3. Intelligent vehicles An important task for the development of intelligent vehicles is to find the lane boundaries of the road. 4. Medicine Decision tree is an important fashion for medical exploration and practice. A decision tree is used for individual of colorful conditions. Decision tree is also used for hard sound opinion. 5. Business Decision trees find use in the field of business where they are used for visualization of probabilistic business models, used in CRM client Relationship operation) and used for credit scoring for credit card druggies and for prognosticating loan pitfalls in banks. Que3.8. Explain procedure of ID3 algorithm. Answer ID3( exemplifications, Target Attribute, Attributes) 1. produce a Root knot for the tree. 2. still, return the single- knot tree root, with marker If all exemplifications are positive. = Machine Learning ways 3 – 9 L( CS/ IT- Sem- 5) 3. still, return the single- knot tree root, with marker If all exemplifications are negative. = – 4. still, return the single- knot tree root, with marker = If Attributes is empty. most common value of target trait in exemplifications. 5. else begin a. A the trait from Attributes that stylish classifies exemplifications b. The decision trait for Root A c. For each possible value, Vi, of A, Add a new tree branch below root, corresponding to the test A = Vi ii. Let illustration Vi be the subset of exemplifications that have value Vi for A iii. If illustration Vi is empty a. also below this new branch add a splint knot with marker = most common value of TargetAttribute in exemplifications Differently below this new branch add thesub-tree ID3( illustration Vi, TargetAttribute, Attributes –{ A}) 6. End 7. Return root.

Leave a Comment