Ensemble learning approach to classifying documents based on formal and informal writing styles
By: Karunarathna, K M G.
Contributor(s): Rupasingha, R A H M.
Publisher: Hyderabad IUP Publications 2022Edition: Vol,18(3), Sep.Description: 27-49p.Subject(s): EXTC Engineering In: IUP journal of information technologySummary: Recent advances in technology, many students and scholars have been tempted to use the internet as their main educational resource since they can obtain a variety of documents online.these documents can be classified as either formal or informal in writing style, involving different linguistics. the paper presents a method to identify automatically the style of a particular document.first, a dataset of online documents was compiled and preprocessed.next features ware extracted via a term frequency- inverse document frequency vectorizer. classification models were then built using six classification algorithms. initially, five machine learning algorithms- random forest, decision tree,support vactor machine, multilayer perceptionnn,and naive bayes- were used. of these five algorithms, the random forest algoritham performed best, obtaining an accuracy of 87.44%,high value for precision and recall,and an f measure with the lowest error rate. in the second experiment,an ensemble learning method was used, whereby a vote algoritham was used with a combination of the five algorithms.this method obtained an accuracy of 91.96% the method combines several algorithms.Item type | Current location | Call number | Status | Date due | Barcode | Item holds |
---|---|---|---|---|---|---|
Articles Abstract Database | School of Engineering & Technology Archieval Section | Not for loan | 2023-0123 |
Recent advances in technology, many students and scholars have been tempted to use the internet as their main educational resource since they can obtain a variety of documents online.these documents can be classified as either formal or informal in writing style, involving different linguistics. the paper presents a method to identify automatically the style of a particular document.first, a dataset of online documents was compiled and preprocessed.next features ware extracted via a term frequency- inverse document frequency vectorizer. classification models were then built using six classification algorithms. initially, five machine learning algorithms- random forest, decision tree,support vactor machine, multilayer perceptionnn,and naive bayes- were used. of these five algorithms, the random forest algoritham performed best, obtaining an accuracy of 87.44%,high value for precision and recall,and an f measure with the lowest error rate. in the second experiment,an ensemble learning method was used, whereby a vote algoritham was used with a combination of the five algorithms.this method obtained an accuracy of 91.96% the method combines several algorithms.
There are no comments for this item.