Ensemble learning approach to classifying documents based on formal and informal writing styles
Karunarathna, K M G
Ensemble learning approach to classifying documents based on formal and informal writing styles - Vol,18(3), Sep - Hyderabad IUP Publications 2022 - 27-49p
Recent advances in technology, many students and scholars have been tempted to use the internet as their main educational resource since they can obtain a variety of documents online.these documents can be classified as either formal or informal in writing style, involving different linguistics. the paper presents a method to identify automatically the style of a particular document.first, a dataset of online documents was compiled and preprocessed.next features ware extracted via a term frequency- inverse document frequency vectorizer. classification models were then built using six classification algorithms. initially, five machine learning algorithms- random forest, decision tree,support vactor machine, multilayer perceptionnn,and naive bayes- were used. of these five algorithms, the random forest algoritham performed best, obtaining an accuracy of 87.44%,high value for precision and recall,and an f measure with the lowest error rate. in the second experiment,an ensemble learning method was used, whereby a vote algoritham was used with a combination of the five algorithms.this method obtained an accuracy of 91.96% the method combines several algorithms.
EXTC Engineering
Ensemble learning approach to classifying documents based on formal and informal writing styles - Vol,18(3), Sep - Hyderabad IUP Publications 2022 - 27-49p
Recent advances in technology, many students and scholars have been tempted to use the internet as their main educational resource since they can obtain a variety of documents online.these documents can be classified as either formal or informal in writing style, involving different linguistics. the paper presents a method to identify automatically the style of a particular document.first, a dataset of online documents was compiled and preprocessed.next features ware extracted via a term frequency- inverse document frequency vectorizer. classification models were then built using six classification algorithms. initially, five machine learning algorithms- random forest, decision tree,support vactor machine, multilayer perceptionnn,and naive bayes- were used. of these five algorithms, the random forest algoritham performed best, obtaining an accuracy of 87.44%,high value for precision and recall,and an f measure with the lowest error rate. in the second experiment,an ensemble learning method was used, whereby a vote algoritham was used with a combination of the five algorithms.this method obtained an accuracy of 91.96% the method combines several algorithms.
EXTC Engineering