Noise detection in semi-supervised learning with the use of data streams

Damaris Pascual González; Fernando D. Vázquez Mesa; J. Salvador Sánchez; Filiberto Pla

doi:10.17533/udea.redin.14514

Authors

Damaris Pascual González Eastern University
Fernando D. Vázquez Mesa Eastern University
J. Salvador Sánchez Jaume I. University https://orcid.org/0000-0003-1053-4658
Filiberto Pla Jaume I. University

DOI:

https://doi.org/10.17533/udea.redin.14514

Keywords:

data streams, unlabeled data, noise cleaning, concept drift

Abstract

Often, it is necessary to construct training sets. If we have only a small number of tagged objects and a large group of unlabeled objects, we can build the training set simulating a data stream of unlabelled objects from which it is necessary to learn and to incorporate them to the training set later. In order to prevent deterioration of the training set obtained, in this work we propose a scheme that takes into account the concept drift, since in many situations the distribution of classes may change over time. To classify the unlabelled objects we have used an ensemble of classifiers and we propose a strategy to detect the noise after the classification process.

|Abstract

= 453 veces | PDF (ESPAÑOL (ESPAÑA))

= 87 veces|

Downloads

Download data is not yet available.

Author Biographies

Damaris Pascual González, Eastern University

Faculty of Economics and Business.

Fernando D. Vázquez Mesa, Eastern University

Faculty of Economics and Business.

J. Salvador Sánchez , Jaume I. University

Department of Programming Languages and Information Systems.

Filiberto Pla , Jaume I. University

Department of Programming Languages and Information Systems.

References

R. Bose, P. van der Aalst, I. Žliobaitė, M. Pechenizkiy. Handling concept drift in process mining. Proceedings of the 23rd International Conference on Advanced Information Systems Engineering. London, UK. 2011. pp. 391-405. DOI: https://doi.org/10.1007/978-3-642-21640-4_30

G. Widmer, M. Kubat. “Learning in the presence of concept drift and hidden contexts”. Machine Learning. Vol. 23. 1996. pp. 69-101. DOI: https://doi.org/10.1007/BF00116900

R. Elwell, R. Polikar, “Incremental learning of concept drift in nonstationary environments”. IEEE Transactions on Neural Networks.Vol. 22. 2011. pp. 1517-1531. DOI: https://doi.org/10.1109/TNN.2011.2160459

G. Ross, N. Adams, D. Tasoulis, D. Hand. “Exponentially weighted moving average charts for detecting concept drift”. Pattern Recognition Letters. Vol. 33. 2012. pp. 191-198. DOI: https://doi.org/10.1016/j.patrec.2011.08.019

O. Chapelle, A. Zien, B. Schölkopf, Semi-supervised Learning. 1st ed. Ed. MIT Press. Cambridge, MA, USA. 2006. pp. 3-5. DOI: https://doi.org/10.7551/mitpress/9780262033589.001.0001

V. Castelli, T. Cover, “On the Exponential Value of Labelled Samples”. Pattern Recognition Letters. Vol. 16. 1995. pp. 105-111. DOI: https://doi.org/10.1016/0167-8655(94)00074-D

V. Castelli, T. Cover. “The Relative Value of Labeled and Unlabeled Samples in Pattern Recognition With an Unknown Mixing Parameter”. IEEE Transactions on Information Theory. Vol. 42. 1996. pp. 2101-2117. DOI: https://doi.org/10.1109/18.556600

J. Ratsaby, S. Venkatesh. Learning From a Mixture of Labelled and Unlabelled Examples With Parametric Side Information. Proceedings of the 8th Annual Conference on Computational Learning Theory. Santa Cruz, USA. 1995. pp. 412-417. DOI: https://doi.org/10.1145/225298.225348

K. Nigam, R. Ghani. Analyzing the Effective and Applicability of Co-training. Proceedings of the 9th International Conference on Information and Knowledge Management. McLean, VA, USA. 2000. pp. 86-93. DOI: https://doi.org/10.1145/354756.354805

F. Cozman, I. Cohen, M. Cirelo. Semi-supervised Learning of Mixture Models. Proceedings of the 20th International Conference on Machine Learning. Washington DC., USA, 2003. pp. 99-106.

D. Yarowsky. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. Cambridge. MA, USA. 1995. pp. 189-196. DOI: https://doi.org/10.3115/981658.981684

E. Riloff, J. Wiebe, T. Wilson Learning Subjective Nouns Using Extraction Pattern Bootstrapping. Proceedings of the 7th Conference on Natural Language Learning. Edmonton, Canada. 2003. pp. 25-32. DOI: https://doi.org/10.3115/1119176.1119180

B. Maeireizo, D. Litman, R. Hwa. Co-training for Predicting Emotions With Spoken Dialogue Data. Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. Barcelona, Spain. 2004. pp. 203-206. DOI: https://doi.org/10.3115/1219044.1219072

C. Rosenberg, M. Hebert, H. Schneiderman. Semi-Supervised Self-training of Object Detection Models. Proceedings of the 7th IEEE Workshop on Applications of Computer Vision. Breckenridge, USA. 2005. pp. 29-36. DOI: https://doi.org/10.1109/ACVMOT.2005.107

Y. Jin, Y. Ma, L. Zhao. A Modified Self-training Semi supervised SVM Algorithm. Proceedings of the International Conference on Communication Systems and Network Technologies. Gujarat, India. 2012. pp. 224-228. DOI: https://doi.org/10.1109/CSNT.2012.56

A. Blum, T. Mitchell. Combining Labelled and Unlabelled Data With Co-training. Proceedings of the Workshop on Computational Learning Theory. New York, USA. 1998. pp. 92-100. DOI: https://doi.org/10.1145/279943.279962

T. Mitchell. The Role of Unlabeled Data in Supervised Learning. Proceeding of the 6th International Colloquium on Cognitive Science. San Sebastian, Spain. 1999. pp. 1-8.

S. Goldman, Y. Zhou. Enhancing Supervised Learning With Unlabelled Data. Proceedings of the 17th International Conference on Machine Learning. Stanford, USA. 2000. pp 327-334.

Y. Zhou, S. Goldman. Democratic Co-learning. Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence. Boca Raton, FL, USA. 2004. pp. 594-602.

V. Vapnik. Statistical Learning Theory. 1st ed. Ed. Wiley. New York, USA. 1998. pp. 434-436.

A. Demirez, K. Bennett. “Optimization Approaches to Semi Supervised Learning”. M. Ferris, O. Mangasarian, J. Pang (Eds.). Applications and Algorithms of Complementarity. 1st. ed. Ed. Kluwer Academic Publishers. Boston, USA. 2000. pp. 121-141. DOI: https://doi.org/10.1007/978-1-4757-3279-5_6

Y. Shi, Y. Tian, G. Kou, Y. Peng, J. Li. “Unsupervised and Semi-supervised Support Vector Machines”. Optimization Based Data Mining: Theory and Applications. 1st ed. Ed. Springer. London, UK. 2011. pp. 61-79. DOI: https://doi.org/10.1007/978-0-85729-504-0_4

B. Settles. Active Learning Literature Survey. Computer Sciences Technical Report 1648. University of Wisconsin-Madison. Wisconsin, USA. 2009. pp. 1-44.

F. Gu, D. Liu, X. Wang. Semi-Supervised Weighted Distance Metric Learning for kNN Classification. Proceedings of the International Conference on Computer, Mechatronics, Control and Electronic Engineering. Changchun, China. 2010. pp. 406-409.

B. Ni, S. Yan, A. Kassim. “Learning a Propagable Graph for Semisupervised Learning: Classification and regression”. IEEE Transactions on Knowledge and Data Engineering. Vol. 24. 2012. pp. 114-126. DOI: https://doi.org/10.1109/TKDE.2010.209

C. Kalish, T. Rogers, J. Lang, X. Zhu.“Can Semi-Supervised Learning Explain Incorrect Beliefs About Categories?”. Cognition. Vol. 120. 2011. pp. 106-118. DOI: https://doi.org/10.1016/j.cognition.2011.03.002