Noise detection in semi-supervised learning with the use of data streams
DOI:
https://doi.org/10.17533/udea.redin.14514Keywords:
data streams, unlabeled data, noise cleaning, concept driftAbstract
Often, it is necessary to construct training sets. If we have only a small number of tagged objects and a large group of unlabeled objects, we can build the training set simulating a data stream of unlabelled objects from which it is necessary to learn and to incorporate them to the training set later. In order to prevent deterioration of the training set obtained, in this work we propose a scheme that takes into account the concept drift, since in many situations the distribution of classes may change over time. To classify the unlabelled objects we have used an ensemble of classifiers and we propose a strategy to detect the noise after the classification process.
Downloads
References
R. Bose, P. van der Aalst, I. Žliobaitė, M. Pechenizkiy. Handling concept drift in process mining. Proceedings of the 23rd International Conference on Advanced Information Systems Engineering. London, UK. 2011. pp. 391-405. DOI: https://doi.org/10.1007/978-3-642-21640-4_30
G. Widmer, M. Kubat. “Learning in the presence of concept drift and hidden contexts”. Machine Learning. Vol. 23. 1996. pp. 69-101. DOI: https://doi.org/10.1007/BF00116900
R. Elwell, R. Polikar, “Incremental learning of concept drift in nonstationary environments”. IEEE Transactions on Neural Networks.Vol. 22. 2011. pp. 1517-1531. DOI: https://doi.org/10.1109/TNN.2011.2160459
G. Ross, N. Adams, D. Tasoulis, D. Hand. “Exponentially weighted moving average charts for detecting concept drift”. Pattern Recognition Letters. Vol. 33. 2012. pp. 191-198. DOI: https://doi.org/10.1016/j.patrec.2011.08.019
O. Chapelle, A. Zien, B. Schölkopf, Semi-supervised Learning. 1st ed. Ed. MIT Press. Cambridge, MA, USA. 2006. pp. 3-5. DOI: https://doi.org/10.7551/mitpress/9780262033589.001.0001
V. Castelli, T. Cover, “On the Exponential Value of Labelled Samples”. Pattern Recognition Letters. Vol. 16. 1995. pp. 105-111. DOI: https://doi.org/10.1016/0167-8655(94)00074-D
V. Castelli, T. Cover. “The Relative Value of Labeled and Unlabeled Samples in Pattern Recognition With an Unknown Mixing Parameter”. IEEE Transactions on Information Theory. Vol. 42. 1996. pp. 2101-2117. DOI: https://doi.org/10.1109/18.556600
J. Ratsaby, S. Venkatesh. Learning From a Mixture of Labelled and Unlabelled Examples With Parametric Side Information. Proceedings of the 8th Annual Conference on Computational Learning Theory. Santa Cruz, USA. 1995. pp. 412-417. DOI: https://doi.org/10.1145/225298.225348
K. Nigam, R. Ghani. Analyzing the Effective and Applicability of Co-training. Proceedings of the 9th International Conference on Information and Knowledge Management. McLean, VA, USA. 2000. pp. 86-93. DOI: https://doi.org/10.1145/354756.354805
F. Cozman, I. Cohen, M. Cirelo. Semi-supervised Learning of Mixture Models. Proceedings of the 20th International Conference on Machine Learning. Washington DC., USA, 2003. pp. 99-106.
D. Yarowsky. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. Cambridge. MA, USA. 1995. pp. 189-196. DOI: https://doi.org/10.3115/981658.981684
E. Riloff, J. Wiebe, T. Wilson Learning Subjective Nouns Using Extraction Pattern Bootstrapping. Proceedings of the 7th Conference on Natural Language Learning. Edmonton, Canada. 2003. pp. 25-32. DOI: https://doi.org/10.3115/1119176.1119180
B. Maeireizo, D. Litman, R. Hwa. Co-training for Predicting Emotions With Spoken Dialogue Data. Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. Barcelona, Spain. 2004. pp. 203-206. DOI: https://doi.org/10.3115/1219044.1219072
C. Rosenberg, M. Hebert, H. Schneiderman. Semi-Supervised Self-training of Object Detection Models. Proceedings of the 7th IEEE Workshop on Applications of Computer Vision. Breckenridge, USA. 2005. pp. 29-36. DOI: https://doi.org/10.1109/ACVMOT.2005.107
Y. Jin, Y. Ma, L. Zhao. A Modified Self-training Semi supervised SVM Algorithm. Proceedings of the International Conference on Communication Systems and Network Technologies. Gujarat, India. 2012. pp. 224-228. DOI: https://doi.org/10.1109/CSNT.2012.56
A. Blum, T. Mitchell. Combining Labelled and Unlabelled Data With Co-training. Proceedings of the Workshop on Computational Learning Theory. New York, USA. 1998. pp. 92-100. DOI: https://doi.org/10.1145/279943.279962
T. Mitchell. The Role of Unlabeled Data in Supervised Learning. Proceeding of the 6th International Colloquium on Cognitive Science. San Sebastian, Spain. 1999. pp. 1-8.
S. Goldman, Y. Zhou. Enhancing Supervised Learning With Unlabelled Data. Proceedings of the 17th International Conference on Machine Learning. Stanford, USA. 2000. pp 327-334.
Y. Zhou, S. Goldman. Democratic Co-learning. Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence. Boca Raton, FL, USA. 2004. pp. 594-602.
V. Vapnik. Statistical Learning Theory. 1st ed. Ed. Wiley. New York, USA. 1998. pp. 434-436.
A. Demirez, K. Bennett. “Optimization Approaches to Semi Supervised Learning”. M. Ferris, O. Mangasarian, J. Pang (Eds.). Applications and Algorithms of Complementarity. 1st. ed. Ed. Kluwer Academic Publishers. Boston, USA. 2000. pp. 121-141. DOI: https://doi.org/10.1007/978-1-4757-3279-5_6
Y. Shi, Y. Tian, G. Kou, Y. Peng, J. Li. “Unsupervised and Semi-supervised Support Vector Machines”. Optimization Based Data Mining: Theory and Applications. 1st ed. Ed. Springer. London, UK. 2011. pp. 61-79. DOI: https://doi.org/10.1007/978-0-85729-504-0_4
B. Settles. Active Learning Literature Survey. Computer Sciences Technical Report 1648. University of Wisconsin-Madison. Wisconsin, USA. 2009. pp. 1-44.
F. Gu, D. Liu, X. Wang. Semi-Supervised Weighted Distance Metric Learning for kNN Classification. Proceedings of the International Conference on Computer, Mechatronics, Control and Electronic Engineering. Changchun, China. 2010. pp. 406-409.
B. Ni, S. Yan, A. Kassim. “Learning a Propagable Graph for Semisupervised Learning: Classification and regression”. IEEE Transactions on Knowledge and Data Engineering. Vol. 24. 2012. pp. 114-126. DOI: https://doi.org/10.1109/TKDE.2010.209
C. Kalish, T. Rogers, J. Lang, X. Zhu.“Can Semi-Supervised Learning Explain Incorrect Beliefs About Categories?”. Cognition. Vol. 120. 2011. pp. 106-118. DOI: https://doi.org/10.1016/j.cognition.2011.03.002
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2018 Revista Facultad de Ingeniería
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Revista Facultad de Ingeniería, Universidad de Antioquia is licensed under the Creative Commons Attribution BY-NC-SA 4.0 license. https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en
You are free to:
Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material
Under the following terms:
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial — You may not use the material for commercial purposes.
ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
The material published in the journal can be distributed, copied and exhibited by third parties if the respective credits are given to the journal. No commercial benefit can be obtained and derivative works must be under the same license terms as the original work.