Detección de ruido en aprendizaje semi-supervisado con el uso de flujos de datos

Autores/as

  • Damaris Pascual González Universidad de Oriente
  • Fernando D. Vázquez Mesa Universidad de Oriente
  • J. Salvador Sánchez Universidad Jaume I. https://orcid.org/0000-0003-1053-4658
  • Filiberto Pla Universidad Jaume I.

DOI:

https://doi.org/10.17533/udea.redin.14514

Palabras clave:

flujo de datos, datos no etiquetados, limpieza de ruido, concept drift

Resumen

A menudo, es necesario construir conjuntos de entrenamiento. Si disponemos solamente de un número reducido de objetos etiquetados y de un conjunto numeroso de objetos no etiquetados, podemos construir el conjunto de entrenamiento simulando un flujo de datos no etiquetados de los cuales es necesario aprender para poder incorporarlos al conjunto de entrenamiento. Con el objetivo de prevenir que se deterioren los conjuntos de entrenamiento que se obtienen, en este trabajo se propone un esquema que tiene en cuenta el concept drift, ya que en muchas situaciones la distribución de las clases puede cambiar con el tiempo. Para clasificar los objetos no etiquetados hemos empleado un ensemble de clasificadores y proponemos una estrategia para detectar el ruido.

|Resumen
= 198 veces | PDF
= 57 veces|

Descargas

Los datos de descargas todavía no están disponibles.

Biografía del autor/a

Damaris Pascual González, Universidad de Oriente

Facultad de Ciencias Económicas y Empresariales.

Fernando D. Vázquez Mesa, Universidad de Oriente

Facultad de Ciencias Económicas y Empresariales.

J. Salvador Sánchez , Universidad Jaume I.

Departamento de Lenguajes de Programación y Sistemas de Información.

Filiberto Pla , Universidad Jaume I.

Departamento de Lenguajes de Programación y Sistemas de Información.

Citas

R. Bose, P. van der Aalst, I. Žliobaitė, M. Pechenizkiy. Handling concept drift in process mining. Proceedings of the 23rd International Conference on Advanced Information Systems Engineering. London, UK. 2011. pp. 391-405. DOI: https://doi.org/10.1007/978-3-642-21640-4_30

G. Widmer, M. Kubat. “Learning in the presence of concept drift and hidden contexts”. Machine Learning. Vol. 23. 1996. pp. 69-101. DOI: https://doi.org/10.1007/BF00116900

R. Elwell, R. Polikar, “Incremental learning of concept drift in nonstationary environments”. IEEE Transactions on Neural Networks.Vol. 22. 2011. pp. 1517-1531. DOI: https://doi.org/10.1109/TNN.2011.2160459

G. Ross, N. Adams, D. Tasoulis, D. Hand. “Exponentially weighted moving average charts for detecting concept drift”. Pattern Recognition Letters. Vol. 33. 2012. pp. 191-198. DOI: https://doi.org/10.1016/j.patrec.2011.08.019

O. Chapelle, A. Zien, B. Schölkopf, Semi-supervised Learning. 1st ed. Ed. MIT Press. Cambridge, MA, USA. 2006. pp. 3-5. DOI: https://doi.org/10.7551/mitpress/9780262033589.001.0001

V. Castelli, T. Cover, “On the Exponential Value of Labelled Samples”. Pattern Recognition Letters. Vol. 16. 1995. pp. 105-111. DOI: https://doi.org/10.1016/0167-8655(94)00074-D

V. Castelli, T. Cover. “The Relative Value of Labeled and Unlabeled Samples in Pattern Recognition With an Unknown Mixing Parameter”. IEEE Transactions on Information Theory. Vol. 42. 1996. pp. 2101-2117. DOI: https://doi.org/10.1109/18.556600

J. Ratsaby, S. Venkatesh. Learning From a Mixture of Labelled and Unlabelled Examples With Parametric Side Information. Proceedings of the 8th Annual Conference on Computational Learning Theory. Santa Cruz, USA. 1995. pp. 412-417. DOI: https://doi.org/10.1145/225298.225348

K. Nigam, R. Ghani. Analyzing the Effective and Applicability of Co-training. Proceedings of the 9th International Conference on Information and Knowledge Management. McLean, VA, USA. 2000. pp. 86-93. DOI: https://doi.org/10.1145/354756.354805

F. Cozman, I. Cohen, M. Cirelo. Semi-supervised Learning of Mixture Models. Proceedings of the 20th International Conference on Machine Learning. Washington DC., USA, 2003. pp. 99-106.

D. Yarowsky. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. Cambridge. MA, USA. 1995. pp. 189-196. DOI: https://doi.org/10.3115/981658.981684

E. Riloff, J. Wiebe, T. Wilson Learning Subjective Nouns Using Extraction Pattern Bootstrapping. Proceedings of the 7th Conference on Natural Language Learning. Edmonton, Canada. 2003. pp. 25-32. DOI: https://doi.org/10.3115/1119176.1119180

B. Maeireizo, D. Litman, R. Hwa. Co-training for Predicting Emotions With Spoken Dialogue Data. Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. Barcelona, Spain. 2004. pp. 203-206. DOI: https://doi.org/10.3115/1219044.1219072

C. Rosenberg, M. Hebert, H. Schneiderman. Semi-Supervised Self-training of Object Detection Models. Proceedings of the 7th IEEE Workshop on Applications of Computer Vision. Breckenridge, USA. 2005. pp. 29-36. DOI: https://doi.org/10.1109/ACVMOT.2005.107

Y. Jin, Y. Ma, L. Zhao. A Modified Self-training Semi supervised SVM Algorithm. Proceedings of the International Conference on Communication Systems and Network Technologies. Gujarat, India. 2012. pp. 224-228. DOI: https://doi.org/10.1109/CSNT.2012.56

A. Blum, T. Mitchell. Combining Labelled and Unlabelled Data With Co-training. Proceedings of the Workshop on Computational Learning Theory. New York, USA. 1998. pp. 92-100. DOI: https://doi.org/10.1145/279943.279962

T. Mitchell. The Role of Unlabeled Data in Supervised Learning. Proceeding of the 6th International Colloquium on Cognitive Science. San Sebastian, Spain. 1999. pp. 1-8.

S. Goldman, Y. Zhou. Enhancing Supervised Learning With Unlabelled Data. Proceedings of the 17th International Conference on Machine Learning. Stanford, USA. 2000. pp 327-334.

Y. Zhou, S. Goldman. Democratic Co-learning. Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence. Boca Raton, FL, USA. 2004. pp. 594-602.

V. Vapnik. Statistical Learning Theory. 1st ed. Ed. Wiley. New York, USA. 1998. pp. 434-436.

A. Demirez, K. Bennett. “Optimization Approaches to Semi Supervised Learning”. M. Ferris, O. Mangasarian, J. Pang (Eds.). Applications and Algorithms of Complementarity. 1st. ed. Ed. Kluwer Academic Publishers. Boston, USA. 2000. pp. 121-141. DOI: https://doi.org/10.1007/978-1-4757-3279-5_6

Y. Shi, Y. Tian, G. Kou, Y. Peng, J. Li. “Unsupervised and Semi-supervised Support Vector Machines”. Optimization Based Data Mining: Theory and Applications. 1st ed. Ed. Springer. London, UK. 2011. pp. 61-79. DOI: https://doi.org/10.1007/978-0-85729-504-0_4

B. Settles. Active Learning Literature Survey. Computer Sciences Technical Report 1648. University of Wisconsin-Madison. Wisconsin, USA. 2009. pp. 1-44.

F. Gu, D. Liu, X. Wang. Semi-Supervised Weighted Distance Metric Learning for kNN Classification. Proceedings of the International Conference on Computer, Mechatronics, Control and Electronic Engineering. Changchun, China. 2010. pp. 406-409.

B. Ni, S. Yan, A. Kassim. “Learning a Propagable Graph for Semisupervised Learning: Classification and regression”. IEEE Transactions on Knowledge and Data Engineering. Vol. 24. 2012. pp. 114-126. DOI: https://doi.org/10.1109/TKDE.2010.209

C. Kalish, T. Rogers, J. Lang, X. Zhu.“Can Semi-Supervised Learning Explain Incorrect Beliefs About Categories?”. Cognition. Vol. 120. 2011. pp. 106-118. DOI: https://doi.org/10.1016/j.cognition.2011.03.002

Descargas

Publicado

2014-02-12

Cómo citar

Pascual González, D., Vázquez Mesa, F. D., Sánchez , J. S., & Pla , F. (2014). Detección de ruido en aprendizaje semi-supervisado con el uso de flujos de datos. Revista Facultad De Ingeniería Universidad De Antioquia, 71(71), 37–47. https://doi.org/10.17533/udea.redin.14514