Incremental k most similar neighbor classifier for mixed data

Authors

  • Guillermo Sánchez-Díaz Autonomous University of San Luis Potosi
  • Uriel E. Escobar-Franco Universidad Politécnica de Tulancingo
  • Luis R. Morales-Manilla Universidad Politécnica de Tulancingo
  • Iván Piza-Dávila Western Institute of Technology and Higher Studies https://orcid.org/0000-0002-4189-6208
  • Carlos Aguirre-Salado Autonomous University of San Luis Potosi https://orcid.org/0000-0003-3422-7193
  • Anilu Franco-Arcega Autonomous University of the State of Hidalgo

DOI:

https://doi.org/10.17533/udea.redin.16307

Keywords:

supervised classification, incremental algorithms, artificial intelligence, pattern recognition

Abstract

This paper presents an incremental k-most similar neighbor classifier, for mixed data and similarity functions that are not necessarily distances. The algorithm presented is suitable for processing large data sets, because it only stores in main memory the k most similar neighbors processed until step t, traversing only once the training data set. Several experiments with synthetic and real data are presented.

|Abstract
= 119 veces | PDF (ESPAÑOL (ESPAÑA))
= 47 veces|

Downloads

Download data is not yet available.

Author Biographies

Guillermo Sánchez-Díaz, Autonomous University of San Luis Potosi

Faculty of Engineering.

Uriel E. Escobar-Franco, Universidad Politécnica de Tulancingo

Engineering Division. Engineering # 100.

Luis R. Morales-Manilla, Universidad Politécnica de Tulancingo

Engineering Division. Engineering # 100.

Iván Piza-Dávila, Western Institute of Technology and Higher Studies

Department of Electronics, Systems and Informatics.

Carlos Aguirre-Salado, Autonomous University of San Luis Potosi

Faculty of Engineering.

Anilu Franco-Arcega, Autonomous University of the State of Hidalgo

Information Technology and Systems Research Center.

References

A. Faragó, T. Linder, G. Lugosi. “Fast nearest-neighbor search in dissimilarity spaces”. IEEE Transactions in Pattern Analysis and Machine Intelligence. Vol. 9. 1993. pp. 957-962. DOI: https://doi.org/10.1109/34.232083

A. Frank, A. Asuncion. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. 1998.

C. Bohm C. Krebs. “The k-nearest neighbor join: turbo charging the kdd process”. Knowledge Information Systems. Vol. 6. 2004. pp. 728-749. DOI: https://doi.org/10.1007/s10115-003-0122-9

C. Chien, K. Bo, C. Fu. The generalized condensed nearest neighbor rule as a data reduction method. Proc. of the 18th International Conference on Pattern Recognition. Hong Kong, China. 2006. pp. 556-559.

C. Xia, H. Lu, BC. Ooi, J. Hu, Gorder: an efficient method for knn join processing. Proc. of the 30th international conference on very large data bases. Toronto, Canada. 2004. pp. 756-767. DOI: https://doi.org/10.1016/B978-012088469-8/50067-X

C. Yong-Sheng, H. Yi-Ping, F. Chiou-Shann. “Fast and versatile algorithm for nearest neighbor search based on lower bound tree”. Pattern Recognition Letters. Vol. 2. 2007. pp. 360-375. DOI: https://doi.org/10.1016/j.patcog.2005.08.016

C. Yu, B. Cui, S. Wang, J. Su, “Efficient index-based knn join processing for high-dimensional data”. Inf. Softw. Technol. Vol. 4. 2007. pp. 332-344. DOI: https://doi.org/10.1016/j.infsof.2006.05.006

C. Yu, R. Zhang, Y. Huang, H. Xiong, “High-dimensional kNN joins with incremental updates”. Geoinformatica. Nº. 14. 2010. pp. 55-82. DOI: https://doi.org/10.1007/s10707-009-0076-5

H. Chen, B. Yang, G. Wang, J. Liu, X. Xu, S. Wang, D. Liu. “A novel bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor method”. Knowledge-Based Systems. Vol. 24. 2011. pp. 1348- 1359. DOI: https://doi.org/10.1016/j.knosys.2011.06.008

H. Latifi, F. Fassnacht, B. Koch. “Forest structure modeling with combined airborne hyperspectral and LiDAR data”. Remote Sensing of Environment. Vol. 121. 2012. pp.10-25. DOI: https://doi.org/10.1016/j.rse.2012.01.015

I. Sone, R. Olsen, A. Sivertsen, G. Eilertsen, K. Heia. “Classification of fresh Atlantic salmon (Salmo salar L.) fillets stored under different atmospheres by hyperspectral imaging”. Journal of Food Engineering. 2012. Vol. 109. pp. 482-489. DOI: https://doi.org/10.1016/j.jfoodeng.2011.11.001

J. Breidenbach, E. Nasset, V. Lien, T. Gobakken, S. Solberg. “Prediction of species specific forest inventory attributes using a nonparametric semi-individual tree crown approach based on fused airborne laser scanning and multispectral data”. Remote Sensing of Environment. 2010. Vol. 114. no. 4. pp. 911-924. DOI: https://doi.org/10.1016/j.rse.2009.12.004

J. Friedman, F. Baskett, L. Shustek, “An algorithm for finding nearest neighbors”. IEEE Transactions on Computers. 1975. Vol. C-24. issue 10. pp. 1000-1006. DOI: https://doi.org/10.1109/T-C.1975.224110

J. Ruiz, M. Abidi. “Logical combinatorial pattern recognition: A review”. Ed. Transworld Research Network. Kerala, India. 2002. pp. 133-176.

J. Ruiz. “Pattern recognition with mixed and incomplete data”. Pattern Recognition and Image Analysis. Vol. 18. 2008. pp. 563-576. DOI: https://doi.org/10.1134/S1054661808040044

K. Figueroa, E. Chávez, G. Navarro, R. Paredes. “On the least cost for proximity searching in metric spaces”. Lecture Notes in Computer Science. Vol. 4007. 2006. pp. 279-290. DOI: https://doi.org/10.1007/11764298_26

M. Adler, B. Heeringa. “Search Space Reductions for Nearest-Neighbor Queries”. Lecture Notes in Computer Science. Vol. 4978. 2008. pp. 554-567. DOI: https://doi.org/10.1007/978-3-540-79228-4_48

P. Packalen, M. Maltamo. “The k-MSN method for the prediction of species-specific stand attributes using airborne laser scanning and aerial photographs”. Remote Sensing of Environment. Vol. 109. 3. 2007. pp. 328-341. DOI: https://doi.org/10.1016/j.rse.2007.01.005

R. McRoberts, S. Magnussen, E. Tomppo, G. Chirici. “Parametric, bootstrap, and jackknife variance estimators for the k-Nearest Neighbors technique with illustrations using forest inventory and satellite image data”. Remote Sensing of Environment. Vol. 115. 2011. pp. 3165-3174. DOI: https://doi.org/10.1016/j.rse.2011.07.002

S. Berchtold, D. Keim, H. Kriegel, T. Seidl, “Indexing the solution space: a new technique for nearest neighbor search in high dimensional space”. IEEE Transactions on Knowledge Data Engineering. Vol. 1. 2000. pp. 45-57. DOI: https://doi.org/10.1109/69.842249

S. Hernández, J. Carrasco, J. Martínez. “Fast k Most Similar Neighbor Classifier for Mixed Data Based on Approximating and Eliminating”. Lecture Notes in Computer Science. Vol. 5012. 2008. pp. 697-704.

S. Hernández, J. Martínez, A. Carrasco. “Fast k most similar neighbor classifier for mixed data (tree k-MSN)”. Pattern Recognition. Vol. 43. 3. 2010. pp. 873-886. DOI: https://doi.org/10.1016/j.patcog.2009.08.014

T. Cover, P. Hart, “Nearest neighbor pattern classification”. Transactions on Information Theory. Vol. 13. 1967. pp. 21-27. DOI: https://doi.org/10.1109/TIT.1967.1053964

U. Escobar, G. Sánchez. “Algoritmo de votación incremental INC-ALVOT para clasificación supervisada”. Revista Facultad de Ingeniería, Universidad de Antioquia. Nº. 50. 2009. pp. 195-204.

V. Ramasubramanian, K. Paliwal. “Fast nearest-neighbor search based on approximation-elimination search”. Pattern Recognition. Vol. 33. 2000. pp. 1497- 1510. DOI: https://doi.org/10.1016/S0031-3203(99)00134-X

X. Tian, Z. Su, E. Chen, Z. Li, C. Van der Tol, J. Guo, Q. He. “Estimation of forest above-ground biomass using multi-parameter remote sensing data over a cold and arid area”. Int. Journal of Applied Earth Observation and Geoinformation. Vol. 14. 2012. pp. 160-168. DOI: https://doi.org/10.1016/j.jag.2011.09.010

Published

2013-08-16

How to Cite

Sánchez-Díaz, G., Escobar-Franco, U. E., Morales-Manilla, L. R., Piza-Dávila, I., Aguirre-Salado, C., & Franco-Arcega, A. (2013). Incremental k most similar neighbor classifier for mixed data. Revista Facultad De Ingeniería Universidad De Antioquia, (67), 19–30. https://doi.org/10.17533/udea.redin.16307