Open Access Open Access  Restricted Access Subscription Access
Open Access Open Access Open Access  Restricted Access Restricted Access Subscription Access

Clustering of Hand Written Digits Using K-Means Algorithm and Self Organizing Maps


Affiliations
1 Koneru Lakshmaiah Education Foundation, Vijayawada, Guntur(Dt.), Andhra Pradesh, India
     

   Subscribe/Renew Journal


Present work focuses on clustering of MNIST dataset using K-means clustering and Self-Organizing Maps (SOM). Histograms of Oriented Gradients (HOG) descriptors are used to extract the feature vectors and Principal Component Analysis (PCA) is applied on feature vectors to reduce the dimensionality. First two principal components are taken for cluster formation. Purity of cluster metric is used to evaluate the clusters. External criteria with prior information of true class is chosen to validate cluster. The performance of SOM is better than K-means in forming clusters. Out of 10 clusters K-means algorithm missed clusters of 3 digits (0, 7 and 9) whereas SOM missed clusters of 2 digits (5, 9).

Keywords

Clustering, Histograms of Oriented Gradients (HOG), K-Means Clustering, MNIST, Principal Component Analysis, Self Organizing Maps, Unsupervised Learning.
User
Subscription Login to verify subscription
Notifications
Font Size

  • Y. LeCun, L. D. Jackel, L. Bottou, C. Cortes, J. S. Denker, …. V. Vapnik, “Learning algorithms for classification: A comparison on handwritten digit recognition,” in J. H. Oh, C. Kwon, and S. Cho (eds.), Neural Networks: The Statistical Mechanics Perspective, pp. 261-276, World Scientific, 1995.
  • F. Lauer, C. Y. Suen, and G. Bloch, “A trainable feature extractor for handwritten digit recognition,” Pattern Recognition, vol. 40, no. 6, pp. 1816-1824, 2007.
  • Y. Lee, “Handwritten digit recognition using K nearest-neighbor, radial-basis function, and backpropagation neural networks,” Neural Computation, vol. 3, no. 3, pp. 440-449, 1991.
  • C.-L. Liu, K. Nakashima, H. Sako, and H. Fujisawa, “Handwritten digit recognition: Benchmarking of state-of-the-art techniques,” Pattern Recognition, vol. 36, no. 10, pp. 2271-2285, 2003.
  • Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” NIPS Workshop on Deep Learning and Unsupervised Feature Learning, vol. 2011, no. 2, 2011.
  • M. Iwayama, and T. Takenobu, “Cluster-based text categorization: A comparison of category search strategies,” Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 1995.
  • H. Frigui, and R. Krishnapuram, “A robust competitive clustering algorithm with applications in computer vision,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 5, pp. 450-465, 1999.
  • S. K. Bhatia, and J. S. Deogun, “Conceptual clustering in information retrieval,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 28, no. 3, pp. 427-436, 1998.
  • J. Hu, B. K. Ray, and M. Singh, “Statistical methods for automated generation of service engagement staffing plans,” IBM Journal of Research and Development, vol. 51, no. 3.4, pp. 281-293, 2007.
  • S. P. Hung, P. Baldi, and G. W. Hatfield, “Global gene expression profiling in Escherichia coli K12. The effects of leucine-responsive regulatory protein,” Journal of Biological Chemistry, vol. 277, no. 43, pp. 40309-40323, 2002.
  • D. C. Cirean, U. Meier, L. M. Gambardella, and J. Schmidhuber, “Deep big, simple neural nets for handwritten digit recognition,” Neural Computation, vol. 22, no. 12, pp. 3207-3220, 2010.
  • A. K. Jain, “Data Clustering: 50 years beyond K-means,” Pattern Recognition Letters, vol. 31, no. 8, pp. 651-666, 2010.
  • C. Ding, and H. Xiaofeng, “K-means clustering via principal component analysis,” Proceedings of the Twenty-First International Conference on Machine Learning, ACM, 2004.
  • A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: A review,” ACM Computing Surveys (CSUR), vol. 31, no. 3, pp. 264-323, 1999.
  • M. Hassaballah, A. A. Abdelmgeid, and H. A. Alshazly, “Image features detection, description and matching,” Image Feature Detectors and Descriptors, pp. 11-45, Springer, Cham, 2016.
  • I. T. Jolliffe, “Principal component analysis and factor analysis,” Principal Component Analysis, Springer, New York, NY, pp. 115-128, 1986.
  • N. Dalal, and B. Triggs, “Histograms of oriented gradients for human detection,” International Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, pp. 886-893, IEEE Computer Society, 2005.
  • MNIST. Available: http://yann.lecun.com/exdb/mnist/
  • T. Kohonen, “The self-organizing map,” Neurocomputing, vol. 21, no. 1-3, pp. 1-6, 1998.
  • J. Vesanto, and E. Alhoniemi, “Clustering of the self-organizing map,” IEEE Transactions on Neural Networks, vol. 11, no. 3, pp. 586-600, 2000.
  • D. Arthur, and S. Vassilvitskii, “k-means++: The advantages of careful seeding,” Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, 2007.
  • A. Gron, Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, O’Reilly Media, Inc., 2017.

Abstract Views: 301

PDF Views: 0




  • Clustering of Hand Written Digits Using K-Means Algorithm and Self Organizing Maps

Abstract Views: 301  |  PDF Views: 0

Authors

Maddimsetti Srinivas
Koneru Lakshmaiah Education Foundation, Vijayawada, Guntur(Dt.), Andhra Pradesh, India
M. Venkata Srinu
Koneru Lakshmaiah Education Foundation, Vijayawada, Guntur(Dt.), Andhra Pradesh, India
G. L. P. Ashok
Koneru Lakshmaiah Education Foundation, Vijayawada, Guntur(Dt.), Andhra Pradesh, India

Abstract


Present work focuses on clustering of MNIST dataset using K-means clustering and Self-Organizing Maps (SOM). Histograms of Oriented Gradients (HOG) descriptors are used to extract the feature vectors and Principal Component Analysis (PCA) is applied on feature vectors to reduce the dimensionality. First two principal components are taken for cluster formation. Purity of cluster metric is used to evaluate the clusters. External criteria with prior information of true class is chosen to validate cluster. The performance of SOM is better than K-means in forming clusters. Out of 10 clusters K-means algorithm missed clusters of 3 digits (0, 7 and 9) whereas SOM missed clusters of 2 digits (5, 9).

Keywords


Clustering, Histograms of Oriented Gradients (HOG), K-Means Clustering, MNIST, Principal Component Analysis, Self Organizing Maps, Unsupervised Learning.

References