Learning to recognize handwritten digits

The Digits data set of the Scikit-learn library provides numerous data-sets that are useful for testing many problems of data analysis and prediction of the results. Some Scientist claims that it predicts the digit accurately 95% of the times. Perform data Analysis to accept or reject this Hypothesis.

In this project, we are using the Handwritten Digits dataset which is already ready in the sklearn library. we can import the dataset

               from sklearn import datasets

               digits = datasets.load_digits()
Info about Dataset:

                print(digits.DESCR)

OUTPUT:




                main_data = digits['data']
                targets = digits['target']
                len(main_data)






%matplotlib inline

plt.subplot(321)
plt.imshow(digits.images[1791], cmap=plt.cm.gray_r,
interpolation='nearest')

plt.subplot(322)
plt.imshow(digits.images[1792], cmap=plt.cm.gray_r,
interpolation='nearest')

plt.subplot(323)
plt.imshow(digits.images[1793], cmap=plt.cm.gray_r,
interpolation='nearest')

plt.subplot(324)
plt.imshow(digits.images[1794], cmap=plt.cm.gray_r,
interpolation='nearest')

plt.subplot(325)
plt.imshow(digits.images[1795], cmap=plt.cm.gray_r,
interpolation='nearest')

plt.subplot(326)
plt.imshow(digits.images[1796], cmap=plt.cm.gray_r,
interpolation='nearest')

OUTPUT:




Support Vector Classifier:


                         from sklearn import svm
                         svc = svm.SVC(gamma=0.001 , C = 100.)
                         svc.fit(main_data[:1790] , targets[:1790])
                         predictions = svc.predict(main_data[1791:])
                         predictions , targets[1791:]
OUTPUT:

          (array([4, 9, 0, 8, 9, 8]), array([4, 9, 0, 8, 9, 8]))
From SVC we get 100% accuracy
Training Data : 1790
Test Data : 6



Decision Tree Classifier:


                    from sklearn.tree import DecisionTreeClassifier
                    dt = DecisionTreeClassifier(criterion = 'gini')
                    dt.fit(main_data[:1600] , targets[:1600])

                    predictions2 = dt.predict(main_data[1601:])
                    from sklearn.metrics import accuracy_score
                    confusion_matrix(targets[1601:] , predictions2
OUTPUT:


       array([[17,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0, 17,  0,  0,  1,  0,  0,  0,  2,  0],
       [ 0,  0, 13,  1,  0,  1,  0,  1,  1,  0],
       [ 0,  2,  2,  9,  0,  3,  2,  4,  0,  0],
       [ 0,  0,  0,  0, 18,  0,  1,  2,  0,  1],
       [ 0,  0,  0,  1,  2, 15,  0,  0,  1,  0],
       [ 0,  0,  0,  1,  2,  0, 19,  0,  0,  0],
       [ 0,  0,  0,  2,  1,  0,  0, 17,  0,  0],
       [ 0,  2,  1,  0,  0,  0,  0,  1, 13,  0],
       [ 0,  1,  0,  0,  0,  0,  0,  2,  1, 16]], dtype=int64)

                    accuracy_score(targets[1601:] , predictions2)
OUTPUT:
           0.7857142857142857
From Decision Tree Classifier we get 78 % Accuracy
Training Data : 1600
Test_data : 197


Random Forest Classifier:


                      

               from sklearn.ensemble import RandomForestClassifier
               

               rc = RandomForestClassifier(n_estimators = 150)
               rc.fit(main_data[:1500] , targets[:1500])
               predictions3 = rc.predict(main_data[1501:])
               accuracy_score(targets[1501:] , predictions3)
OUTPUT:

          0.9222972972972973
From Random Forest Classifier we get high accuracy for n_estimators = 150
Training data : 1500
Test Data : 297



Conclusion:
Data maters the most we need a good amount of data for modal.if we have a less data then we can use some other machine learning classifier algorithms like random forest which is also give 92 % accuracy on 1500 trainset which is less data compare to Support vector classifier.


As per our hypothesis, we can say with hyperparameter tunning with different machine learning models or using more data we can achieve near 95% accuracy on the handwritten dataset. But make sure we also have a good amount of test data otherwise the model will get overfit.

Read With SRS

Search This Blog

Learning to recognize handwritten digits

Info about Dataset:

Random Forest Classifier:

Conclusion:

Labels

Comments

Post a Comment

Popular posts from this blog

Chemical reaction and equation

The manufacture of Biodiesel from used cooking oil

Mechanical and durability Behavior of fiber reinforced concrete incorporating deferent types of natural, pp and steel fibers