Skip to main content

Learning to recognize handwritten digits

The Digits data set of the Scikit-learn library provides numerous data-sets that are useful for testing many problems of data analysis and prediction of the results. Some Scientist claims that it predicts the digit accurately 95% of the times. Perform data Analysis to accept or reject this Hypothesis.


In this project, we are using the Handwritten Digits dataset which is already ready in the sklearn library. we can import the dataset 


               from sklearn import datasets
               digits = datasets.load_digits()

Info about Dataset:


                print(digits.DESCR)

OUTPUT:




                main_data = digits['data']
                targets = digits['target']
                len(main_data)






%matplotlib inline

plt.subplot(321)
plt.imshow(digits.images[1791], cmap=plt.cm.gray_r,
interpolation='nearest')

plt.subplot(322)
plt.imshow(digits.images[1792], cmap=plt.cm.gray_r,
interpolation='nearest')

plt.subplot(323)
plt.imshow(digits.images[1793], cmap=plt.cm.gray_r,
interpolation='nearest')

plt.subplot(324)
plt.imshow(digits.images[1794], cmap=plt.cm.gray_r,
interpolation='nearest')

plt.subplot(325)
plt.imshow(digits.images[1795], cmap=plt.cm.gray_r,
interpolation='nearest')

plt.subplot(326)
plt.imshow(digits.images[1796], cmap=plt.cm.gray_r,
interpolation='nearest')

OUTPUT:




Support Vector Classifier:


                         from sklearn import svm
                         svc = svm.SVC(gamma=0.001 , C = 100.)
                         svc.fit(main_data[:1790] , targets[:1790])
                         predictions = svc.predict(main_data[1791:])
                         predictions , targets[1791:]
OUTPUT:

          (array([4, 9, 0, 8, 9, 8]), array([4, 9, 0, 8, 9, 8]))

From SVC we get 100% accuracy
Training Data : 1790
Test Data : 6




Decision Tree Classifier:


                    from sklearn.tree import DecisionTreeClassifier
                    dt = DecisionTreeClassifier(criterion = 'gini')
                    dt.fit(main_data[:1600] , targets[:1600])

                    predictions2 = dt.predict(main_data[1601:])
                    from sklearn.metrics import accuracy_score
                    confusion_matrix(targets[1601:] , predictions2
OUTPUT:


       array([[17,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0, 17,  0,  0,  1,  0,  0,  0,  2,  0],
       [ 0,  0, 13,  1,  0,  1,  0,  1,  1,  0],
       [ 0,  2,  2,  9,  0,  3,  2,  4,  0,  0],
       [ 0,  0,  0,  0, 18,  0,  1,  2,  0,  1],
       [ 0,  0,  0,  1,  2, 15,  0,  0,  1,  0],
       [ 0,  0,  0,  1,  2,  0, 19,  0,  0,  0],
       [ 0,  0,  0,  2,  1,  0,  0, 17,  0,  0],
       [ 0,  2,  1,  0,  0,  0,  0,  1, 13,  0],
       [ 0,  1,  0,  0,  0,  0,  0,  2,  1, 16]], dtype=int64)

                    accuracy_score(targets[1601:] , predictions2)
OUTPUT:
           0.7857142857142857

From Decision Tree Classifier we get 78 % Accuracy
Training Data : 1600
Test_data : 197



Random Forest Classifier:



                      

from sklearn.ensemble import RandomForestClassifier


rc = RandomForestClassifier(n_estimators = 150)
rc.fit(main_data[:1500] , targets[:1500])
predictions3 = rc.predict(main_data[1501:])
accuracy_score(targets[1501:] , predictions3)
OUTPUT:

0.9222972972972973

From Random Forest Classifier we get high accuracy for n_estimators = 150
Training data : 1500
Test Data : 297




Conclusion:

Data maters the most we need a good amount of data for modal.if we have a less data then we can use some other machine learning classifier algorithms like random forest which is also give 92 % accuracy on 1500 trainset which is less data compare to Support vector classifier.



As per our hypothesis, we can say with hyperparameter tunning with different machine learning models or using more data we can achieve near 95% accuracy on the handwritten dataset. But make sure we also have a good amount of test data otherwise the model will get overfit.

Comments

Popular posts from this blog

4 Ways to Prove the Earth Is Round

Compare shadows: Advertisement The first person to estimate the circumference of the Earth was a Greek mathematician named Eratosthenes, who was born in 276 B.C. He did so by comparing shadows case on the day of the summer solstice in what is today Aswan, Egypt, with the more northerly city of Alexandria. At noon, when the sun was directly overhead in Aswan, there were no shadows. In Alexandria, a stick set in the ground cast a shadow. Eratosthenes realized that if he knew the angle of the shadow and the distance between the cities, he could calculate the circumference of the globe. On a flat Earth, there wouldn't have been any difference between the length of the shadows at all. The sun's position would be the same, relative to the ground. Only a globe-shaped planet explains why the sun's position should be different in two cities a few hundred miles apart.                            Go climb a tree: This is another ...

Chemical reaction and equation

  Chemical reactions -   The transformation of chemical substance into a new chemical substance by making and breaking of bonds between different atoms is known as Chemical Reaction.  Signs of a chemical reaction These factors denote that a chemical reaction has taken place- change of state of substance, change of color of substance,evolution of heat, absorption of heat, evolution of gas and evolution of light. Chemical Equation:   The representation of chemical reaction by means of symbols of substances in the form of formulae is called chemical equation.  E.g. - H 2  + O 2  ⇒ H 2 O                        Balanced Chemical Equation:   A balanced chemical equation has number atoms of each element equal on both left and right sides of the reaction.                                    *No...

Meteorological Data Analysis

Is there any change due in weather to global warming in of Finland by using Data analytics                                       Effect of  global warming  “Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming” To find whether the average Apparent temperature for the month of a month say April starting from 2006 to 2016 and the average humidity for the same period have increased or not. step-1  Importing of libraries and Dataset.   import libraries step-2   over Look at the dataset. step-3 Cleaning the Dataset step-4 Plotting a graph of  the following Dataset >  Firstly  plot the  graph whole dataset for all months  Graph for all month >  Now    plot graph for a specific month(April) .   Graph for  month of April Conclusion: As we ca...