Skip to main content

Learning to recognize handwritten digits

The Digits data set of the Scikit-learn library provides numerous data-sets that are useful for testing many problems of data analysis and prediction of the results. Some Scientist claims that it predicts the digit accurately 95% of the times. Perform data Analysis to accept or reject this Hypothesis.


In this project, we are using the Handwritten Digits dataset which is already ready in the sklearn library. we can import the dataset 


               from sklearn import datasets
               digits = datasets.load_digits()

Info about Dataset:


                print(digits.DESCR)

OUTPUT:




                main_data = digits['data']
                targets = digits['target']
                len(main_data)






%matplotlib inline

plt.subplot(321)
plt.imshow(digits.images[1791], cmap=plt.cm.gray_r,
interpolation='nearest')

plt.subplot(322)
plt.imshow(digits.images[1792], cmap=plt.cm.gray_r,
interpolation='nearest')

plt.subplot(323)
plt.imshow(digits.images[1793], cmap=plt.cm.gray_r,
interpolation='nearest')

plt.subplot(324)
plt.imshow(digits.images[1794], cmap=plt.cm.gray_r,
interpolation='nearest')

plt.subplot(325)
plt.imshow(digits.images[1795], cmap=plt.cm.gray_r,
interpolation='nearest')

plt.subplot(326)
plt.imshow(digits.images[1796], cmap=plt.cm.gray_r,
interpolation='nearest')

OUTPUT:




Support Vector Classifier:


                         from sklearn import svm
                         svc = svm.SVC(gamma=0.001 , C = 100.)
                         svc.fit(main_data[:1790] , targets[:1790])
                         predictions = svc.predict(main_data[1791:])
                         predictions , targets[1791:]
OUTPUT:

          (array([4, 9, 0, 8, 9, 8]), array([4, 9, 0, 8, 9, 8]))

From SVC we get 100% accuracy
Training Data : 1790
Test Data : 6




Decision Tree Classifier:


                    from sklearn.tree import DecisionTreeClassifier
                    dt = DecisionTreeClassifier(criterion = 'gini')
                    dt.fit(main_data[:1600] , targets[:1600])

                    predictions2 = dt.predict(main_data[1601:])
                    from sklearn.metrics import accuracy_score
                    confusion_matrix(targets[1601:] , predictions2
OUTPUT:


       array([[17,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0, 17,  0,  0,  1,  0,  0,  0,  2,  0],
       [ 0,  0, 13,  1,  0,  1,  0,  1,  1,  0],
       [ 0,  2,  2,  9,  0,  3,  2,  4,  0,  0],
       [ 0,  0,  0,  0, 18,  0,  1,  2,  0,  1],
       [ 0,  0,  0,  1,  2, 15,  0,  0,  1,  0],
       [ 0,  0,  0,  1,  2,  0, 19,  0,  0,  0],
       [ 0,  0,  0,  2,  1,  0,  0, 17,  0,  0],
       [ 0,  2,  1,  0,  0,  0,  0,  1, 13,  0],
       [ 0,  1,  0,  0,  0,  0,  0,  2,  1, 16]], dtype=int64)

                    accuracy_score(targets[1601:] , predictions2)
OUTPUT:
           0.7857142857142857

From Decision Tree Classifier we get 78 % Accuracy
Training Data : 1600
Test_data : 197



Random Forest Classifier:



                      

from sklearn.ensemble import RandomForestClassifier


rc = RandomForestClassifier(n_estimators = 150)
rc.fit(main_data[:1500] , targets[:1500])
predictions3 = rc.predict(main_data[1501:])
accuracy_score(targets[1501:] , predictions3)
OUTPUT:

0.9222972972972973

From Random Forest Classifier we get high accuracy for n_estimators = 150
Training data : 1500
Test Data : 297




Conclusion:

Data maters the most we need a good amount of data for modal.if we have a less data then we can use some other machine learning classifier algorithms like random forest which is also give 92 % accuracy on 1500 trainset which is less data compare to Support vector classifier.



As per our hypothesis, we can say with hyperparameter tunning with different machine learning models or using more data we can achieve near 95% accuracy on the handwritten dataset. But make sure we also have a good amount of test data otherwise the model will get overfit.

Comments

Popular posts from this blog

The manufacture of Biodiesel from used cooking oil

ABSTRACT                      The increasing awareness of the depletion of fossil fuel resources and the environmental benefits of biodiesel fuel has made it more attractive in recent times. Its primary advantages deal with it being one of the most renewable fuels currently available and it is also non-toxic and biodegradable. It can also be used directly in most diesel engines without requiring extensive engine modifications. However, the cost of biodiesel is the major hurdle to its commercialization in comparison to petroleum-based diesel fuel. The high cost is primarily due to the raw material, mostly neat cooking oil. Used cooking oil is one of the economical sources for biodiesel production. However, the products formed during frying, can affect the transesterification reaction and the biodiesel properties.     The production of biodiesel from waste cooking oil offers a triple-facet solution: economic, environmen...

Mechanical and durability Behavior of fiber reinforced concrete incorporating deferent types of natural, pp and steel fibers

  1. Introduction As an important building material, concrete has been widely used in civil engineering applications such as bridges and roads engineering, and the related experimental study of the mechanical properties of concrete was also fruitful . With the vigorous development of engineering construction, high-performance concretes such as fiber-reinforced concrete was applied gradually in important engineering structures . Among these high-performance concretes, for the advantages of low cost, easy fabrication, and performance improvements, obviously, steel fiber-reinforced concrete was used widely in the current engineering field . However, the study showed that uneven incorporation of steel fiber would affect the fluidity and uniformity of concrete mixing and even result in fiber bonding, which eventually affects the reinforcement effect of mechanical properties. Up to now, most research paid attention on the improvement effect of different types of fiber or optimum fiber co...

Chemical reaction and equation

  Chemical reactions -   The transformation of chemical substance into a new chemical substance by making and breaking of bonds between different atoms is known as Chemical Reaction.  Signs of a chemical reaction These factors denote that a chemical reaction has taken place- change of state of substance, change of color of substance,evolution of heat, absorption of heat, evolution of gas and evolution of light. Chemical Equation:   The representation of chemical reaction by means of symbols of substances in the form of formulae is called chemical equation.  E.g. - H 2  + O 2  ⇒ H 2 O                        Balanced Chemical Equation:   A balanced chemical equation has number atoms of each element equal on both left and right sides of the reaction.                                    *No...