Skip to main content

Learning to recognize handwritten digits

The Digits data set of the Scikit-learn library provides numerous data-sets that are useful for testing many problems of data analysis and prediction of the results. Some Scientist claims that it predicts the digit accurately 95% of the times. Perform data Analysis to accept or reject this Hypothesis.


In this project, we are using the Handwritten Digits dataset which is already ready in the sklearn library. we can import the dataset 


               from sklearn import datasets
               digits = datasets.load_digits()

Info about Dataset:


                print(digits.DESCR)

OUTPUT:




                main_data = digits['data']
                targets = digits['target']
                len(main_data)






%matplotlib inline

plt.subplot(321)
plt.imshow(digits.images[1791], cmap=plt.cm.gray_r,
interpolation='nearest')

plt.subplot(322)
plt.imshow(digits.images[1792], cmap=plt.cm.gray_r,
interpolation='nearest')

plt.subplot(323)
plt.imshow(digits.images[1793], cmap=plt.cm.gray_r,
interpolation='nearest')

plt.subplot(324)
plt.imshow(digits.images[1794], cmap=plt.cm.gray_r,
interpolation='nearest')

plt.subplot(325)
plt.imshow(digits.images[1795], cmap=plt.cm.gray_r,
interpolation='nearest')

plt.subplot(326)
plt.imshow(digits.images[1796], cmap=plt.cm.gray_r,
interpolation='nearest')

OUTPUT:




Support Vector Classifier:


                         from sklearn import svm
                         svc = svm.SVC(gamma=0.001 , C = 100.)
                         svc.fit(main_data[:1790] , targets[:1790])
                         predictions = svc.predict(main_data[1791:])
                         predictions , targets[1791:]
OUTPUT:

          (array([4, 9, 0, 8, 9, 8]), array([4, 9, 0, 8, 9, 8]))

From SVC we get 100% accuracy
Training Data : 1790
Test Data : 6




Decision Tree Classifier:


                    from sklearn.tree import DecisionTreeClassifier
                    dt = DecisionTreeClassifier(criterion = 'gini')
                    dt.fit(main_data[:1600] , targets[:1600])

                    predictions2 = dt.predict(main_data[1601:])
                    from sklearn.metrics import accuracy_score
                    confusion_matrix(targets[1601:] , predictions2
OUTPUT:


       array([[17,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0, 17,  0,  0,  1,  0,  0,  0,  2,  0],
       [ 0,  0, 13,  1,  0,  1,  0,  1,  1,  0],
       [ 0,  2,  2,  9,  0,  3,  2,  4,  0,  0],
       [ 0,  0,  0,  0, 18,  0,  1,  2,  0,  1],
       [ 0,  0,  0,  1,  2, 15,  0,  0,  1,  0],
       [ 0,  0,  0,  1,  2,  0, 19,  0,  0,  0],
       [ 0,  0,  0,  2,  1,  0,  0, 17,  0,  0],
       [ 0,  2,  1,  0,  0,  0,  0,  1, 13,  0],
       [ 0,  1,  0,  0,  0,  0,  0,  2,  1, 16]], dtype=int64)

                    accuracy_score(targets[1601:] , predictions2)
OUTPUT:
           0.7857142857142857

From Decision Tree Classifier we get 78 % Accuracy
Training Data : 1600
Test_data : 197



Random Forest Classifier:



                      

from sklearn.ensemble import RandomForestClassifier


rc = RandomForestClassifier(n_estimators = 150)
rc.fit(main_data[:1500] , targets[:1500])
predictions3 = rc.predict(main_data[1501:])
accuracy_score(targets[1501:] , predictions3)
OUTPUT:

0.9222972972972973

From Random Forest Classifier we get high accuracy for n_estimators = 150
Training data : 1500
Test Data : 297




Conclusion:

Data maters the most we need a good amount of data for modal.if we have a less data then we can use some other machine learning classifier algorithms like random forest which is also give 92 % accuracy on 1500 trainset which is less data compare to Support vector classifier.



As per our hypothesis, we can say with hyperparameter tunning with different machine learning models or using more data we can achieve near 95% accuracy on the handwritten dataset. But make sure we also have a good amount of test data otherwise the model will get overfit.

Comments

Popular posts from this blog

Meteorological Data Analysis

Is there any change due in weather to global warming in of Finland by using Data analytics                                       Effect of  global warming  “Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming” To find whether the average Apparent temperature for the month of a month say April starting from 2006 to 2016 and the average humidity for the same period have increased or not. step-1  Importing of libraries and Dataset.   import libraries step-2   over Look at the dataset. step-3 Cleaning the Dataset step-4 Plotting a graph of  the following Dataset >  Firstly  plot the  graph whole dataset for all months  Graph for all month >  Now    plot graph for a specific month(April) .   Graph for  month of April Conclusion: As we can analyze there isn’t any change in humidity in past 10 years( 2006–2016) for the month of April.  where as , temperature increases sharply in 2009 and drops in 2015 for rest

Mechanical and durability Behavior of fiber reinforced concrete incorporating deferent types of natural, pp and steel fibers

  1. Introduction As an important building material, concrete has been widely used in civil engineering applications such as bridges and roads engineering, and the related experimental study of the mechanical properties of concrete was also fruitful . With the vigorous development of engineering construction, high-performance concretes such as fiber-reinforced concrete was applied gradually in important engineering structures . Among these high-performance concretes, for the advantages of low cost, easy fabrication, and performance improvements, obviously, steel fiber-reinforced concrete was used widely in the current engineering field . However, the study showed that uneven incorporation of steel fiber would affect the fluidity and uniformity of concrete mixing and even result in fiber bonding, which eventually affects the reinforcement effect of mechanical properties. Up to now, most research paid attention on the improvement effect of different types of fiber or optimum fiber conten

catalysis by organometallic compound

1)  what is alkene hydrogenation  ? An alkene  addition reaction is a process  called  hydrogenation . In a hydrogenation reaction  two hydrogen atom  are added  across   the double bond   of an  alkene  resulting  in a saturated  alkene . The heat released is called  heat of hydrogenation .  2) what is  wilkinson's catalyst ?   First effect homogenious  catalyst   is a  square  planner  16 electron  d8 complex  chlorotris (triphenyl phosphine ) rhodium(1) called  wilkinson's catalyst .  3) what do you understand by Tolman catalytic loops ?   A  reaction  involving  a true catalyst  can always be represented  by   a closed loop is called  Tolman catalytic loops.  4)Explain the term hydroformylation  ? Hydroformylation   also called a oxoprocess  or oxo synthesis  It is an industrial process  to prepare  aldehyde from alkanes .  In this process  there is a net addition  of formyl group(-CHO) and  a hydrogen atom to a   C = C  double bond . This process is considered as very impo