In [1]:
!pwd


/media/sf_vm_shared_folder/git/uat_shl/rnd04ocr

In [2]:
!ls -l


total 13
drwxrwx--- 1 root vboxsf 4096 Dec 29 15:52 image
-rwxrwx--- 1 root vboxsf  251 Dec 29 15:48 out_v001.txt
-rwxrwx--- 1 root vboxsf 2643 Dec 29 15:48 tesseract.ipynb
-rwxrwx--- 1 root vboxsf 2643 Dec 29 15:48 tesseract-iss.ipynb

In [3]:
!tesseract image/ocr.png out -l chi_sim -psm 4 digits


Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Detected 18 diacritics

In [4]:
!cat out.txt


51    

      

2017 4 15                 89400 
    1219 

     52273 .          
    30 1  30         -

     0 11. 00  
    .00 11 30      -300 -200 -100

11  29  55
59400
2011 04 15 11 29 52
59100 59100

    

           51    1     17-8 4   

 


In [ ]:


In [13]:
!tesseract image/ocr-gy.jpg out -l eng -psm 4 digits


Tesseract Open Source OCR Engine v3.04.01 with Leptonica

In [14]:
!cat out.txt


  

1111 118111815 611
11111 1111111231119. 0101.

1121111 131111111 1131116111113 1311195 171193111111 0173111111 1319
0150111901192 3011 1111811 1181118 17183115 11193111111
81161 19336119116111.

141.4 7 2

1 111135 110111 11816 111 1.1 .11 17111 111811 1118111210

5133

321331 1512113 10111131 101 11111 11931 2.151113111111111 11631
3 111 11.111151111131613 111611 2931311011.

15111305911 10 17131111 111191511 91111101117161 1 6311
1181121111 5136311 1.111911 31 1319591111 131111110 53110 31101

1101.1 1101 17131 1193111

1111166 10 1118812 510111

 


In [ ]:


In [11]:
# good
!tesseract image/ocr-gy.jpg out -l eng -psm 4


Tesseract Open Source OCR Engine v3.04.01 with Leptonica

In [12]:
!cat out.txt


  

My name is GU
my nickname Di Di.

My family name W; carries meaning of ’Taking care
of someone’; and given name means ’Reading
and experiencing’.

any , 2

l was born here in u s s’ij: then went to

f“?

E“§ g‘E (China) for my year 2. I spent my year
3 in «n: mg ”a? (New Zealand).

Exposed to many linguistic environment, I can
hardly speak well at present, but I do say to all of

you from my heart:

‘Nice to meet you!

 


In [33]:
# bad
!tesseract image/ocr-gy.jpg out -l eng -psm 6


Tesseract Open Source OCR Engine v3.04.01 with Leptonica

In [34]:
!cat out.txt


illilll‘i‘i 1‘ 1 a I %“W“““’WWW§“ _
g “ ‘ 1 "a: *3 533 .i at , .. V _ l"? M llll , i , ‘ :w
$Wl§3 a :3 3"? £333 w“? y m «W :33 rt .. twill: ii 1 “fly ‘ ‘7 ‘ , ‘ '_ L: E
3g“? [1‘6“ flwflfi“ N3 ‘3 3“ =13 lllrmW "3 ll; ”Em-i “- m :~   f ‘ i 1 H y 1’ ,4 ‘2:
NW name is GU 31793. (Hardy). Many can ' ° h » 3
my nickname Di Di. ‘ ‘ i
My family name W; carries meaning of ’Taking care 7‘__ y a
0f someone’; and given name means ’Reading ‘ :3} H" 5
a nd expe rienci ng’. ‘ ,- fig; 53;“ ABM
l was born here in iii. w :53 v’fj: then went to 3' 7 _
TE; 33 E 3;: (China) for my year 2. I spent my year > -\ x , m
3 in «a a: 333 :3 3"" {I} (New Zealand). _ _ ~ y ,.
Exposed to many lingUIstic envnronment, I can i ' f a;
hardly speak well at present, but I do say to all of :uw: -_ § ‘3; :4
you from my heart: 1 , _V
Nice to meet you! m ‘ «WW. 7* . w .


In [ ]:


In [15]:
# good
!tesseract image/ocr-lt.png out -l eng -psm 4


Tesseract Open Source OCR Engine v3.04.01 with Leptonica

In [16]:
!cat out.txt


22:23 22.8K/s >3 'Zl' ’9‘ 45‘ AI“ Si-- x Emergency c-- @26%

E Product Q E

Overview Product Details Q&A Ratings&

 

XIAOMI Mi Notebook Air 12.5” Gold
(Export)

SSE—999790

Installment: 12 X

(4)

 


In [31]:
# good
!tesseract image/ocr-lt.png out -l eng -psm 6


Tesseract Open Source OCR Engine v3.04.01 with Leptonica

In [32]:
!cat out.txt


22:23 22.8K/s >3 'Zl' ’9‘ 4? Alll Si-- x Emergency c-- @26%
E Product Q !
Overview Product Details Q&A Ratings &
XIAOMI Mi Notebook Air 12.5” Gold

(Export)
869%
Installment: 12 X
(4)
51‘ _ V 4'
’_..‘s°x


In [ ]:


In [35]:
# bad
!tesseract image/ocr-hr.jpg out -l eng -psm 4


Tesseract Open Source OCR Engine v3.04.01 with Leptonica

In [36]:
!cat out.txt


 


In [ ]:


In [39]:
# fair
!tesseract image/ocr-hr1.jpg out -l eng -psm 4


Tesseract Open Source OCR Engine v3.04.01 with Leptonica

In [40]:
!cat out.txt


.3x

”A

455; MW); In «WM


In [45]:
# fair
!tesseract image/ocr-hr1.jpg out -l eng -psm 6


Tesseract Open Source OCR Engine v3.04.01 with Leptonica

In [46]:
!cat out.txt


.Sx : 4622 may); “w “hawk


In [41]:
# fair
!tesseract image/ocr-hr1.jpg out -l eng -psm 7


Tesseract Open Source OCR Engine v3.04.01 with Leptonica

In [42]:
!cat out.txt


.Sx : 4622 may); “w “hawk


In [43]:
# fair
!tesseract image/ocr-hr1.jpg out -l eng -psm 8


Tesseract Open Source OCR Engine v3.04.01 with Leptonica

In [44]:
!cat out.txt


.3x:46$29+wl>51nwwk


In [ ]: