Optical character recognition technology, computer
to "read" all the world
His cell phone camera at French dishes on the menu, the screen in real time translation good Chinese dishes;Batch scanning books, library books into e-books will all the world;Cars roaming the streets, street scene shooting at the same time, it automatically extracted from the image logo, let the map information more accurate...Behind the scene, is a common key technology, Optical Character Recognition (OCR).
Eye to read the world
Mouse inventor Douglas Engelbart once for short for artificial Intelligence "AI" has come up with another idea -- Augmented Intelligence, enhance Intelligence.In his view, people have been smart enough, we don't need to duplicate the human, but can be from a more practical point of view, will further extend the human intelligence, let the machine to enhance the intelligence of people.Intelligent glasses is such products: put on a pair of go to the supermarket, to see the words on the right goods, automatic search out details: manufacturers, in the price of different electric business platform, and so on.
Make smart glasses "reading" words, is the OCR technology.OCR is essentially using optical devices to capture images, whether today's mobile phone, camera, or the future intelligent wearable devices, as long as there is text, can to recognize.Imagine, at a meeting in future work, as long as the camera phones such as intelligent equipment to the conference board, the discussion of the system can automatically identify the whiteboard content, sorting out the related personnel of the follow-up work, and the backlog items will be automatically stored into their respective electronic calendar.With the support of OCR technology, such a scenario would not dream of.
Microsoft launched last year of Office Lens application you will take a small step to the implementation of the dream.Research in the core of the speech team support, this technology can already through visual computing technology to realize automatic cleaning of the images, and use the OCR technique based on the cloud will to character recognition of images, finally returned to the user an editable, searchable digital files.
Brilliant and challenges
The application of the OCR technology has experienced more than half a century of grope for optimization.In the early 1950 s, IBM began using OCR technology to realize the digitization of all kinds of documents.But earlier OCR equipment large and complex, and can only handle clean under the background of a particular print.In the 1980 s, the birth of the flatbed scanner let into the commercial OCR stage, more portable smart devices, can deal with the number of fonts also increased, but is still very high demands on the background of the text, good image quality are needed to ensure effect.
In the 1990 s, flatbed scanner for printed text recognition rate has reached more than 99%, is the OCR application for the first climax.One of the most famous event is Google digital library at that time, Google also applied to scanning patent, implements the mass of the high speed scan.During this period, the handwriting recognition is also in the parallel development, is widely used for sorting, check digital classification, handwritten form, etc.
Such achievements, once let everybody think OCR technology has reach the limit, but since 2004, has a 3000000 megapixel camera smartphones as of the date of birth, there has been a new development of OCR pursuit: more and more people pick up phone can see things and scene, this text recognition in natural scene difficulty is much higher than flatbed scanner, even print, also can't get high recognition rate, let alone the handwriting.Academia thus will text recognition in natural scene is treated as a new subject.
Text detection under the natural scene
Text recognition in natural scene images are much difficult to scanner character recognition in image, because it has a great diversity and obvious uncertainty.Text's language, the letter font parameters, factors such as text line arrangement and alignment, affects the recognition effect.Due to the randomness shooting images, text in the images of the region also may produce deformation (perspective and affine transformation), incomplete and fuzzy fracture phenomenon.
In addition, compared with the traditional scanning document image of OCR technology, natural scene image background is also more complex.Such as text may write near the surface, text area has a complex texture and noise, the text area in the image is very similar with the text area of texture (such as Windows, leaves, fences, brick wall), and so on.These complex background would greatly increase the rate of error checking.
Usually, OCR steps can be divided into two steps: first is the Text detection, extracted from the Text from image;Then, the text for identification. Microsoft research Asia team on related technologies and algorithms targeted optimization and innovation, the text detection is improved.
Text detection is the first to be cut out from the image of possible words, namely "candidate connected area", and then carries on the words/text classification.A letter or text can be divided into several connected area, usually as the letter "o" is only a connected area, "I" has two connected area.Stage in determining the candidate connected area, we adopted the innovation contrast extremum Region CER (Contrasting Extremal Region), select has certain contrast with the background around the extremum area, through a sharp narrowing the scope of the candidate, to improve the efficiency of the algorithm.
Affected by the noise of the CER example.Image: HuoJiang
Due to the low resolution image fuzzy, or noise is large, and extracted the CER is likely to contain redundant pixels or noise, the existence of redundant pixels or noise will make the back of the text/erbal classification problem becomes more complex.In order to improve the quality of the candidate gained by the connected area, Microsoft research Asia team decided to add a link to enhance the CER algorithm.We use the color of the image information as much as possible to filter out the redundant pixels in the CER or noise, with visual perception consistency of color space.This space is not sensitive to light, closer to the human eye to color judgment.
Example results algorithm extracted candidate connected regions.Image: HuoJiang
When the system obtained high quality candidate start-point, requires to distinguish the characters, to determine whether the words or nonverbal.Microsoft research Asia team is proposed based on a set of shallow text/erbal classification of neural network algorithm, the algorithm is more effective than ever before.This algorithm adopted according to the characteristics of the text itself original problem space partition strategy be divided into 5 sub space, each subspace corresponding type of writing samples, each candidate connected region is divided into the one of the five types.In each subspace, there is a corresponding shallow neural network as the subspace text/verbal classifiers - we can see the neural network as a black box, after a lot of learning, it can be relatively accurate categorizing words with the words.
Problem space partition the sample text.Image: HuoJiang
These improvements has greatly enhanced the OCR recognition in natural scene.Before the text in natural scene detection standard test data set, the industry's best technology can achieve accuracy is 88.5%, and the recall rate was only 66.5%.And in August 2014, Microsoft research Asia team in Swedish capital Stockholm held the international conference on pattern recognition (ICPR) text detection tests were conducted on a natural scene have achieved 92.1% accuracy and recall rate of 92.3%.With the breakthrough of research work, the OCR will be will glow new breeds more exciting applications.