Digital Information Research Foundation
Faculty of Health, Engineering and Science
School of Computer and Security Science
Text in images is a very important clue for image indexing and retrieving. Unfortunately, it is a challenging work to accurately and robustly extract text from a complex background image. In this paper, a novel region-based text extraction method is proposed. In doing so, the candidate text regions are detected by 8-connected objects detection algorithm based on the edge image. Then the non-text regions are filtered out using shape, texture and stroke width rules. Finally, the remaining regions are grouped into text lines. Since stroke width is the intrinsic and particular characteristics of the text, the accuracy of the non-text filter are notably promoted. The improved Stroke Width Transform in the paper is less computing complexities and more accurate. Experimental results on sample ICDAR competition Dataset and our dataset show that the proposed method has the best performance compared with other five methods.