Re: OCR and JPG files


 

On Fri, Dec 14, 2018 at 07:27 AM, Kendra Schafer wrote:
I need to be ale to convert PDFs into something that's accessible.
Which can vary between dirt simple and well-nigh impossible depending on the nature of the source image PDF (photo text - more commonly known as an image PDF in the PDF world) itself.

If it's something like a straight business letter, newspaper article, or the like OCR processing these days generally goes without a hitch.  If, however, it's something like a court document where the actual body text in the original is entered in conjunction with a template containing line numbers down each side of the text then it becomes very ugly indeed as OCR software will think those line numbers are part of the text itself.  Their actual purpose is just to sit there as a place identification aid if two people are talking about the same document over the phone and you need to get someone to the same page and same line on the page very quickly.

I posted the following on the NVDA group quite a while back, but the information is still entirely applicable:  Free & Good OCR Software for Image Scanned PDFs
Even though PDF-XChange Viewer has been discontinued, it is still available for download and works splendidly for the purpose of OCRing image PDFs and saving the resulting text layer permanently, so it's there if you will be revisiting the same files repeatedly.
 
--

Brian - Windows 10 Home, 64-Bit, Version 1809, Build 17763  

   Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat, plausible, and wrong.

         ~ H.L. Mencken, AKA The Sage of Baltimore

Join main@jfw.groups.io to automatically receive all group messages.