Re: OCR and JPG files


 

On Thu, Dec 13, 2018 at 09:44 PM, Flor Lynch wrote:
Many sighted people genuinely don't know the difference between photo-text and 'real' text (as we may call it), unless it is explained to them.
And many don't really get it even after it's explained to them.   Sometimes a demo from a screen reader user who's got an image scanned PDF that's an image of a book or article or similar versus a PDF that is actually text-based can get the point across.

I like the explanation given here that an image PDF (or just a plain image JPG) of text is a photograph of information in writing while an actual text file can be manipulated by editing software directly.  If someone's got Adobe Acrobat proper, not just reader, you can get the point across by having them open an image PDF and try to edit it.  That drives the point home really quickly.

I've worked on multiple occasions with graduate students who are getting reading assignments in PDF format, but often in image scanned PDF format, I have had to teach them how to OCR them outside the screen reader for the sake of long term convenience.  I've also tried to teach them how to teach their instructors about the importance of actually OCR scanning these files and saving the resulting text layer in them so that blind students in the future won't hit this barrier.  It's so simple to do in most cases and once it's done you just use that OCRed version of the file going forward.  In the case of documents like the legal one I described above, someone sighted can print it out, trim off or fold away the template part that is not really actual content, then scan it using OCR when doing so.  If they want to keep the original numbered version for "sighted reference" and the specially prepared version for screen reader users it's simple to do so.
 
--

Brian - Windows 10 Home, 64-Bit, Version 1809, Build 17763  

   Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat, plausible, and wrong.

         ~ H.L. Mencken, AKA The Sage of Baltimore

Join main@jfw.groups.io to automatically receive all group messages.