Re: textbook scanning for teaching purposes



            Thanks for the kind words.  You are in a uniquely thorny position as far as scanning goes because you're trying to work in material that is in two separate languages.  OCR has gotten extremely good, even on pretty sketchy copy, for material in a single language but I don't know whether the same can be said for things in two languages.  As you know from our private exchanges, I've tried scanning one of the books you refer to in both Swedish and English and neither gives satisfactory results in both languages at once.

            This is an instance where you could be the avant garde for your own needs and those of others, too.  I know you've given Tracker Software's PDF XChange Viewer a try, and what they provide for free is pretty remarkable, but not enough to get what you need.  I do not know whether one of their paid products might work, but it would be worth getting in touch with them to ask about that.  If they claim that one would, I think it would be entirely reasonable to ask them to process a single file using that software and returning the result to you so that you can actually evaluate the result before considering purchase.  I can't believe that the need for bilingual OCR is frequent, but it certainly is something that's going to occur for someone other than yourself.  I would be shocked if someone has not developed something that supports this, but I'd have to do the same digging as you will to determine who.

             As to finding the right assistant, no matter how much we love our loved ones and they love us in return, they're generally not the best option for several reasons.  This is particularly so if someone is easily frustrated with technology that doesn't function perfectly all the time.  A big part of it all is dealing with the inevitable issues that arise when you are putting software layer upon software layer and expecting them all to seamlessly communicate with each other.  While we've come a long way in that department, you know better than most that we're not "there yet" as far as true seamlessness goes.

             These days you can pretty much configure monolingual OCR to handle most of the formatting eventualities you typically see in print.  Most can be easily set up to handle columns like in a newspaper, or tables.  There are still problems, though, when a table is presented in columns but with no structure surrounding the table itself.  Then the software has to make a guess as to whether it's something in "newspaper columnar" format or "table row and columnar" format and that's not easy to do absent clear delimiters that suggest one versus the other.


