I have had the experience working with tesseract OCR on an iphone application. This project is doable if there are more clarifications.
- How are text presented in the video (is it clear, slanted, partially covered, etc).
- Are videos from a wide variety of types? (This would matter because it means text would be presented in a wider spectrum)
- What language are the text based in? I know tesseract have different recognition rate depending on the language (English being the highest).
- Would we need to train more data for the text? And if so, do you provide all of these sampling images?
In terms of the application itself, do you simply want a program that would analyze each video and then create a database that bookmarks the timeplay in the video with the text that is presented. Then Also use this same program to load the video and search for different text?