Agree to move and replace the files of same name alreadyġ. (iii) Copy & paste the 'tessdata' folder and 'tesseract.exe' file from C:\Program Files (x86)\Tesseract-OCR toĬ:\Program Files (x86)\Subtitle Edit\Tesseract. (ii) Download 'Tesseract OCR' (tesseract-ocr-setup-3.02.02.exe) and install. (i) Download and install 'Subtitle Edit'. I have used an hour long HD TS file decrypted and copied to my PC, which has been edited in VideoRedo so as to keep the DVB Subtitles intact. Today, I found the solution which has been in front of me all this time. I've been searching forever for a way to do this, but never found any concrete solutions.lots of suggestions that haven't worked or incredibly elaborate solutions that require all kinds of software but also never seem to work. DVB subs are extremely limited in terms of the ability to play them back. I would like to give back a bit by showing how to quickly and painlessly extract subtitles from an HD TS file decrypted and copied to my PC, and convert to SRT or any other format you so wish. Thanks for all the input and information - I have learned so much. Hope this helps someone else.ĭuplicate each capital i in the Replace box with each pass above.Īnd yes, I am willing to learn a better method, if any, for sure.I've been passively following this incredible forum for a long time and have finally signed up. An additional 3rd run using small L will usually finish all these types of errors, and yes it does take a lot of time. Then repeat as above deciding which ones needs to be changed. Be sure and put capital i in the change to box. Next run again using, in the find next box, I put small L. Go through the whole document using this and correcting all the L/i errors. If this needs to be a capital i, I place a capital i in the change to box: capital i after it. I first start with search for small L in the box, then find next, look at it determine whether needs changing, if not, find next. Next, go to TEXT tab and then "Find and Replace". Using Gaupol, use spell-check first to correct most errors. Now this does take quite a bit of extra time but usually will clean this up. Been working with SRT files for about 2+ years now, and yes, the small L, and the capital i is a pain and my biggest problem. Came here trying to find a better method, and read the post. If you really are hell bent on not using a dictionary, then you need a better algorithm because what you have will not work. I'd be inclined to dump the output of gocr to ispell (or aspell, or. Your proposed algorithm is totally flawed. The former will not spell the word "ill" under your scheme, while the latter will be littered with lower case l.Īs I already indicated, the start of a sentence will break your initial-I rule, as will proper nouns. You said that a word consisting of upper case I only is fine, even if there is more than one. The obvious answer there is to always choose, for example, a capital i even if the first letter recognised is a lower case L. Roman Numerals, but any capital i not at the beginning of a word needs changing.īut it does depend on whether the OCR engine comes across a lower case L or a capital i first, as to what needs changing and what to look for. Phenest wrote:I figured that any word that consists solely of capital i's is ok, e.g.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |