#slfjslf i'm going to need this again so i'm putting this here
Explore tagged Tumblr posts
Text
Tesseract
you can loop through images and turn them into text files and searchable pdfs!!
excited rant + such below
look at my sexy sexy txt file!!
1. Brandt Aymar, The Young Male Figure in Paintings, Sculptures, and Drawings From Ancient Egypt to the Present (New York, 1972), 132. Among the author’s other fascinating publications listed at the beginning are Cruising is Fun and The Complete Cruiser.
2. See the exhibition catalogue Hippolyte, Auguste et Paul Flandrin: Une fraternité picturale au XIXe siécle Paris (Musée du Luxembourg, 1984), no. 14, 70-71, where a complete description of the painting’s genesis and some of its sub- sequent influence is recounted.
3. Judith Butler, “Imitation and Gender Insubordiation,” Inside/Out: Lesbian Theories, Gay Theories, ed. Diana Fuss (New York and London, 1991), 20-23.
4. Tbid. “Ses heureux débuts s’il [Flandrin] n’eut pas affecté de la resserrer, de maniére a ne representer qu’un modgle accroupi, comme certaines figures égyptiennes; il eut mieux fait de lui donner plus de développement.”
i only downloaded the english trained data too and it was still able to do so well on the french too! i mean it wrote Tbid instead of Ibid and modgle instead of modèle, but it’s already so perfect! such a good ocr so very good for me the sweetest sweetest bean. free open source stuffs are the loveliest things
anyways if you want to get in on the jpg to txt/searchable pdf action, it’s my first time using commandline for Personal Reasons if you’re a bit confused like i am abt it all, have my horrible tutorial:
follow the downloading inst here :
https://tesseract-ocr.github.io/tessdoc/Compiling.html
and then if you click on this link, you can download the eng.traineddata and drag it to your tessdata folder :
https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata
then you’ve got to go to commandline and set your TESSDATA_PREFIX to be the path to your tessdata folder. i did :
export TESSDATA_PREFIX=/Users/ghostplantss/tesseract/tessdata
but you’ll have to change the path to suit your thingamajig.
now all you’ve got to do is make a loop! so um. here, i’m going through all jpgs in my trial folder and turning them into a txt file. so sth that’s a.jpg would come out as a-eng.txt
but you’ve got to change the path + you can change jpg to any other image file type you’d like
for a txt file:
for f in /Users/ghostplantss/trial/*.jpg; do tesseract $f $f-eng -l eng; done
for a pdf file:
for f in /Users/ghostplantss/trial/*.jpg; do tesseract $f $f-eng pdf eng; done
#slfjslf i'm going to need this again so i'm putting this here#i'm! so bad at command line this is why i almost failed 240
2 notes
·
View notes