Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

\uD83D\uDDD3 Date

\uD83D\uDC65 Participants

\uD83D\uDDE3 Discussion topics

Time

Item

Presenter

Notes

Annual meeting update

Carmen Mitchell


Working groups restructuring

Nicole Shibata


New committee/Working group members

Carmen Mitchell

Invite the new members to your June meetings, please. Even though the new terms start in July, it would be good to have a “transition” meeting where you can vote on the chair and set up the new meeting times. (If the old meeting time isn’t convenient for all the new folks.)

Annual report prep

Carmen Mitchell

Could the WG chairs, Dave, and Carmen plan to compile their end-of-year updates by June 10th? This will give us some time to circulate the report and send it to cold. Previous examples are available.

Information item:

Tesseract OCR
https://tesseract-ocr.github.io/

From Mark Bilby

Just read up for the first time the Tesseract 5.0 open source OCR solution after seeing it used in recent Internet Archive digitization batches. We’re about to gear up to digitize thousands of retro theses and had considered using ABBYY for this piece of the workflow, but it looks like Tesseract may be the better way to go. Made me wonder whether other CSUs have ever used it, and if not, whether it might be worth exploring and providing some training. Another approach might be for one campus or the CSUCO to spin up a virtual machine once a week or once a month and run through a batch task where all pdfs or image files from an input group of folders in cloud storage are processed and saved to an output folder.

Update from Mark: Internet Archive also has free OCR and multi-format services for partners:
”We offer free OCR for any texts uploaded to archive.org now - just upload the PDFs from your retro theses and our standard derive process will return OCR (via Tesseract) along with our other post-prod outputs. We can get you set up with a collection for the theses & access to our command line tool (if you don't have it already) for batch processing. Happy to connect you with the right folks if you're interested in pursuing!”
 https://archive.org/services/docs/api/internetarchive/
Contact Mark if you are interested.

✅ Action items

  •  

⤴ Decisions

  • No labels