November 11, 2003
Importing Data
I had a lovely experience today with Student Stores (no sarcasm here). I went in, asked for a copy of the book listing of Spring 2004 books (which I think is incomplete, those bastards!) ... they said it would cost $0.05 per printed page, and that they couldn't get it in a digital format.
No problem.
I got all of it printed, then I headed over to Undergrad Library to OCR the data at the Collaboratory (WHICH HAS FILM SCANNERS, WHY DID NO ONE TELL ME THIS?!). I OCRed all 63 pages into Word, then converted the Word docs to HTML.
I then came back to the dorm and wrote a quick PHP script that was able to quickly parse all 8 megs of text data (MS Word generates a LOT of crap HTML) and extract all book data!!!!
So now I have all the textbooks that go with each class. W00t!
No problem.
I got all of it printed, then I headed over to Undergrad Library to OCR the data at the Collaboratory (WHICH HAS FILM SCANNERS, WHY DID NO ONE TELL ME THIS?!). I OCRed all 63 pages into Word, then converted the Word docs to HTML.
I then came back to the dorm and wrote a quick PHP script that was able to quickly parse all 8 megs of text data (MS Word generates a LOT of crap HTML) and extract all book data!!!!
So now I have all the textbooks that go with each class. W00t!
Posted by roy on November 11, 2003 at 04:51 PM in | Add a comment
Comment with Facebook
Want to comment with Tabulas?. Please login.