Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My first job out of college was scanning books for the Internet Archive down in the basement of the Library of Congress. Their scanning machines used a foot pedal to raise and lower the glass Platen, so I'd use one hand to flip the page and wiggle the cradle to get things nice and flat and the other would snap the photo. You can get pretty fast after a while, but boy is it mindless. Older books that had been rebound a couple times already were the hardest to work with as you have the least amount of margin. There's a bunch of different sized dowels that we would put under the spine in the cradle so the glass could gain a couple millimeters of margin, just enough to avoid cutting off text. Worst case scenario the book had to be unbound in order to capture. I did get to flip through a lot of cool old illustrated catalogues like this: https://archive.org/details/illustratedcatal00keil/page/14/m...


Later on I worked briefly for the Archive. The scanner I designed later became their "ttscribe". It was fascinating to see their process up close.


(I'm the founder of DIY Book Scanner, ran it from 2009 to 2015)


I currently work for the University I study at in the (biomedical) library. I scan lots of old journal articles and periodicals for academics who need it for their research. A typical job might be scanning an article from 1960 on Potato Research for an agriculturalist, or a graph of human energy expenditure, or an analysis of fibres for forensic medicine. We get researchers from all around the world requesting articles on all sorts of topics from our archives. It's a Sandstone university, so we have some very old collections that are definitely getting crumby!

Other than locating the books, by far the most tedious aspect is the scanning. We only have a terrible flatbed scanner, that is completely unforgiving - it only has a 25 page email limit, otherwise you have to split it into separate emails. And if you mis-scan a page accidentally (some of the book margins are super tight), then you have to restart the entire scan - there's no delete page button!


> And if you mis-scan a page accidentally (some of the book margins are super tight), then you have to restart the entire scan - there's no delete page button!

Sounds like you need a license for Adobe Acrobat Pro or some other application that will let you reshuffle/insert pages.


Oh I understand you ! I would have love to have design hints like that for one summer job I had. I was tasked to scan medical books with thousands of pages. I had little time constraints and could do it wherever I wanted. I was paid 80$ per book which I found crazy huge before starting. Even by optimizing every parameter of the hardware, software and my workspace, I couldn’t do more than a few 2-hour sessions per day and it took me several days per book. A boring job if there is one. I would certainly use it as a zen practice today but as a teenager, not really needing the money, I couldn’t find any value in it (after optimization).


Books that aren't rare, ie aren't valuable as artefacts, you would surely cut off the spine and run through an automated scanner?

But then medical texts probably cost way more than $80. How much was your boss making from those scans? Were they taking account of copyright law?


Wow that's awesome! I take it you're responsible for a chunk of the books available now on openlibrary.org?

When scanning books like that did you ever see anything interesting or are you so zoned out you don't really pay attention?


A very small chunk, I only lasted a couple months. Most of it was pretty boring, think volume after volume of copyright records or issues of the national stamp collector's magazine. Eventually I started working on some of the contract work they did for other agencies, e.g. declassified FBI case reports. The best was the stuff for the Smithsonian, which often included beautiful naturalist illustrations. I'm not sure how much of that stuff was public domain though.



Back when this was a more popular problem, I saw a number of projects that used a rubber tipped stick to automate page turning.

I wonder why this never took off?


I'd really like to see this too.

Building a scanner would be interesting, but the mind-numbing idea of turning pages manually isn't very enticing


Were you using gloves or something like that?


Unsure about IA, but gloves are typically advised against[0] unless you have a suspicion that the book will be dangerous (arsenic ink in bindings[1], dust, mold or frass (sadly)[2]). Hand-washing before is typical advice but YMMV

[0] https://www.nationaltrust.org.uk/features/why-wearing-gloves...

[1] https://daily.jstor.org/some-books-can-kill/

[2] https://www.ifla.org/node/93094


I get the points against gloves in your links, but from my experience my hands will constantly leave fingerprints and oil on the books no matter how much I wash it.

How do they avoid this issue? Or just don't bother?


The advice I was given when handling some of the books at the university of toronto's rare book collection was to simply not directly touch the parts that matter if it can be avoided. Turn pages via their margins.


Ripping is usually more of a problem than some finger oil and gloves makes you less sensitive and more likely to rip or destroy stuff.


>” Ripping is usually more of a problem..”

Accidentally ripping a page while turning the pages too quickly.


Nope. I'm not sure if this is the right way to put it, but we were basically a scanning factory. I think the really sensitive documents got routed elsewhere. Books and folios that had large format foldouts got more specialized, ahem, white glove treatment. Many scanners wore these little textured rubber finger tips: https://rubber_finger_tips.jpg.so/


That sounds incredible




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: