archive.org: Uploading books & magazines

During the past two years I’ve added many magazine articles stored at the Internet Archive (archive.org) to the DIX resource index. The Internet Archive is a treasure trove for anyone who is interested in old magazines— like me, searching for articles about Disney and the company’s history.

Although I personally do not own many sources like magazines myself, every now and then I want to share something with the community. The Internet Archive is the perfect location for documents which I think are important for all — hobby or more — Disney historians.

Since the upload to the Internet Archive tends to be a little tricky and I was not able to find a complete documentation for that process I decided to write this short tutorial.

Scanning pages

Scans for the archive should have a resolution of at least 300dpi, be cropped, deskewed, and rotated. Scans containing double pages should be split in one image file per page.

File naming

Name your image files numerically starting from 0 for the first page. For instance 000.jpg, 001.jpg, 002.jpg, 003.jpg, etc. For this purpose I use XnView MP, a free media browser, viewer and converter which among many other things is capable of bulk renaming using patterns (like ### for 3 digits and leading zeros).

Packing files for upload

Create a zip file containing all your images. The filename of your zip file is very important: it must end with “_images.zip”, e.g. “MyMagazine_images.zip”. This is the trigger which initiates the automatic creation of various derivatives based on your upload. The “MyMagzine” part of the filename will later be used for the naming of derivatives like PDF or ePub files.

Preparing the upload

The entry point for uploads the the Internet Archive is http://archive.org/upload

If you plan to upload a number of similar items like multiple issues of the same magazine you could preset some meta data using parameters in the request URL. Use the URL as a bookmark and these values will preset some form fields in the next step and may be changed/extended later.

Some parameters I use (with sample values):
collection=opensource
language=eng
title=Disney News Magazine
creator=Walt Disney Productions
subject=Disney,Disneyland,Disney News Magazine

Resulting URL for the sample values above: http://archive.org/upload/?collection=opensource&language=eng&title=Disney News Magazine&creator=Walt Disney Productions&subject=Disney,Disneyland,Disney News Magazine

Fields like the Page Title and also the Page URL may be changed and/or extended to reflect the specific item you are uploading. The Date should be set to the year (and month) of the item’s publication. The Page URL will end with the string ‘images’, which you should remove.

Although there are “collections” at archive.org which could be used to organize similar items like issues of the same magazine they may only be created by archive.org admins on request if someone has created more than 50 items to be grouped.
As alternative I’d advice to add a grouping key phrase in the Subject Tags field (like “Disney News Magazine” in the sample above).

Start upload … and wait

After setting all meta data values you simply hit the upload button and after the upload is finished the archive.org system starts to process it. It re-packs the zip files contents, does some security checks, processes the images by OCR and generates new formats like PDFs, ePub, plain text, and also the book reader format which makes the item readable right in the browser.

The derivative process may take some time depending the size of your upload.

Summary

Long story short:

  • Sort and name images numerically (eg. 000.jpg, 001.jpg, 002.jpg, ect.)
  • Create a zip file containing your images
    Name of the zip file must end with “_images.zip”; eg. MyBook_images.zip
    Name of the zip file will also be used to name derived files like PDFs (eg. MyBook.pdf)
  • Start Upload to archive.org => http://archive.org/upload
  • Don’t forget to set meta data

The DIX project (DIX = Disney index) collects and indexes various sources like podcast, websites, magazines or books with references to the history of Disney animation, films and theme parks.

Check the main website http://www.dix-project.net/ for a search able index of more than 2700 articles and podcasts. Also check out the DIX Twitter and Pinterest accounts.