DataTurks: Uploading links of text-files for text annotations.

DataTurks allows you to work with private links to images and text files. Instead of uploading raw data files, you can choose to upload a text file containing URLs to your images or text files.

For images, everything works pretty simply but for text-files, it needs some simple setup:

  • Convert PDF/DOC etc to plain text files.
  • Setup the browser to work on text file links.

Convert files to plain text files: (ETA: 10 minutes once)

Browsers don’t natively support PDF or Doc files and don’t work well with special characters etc in text files. We provide a handy Utility to convert all your files to plain text format and also do pre-processing to make them browser friendly.

Use the below utility to convert all files inside a directory or one file at a time. This is a JAVA utility so you need to have JAVA installed on your computer.

java -jar ./FileConverter.jar <Folder Path To Input Files> <Folder Path To Output Converted Files>

Example:

java -jar ./FileConverter.jar ../al_resumes/ ../all_resumes_converted/

Example output:

Success: Converted /Users/Test/../all_resumes/Abhik_Banerjee_UC_1.docx to /Users/Test/../all_resumes_converted/Abhik_Banerjee_UC_1.docx.txt
Success: Converted /Users/Test/../all_resumes/CEO_resume.pdf to /Users/Test/../all_resumes_converted/CEO_resume.pdf.txt
Success: Converted /Users/Test/../all_resumes/Christof-cv.pdf to /Users/Test/../all_resumes_converted/Christof-cv.pdf.txt
Successfully : Converted 3 files

Now you can upload these converted files to your own servers and create URLs to them. And generate a text file containing URLs which you can upload on Dataturks.

While uploading this file, remember to select the “URLs?” checkbox under “Advanced”.

Setup the browser to work on text file links. (ETA: 2 minutes one-time)

Due to CORS issue, browsers don’t allow web pages to access links outside the page’s domain, so Dataturks.com can’t easily load text files stored at xyz.com (this is not an issue for accessing images though).

The workaround for this is to use a browser extension to allow permissions to Dataturks.com to read files from xyz.com.

For chrome, please install the Allow CORS plugin

Then on the Dataturks website, enable the plugin and allow it access to your text links as shown in below image.

Enabling Chrome CORS Extension

Note for every annotator, this CORS enabling needs to be done as a one-time setup.

All done :)

If you face any issues or have queries with the above please contact us at support@dataturks.com