DataTurks On-prem: A fully self-hosted data annotation solution.

--

Supports E2E tagging of data items like images, text, video for your machine learning projects: http://dataturks.com

At Dataturks, we have a super streamlined set of tools and workflows that make it super easy for large teams to collaborate and build high-quality datasets for their ML applications.

Even though the cloud version of Dataturks provides an easy to use platform right from the browser and is been used by more than 10,000 ML practitioners, there are situations where companies need to bring the solution internal to their infra, below is a guide on setting up Dataturks internally.

Highlights:

  • Total time to setup: 15 minutes.
  • Works fully offline, no internet connection required.
  • Docker-based installation: We provide a docker image.
  • Supports Linux, Windows etc.
  • Supports all features as supported on the Dataturks web.

Steps:

  1. Install Docker (skip if you already know how to)

Below is an example of setting up docker on an AWS EC2 instance running Amazon AMI:

[ec2-user ~]$ sudo yum update -y
[ec2-user ~]$ sudo yum install docker -y
#Start the Docker Service
[ec2-user ~]$ sudo service docker start

2. Load Dataturks docker Image:

#Download the image file from the link
[ec2-user ~]$ curl -o dataturks_docker.tar.gz https://s3-us-west-2.amazonaws.com/images.onprem.com.dataturks/dataturks_docker_3_3_0.tar.gz
#Extract the docker
[ec2-user ~]$ tar -xvzf dataturks_docker.tar.gz
#Load dataturks docker image
[ec2-user ~]$ sudo docker load --input ./dataturks_docker.tar
#Start the docker image
[ec2-user ~]$ sudo docker run -d -p 80:80 dataturks/dataturks:3.3.0

Wait for 2–3 minutes for the system to come up. The open your browser and got to http://localhost (or using the server IP, DNS name etc)

3. All done and ready to use.

Now you can click on “Login” to start creating accounts and create a new project etc.

Things to note:

  • No license needed. Older versions of the image needed some license which is no longer required. [Ignore the page which shows a form to enter license]
  • All data you upload and all your metadata are stored in the container. Always make sure to download your data before you destroy the container.
  • Prefer to upload URL of images since all data you directly upload like images etc, is stored locally and hence requires sufficient disk space for the container.

If you have any query or need any help, please contact us at support@dataturks.com

--

--

DataTurks: Data Annotations Made Super Easy

Data Annotation Platform. Image Bounding, Document Annotation, NLP and Text Annotations. #HumanInTheLoop #AI, #TrainingData for #MachineLearning.