Coding Period Week 13(30th August —6th September)

Saksham Gautam
2 min readSep 6, 2023

--

Goals —

  1. Get access to the gallo server and setup the environment.
  2. Database setup
  3. Batch job scheduler pipeline

Progress —

  1. During the meeting held on September 1st, I got the access to the Gallo server. The mentors generously set up a functional workspace, installed crucial project dependencies, and created a MySQL database specifically for the anonymization process.
  2. Just after the meeting, I completed the python environment setup using virtual environment.
  3. Subsequently, transferred the webpage code to the gallo server and with the slight modifications, the web-app was running. Currently we are using the flask development server. Once the back-end development process is completed, we will shift to apache server.
  4. Next I edited the flask app code a little to eliminate the need of mysqladmin to make new tables in the database.
  5. To view the webapp on the local computer, map the port on which the server is running on the remote host to that of the local computer
ssh -L port1:localhost:port2 user@server

5. As previously discussed gallo doesn’t has the gpu. So the processing will be done on the hpc8 while the web-app will be hosted on the gallo. We will need to build some pipeline such that when a project is created and submitted by the user, the batch jobs should be automatically scheduled. The database should be updated with the path of the output videos. From the gallo server, we only have access to the gallinahome. So we will leverage this fact.

7.Established a directory within gallinahome called “anonymizer_storage,” which includes two subdirectories: “input” for storing incoming files and “output” for storing the processed files.

7. Proposed method —
1. User-uploaded files are stored in a sub-directory, named after the project, within the “input” directory. Anonymization parameters for each project are saved in a corresponding .json file within this directory.
2. A scheduler script scans the “input” directory and transfers files, project by project, to the /scratch/users/ directory.
3. From here the job is scheduled. The files are copied into the $TMPDIR and are processed.
4. Processed files are then returned to the “output” directory within a subdirectory bearing the project’s name.
5. A cron job runs this scheduler the script every 60 minutes.
6. It should be noted that we are updating the path of the processed files in the mysql database even before the processing starts. It’s like we are pre-deciding the location where the output files will be stored after the processing.

8. I’ve successfully completed setting up this pipeline. However an issue arises when more than one job is scheduled simultaneously. One of the jobs encounter the directory not found error. This is peculiar because when the jobs are executed one after the other they both produce the output.

9. As far as I know we should be able to schedule two parallel batch jobs. I will confirm this fact in the upcoming meeting.

--

--