Google Summer of Code Week 1: Coding begins

苏唯
Shuwei's Recording in GSoC
3 min readMay 20, 2018

Before I start, I wish to highlight that from now onward, I will be continuing my #10WeeksChallenge in the form of GSoC weekly blogs.

So, the coding period for GSoC 2018 began on May 14. This period will last for a total of 12 Weeks (May 14— August 14). This is my second blog regarding Google Summer of Code. If you like, you can read the first one here. Let’s start 😎

1. Deep Speech 2 on PaddlePaddle

This week, the most important breakthrough is that we found an open source program named DeepSpeech2 on PaddlePaddle which better suit Chinese Pipeline rather than Mozilla’s DeepSpeech. DeepSpeech2 on PaddlePaddle is an open-source implementation of end-to-end Automatic Speech Recognition (ASR) engine, based on Baidu’s Deep Speech 2 paper, with PaddlePaddle platform. The paper shows that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech–two vastly different languages. Accordingly, I decided to practice on Deep Speech 2 Project using PaddlePaddle this week. Since we are still not able to run PaddlePaddle on remote server, I started my first attempt on my local mac.

1.1 PaddlePaddle Installation on Local Mac

You can use pip to install PaddlePaddle with a single command. But there are many little problems during installation and cost me a lot of time to fix(See the following Notes for detail).

sudo pip install paddlepaddle
  • Note 1: Make sure that your default python version Python 2.7 series.
  • Note 2: pip only supports manylinux1 standard, you’ll need to upgrade your pip to >9.0.0.
  • Note 3: Use sudo pip instead or you’ll get permission denied error.

1.2 PaddlePaddle Use on Local Mac

Create a new file called housing.py, and paste this Python code:

Run python housing.py and voila! It should print out a list of predictions for the test housing data.

1.3 Deep Speech Installation on Local Mac

  • Make sure these libraries or tools installed: pkg-config, flac, ogg, vorbis, boost and swig,(I installed them via homebrew with proxy):
brew install pkg-config
brew install flac
brew install vorbis-tools
brew install boost
brew install swig
  • Run the setup script for the remaining dependencies.
git clone https://github.com/PaddlePaddle/DeepSpeech.git
cd DeepSpeech
sudo sh setup.sh

Note : Remember to use “sudo” and using “brew install gcc” to install Fortran compiler.

1.4 PaddlePaddle Installation on CWRC HPC

Our project is based on PaddlePaddle framework. However, we ran into some trouble about installing PaddlePaddle on HPC. The reason is that the makers of the Docker image put stuff in /root, which is not accessible in Singularity unless you have root rights on the host machine (see error message below). And that defeats the main purpose in using Singularity with Docker images, i.e. have the students do the work without our intervention and without sudo rights directly on the Case HPC.

The Red Hen is still working on this problem, I hope we can work out ASAP so that we can run PaddlePaddle on the server😊

Reference

PaddlePaddle Book
Deep Speech 2 Github

2. Reading Chinese data on gallina

2.1 Modify the path

Since the server path changed to /mnt/rds/redhen/gallina, I updated my function day() in ~/.bashrc accordingly. The demo is shown as follows.

2.2 Basic commands

$ ssh sxx186@redhen1.case.edu
$ ssh sxx186@rider.case.edu
$ cd /mnt/rds/redhen/gallina/tv/2018/2018-04/2018-04-30
$ ls -al *_CN_*

3. Submit Jobs on CWRU HPC

3.1 First Try Using Batch mode

I tried a little “hello world!” python demo to test my work.

  • Firstly, I wrote a SLURM script
#!/bin/bash#SBATCH -N 1#SBATCH -c 1
#SBATCH --mem-per-cpu=1G
#SBATCH --time=0-00:30:00 # 30 minutes
#SBATCH --output=my.stdout
#SBATCH --mail-user=sxx186@case.edu
#SBATCH --mail-type=ALL
#SBATCH --job-name="just_a_test"
# Put commands for executing job below this line
# This example is loading the default Python module and then
# printing "hello world!"
module load python
python test.py
  • Then submit this batch script
[sxx186@hpclogin SLURM]$ sbatch simple.slurmSubmitted batch job 7800793
  • Run!
[sxx186@hpclogin SLURM]$ python test.pyhello world!

Reference

User support from CWRU
Tutorial from Peking University

4. Conclusion

It was a pleasant first week of Google Summer of Code. If all goes well, I will be able to run Deep Speech 2 model at HPC in the next week. I am strictly following my timeline. The 2nd week ends on May 27. Till then, happy coding and cheers. 😀

--

--