Applied Bioinformatics — Homework 4

This is what I have done so far. (I will come back to this)

•Install Entrez Direct on your computer

•Obtain the Genome sequence corresponding to the 1978 outbreak of the Ebola Virus.

•Obtain a fasta file that contains all the protein sequences as published in the Ebola virus genome sequencing paper. We know the accession number of the project is PRJNA257197. How many sequences have you got?

Lessons Learned

Installing Enterez Direct was pretty easy. I learned how to set the path to run the program.

I got the error message that the computer couldnt find the esearch command. I googled and found out my path wasnt set to enterez direct.

echo $PATH
export PATH=$PATH:/path/to/my/program

I used the below command to find the file path of enterez direct.

readlink -f esearch

I think my computer doesn't have enough memory to complete this homework assignment. I got a segmentation fault core dumped message. I will circle back to this. However, I did see that are 249 sequencing runs.

