Removing all lines containing specific string from a text file with sed
I am working with TAC KBP sequencing dataset which includes brat annotation files. Brat is a nice tool for annotation and visualization but this time all the other unrelated annotations in the dataset made it impossible for me to see my data for the specific task: sequence detection.


So all I need was to get rid of all the other relations in the .ann files. Brat annotation file includes one annotation per line. So if I deleted those lines with coreference annotations the visualization would be a lot nicer.
sed -i '' "/Coreference/d" /path/to/file
Sed did the job for me. Here the `-i` is for inplace replacement/deletion. Of course I needed to run this not just for one file but all the annotation files in the folder.
for file in `ls data/training/`; do if [[ $file == *ann ]]; then sed -i '' "/Coreference/d" data/training/$file ; fi ; done
The visualization still look messy with long distance relations but that helped to tidy it up a little.


I also wanted to get rid of end of lines, cause the source txt files had in-the-middle-of-sentence end of lines everywhere. However this time I needed to replace them with some other character since I need the offsets unchanged. So I replaced all the end of lines with white space. (I needed gnu-sed on OSX to run this second one)
gsed -i ':a;N;$!ba;s/\n/ /g' data/training/$file
References:
How would I use sed to delete the whole line in a text file that contains a specific string?stackoverflow.com