Keeping your copyrights up to date.

tristansokol/copyright-updater

Recently I took the SDI assessment, and in the process I was told that I liked things with structure and order, which really resonated with me when thinking about one of the things that I hate, which is outdated copyrights in code. I also thought this would be a good time to try out learning about some bash tools that I was unfamiliar with like awk and sed.

I also talked with some friendly legal professionals about what the “right” answer is for the copyright year, and like most legal things, it isn’t super straightforward. Then answer we settled on was a copyright date that went from the first year the code was written, up to the last time a significant changed was made, like Copyright 2013 — 2016.

To start out, I use the argument my to my script ($1) to specify which directory I want to update the copyrights for, and then switch into that directory.

#!/bin/bash
# A simple script
echo "looking in " $1
#get the current directory
cd $1

Next, I figure out which branch I’m on. I don’t want to make any changes on master, so I automatically make sure that I’m either on a branch already, or make a new branch to switch around.

#check what git branch we are on
branch=$(git branch | sed -n -e ‘s/^\* \(.*\)/\1/p’)
#if we are on master, switch to a new branch
if [[ $branch==’master’ ]]; then
git checkout -b updating-copyrights
fi

After that I search the entire directory for any instance of copyright, using the-lir option makes grep return file names (l), search case insensitively (i), and recursively (r) through subfolders.

grep copyright ./ -lir | while read -r filename ; do

Now that I’m iterating through all of the files with the word copyright in them, I can start with the real work:

numCommits=$(git log — oneline ${filename} | wc -l)
if [[ “$numCommits” -eq 1 ]]; then
echo $filename
printf ‘\e[1;33m%-6s\e[m\n’ “Has a copyright notice, but only the inital commit”
else

I first check to make sure that this file has more than one commit, this was an added protection to prevent files that have legitimately older copyrights are just being added to the repo.

 lastupdated=$(git log -1 — date=format:’%Y’ — pretty=format:”%cd” $filename)
if [[ $lastupdated ]]; then

Now I start looking at the file, making sure it is tracked in git, as well as looking at the last date the file was committed to.

nextword=$(awk ‘{for(i=1;i<=NF;i++) if ($i==”copyright”||$i==”Copyright”) print $(i+1)}’ $filename)
if [[ $nextword == ‘©’ || $nextword == ‘©’ ]]; then
nextword=$(awk ‘{for(i=1;i<=NF;i++) if ($i==”©”||$i==”©”) print $(i+1)}’ $filename)
fi
if [[ $nextword =~ ^-?[0–9]+$ ]]; then
if [[ $nextword -eq $lastupdated ]]; then
printf ‘\e[1;32m%-6s\e[m\n’ $filename

Then I start checking what the word after “copyright” is and if it is a year, especially the year that the file was was last updated. I also do some output logging to see what the file name is (including some fun colors for the terminal window).

if [[ $nextword -ne $lastupdated ]]; then
printf ‘\e[1;31m%-6s\e[m %s vs %s\n’ $filename $lastupdated $nextword
sed -i ‘’ -e “s@Copyright $nextword@Copyright $nextword — $lastupdated@g” $filename
sed -i ‘’ -e “s@Copyright © $nextword@Copyright $nextword — $lastupdated@g” $filename
sed -i ‘’ -e “s@Copyright © $nextword@Copyright $nextword — $lastupdated@g” $filename
#rm $filename.deleteme Copyright 2014–2016
fi
else
echo $filename
grep -B 1 -A 1 -i copyright $filename

At this time it is just getting a little bonkers with looking at the word after the word copyright and then doing in-line replacements with sed to give a copyright notice that should look like Copyright 2014-2016.

Sometimes the file just talks about the word “copyright” or is an apache license, so I’ print out a notification about that to the console as well.

printf ‘\e[1;33m%-6s\e[m\n’ “None of these were determined to be real copyright notices”
Look at all those copyrights I can update automatically in Retrofit!

That is basically it! I’m sure there is a ton of room for improvement, so please check out the repo, and let your feedback rain over me like a thunderstorm.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.