Very interesting. I was just working on a similar problem I wrote about here. Although our stories started very similar using a full size U-Net and noticing heavy overfitting, we then took different directions.
Dilated convolutions didn’t help me when applied to U-Net. Using them without any downsampling even hurt performance, because I had to limit the size of the architecture. I also ended up with 200k parameters, but I took an almost vanilla U-Net that used 32 output channels everywhere. I noticed that I needed the depth for performance, but the “double the channels”-rule resulted in overfitting. Static 32 channels and 0.5 dropout in the middle was my best performing architecture.
Other than that you talk a lot about the receptive field. I don’t have the reference at hand, but there was a paper that showed the practical receptive field to be smaller than the theoretical. By the way, how did you calculate the receptive field of U-Net to be 68px? My best guess would have been 80x80.
Anyway, thanks for this amazing article. All the best!