Automatically Recognize Crops from Landsat by U-Net, Keras and Tensorflow

Ziheng Jensen Sun
Artificial Intelligence In Geoscience
6 min readAug 29, 2019

How many crops can you recognize? For many people, most of the crops are just “plants” and distinguishing them will be like an investigation. Information about what is growing in the crop field is very sensitive for people such as agricultural policy makers, insurance companies, farmers, food retailers, etc. Knowing the information in time will assist them to adjust their strategies on policy making, insurance pricing, growing decision making more wisely and efficiently.

Crop maps are the most direct way for people to get such information. Remote-sensing-based crop mapping is one of the major methods to produce crop maps. USDA scientists have used remote sensing observations as input and use image classification models to produce seamless crop maps like CDL (cropland data layer). However, one major disadvantage of traditional image classification method is that the trained models can not generalize well. Thus, one popular research direction is to find a model which can be reused many times on new data to derive crop maps without any ground truth assistance. Machine learning, especially deep learning, is involved, which are supposed to be better universally applicable after being trained.

Cropland Data Layer (USDA NASS, from CropScape) Yellow: Corn; Green: Soybean

We proposed this Ag-Net as an ESIP GSoC project idea and got a lot of contributions from our talented students. You can check out their results in this Ag-Net GitHub repository (including Jupyter Notebooks). In this article we will conclude from these results the common procedure on how to use U-Net, Keras (Tensorflow-based) to automatically classify Landsat images into crop maps in Python. It basically includes three steps:

  1. U-Net Model Creation

There are some different versions of U-Net with various configuration. In this case, the built U-Net is supposed to take 7 bands of Landsat as inputs and output a one-band image with the same size (width*height). The mapping is image-to-image which translate every pixel in Landsat images into a crop pixel with a value indicating its type. Based on Keras architecture, example code is shown below:

from keras.activations import softmax
from keras.layers.core import Layer, Dense, Dropout, Activation, Flatten, Reshape, Permute
from keras.layers import Input, merge, Convolution2D, MaxPooling2D, UpSampling2D, Reshape, core, Dropout,GlobalMaxPooling2D
from keras.layers import Add,Multiply
from keras import backend as K
def conv2d_block(input_tensor, n_filters, kernel_size=3,strides=1,batchnorm=True):
x = Conv2D(filters=n_filters, kernel_size=(kernel_size, kernel_size),strides=(strides, strides),padding="same", kernel_initializer="he_normal")(input_tensor)
if batchnorm:
x = BatchNormalization()(x)
x = LeakyReLU(alpha=0.1)(x)
return x
def encoder_decoder(x1,ni, kernel_size=3, batchnorm=True,times=None):
x=GlobalMaxPooling2D()(x1)
x=Reshape(target_shape=(1,1,times))(x)
x=Conv2D(filters=ni//2, kernel_size=(kernel_size, kernel_size),padding="same", kernel_initializer="he_normal")(x)
x=LeakyReLU(alpha=0.1)(x)
x=Conv2D(filters=ni, kernel_size=(kernel_size, kernel_size),padding="same", kernel_initializer="he_normal")(x)
x=Activation('sigmoid')(x)

x2=Conv2D(filters=ni, kernel_size=(kernel_size, kernel_size),padding="same", kernel_initializer="he_normal")(x1)
x2=Activation('sigmoid')(x2)


x11=Multiply()([x1,x2])
x12=Multiply()([x1,x])
x13=Add()([x11,x12])

return x13

def DownBlock(x,ni,nf, kernel_size=3, batchnorm=True,down=None):
inp=x
x=conv2d_block(x,nf,3,2)
x=conv2d_block(x,nf,3)
x=Add()([x,conv2d_block(inp,nf,3,2)])
if down is not None:
return encoder_decoder(x,nf, kernel_size=3, batchnorm=True,times=128)
else:
return x

def UpBlock(down,cross,ni,nf, kernel_size=3, batchnorm=True,down1=None):
x=Conv2DTranspose(filters=nf, kernel_size=(3, 3),strides=(2,2),padding="same", kernel_initializer="he_normal")(down)
print(x)
print(cross)
x=concatenate([x,cross])
x=conv2d_block(x,nf,3)
if down1 is not None:
return encoder_decoder(x,nf, kernel_size=3, batchnorm=True,times=256)
else:
return x
def get_unet(input_img, n_filters=128, dropout=0.15, batchnorm=True):
d1=DownBlock(input_img,7,128,3, True,12)
d2=DownBlock(d1,128,256)
d3=DownBlock(d2,256,512)
d4=DownBlock(d3,512,1024)
u1=UpBlock(d4,d3,1024,512)
u2=UpBlock(u1,d2,512,256,3,True,12)
u3=UpBlock(u2,d1,256,128)

outputs = Conv2DTranspose(filters=255, kernel_size=(3,3),strides=(2,2),padding="same", kernel_initializer="he_normal")(u3)
outputs = core.Reshape((128*128,255))(outputs)
outputs = core.Activation('softmax')(outputs)

model = Model(inputs=[input_img], outputs=[outputs])
return model

2. Training Data Preparation

We used the Ag-Net-Dataset generated from the Landsat 8 Scene captured on Sep 2, 2015. It has four folders: input, input_32bit, input_rgb, and target. The input folder contains all the seven bands of the scene at 16 bits. The input_32bit contains all the seven bands of the scene at 32 bits. The input_rgb contains the true-color tiles of the scene. People can choose any of the three input folders to pair with the target folder to build their training dataset. All the images in the input folder and target folders are exactly matched by location. The spatial resolution is exactly the same (30 meter).

Landsat RGB Visual Image
Corresponding Crop Map

3. Learning

One of the most challenging jobs in training the U-Net is streaming the images into U-Net. There are certain ways to do so. Python provides very easy-to-use method to read multiple-dimension arrays from image files. The arrays (numpy arrays) will be reshaped into a specific matrix format, e.g., (batch_size, band_count, width, height, classes). The corresponding output array will be reshaped into some similar shapes with categorized probability array. The mapping between input arrays and output arrays must be exactly matched pixel by pixel. Otherwise, the training will be void.

def get_data(ids,batch_size):
while True:
ids_batches = [ids[i:min(i+batch_size,len(ids))] for i in range(0, len(ids), batch_size)]
#ids_out_batches = [ids_out[j:min(j+batch_size,len(ids_out))] for j in range(0, len(ids_out), batch_size)]
# Load images
for b in range(len(ids_batches)):
#print(b)
#print(":")
#print(ids_batches[b])
k=-1
X = np.zeros((len(ids_batches[b]), im_height, im_width, 7), dtype=np.float32)
y = np.zeros((len(ids_batches[b]), im_height*im_width, 255), dtype=np.float32)
for c in range(len(ids_batches[b])):
k=k+1
#temp1=np.zeros((1,7),dtype=np.float32)
for r in range(1,7):
img = load_img(path_train_input + 'lc8' + ids_batches[b][c][3:-4] + '_' + str(r) + '.tif', color_mode="grayscale")
x_img = img_to_array(img)
x_img = resize(x_img, (128, 128), mode='constant', preserve_range=True)
for p in range(128):
for q in range(128):
#print(x_img[p][q]/255)
X[k][p][q][r-1]=x_img[p][q]/255

#k=k+1
# Save images
#X[k, ..., 0] = temp1 / 255

# Load masks

mask = img_to_array(load_img(path_train_output+ids_batches[b][c], color_mode="grayscale"))
mask = resize(mask, (128, 128), mode='constant', preserve_range=True)

inc=-1
for p in range(128):
for q in range(128):
num=int(mask[p][q])
temp=np.zeros((255), dtype=np.float32)
temp[num]=1
inc=inc+1
y[k][inc]=temp
print

yield X,y

For training, we compiled the U-Net using Adam algorithm as optimizer, 0.1 as learning rate, categorical_crossentropy as loss function.

model.compile(optimizer=Adam(lr=0.1), loss="categorical_crossentropy", metrics=["categorical_accuracy"])
model.summary()

The data will be started to be fed into U-Net by the following code:

x_tr,y_tr = get_data(train_ids[0:2], batch_size=4)
results = model.fit(x=x_tr,y=y_tr, batch_size=4, epochs=50, verbose=1)

After trained for tens of epochs, the training could be stopped if the training accuracy and testing accuracy reach the expected threshold.

4. Testing

Use the trained U-Net, we could test it on several images for validation. The output of the U-Net is a probability map which need be sum up using max function (the crop type with the largest probability will take the pixel). We could use a bigger image and slice it into tiles, feed each tile into the U-Net to get the predicted crop maps and finally merge all the tile maps into a big map. The result is a one-band raster like:

U-Net Predicted One-band Crop Layer

To make it readable, we apply the legend of USDA NASS onto the layer. The map will be rendered into:

Colorized Crop Layer

We tested the U-Net on several Landsat scenes and got some satisfactory results. The general accuracy comparing to CDL is around 72% ~ 85% (all the pixels). Generally speaking, the accuracy of major crops such as corn and soybean is relatively higher than other crops and land cover types.

An Example Test Result of U-Net

References

Sun, Z., Di, L. and Fang, H., 2019. Using long short-term memory recurrent neural network in land cover classification on Landsat and Cropland data layer time series. International journal of remote sensing, 40(2), pp.593–614.

Sun, Z.. (2019, January 30). Some Basics of Deep Learning in Agriculture (Version 1). ESIP. https://doi.org/10.6084/m9.figshare.7631615.v1

Jupyter Notebooks

Ayush Patel’s Agriculture Net.ipynb

SANKALP MITTAL’s Ag-Net.ipynb

--

--

Ziheng Jensen Sun
Artificial Intelligence In Geoscience

research assistant professor in @georgemasonu expert in geospatial cyberinfrastructure, machine learning practitioner, a husband and a father.