How To Determine The Ripeness Of Watermelon With Keras: A Complete Cycle, From The Idea To The Program On Google Play
How it all began
It all started with Apple Market — I found that they have a program that allows you to determine the ripeness of watermelon.
The program … is strange. What is worth, if only, a suggestion to knock on the watermelon not with knuckles, but … with a telephone! Nevertheless, I wanted to repeat this achievement on the more familiar Android platform.
Selection of tools
Our problem is solved in several ways, and frankly, I had to make considerable efforts not to go “in a simple” way. That is, take Fourier transforms, wavelets and the signal editor. However, I wanted to gain experience working with neural networks, so let the networks and analyze the data.
Like a library for creating and learning neural networks, Keras-Guglovskaya superstructure over TensorFlow and Theano was chosen. In general, if you are just starting to work with deep training networks, you can not find a tool. On the one hand, Keras is a powerful tool, optimized for speed, memory, and hardware (yes, it can work on video cards and their clusters). On the other hand, everything that can be “hidden” from the user is hidden there, so you do not have to puzzle over the docking of the layers of the neural network, for example. Very comfortable.
Both Keras and Neural Networks, in general, require knowledge of Python — this language, like a snake wrapped around … sorry, sore. In short, without Python in modern Deep Learning should not be confused. Fortunately, Python can be studied in two weeks, at most — in a month.
- To Python, you will need some more libraries, but these are trifles — I mean if you have dealt with Python himself. It will take an acquaintance (very superficial) with NumPy, PyPlot and possibly, with a couple of libraries, from which we take literally a pair of functions. Not difficult. True.
Well, in conclusion, I note that the above-mentioned clusters of video cards will not be needed — our task is normally solved with the help of a computer CPU — slowly, but not critically slow.
First, you need to create a neural network — on Python and Keras. You can use Ubuntu emulator, you can also use a Windows as well, but the extra time spent will suffice for you to study the Ubuntu, and continue to work under it.
The next step is to write a program. I plan to do this on Java under Android. It will be a prototype of the program, in the sense that it will have a user interface, but there are no neural networks yet.
- What is the point of writing, you ask. And here is what: any task related to data analysis sooner or later rests on the search for data — for the training of our program. In fact, how many watermelons should be taped and taste, so that the neural network could build on these data a reliable model? A hundred? More?
Here, our program will help us: we upload it to Google Play, distribute it (ok, impose, twist our hands) to all friends who are unlucky to have an Android phone, and the data, with a tiny little stream, begin to flow … and by the way, where?
The next step is writing a server program that receives data from our client’s android. True, this server program is very simple, I finished it all in twenty minutes. But, nevertheless, this is a separate stage.
- Finally, there is enough data. We train the neural network.
Porting the neural network in Java and releasing the update of our program.
Creating a Neural Network
Working with audio, which, of course, is tapping on a watermelon, it’s either a recurrent neural network or the so-called one-dimensional convolutional network. Moreover, recently the convolutional networks unambiguously lead, replacing recurrent ones. The idea of the convolutional network is that the slice of the window glides over the data array — the “sound time-time” graphic, and instead of analyzing hundreds of thousands of samples, we work only with what gets into the window. The following layers combine and analyze the results of this layer.
- To be clearer, imagine that you need to find a seagull in the photo of the sea landscape. You scan the picture — the “window” of your attention moves along imaginary lines and columns, looking for a white tick. This is how a convolutional 2D network works, one-dimensional scans along one coordinate — the best choice if we are dealing with an audio signal.
I note, however, that the obsession with 1D networks is not necessary. As an exercise, I constructed a sound graph and analyzed the resulting bitmap as a picture — using the 2d convolutional network. To my surprise, the result was no worse than in the analysis of “raw one-dimensional” data.
The network used had the following structure:
model = Sequential()
model.add(Conv1D(filters=32, kernel_size=512, strides=3,
padding='valid', use_bias=False, input_shape=(nSampleSize, 1), name='c1d',
model.add(Activation('relu', input_shape=(nSampleSize, 1)))
This network has two output values (it predicts two values): sweetness and ripeness.
- Sweetness is 0 (unsweetened), 1 (normal) and 2 (excellent).
- Ripeness, respectively, 0 — too hard, 1 — what you need, and 2 — overripe, like cotton wool with sand.
Estimations for the test sample are exposed by the person, exactly — we will talk in the section devoted to the program for Android. The task of the neural network is to predict what kind of watermelon (for the tapping record) the person will put the estimate.
Writing a program
I already mentioned that the program should come out in two versions. The first, preliminary, honestly warns the user that her predictions are complete nonsense. But it allows the user to record a knock on the watermelon, put an estimate of the taste qualities of this watermelon and send it over the Internet to the author of the program. That is, the first version simply collects the data.
Here is the page of the program on Google Play, of course, the program is free.
What does it do?
1. Press the button with the microphone and start recording. You have five seconds to knock three times on a watermelon. Button with watermelon makes a “prediction”, and we do not touch it yet.
Note — if there is an old version on Google, then the record and the prediction are combined in the button with the watermelon, but there are no buttons with the microphone.
2. The saved file is temporary and will be overwritten the next time you press the record button. This allows you to repeat the tapping if someone talks by the hand (you can not imagine how hard it is to get people to shut up for five seconds!) Or just the water rustles — the dishes are rattling — the neighbor is drilling …
But watermelon is chosen and bought. You brought it home, recorded the sound and cut it. Now you are ready to assess its taste. Select the Save tab.
On this tab, we see two comp boxes for ranking — sweetness, and ripeness (the translation is being done). They put an estimate — they clicked Save.
Save can be pressed only once. So, first, put an estimate. At the click of a button, the sound file is renamed, and now it will not be deleted at the next recording.
3. Finally, having recorded (and therefore eaten) a dozen watermelons, you returned from the dacha where you did not have the Internet. Now the Internet is. Open the Submit tab and click the button. A package (with a dozen watermelons) goes to the developer’s server.
Writing a server program
It’s all simple, so I’m better off laying out the full code of this script. The program “catches” files, gives them unique names and adds them to a directory accessible only to the site owner.
$uploads_dir = './melonaire/';
$tmp_name = $_FILES['file']['tmp_name'];
$pic_name = $_FILES['file']['name'];
$filename = md5(date('Y-m-d H:i:s:u'));
echo "File not uploaded successfully.";
Neural network training
The data is divided into training and test, 70 and 30 percent, respectively. Neural network — converges. Here there are no surprises, however, for beginners: do not forget to normalize the input data, this will save you a lot of nerves. Something like this:
for file_name in os.listdir(path):
nSweetness, nRipeness, arr_loaded = loadData(file_name)
arr_data.append(arr_loaded / max(abs(arr_loaded)))
# 2 stands for num. of inputs of a combo box - 1
arr_labels.append([nSweetness / 2.0, nRipeness / 2.0])
Porting of a neural network
There are several ways to port the network from the Python environment in Java. Recently, Google has made this process more convenient, so that you will read the textbooks — make sure that they are not out of date. Here’s how I did it:
from keras.models import Model
from keras.models import load_model
from keras.layers import *
import tensorflow as tf
g = tf.GraphDef()
print([n for n in g.node if n.name.find('input') != -1])
print([n for n in g.node if n.name.find('output') != -1])
print([n for n in g.node if n.name.find('keras_learning_phase') != -1])
def keras_to_tensorflow(keras_model, output_dir,
if os.path.exists(output_dir) == False:
out_nodes = 
for i in range(len(keras_model.outputs)):
out_nodes.append(out_prefix + str(i + 1))
tf.identity(keras_model.output[i], out_prefix + str(i + 1))
sess = K.get_session()
from tensorflow.python.framework import graph_util, graph_io
init_graph = sess.graph.as_graph_def()
main_graph = graph_util.convert_variables_to_constants(sess, init_graph, out_nodes)
graph_io.write_graph(main_graph, output_dir, name=model_name, as_text=False)
from tensorflow.python.tools import import_pb_to_tensorboard
model = load_model(get_script_path() + "/models/model.h5")
#keras_to_tensorflow(model, output_dir=get_script_path() + "/models/model.h5",
# model_name=get_script_path() + "/models/converted.pb")
print_graph_nodes(get_script_path() + "/models/converted.pb")
Pay attention to the last line: in Java code, you will need to specify the names of the input and output network. This “print” just prints them.
- So, we put the received concerted.pb file into the assets folder of the project in Android Studio, connect the library tensorflowinferenceinterface (see here, or here, or better, here), and everything.
All. When I did this for the first time, I expected it would be difficult, but … it worked on the first attempt.
Here is what the call to a neural network from Java code looks like:
protected Void doInBackground(Void... params)
//Pass input into the tensorflow
tf.feed(INPUT_NAME, m_arrInput, 1, // batch ?
m_arrInput.length, 1); // channels ?
//copy the output into the PREDICTIONS array
} catch (Exception e)
Here m_arrInput is an array with two elements, containing — ta-da! Is our prediction, normalized from zero to one.
Here, like, it is supposed to thank for attention, and express the hope that it was interesting. Instead, I note that the first version of the program is on Google. The second is completely ready, but the data is not enough. So, if you like watermelons — please put a prog to your Android. The more data you send, the better the second version will work …
Of course, it will be free.
Good luck, and yes: thanks for attention. I hope it was interesting.