Skychain: AI for Heart Diseases Recognition in development

Skychain has started developing the neural network for heart diseases recognition, which detects deviations by analyzing digitized ECGs.

The problem

Heart disease is the leading cause of death in the United States. More than 600,000 Americans die of heart disease each year. That is one in every four deaths in this country.

Every year about 735,000 Americans have a heart attack. Of these, 525,000 are a first heart attack and 210,000 happen in people who have already had a heart attack.

Skychain developments

Skychain is developing the AI to recognize heart diseases using digitized ECGs. Maria Zorkaltseva, the Skychain data science developer, took data for training from PTB ECG database available at the Physiobank which contains 549 records from 290 subjects with ECG sampling frequency of 1000 Hz.

The database includes records of the following diseases:

  • Myocardial infarction,
  • Cardiomyopathy/Heart failure,
  • Bundle branch block,
  • Dysrhythmia,
  • Myocardial hypertrophy,
  • Valvular heart disease,
  • Myocarditis4Miscellaneous,
  • Healthy controls.

We take only the first ECG record for each patient. For each record, there is a header file with information of such additional features as age, smoker/non-smoker and the accompanying diagnosis.

Models were trained in two modes:

  1. using only ECG signals;
  2. using ECG signals and additional features extracted from header files.

To artificially increase the sample length of examples for training and testing, the method of dividing signals into windows was used. ECG signal preprocessing can thus be conducted in two ways:

  1. dividing signals into fixed-size windows (in the case of recurrent neural networks);
  2. dividing signals into cardiac cycles with 2 heart beat each (in case of convolutional neural networks).

The data was divided into train and test sets by patient id in the ratio of 80% to 20%, respectively.

Neural network models

In this work, various architectures and types of neural networks (convolutional, recurrent) with different dimensions of the input tensors were tested:

a) one-dimensional convolutional neural network with an input tensor (window_size, channel);

b) a two-dimensional convolutional neural network with an input tensor (cardiac_cycles, window_size, channel);

c) recurrent neural network (LSTM) with an input tensor (window_size, channel).

Categorical crossentropy and categorical accuracy were used as the loss function and a metric, respectively. For monitoring, the loss function was chosen and the model with the optimal loss function value were saved after each epoch, thus, in the end, the model with “the best” weights was used for prediction. If the loss function value did not change after 5 epochs, the learning_rate decreased twice to the minimal value of 0.00001. The initial learning rate was 0.001. For optimization, an adaptive back-propagation method Adam was used.

The training process stopped if the loss function value remained the same during 10 epochs. This method is called EarlyStopping. Thinning with a probability of 0.5 was used as a regularization.

Development results

With the help of the proposed neural networks, two classes (Myocardial infarction, Healthy control), taken from the PTB ECG database, are determined with sufficiently high accuracy.

In the task of detecting heart diseases on digitized ECG, convolutional neural networks show higher accuracy than recurrent ones. In turn, two-dimensional convolutional networks cope with the task better than one-dimensional. The introduction of additional features allows to obtain higher accuracy than when learning only with ECG signals. The best results are shown by a two-dimensional convolutional network with additional features.

These are the results of two-dimensional convolutional networks with additional features (case 2bd) and without (case 1bd) below.

Case 1bd:

Average categorical_accuracy for 5 folds (train data): 87.53% (+/- 7.56%)
Accuracy score on test data: 0.83
Matthews coeff on test data: 0.61

Case 2bd:

Average categorical_accuracy for 5 folds (train data): 99.39% (+/- 1.21%)
Accuracy score on test data: 1.0
Matthews coeff on test data: 1.0

For better understanding of precision and recall metrics, read the article here: https://en.wikipedia.org/wiki/Precision_and_recall

Future work

Further work includes adding additional classes of heart diseases to the models that can be classified with high accuracy by expanding datasets. It is also planned to add residual blocks to convolutional neural networks and compare the results with existing ones.

Join Skychain on social media: Twitter, Facebook, Telegram.

Try Skychain Alpha.

Iva Chernysheva, Marketing Manager