Extract value from AWS API logs with Kinesis & Lambda (Python 3.6)

Image for post
Image for post
Tracking ec2 api calls across our accounts

The Project

Consolidated CloudTrail logging provides the raw material necessary for analytics, and this pipeline forwards a subset to a lambda function which populates a custom CloudWatch Metrics space. This assumes you are collecting CloudTrail logs and forwarding them to CloudWatch Logs.

We use Kinesis Streams to send the CloudTrail log data to Lambda, where we extract log events that meet our criteria and then push summary metrics to a new Metrics group for dashboarding (as above).

The meat of the project is the three lines of Python that re-encode the log data (below). …


Image for post
Image for post
R, G, & B — Arabic numeral ‘3’

Data pre-processing is critical for computer vision applications, and properly converting grayscale images to the RGB format expected by current deep learning frameworks is an essential technique. What does that mean?

Understanding Color Image Structure

Most color photos are composed of three interlocked arrays, each responsible for either Red, Green, or Blue values (hence RGB) and the integer values within each array representing a single pixel-value. Meanwhile, black-and-white or grayscale photos have only a single channel, read from one array.

Using the matplotlib library, let’s look at a color (RGB) image:

img = plt.imread('whales/547b59eec.jpg')
plt.imshow(img)
print(img.shape)
(525, 1050, 3)

The output of the matplotlib.plot.shape call tells us that the image has height of 525 pixels, width of 1050 pixels, and there are three arrays (channels) of this size. …


Image for post

The ‘hello world’ of deep learning is often the MNIST handwritten number dataset, and I wanted to apply the same techniques to a more interesting application: the Arabic Handwritten Characters Dataset (AHCD), a dataset developed by the American University in Cairo.¹

In this example I use the fast.ai library to train a convolutional neural net (CNN) to correctly classify the AHCD at 99+% accuracy. Here’s how:

First, import the libraries we need and set our GPU to use cuda:

%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai.vision import *
from fastai.metrics

About

Matthew Arthur

Former NSA analyst working on compliance & security automation in the cloud. Background in applied mathematics, student of machine learning & neural nets.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store