SQL or ML? The same dataset, two ways.

  • Figuring out whether you want to use your data for generating future predictions or historical trend analysis. If it’s the latter, you don’t need ML
  • Why ML is often a good solution for generating predictions on video, image, or audio data

Finding patterns in your data

For this section let’s assume you’ve found yourself at this step in the flowchart:

ML on text, numerical, and categorical data

Working with the same log file dataset as we did above, I’d now like to change the thing I’m predicting. Instead of flagging logs that indicate errors, I want to identify logs that indicate anomalies. The dataset is already labeled for this, using a “-” in the first character of each log to indicate “non-alert” logs. Logs without a dash symbol are alerts. This makes it a good dataset for alert detection, but I’m still not 100% convinced that machine learning is the right tool for the job. Before I decide, I want to compare alert vs. non-alert logs to see if I can spot any patterns.

  • Option 1: My system doesn’t flag any non-alerts as alerts. In the process, it may miss some logs that are alerts.
  • Option 2: My system flags a high percentage of possible alerts. In the process, it may surface some false positives (alerts that actually aren’t alerts).
  • Linear regression: predict a numerical value (like revenue)
  • Binary logistic regression: predict the probability that your input belongs to one of two classes
  • Multi-class logistic regression: predict multiple possible classes for a particular input
  • In the first line we create the model and give it a name
  • Then we tell BQML the type of model we’re building (logistic regression) and the column from our table that will be the label: this is the thing our model is predicting. In our case, it’s the alert column
  • The rest is a regular old SQL query where we extract all of the data that will be used as input and output to our model. Remember that the alert column had more than 2 values: non_alert or the particular alert messages if the log was an alert. I’ve used a CASE statement to convert this data to two classes

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sara Robinson

Sara Robinson

3.8K Followers

Connoisseur of code, country music, and homemade ice cream. Helping developers build awesome apps @googlecloud. Opinions = my own, not that of my company.