Data and ML Monitoring is Easier with whylogs v1.1

WhyLabs Team
WhyLabs
Published in
6 min readSep 28, 2022

whylogs v1.1 is out with new features that make data and ML monitoring easier than ever

The release brings many features to the whylogs data logging API, making it even easier to monitor your data and ML models!

whylogs is the open-source standard for data logging, allowing you to create statistical profiles of datasets to monitor for data quality, data drift, model drift, and more in Python or Java environments. Learn more about whylogs on GitHub.

Profiles generated with whylogs can also be used with WhyLabs Observatory to easily configure a customizable monitoring experience. Learn more about the WhyLabs Observatory here.

What’s new with whylogs v1.1?

If you’re a longtime whylogs user, you may notice some of these features were already available in whylogs v0, and now they’re all available in the simplified v1 API.

New features in whylogs v1.1:

  • Segments: Gain visibility within a sub-group of data
  • Log image data: Monitor data for computer vision models
  • Log rotation: Monitor continuous data streams
  • Conditional count metrics: Detect specific values in datasets
  • String tracking: Monitor string data for NLP
  • Model performance: Track and monitor model performance in WhyLabs

Keep reading to learn more.

Monitor subgroups of data with segments

Specific subgroups of data can behave differently from the overall dataset. When monitoring the health of a dataset, it can be helpful to have visibility at a subgroup level to better understand how these subgroups contribute to trends in the overall dataset. This can be crucial for detecting dataset bias and fairness. whylogs v1.1 supports data segmentation for this purpose.

Segmentation in whylogs can be done by a single feature or by multiple features simultaneously.

from whylogs.core.segmentation_partition import segment_on_column
column_segments = segment_on_column("category")

See a full code example on GitHub

Segmented profiles can also be uploaded to WhyLabs, where each segment will appear in the “Segments” section of the model dashboard within a particular project.

EXAMPLE OF SEGMENTS IN WHYLABS

Learn more about monitoring subgroups of data with segments in whylogs here.

Monitor Computer Vision data with image logging

In addition to tabular and textual data, whylogs can generate profiles of image data. whylogs can compute a number of metrics relative to image data. These metrics can be used to detect data drift and quality issues, such as low lighting levels.

results = log_image([img1, img2])
print(results.view().get_column("image_1").to_summary_dict())

Image metrics that are tracked in whylogs.

  • Brightness (mean, standard deviation)
  • Hue (mean, standard deviation)
  • Saturation (mean, standard deviation)
  • Image Pixel Height & Width
  • Colorspace (e.g. RBG, HSV)
EXAMPLE OF DATA QUALITY ISSUE WITH LOW LIGHTING

To learn more about logging image data with whylogs, check out our documentation and stay tuned for an upcoming blog post about it!

Log rotation (rolling logs) for continuous data streams

Logging continuous streams of data can be challenging. By using log rotation in whylogs, you can ingest data at the rate it gets generated, without having any delay or memory constraints.

Instead of having to plan out how to log intervals with batching, whylogs will handle all of that for you. The Logger will create a session and log information at the requested intervals of seconds, minutes, hours, or days and at that interval, write out your profile to a .bin file and flush the log, getting ready to receive more data.

class MyApp:
def __init__(self):
# example of the rolilng logger at a 15 min interval
self.logger = why.logger(mode="rolling", interval=15, when="M",
base_name="message_profile_")
# write to our local path, there are other writers though
self.logger.append_writer("local", base_dir="example_output")
self.dataset_logged=0 # this is simple for our logging
def close(self):
# On exit the rest of the logging will be saved
self.logger.close()
def consume(self, data_df):
self.logger.log(data_df) # log it into our data set profile
self.dataset_logged += 1
print("Inputs Processed: " + str(app.dataset_logged) +
" Dataset Files Written to Local: " + str(count_files(tmp_path)))

See a full code example on GitHub

Learn more about log rotation to monitor data streams here.

Conditional count metrics

By default, whylogs tracks several metrics, such as type counts, distribution metrics, cardinality, and frequent items. While these metrics are helpful for many use cases, such as monitoring data drift, sometimes custom metrics are needed to monitor an application properly.

Condition count metrics allow users to define custom metrics and return the number of times the condition was valid for a given column. This feature is useful for detecting personal identifiable information (PII) or if specific numerical values are contained in datasets.

Users can create condition count metrics with regex for string matching, conditionals for numerical values, or a custom function for any given condition.

class CustomResolver(Resolver):
def resolve(self, name: str, why_type: DataType, column_schema: ColumnSchema) -> Dict[str, Metric]:
return {"condition_count": ConditionCountMetric.zero(column_schema.cfg)}
conditions = {
"containsEmail": Condition(rel(Rel.fullmatch, "[\w.]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}")),
"containsCreditCard": Condition(rel(Rel.match, ".*4[0-9]{12}(?:[0-9]{3})?"))
}
config = ConditionCountConfig(conditions=conditions)
resolver = CustomResolver()
schema = DatasetSchema(default_configs=config, resolvers=resolver)
prof_view = why.log(df, schema=schema).profile().view()
prof_view.to_pandas()

See a full code example on GitHub

Condition Validators can be used with these metrics to trigger actions.

Learn more about using condition count metrics in whylogs:

Basic string tracking

String tracking allows users to use whylogs to perform essential text monitoring functions on datasets. By default, columns of type str will have the following metrics, when logged with whylogs: — Counts — Types — Frequent Items/Frequent Strings — Cardinality.

Tracking further metrics for strings can be done by counting the number of characters that fall in a given unicode range for each string record, and then generating distribution metrics, such as mean, stddev and quantile values based on these counts. In addition to specific unicode ranges, whylogs can follow the same approach, but for the overall string length.

Some examples could include detecting if a communication style is changing, different languages, and how many emojis are used.

The example below tracks two specific ranges of characters:

  • ASCII Digits (unicode range 48–57)
  • Latin alphabet (unicode range 97–122)
class UnicodeResolver(Resolver):
def resolve(self, name: str, why_type: DataType, column_schema: ColumnSchema) -> Dict[str, Metric]:
return {UnicodeRangeMetric.get_namespace(column_schema.cfg): UnicodeRangeMetric.zero(column_schema.cfg)}
config = MetricConfig(unicode_ranges={"digits": (48, 57), "alpha": (97, 122)})
schema = DatasetSchema(resolvers=UnicodeResolver(), default_configs=config)
prof_results = why.log(df, schema=DatasetSchema(resolvers=UnicodeResolver(),
default_configs=MetricConfig(unicode_ranges={"digits": (48, 57), "alpha": (97, 122)})))
prof = prof_results.profile()
profile_view_df = prof.view().to_pandas()
profile_view_df

See a full code example on GitHub

Learn more about string tacking with whylogs here.

NOTE: More text and NLP logging features are coming to whylogs soon!

Model performance monitoring

Monitoring model performance is critical to understanding how well ML models continue to function once deployed. Performance is tracked by logging model predictions and ground truth data with whylogs to calculate scoring metrics in your home-grown ML monitoring solution or the WhyLabs Observability.

Users can set custom monitors in WhyLabs to detect anomalies in model performance, such as if the model accuracy score drops.

WhyLabs will calculate scoring metrics for both classification and regression models.

Classification metrics: Total output and input count, accuracy, ROC, precision-recall chart, confusion matrix, recall, FPR, precision, and F1 score.

results = why.log_classification_metrics(
df,
target_column = "output_discount",
prediction_column = "output_prediction",
score_column="output_score"
)

See a full code example on GitHub

Regression metrics: Total output and input count, mean squared error, mean absolute error, root mean squared error.

results = why.log_regression_metrics(
df,
target_column = "temperature",
prediction_column = "prediction_temperature"
)

See a full code example on GitHub

Get started with monitoring model performance:

Conclusion

We’re excited about the functionality whylogs v1.1 brings, allowing users to monitor model performance, subgroups, images, strings, and continuous data streams in our easy-to-use data logging API.

If you’re interested in trying whylogs or getting involved with our community of AI builders, here are some steps you can take:

Originally published at https://whylabs.ai.

--

--

WhyLabs Team
WhyLabs
Editor for

On a mission to build the interface between humans and AI applications