Introducing Db2 11.5.4 — The database with AI, for AI

Steven Astorino
IBM Data Science in Practice
9 min readJul 10, 2020

As organizations embark or progress on their journey to AI, one of the most important aspects is making sure their data is ready for AI. IBM describes this journey as a series of steps as part of the AI Ladder: Collect, Organize, Analyze, Infuse AI.

IBM Db2 is primarily aligned with the Collect stage as shown in figure #1.

Figure #1: Db2 within the Ladder to AI.

Db2 11.5.4 introduces many enhancements that deliver new value over its previous versions. For simplicity I categorize these as follows:

Enterprise Readiness

Performance is a key attribute of transaction and analytic workloads. This latest release of Db2 is designed to be able to start the database many times faster than previous releases. When the database is activated, buffer pools are allocated by the database manager and the operating system commits these to memory using an asynchronous process — after the database is deemed ready for use or “open for business”.

Parallelism implies doing multiple things at the same time. Previously, Db2 federation only worked on one Db2 node, regardless of the number of nodes/partitions. This latest release enables federation on all active nodes and the process of fetching data from remote data sources is distributed among them, making the fetching of data faster. Part of the way that this is achieved is by enabling the query compiler to generate a parallel access plan which can be accessed and executed by every partition.

Space reclamation (Columnar) can also help improve performance. Enhancements in this release include removing space ­­ — to help alleviate performance overheads due to fragmented regions. It identifies the fragmented regions within the specified tables, moves and tightly packs them, freeing and reclaiming the space from the fragmented regions, performs index updates as necessary and supports concurrent Query/Insert/Update/Delete activities.

Many customers store highly sensitive business data in their Db2 tables. Security is uppermost in many organizations’ minds. Customers can now enjoy single sign-on (SSO) with 3rd Party applications using Db2 in the back-end utilizing the JWT Token for SSO. Previously, while using an application with Db2, a user’s application authentication and Db2 authentication had to be done by users separately. This is no longer the case. A user authenticates on the front-end application that can create JWT tokens. The application will be able to generate a user-specific token. The token is then passed to Db2 to assert and authenticate the user using the front-end application.

Authentication caching is another performance feature designed to help relieve the performance impact that can occur due to authentication bottlenecks — and this capability can result in performance improvements for these types of workloads by having this data cached. This helps with workloads that have very short duration connections which occur repeatedly using the same, limited set of authorization IDs. This capability applies to all password plugins supported by Db2. The cache is configured based on the maximum number of users to be cached and duration of time for cache. It is important to note that credentials are not cached.

Time savings are important to any business. The adaptive workload manager in Db2 provides automated resource tuning for workloads to help maximize performance on hardware, thereby saving time and simplifying efforts on optimization. Doing so manually, typically involves costly human effort. It schedules jobs appropriately, based on available resources making sure that the system is well-utilized but not overcommitted. Incoming work is guided to different processing “lanes” based on expected memory and CPU consumption and duration. Each “lane” is assigned a defined resource allotment in an attempt to maximize predictability. Admission of new work is on the fit of the work to the available resources in that class and latency. Analysis of historical feedback based on past executions is also leveraged.

As part of a tech preview, the Db2 Machine Learning Optimizer provides an additional level of intelligent optimization to deliver query execution strategies that help improve upon traditional cost-based query optimization. We have seen that basic workload cost optimizers make recommendations for query execution, that are not sensitive to recent changes in the database and they don’t take into account previous history. The Db2 Machine Learning Optimizer, by contrast, incorporates feedback from actual query performance to recommend execution strategies that may deliver improved results.

System administrators can now lock down a query plan at a release level. This eliminates the time to reoptimize the same query after an upgrade. Previously, query plans did not get saved and administrators had to rewrite query plans after each upgrade. System administrators can still choose to write new query plans if they want to.

Db2 provides advanced log space management for Crash Recovery scenarios, reducing the likelihood of hitting transaction log full conditions caused by long running, low volume transactions. This is particularly beneficial for crash restarts. It extracts the log records for active transactions and allows the log file to be closed, archived, and reused.

Also, in this release, there is more support for non-dictionary-based compression for unencoded string data to improve compression. Algorithms are chosen to balance compression and encode/decode performance. Query runtime can decompress pages before processing and is typically used during bulk insert processing. This applies to string datatypes of CHAR/VARCHAR, GRAPHIC/VARGRAPHIC, BINARY/VARBINARY.

Modern Development

As part of a Tech Preview(1), customers will be able to use the Db2 engine for performing graph analytics without investing in a new graph database or needing to migrate data to a different system. This means an organization’s data remains in Db2 and can continue to read and make updates to the data while leaving applications as they are. Db2 Graph creates a virtual graph view of relational data. It uses the referential information to automatically create the graph schema. Users can map tables or views into graph vertices and edges, adding more relationships by editing the Graph schema — provided that the underlying data exists in Db2.

Over the years, more machine learning (ML) capabilities have been added to Db2 releases. Customers can now build end-to-end ML workflows using Db2’s built-in capabilities without the need to move data to other platforms. The in-database ML provides data exploration, data pre-processing, model building, and evaluation, model scoring, and model management. Figure 2 below helps describe the breadth and depth of this feature.

Figure 2: In-database machine learning capabilities.

New to this release, Db2 customers can perform spatial analytics on row and column organized tables. This is based on the Open Geospatial Consortium (OGC) guidelines providing over 100 spatial functions including but not limited to:

  • Constructor functions to create shape objects from transport formats
  • Property functions to retrieve geometry properties
  • Relation functions to determine relations between geometries
  • Shape functions to produce geometries from existing geometries

Catering more to the application developer is also a big theme or this Db2 release. Application developers will be able to use more of the latest features of their favorite programming language and integrated development environment (IDE). A summary is included below:

Db2 Programming Driver Updates ­­­­ — including native installation for Db2 DS drivers on Linux.

.NET

  • Ability for application developers to leverage new capabilities of .NET 2.1 Core
  • .NET 2.2 Core Support (Progressive completion) and .NET 3.0 Core (New)
  • .NET Core Driver Support on Azure and AWS

Open Source

  • Ruby 2.6.5 and Rails 6.0.x Support
  • Node.js V12.x Support
  • Django 2.2 Support
  • Compliance with JDBC 4.2 Specifications

Visual Studio Code Extension

  • Ability to store connection profiles by the user (enables users to reuse connection information for various Db2 instances)
  • Ability to maintain the history of queries run by the user

Db2 provides application developers with APIs to create, discover and execute their own REST end-points for static and dynamic SQL, referred to as services, specific to their application needs. These provide Db2 access without managing client drivers on end-user machines. Each developer-defined service is associated with a single SQL statement (for example SELECT, INSERT, UPDATE, DELETE initially). Services can be executed synchronously, or as a job that supports result set paging.

Python user defined function (UDF) — This enables organizations to bridge the gap between Python developers and Db2 data at scale. It allows developers to call Python code directly, as database operations or functions, pushing the computation closer to the data so that they can perform in-database data wrangling techniques and use machine learning algorithms as database operations.

Consumability and Resilience.

In this release of Db2, organizations can deploy Db2 Phase 2-Operators on Red Hat OpenShift, enabling users to install and upgrade Db2 on the OpenShift platform. Customers can take advantage of orchestration, monitoring, governance, elasticity and other features of the Cloud Pak for Data platform for Db2.

System administrators can now deploy automated Db2 high availability disaster recovery (HADR) on RHEL 8 platforms and public cloud to achieve high availability for cluster services by detecting and recovering from node and resource-level failures.

Other features under this category enable system administrators to block or restrict certain operations that would otherwise result in a re-org pending state. These controlled restrictions are designed to increase the availability of the Db2 system.

Db2 Cloud Managed Services.

Db2 is available as a cloud managed service. IBM DevOps & CloudOps teams are composed of engineers and experts that manage Db2 and cloud-based services who manage:

  • operating system updates and security patches
  • new offering features and capabilities, including engine updates and console enhancements
  • rolling out new hardware releases, when available
  • unexpected software and hardware failure

For over 35 years Db2 has been handling some of the toughest mission critical workloads with focus on:

Scalability — with independent elastic scaling of storage and compute. Elasticity means Db2 can scale up its compute capacity during peak demand and scale down when demand falls. It also enables dynamic ramping up of data storage as and when an organization’s data grows.

Highly Availability — deploy HA nodes to achieve necessary levels of redundancy and availability. This includes operational data stores and data warehousing as follows:

Operational data stores:

  • Active/Passive deployment — Single read/write node (primary), read-only replicas (HA replica), spanning multiple availability zones (AZ)
  • Dedicated or shared Kubernetes plane for container management — Choose between a shared Kubernetes control plane or a completely dedicated deployment (per customer).
  • HA replicas for read-only access — Use the HA replicas for read-only access to your transactional/operational data.
  • Zero downtime with Automatic Client Rerouting — Instance scaling, failure recovery (failover) and maintenance are all designed for near-zero downtime with automatic client rerouting.

Data warehousing:

  • In-cluster HA managed by Kubernetes — Unhealthy compute nodes are immediately detected, removed from the cluster, and replaced by a new node from a hot standby pool or provisioned just-in-time. Storage is built on highly redundant, highly performant, highly available, SSD-backed block storage.
  • Multi-cloud HA with database replication — Copy tables and replicate changes to your source database to multiple target databases, across cloud vendors. Use source and target databases for active workloads, and just replicate the changes when needed.

Reliability — self-service, automated and fully managed backups with snapshot, disaster recovery, and point-in-time recovery at the following levels:

  • Database — Scheduled or ad-hoc Db2-level backups available for all Db2-based cloud offerings. Backups offloaded to object storage, automatically replicated to other availability zones.
  • Snapshot — Storage-level backups ideally suited for massive data warehouses where taking a database-level backup is too slow or impractical.
  • Point-in-time — For operational workloads, log files continuously copied to object storage for point-in-time recovery.

Summary and Next Steps

I hope as you read this blog post you understand the continued investment in Db2 and its related offerings. In summary, this release has been designed to help organizations with enterprise readiness through capabilities for improving performance such as improving federation efficiency as well as security. To help reduce costs this release incorporates more machine learning and automation driven capabilities such as adaptive workload management.

To support modern development methods and trends this release introduces new capabilities including Db2 Graph, Rest APIs for Db2 and laying the foundation for in-database machine learning. We also present Db2 as a fully managed service so that organizations can focus more on what they need to, while IBM addresses the management and operational aspects.

To find out more information and the latest capabilities or how Db2 can help your organization, click here.

Steve -

Footnotes:

1. Technology Preview https://www.ibm.com/support/pages/node/6205946

--

--