From Data to Insights: Exploring the Potential of Histogram Analytics in Multiple Industries

Published in

Teradata

16 min readJul 20, 2023

Many different colored large shipping containers within a container port. — **Photo by** **CHUTTERSNAP**

As a result of rising expenses caused by inflation, businesses are actively seeking out creative approaches such as histogram analytics to handle the consequences on their supply chains and operations. By utilizing this technique, companies can gain insights into the potential consequences of different occurrences, enabling them to adapt their strategies to economic fluctuations and unforeseen disruptions. This blog will further discuss the practical application of histogram analytics in optimizing business operations, including a real-life example where we aided a semiconductor manufacturer in identifying a bottleneck within their operations.

Overview

Histogram analytics is a methodology that models transactional data on anything that goes through multiple processing steps (sequential/parallel). You can leverage it by generating histograms for each step at each cycle time and then generating a path table that shows the transition of each step with certain probability. These transition probabilities can then be aggregated using a certain set of rules to accomplish the following:

Location of items: current or last known location of all items of interest, effectively identifying the step or queue within the path/channel. This metric is used with cycle time, yield, and path selection models to establish the boundaries for prediction.
Quantity of items tracked into the current step (used with yield model for predicting output quantity).
Time of last event (used with cycle time model for predicting completion time).

Some use cases where histogram analytics can be leveraged include:

Supply chains involving goods to be moved through a series of steps before reaching their destination. (i.e., production, packing, transportation, warehouse, shelve)
Manufacturing industries requiring a series of steps to convert raw materials to finished goods.
Airport security clearances where a person must go through multiple steps before they board the aircraft.

Approach

The approach for introducing histogram analytics involves the following steps:

Organize the supply chain/manufacturing sequence into sequential events.
Define (expected) paths through the sequence with probabilities,
Define event-to-event models (detailed level) for basic cycle time and/or yield
Aggregate the models over multiple events and sequences, according to the expected paths, to evaluate performance and make predictions.

Now let’s further explore each step.

Events, Segments, and Queues

Events

As transactional data is collected, the transactions can be used to demarcate the beginning and end of processes. This suggests defining a segment as the process interval between appropriate data collection events.

In this context, a data collection event must contain the following:

A timestamp of the event
Items observed (e.g., shipments, products)
Location in the supply chain (e.g., physical, logical, equipment)

This point in time collection of information is termed an event. Events can be defined from most transactional systems in use such as shop floor systems, warehouse systems, logistics systems, and shipping.

Segments

A segment is then defined as the process between two significant sequential events. Note that it might be judicious to ignore some data collection points (too much data) and focus on the most important events for segment definition.

The segment process should be thought of in a very general way. It could represent a physical transformation (e.g., manufacturing), a movement (logistics), verification (testing, inspection), or other activities. Note that with these definitions, manufacturing activities and supply chains effectively have the same structure.

Identification of a segment is then associated (at a minimum) with the starting event and ending event. In some cases, it may be advantageous to include attributes of the items within the segment, although these can frequently be included in the event definitions or handled independently with the item management.

Queues

The end event of a segment may correspond to the starting event of the next segment. For example, a transaction of a truck in yard will begin the next process, that of docking. In these cases, there are no queues or other inventory hold locations between the segments.

Frequently, however, the end of one segment will not correspond to the beginning of the next. In this case, a queue will exist where products are temporarily held while awaiting the start of the next segment. Effectively a queue has its own start and end events corresponding to the end event of the last segment and the start event of the next, respectively. A queue is modeled differently from a segment, however, the cycle time of a queue is generally a function of the number of items in a queue and the processing rate of the next process.

We will be examining a real-life scenario where a semiconductor manufacturer encountered bottlenecks in their supply chain process. By employing the methodology of histogram analytics discussed in this blog, we were able to identify and address those specific issues.

Histogram Models

In the fast-paced world of semiconductor manufacturing, optimizing the supply chain process is vital for accurate prediction of wafer outs and overall efficiency. However, challenges arise as variability and bottlenecks impact the smooth flow of operations. That’s where the power of histogram analytics comes in. By leveraging this methodology, we dive deep into the supply chain, identify the specific processes that act as bottlenecks, and visualize them through plotted data. This allows us to not only pinpoint the problem areas but also pave the way for precise corrections and enhancements, ensuring seamless production and unlocking the true potential of semiconductor manufacturing.

The semiconductor manufacturer provided a step-by-step definition to identify a part flow number based on the combination of {site, technology, techcode, part, stage, eqpType}. We then created a table to extract part flow numbers and filter out only lots that are in production.

The table and its underlying data are then used for building histogram models.

For each of the steps, we built histogram models for step cycle time, queue cycle time, and process cycle time:

Step cycle time (metric=’sCT’) — time taken by a lot to complete the entire step from arrival to finish of processing
Queue cycle time (metric=’qCT’) — time spent waiting in a queue for processing a lot
Process cycle time (metric=’pCT’) — time required to process the lot by the machine

The process was used to build histogram models of the cycle time by machine process time (pCT), queue time (qCT), and total step time (sCT = pCT + qCT).

The SQL for building the models included the following:

Data limits are computed based on Tukey outlier limits — we calculated the upper quartile (Q3) and lower quartile (Q1) to smooth the tail of bar graphs, making the histograms close to a normal distribution.
The actual models are stored as binary objects (BLOBs) in the database.

A Tukey limit is the limit to remove the outlier based on quantile. Anything below Q1–1.5(Q3-Q1) or above Q3+1.5(Q3-Q1) are considered outliers, where Q1 is the first quartile and Q3 is the third quartile for distribution.

We can get different insights from different cycle times like queue cycle times which can tell us where in each step there is a bottleneck that is affecting the total time or total performance.

Firstly, to get started, we need to build histogram models on the base tables.

create table histogram_models as ( 
  WITH tmp_data AS ( 
    SELECT  
     a.model_id,  
     a.model_type,  
     a.segment_step -- keeping model_type for convenience  
     ,  
     CASE WHEN a.model_type = 'sCT' THEN d.endinst_fday - d.startinst_fday WHEN a.model_type = 'pCT' THEN d.trackoutinst_fday - d.trackininst_fday WHEN a.model_type = 'qCT' THEN d.endinst_fday - d.startinst_fday -( 
        d.trackoutinst_fday - d.trackininst_fday 
      ) ELSE NULL END AS CT_fday,  
     Count(*) Over ( 
        PARTITION BY a.model_id, a.model_type 
      ) AS data_cnt -- how many data points 
    FROM  
     Your_database.client_base_table1 
      JOIN ( 
        -- get segment information from SME table  
       -- model ID for all possible technologies, parts, etc. built here for example  
       SELECT  
         Row_Number() Over ( 
            ORDER BY  
             1 
          ) AS model_id,  
         s.partflow AS segment_step,  
         s.site,  
         s.technology,  
         s.techcode,  
         s.part,  
         s.stage,  
         s.eqpType,  
         CASE c.day_of_calendar WHEN 1 THEN 'sCT' WHEN 2 THEN 'pCT' WHEN 3 THEN 'qCT' END AS model_type  
       FROM  
         database_name.client_base_table s,  
         sys_calendar.CALENDAR c  
       WHERE  
         c.day_of_calendar < 4  
       GROUP BY  
         2,  
         3,  
         4,  
         5,  
         6,  
         7,  
         8,  
         9 
      ) a ON a.technology = d.technology  
     AND a.site = d.site  
     AND a.part = d.part  
     AND a.stage = d.stage  
     AND a.eqptype = d.eqptype  
   WHERE  
     d.technology = '******' -- most common technology 
      AND d.part = '*******'  
     AND d.lotType = 'P' -- production lot 
      AND d.prcdKind = 'N' -- normal processing 
      ),  
 tmp_limits AS ( 
    SELECT  
     t.model_id,  
     t.model_type,  
     t.segment_step,  
     Percentile_Cont(0.25) Within GROUP ( 
        ORDER BY  
         ct_fday 
      ) AS q25_ct,  
     Percentile_Cont(0.75) Within GROUP ( 
        ORDER BY  
         ct_fday 
      ) AS q75_ct,  
     CASE WHEN q25_ct < ( 
        1.5 *(q75_ct - q25_ct) 
      ) THEN 0 ELSE q25_ct - 1.5 *(q75_ct - q25_ct) END AS lcl_ct,  
     q75_ct + ( 
        1.5 *(q75_ct - q25_ct) 
      ) AS ucl_ct  
   FROM  
     tmp_data t  
   GROUP BY  
     1,  
     2,  
     3 
  )  
 SELECT  
   model_id,  
   segment_step AS xIn,  
   1 AS yIn,  
   model_type,  
   data_cnt,  
   avg_val,  
   stdev_val -- added some additional stats for info 
    ,  
   Current_Timestamp(0) AS model_create_dttm,  
   -- if data points less than bins, use normal distribution.   
   CASE WHEN data_cnt < 50 THEN customfunctions.histgrgendistnorm(avg_val, stdev_val, 4, 1, 50) ELSE customfunctions.histgrgendataquant( 
      tb.lcl_ct, tb.ucl_ct, 50, 0, tb.ct_fday 
    ) END AS histgrmdl  
 FROM  
   ( 
      SELECT  
       d.model_id,  
       d.segment_step,  
       d.model_type,  
       d.ct_fday,  
       l.lcl_ct,  
       l.ucl_ct,  
       d.data_cnt,  
       Average(ct_fday) Over ( 
          PARTITION BY d.model_id, d.segment_step,  
         d.model_type 
        ) AS avg_val,  
       StdDev_Pop(d.ct_fday) Over ( 
          PARTITION BY d.model_id, d.segment_step,  
         d.model_type 
        ) AS stdev_val  
     FROM  
       tmp_limits l  
       JOIN tmp_data d ON d.model_id = l.model_id  
       AND d.segment_step = l.segment_step  
       AND d.model_type = l.model_type  
     WHERE  
       d.ct_fday BETWEEN l.lcl_ct  
       AND l.ucl_ct  
       AND l.ucl_ct > l.lcl_ct 
    ) tb  
 GROUP BY  
   1,  
   2,  
   3,  
   4,  
   5,  
   6,  
   avg_val,  
   stdev_val 
) with data;

Now we must create a view on top of base tables that specifies the part flow for each lot and apply filters on specific columns (to pull out only lots that are in production).

REPLACE VIEW v384_Some_view AS LOCKING ROW FOR ACCESS  
SELECT  
 t1.partflow,  
 t2.*  
FROM  
 Your_database.client_base_table1 t1  
 JOIN Your_database.client_base_table2 t2 ON t1.site = t2.site  
 AND t1.technology = t2.technology  
 AND t1.techcode = t2.techcode  
 AND t1.part = t2.part  
 AND t1.eqptype = t2.eqptype  
 AND t1.stage = t2.stage  
WHERE  
 t2.part IN ( 
    SELECT  
     part  
   FROM  
     Your_database.client_base_table1 
    WHERE  
     site = '***'  
     AND technology = '******'  
     AND techcode = '******'  
   HAVING  
     Max(partflow) = 384  
   GROUP BY  
     1 
  )  
 AND lottype = 'P';

Generating Process Flow

The connection of segments to other segments and queues forms paths that are consistent with the sequence of processing. The paths can be thought of as a roadmap, connecting segment to segment and allowing for splits and merges between steps (e.g., parallel processing on components of an item). An important concept required by the methodology outlined herein is the requirement to weigh the paths, such that an expected load on a process or predicted path can be made. This is equivalent to assigning probabilities at splits (probabilities are not needed at merges).

Moving from event to event through the supply chain are sets of goods. These are items that are tracked through the events and segments. Some expected characteristics of the set of goods as it progresses include:

Different tracking identifications may be used — depending on the segment/process/channel.
Unit of measure changes may occur (many to few or few to many or change of name)
Consolidation or “break up” (e.g., multiple packages into a crate, or a truckload to multiple pallets).
Virtual vs. real items (tracking order status vs. actual shipments).

The set of goods can influence the path selection. For example, in semiconductor manufacturing, a lot (i.e., a set of goods) that passes testing at a very high speed may require different packaging to handle the expected heat and use profiles. Once the result of the test is known, the specific path for that lot can be selected.

**Figure 1 shows the path a lot or wafer takes to complete its process**

Segments (and queues) can be derived from the event-to-event sequencing of sets of goods. While both segments and queues are defined as the interval (process) between two sequential events, the analytical models are different for a process segment than for a queue. It is therefore necessary to explicitly identify queues — based on the likelihood of inventory being held at the queue prior to processing.

Evaluation of the event-to-event data also provides information about the paths and interconnections (connections between segments). As paths split and merge, historical data will typically be used to estimate the probability of a split. Note that there are two distinct types of splits:

Path splits — in this case, the set of goods is kept intact and a selection of one or more paths is made. This is equivalent to selecting a specific road when presented with two or more roads at an intersection.
Lot splits — in this case, the individual components in the set of goods are physically split and proceed on different paths. This is analogous to a single passenger load splitting into multiple aircraft at a hub.

Figure 2 shows the example of Figure 1 with path/lot splits identified. Note the percentages assigned to the splits and the types of splits. Since the segments in Figure 2 are connected (no isolation exists), there can only be one channel.

The process flow is derived from tracking an item through the process. This is conveniently done with Teradata’s nPath® function on the event sequence. Since some steps are skipped, additional modifications must be made to build a cartesian type of path. Then n-path is used to track lots through the steps.

nPath is a Teradata Vantage™ analytics function that looks for a pattern that we specify by scanning a set of rows in the database. The function provides users the flexibility to define a complex pattern in the input data and defined the values that are output for each matched input set.

nPath is very useful when the main objective is to identify a certain path that leads to an outcome and that is a reason why it is widely used in different industries.

create table path_table as( 
  WITH path_tmp AS ( 
    SELECT  
     path_from AS from_xVal,  
     1 AS from_yVal,  
     path_to AS to_xVal,  
     1 AS to_yVal,  
     path_to - path_from - 1 AS nmbr_skip -- compute if next step skipped 
      ,  
     Count( 
        DISTINCT(lotid) 
      ) AS no_of_lots  
   FROM  
     NPath ( 
        ON ( 
          SELECT  
           *  
         FROM  
           partflow_model_view 
          WHERE  
           lotType = 'P' -- production lots 
            AND prcdKind = 'N' -- normal lots  
           ) PARTITION BY lotId -- traceable unit through process 
        ORDER BY  
         startinst -- step start time 
          USING MODE(OVERLAPPING) PATTERN ('A.B') Symbols(True AS A, True AS B) RESULT( 
            First(lotid OF A) AS lotid,  
           First(partflow OF A) AS path_from,  
           -- defines numeric sequence 
            First(partflow OF B) AS path_to 
          ) 
      ) AS dt  
   GROUP BY  
     1,  
     3  
   HAVING  
     no_of_lots > 15 
  )  
 SELECT  
   t2.*,  
   Sum(t2.step_move) Over (PARTITION BY t2.from_x, t2.from_y) AS ttl_move,  
   ( 
      t2.step_move(FLOAT) 
    )/( 
      NullIfZero(ttl_move) 
    ) AS fract_move -- split fraction  
 FROM  
   ( 
      SELECT  
       new_from_xval AS from_x,  
       new_from_yval AS from_y,  
       new_to_xval AS to_x,  
       new_to_yval AS to_y,  
       Sum(no_of_lots) AS step_move -- compute lots that moved into 2 (or more) steps  
     FROM  
       ( 
          SELECT  
           t1.*,  
           c.day_of_calendar AS adder -- handle skipped steps  
           ,  
           from_xval + adder - 1 AS new_from_xval,  
           CASE WHEN adder > 1 THEN 2 ELSE from_yVal END AS new_from_yVal,  
           new_from_xval + 1 AS new_to_xval,  
           CASE WHEN new_to_xval = to_xVal THEN to_yval ELSE 2 END AS new_to_yval  
         FROM  
           path_tmp t1  
           LEFT JOIN sys_calendar.CALENDAR c ON nmbr_skip + 1 >= c.day_of_calendar  
         WHERE  
           nmbr_skip >= 0 
        ) x  
     GROUP BY  
       1,  
       2,  
       3,  
       4 
    ) t2 
) with data;

The output of the above SQL will be like this:

**Figure 3: Teradata nPath function output to calculate the path of the production lot.**

Now we must add terminal steps in the path table:

insert into path_table values (0,0,1,1,null,null,1.00); 
insert into path_table values (384,1,0,0,null,null,1.00);

Segment Combinations

The core functionality of histogram analytics is the combination of segments. There are 2 possibilities for this combination:

Serial combinations: one segment immediately follows another with no splits or merges in between
Parallel combinations: two segments move items equivalent distances in parallel

These simple combinations are described in detail in the following sections.

**Figure 4: Illustration of combining 2 serial segments.**

**Figure 5 illustrates two segments in parallel (same skey values) and their respective histogram models.**

Now we need to add Null histograms using the path and model table where y=2 (this is generated in the path table). This table can then be used for aggregating the histograms.

create table path_mdls_d as(
  select 
    a.model_id, 
    a.xin, 
    case when b.from_y = 1 then 1 else 2 end as yin, 
    a.model_type as metric, 
    a.model_create_dttm, 
    case when b.from_y = 1 then a.histgrmdl else null end as histgrblob 
  from 
    histogram_models a 
    inner join (
      select 
        distinct from_x, 
        from_y 
      from 
        path_table
    ) b on a.xin = b.from_x
) with data;

Projections

Based on current positions, individual segment models, and known path probabilities, the statistics can be applied to project lead time and yield statistics for future positions. This can be done from any existing location to any location by the following steps:

Determine all possible paths between the locations
Aggregate the histogram models for all segments along the possible paths to build a single aggregated histogram
Compute the probability of observing the item at the future location (i.e., the path load percentage).
Compute the statistics desired from the resulting histogram (typically any known time or yield in the current segment is offset from this calculation).

This capability allows the prediction of yield, lead times, and the most likely paths based on statistical models. Conceptually the technique is similar to running Monte Carlo simulations and building a histogram to represent the probable outcomes.

The most complex part of the projections is building the single aggregated histogram to represent all likely paths for a shipment or product. As items move in the supply chain, the calculations must be repeated to update the aggregated model.

The stored procedure was made for the projection as we start from the last step and combine all the histogram models and aggregate their probabilities from end to start based on certain conditions. Histogram models are continually added on in a series, as per the path table, until we have a merge point and a split, which is when we combine the split path in parallel and multiply their respective probabilities. The steps were repeated until we have a final aggregated histogram — a combination of all the steps which can then be used for further calculations.

**Figure 6 shows the complete idea of Histogram Analytics.**

Store procedure SQL is given below:

  REPLACE PROCEDURE projection_procedure( 
    IN xstop INTEGER,  
   IN met varchar(5) 
  ) SQL SECURITY INVOKER BEGIN DECLARE xstart INTEGER; 
  -- starting point at end of path  
  DECLARE cntr INTEGER; 
  -- general purpose counter 
  DECLARE vDummy CHAR(1); 
  -- create handler for nonexistent volatile tables  
  DECLARE CONTINUE HANDLER FOR SqlState '42000'  
  SET  
   vDummy = 'd'; 
  /* output table.  Assumes 1st model is always singular */ 
   
  /* may need modification for more complex systems */ 
  DROP  
   TABLE tmp_mdlAgg_out; 
  CREATE VOLATILE TABLE tmp_mdlAgg_out AS ( 
    SELECT  
     from_x AS last_x,  
     from_y AS last_y,  
     fract_move AS path_fraction,  
     histgrblob AS curr_model  
   FROM  
     path_table p  
     JOIN tw_path_mdls_d m ON p.from_x = m.xin -- model always associated with from step  
     AND p.from_y = m.yin  
   WHERE  
     m.metric = met  
     AND p.to_x = 0  
     AND p.to_y = 0 
  ) WITH DATA PRIMARY INDEX (last_x, last_y) ON COMMIT PRESERVE ROWS; 
  /* create work table for holding intermediate results */ 
  DROP  
   TABLE tmp_mdlAgg; 
  CREATE VOLATILE TABLE tmp_mdlAgg ( 
    last_x INTEGER,  
   last_y INTEGER,  
   path_fraction FLOAT,  
   curr_model VARBYTE(30000),  
   nmbr_rows SMALLINT,  
   row_ID SMALLINT 
  ) PRIMARY INDEX(last_x, last_y) ON COMMIT PRESERVE ROWS; 
  SELECT  
   Max(last_x) INTO xstart  
  FROM  
   tmp_mdlAgg_out; 
  WHILE xstart > xstop DO  
  DELETE FROM  
   tmp_mdlagg; 
  INSERT INTO tmp_mdlagg  
  SELECT  
   p.from_x,  
   p.from_y,  
   p.fract_move * t.path_fraction AS path_fract,  
   CASE WHEN m.histgrblob IS NULL THEN t.curr_model ELSE customfunctions.histgrcombseradd( 
      t.curr_model, m.histgrblob, 50, 1E - 6 
    ) END AS curr_model,  
   Count(*) Over (PARTITION BY p.from_x, p.from_y) AS nmbr_rows,  
   Row_Number() Over ( 
      PARTITION BY p.from_x,  
     p.from_y  
     ORDER BY  
       path_fract 
    ) AS row_id  
  FROM  
   tmp_mdlagg_Out t  
   JOIN path_table p ON p.to_x = t.last_x  
   AND p.to_y = t.last_y  
   JOIN path_mdls_d m ON m.xin = p.from_x  
   AND m.yin = p.from_y  
   AND t.last_x = ( 
      SELECT  
       Min(last_x)  
     FROM  
       tmp_mdlagg_out 
    )  
  WHERE  
   m.metric = met; 
  INSERT INTO tmp_mdlAgg_out  
  SELECT  
   last_x,  
   last_y,  
   path_fraction,  
   curr_model  
  FROM  
   tmp_mdlagg  
  WHERE  
   nmbr_rows = 1; 
  DELETE FROM  
   tmp_mdlagg  
  WHERE  
   nmbr_rows = 1; 
  SELECT  
   Count(*) INTO cntr  
  FROM  
   tmp_mdlagg; 
  IF cntr > 1 THEN  
  /* only works for 2 paths currently */ 
  INSERT INTO tmp_mdlagg_out  
  SELECT  
   t1.last_x,  
   t2.last_y,  
   1.0,  
   customfunctions.histgrcombpar( 
      t1.curr_model, t2.curr_model, t1.path_fraction,  
     t2.path_fraction, 50, 1.0E - 6 
    )  
  FROM  
   tmp_mdlagg t1  
   JOIN tmp_mdlagg t2 ON t1.last_x = t2.last_x  
   AND t1.last_y = t2.last_y  
   AND t1.nmbr_rows = 2  
   AND t2.nmbr_rows = t1.nmbr_rows  
  WHERE  
   t1.row_id = 1  
   AND t2.row_id = 2; 
  END IF; 
  SELECT  
   Min(last_x) INTO xstart  
  FROM  
   tmp_mdlagg_out; 
  END WHILE; 
  DROP  
   TABLE tmp_mdlagg; 
  END;

Procedure call for step cycle time:

call  projection_procedure(1,'sCT')

The results can be queried with a simple query like:

SELECT last_x 
, customfunctions.histgrcalcquant2val(curr_model, 0.05) AS q05 
, customfunctions.histgrcalcmed(curr_model) AS median_val 
, customfunctions.histgrcalcquant2val(curr_model, 0.95) AS q95 
, customfunctions.histgrplot(curr_model) AS histgrplot 
FROM tmp_mdlAgg_out

WHERE last_x = 1

**Figure 7 shows the combined histogram of all segments after running the projection procedure.**

Materialize the volatile table created by the procedure into a permanent table

CREATE TABLE outproj_sct_days AS ( 
  SELECT  
   *  
 FROM  
   tmp_mdlAgg_out 
) WITH DATA;

You can call the above procedure multiple times for process cycle time (pCT) or queue cycle time (qCT).

Computing Statistics

After the projection of the final histogram, certain calculations are required to examine the variability and bottlenecks. SQL UDFs were used to compute different percentiles of the final histogram, and results are used to draw the variability graph.

Below is the SQL to extract percentiles from the histogram.

CREATE TABLE cumulative_distribution_sct_days AS( 
SELECT last_x 
, customfunctions.histgrcalcmin(curr_model) AS min_time 
, customfunctions.histgrcalcquant2val(curr_model, 0.01) AS q01_time 
, customfunctions.histgrcalcquant2val(curr_model, 0.02) AS q02_time 
, customfunctions.histgrcalcquant2val(curr_model, 0.05) AS q05_time 
, customfunctions.histgrcalcquant2val(curr_model, 0.1) AS q10_time 
, customfunctions.histgrcalcquant2val(curr_model, 0.20) AS q20_time 
, customfunctions.histgrcalcquant2val(curr_model, 0.25) AS q25_time 
, customfunctions.histgrcalcquant2val(curr_model, 0.5) AS q50_time 
, customfunctions.histgrcalcquant2val(curr_model, 0.75) AS q75_time 
, customfunctions.histgrcalcquant2val(curr_model, 0.8) AS q80_time 
, customfunctions.histgrcalcquant2val(curr_model, 0.9) AS q90_time 
, customfunctions.histgrcalcquant2val(curr_model, 0.95) AS q95_time 
, customfunctions.histgrcalcquant2val(curr_model, 0.98) AS q98_time 
, customfunctions.histgrcalcquant2val(curr_model, 0.99) AS q99_time 
, customfunctions.histgrcalcmax(curr_model) AS max_time 
FROM outproj_sct_days 
WHERE last_y = 1 
) with data;

Now we can use the results in Python to plot the variability graph using Matplotlib.

td_df = tdml.DataFrame(tdml.in_schema("YourDatabase","cumulative_distribution_sct_days"))  
py_df=td_df.to_pandas() 
py_df['projected_average'] = py_df.mean(axis=1) 
 
fig,ax=plt.subplots(figsize=(12,5)) 
sns.lineplot(py_df.index,py_df["q01_time"],label="Projected 01st pctile") 
sns.lineplot(py_df.index,py_df["q05_time"],label="Projected 05th pctile") 
sns.lineplot(py_df.index,py_df["q95_time"],label="Projected 95th pctile") 
sns.lineplot(py_df.index,py_df["q99_time"],label="Projected 99th pctile") 
sns.lineplot(py_df.index,py_df["q50_time"],label="Projected median") 
ax.set_title('Projected Time to out (CHD)',fontsize=20,loc='center',fontdict=dict(weight='bold')) 
ax.set_xlabel('Partflow',fontsize=16,fontdict=dict(weight='bold')) 
ax.set_ylabel('Hours', fontsize=16,fontdict=dict(weight='bold'))

The output graph generated from the above Python script:

**Figure 8 shows the variability graph used to identify any bottlenecks in production.**

The user can observe in the graph and examine that step 302 has some uncertainty which can be investigated further at the hardware level (manufacturing machine, etc.). The same can be confirmed through the box plot where we can see that step 302 takes the highest queue time.

All the statistics and calculations are performed in Teradata Vantage. The total steps in our case are 384, so only 384 rows were extracted for data visualization, but this can be scaled to larger datasets.

Key Takeaways

We’ve developed a proven and patented methodology that will greatly improve predicting cycle time in sequential events.

This methodology can run entirely on Teradata. There’s no need to move your data around and everything stays securely within the Teradata ecosystem, saving you time, effort, and potential headaches.

Our solution is incredibly scalable. Whether you are dealing with a small-scale project or a massive operation, this solution can be used.

Once you implement our methodology, you will gain valuable insight into your sequential events. Using this knowledge, you can make data-driven decisions and optimize your processes with precision. You will have a clearer understanding of cycle times, enabling you to streamline operations, identify bottlenecks, and allocate resources effectively.

Feedback and Questions

We value your insights and perspective! Share your thoughts, feedback, and ideas in the comments below. Feel free to also explore the wealth of resources available on the Teradata Developer Portal and Teradata Developer Community.

About the Author

Taha has been working as a Data Scientist within the Data Science Department at Teradata for more than two years. He has enjoyed leveraging his expertise in Python, SQL, SAS, and Machine Learning to contribute towards projects within finance, retail, and supply chain.

Taha has a Bachelor of Computer Science from the National University of Computer and Emerging Sciences — FAST.

Connect with Taha on LinkedIn.