From Data to Insights: Exploring the Potential of Histogram Analytics in Multiple Industries

Mohammad Taha Wahab
Teradata
Published in
16 min readJul 20, 2023
Many different colored large shipping containers within a container port.
Photo by CHUTTERSNAP

As a result of rising expenses caused by inflation, businesses are actively seeking out creative approaches such as histogram analytics to handle the consequences on their supply chains and operations. By utilizing this technique, companies can gain insights into the potential consequences of different occurrences, enabling them to adapt their strategies to economic fluctuations and unforeseen disruptions. This blog will further discuss the practical application of histogram analytics in optimizing business operations, including a real-life example where we aided a semiconductor manufacturer in identifying a bottleneck within their operations.

Overview

Histogram analytics is a methodology that models transactional data on anything that goes through multiple processing steps (sequential/parallel). You can leverage it by generating histograms for each step at each cycle time and then generating a path table that shows the transition of each step with certain probability. These transition probabilities can then be aggregated using a certain set of rules to accomplish the following:

  • Location of items: current or last known location of all items of interest, effectively identifying the step or queue within the path/channel. This metric is used with cycle time, yield, and path selection models to establish the boundaries for prediction.
  • Quantity of items tracked into the current step (used with yield model for predicting output quantity).
  • Time of last event (used with cycle time model for predicting completion time).

Some use cases where histogram analytics can be leveraged include:

  • Supply chains involving goods to be moved through a series of steps before reaching their destination. (i.e., production, packing, transportation, warehouse, shelve)
  • Manufacturing industries requiring a series of steps to convert raw materials to finished goods.
  • Airport security clearances where a person must go through multiple steps before they board the aircraft.

Approach

The approach for introducing histogram analytics involves the following steps:

  • Organize the supply chain/manufacturing sequence into sequential events.
  • Define (expected) paths through the sequence with probabilities,
  • Define event-to-event models (detailed level) for basic cycle time and/or yield
  • Aggregate the models over multiple events and sequences, according to the expected paths, to evaluate performance and make predictions.

Now let’s further explore each step.

Events, Segments, and Queues

Events

As transactional data is collected, the transactions can be used to demarcate the beginning and end of processes. This suggests defining a segment as the process interval between appropriate data collection events.

In this context, a data collection event must contain the following:

  • A timestamp of the event
  • Items observed (e.g., shipments, products)
  • Location in the supply chain (e.g., physical, logical, equipment)

This point in time collection of information is termed an event. Events can be defined from most transactional systems in use such as shop floor systems, warehouse systems, logistics systems, and shipping.

Segments

A segment is then defined as the process between two significant sequential events. Note that it might be judicious to ignore some data collection points (too much data) and focus on the most important events for segment definition.

The segment process should be thought of in a very general way. It could represent a physical transformation (e.g., manufacturing), a movement (logistics), verification (testing, inspection), or other activities. Note that with these definitions, manufacturing activities and supply chains effectively have the same structure.

Identification of a segment is then associated (at a minimum) with the starting event and ending event. In some cases, it may be advantageous to include attributes of the items within the segment, although these can frequently be included in the event definitions or handled independently with the item management.

Queues

The end event of a segment may correspond to the starting event of the next segment. For example, a transaction of a truck in yard will begin the next process, that of docking. In these cases, there are no queues or other inventory hold locations between the segments.

Frequently, however, the end of one segment will not correspond to the beginning of the next. In this case, a queue will exist where products are temporarily held while awaiting the start of the next segment. Effectively a queue has its own start and end events corresponding to the end event of the last segment and the start event of the next, respectively. A queue is modeled differently from a segment, however, the cycle time of a queue is generally a function of the number of items in a queue and the processing rate of the next process.

We will be examining a real-life scenario where a semiconductor manufacturer encountered bottlenecks in their supply chain process. By employing the methodology of histogram analytics discussed in this blog, we were able to identify and address those specific issues.

Histogram Models

In the fast-paced world of semiconductor manufacturing, optimizing the supply chain process is vital for accurate prediction of wafer outs and overall efficiency. However, challenges arise as variability and bottlenecks impact the smooth flow of operations. That’s where the power of histogram analytics comes in. By leveraging this methodology, we dive deep into the supply chain, identify the specific processes that act as bottlenecks, and visualize them through plotted data. This allows us to not only pinpoint the problem areas but also pave the way for precise corrections and enhancements, ensuring seamless production and unlocking the true potential of semiconductor manufacturing.

The semiconductor manufacturer provided a step-by-step definition to identify a part flow number based on the combination of {site, technology, techcode, part, stage, eqpType}. We then created a table to extract part flow numbers and filter out only lots that are in production.

The table and its underlying data are then used for building histogram models.

For each of the steps, we built histogram models for step cycle time, queue cycle time, and process cycle time:

  • Step cycle time (metric=’sCT’) — time taken by a lot to complete the entire step from arrival to finish of processing
  • Queue cycle time (metric=’qCT’) — time spent waiting in a queue for processing a lot
  • Process cycle time (metric=’pCT’) — time required to process the lot by the machine

The process was used to build histogram models of the cycle time by machine process time (pCT), queue time (qCT), and total step time (sCT = pCT + qCT).

The SQL for building the models included the following:

  • Data limits are computed based on Tukey outlier limits — we calculated the upper quartile (Q3) and lower quartile (Q1) to smooth the tail of bar graphs, making the histograms close to a normal distribution.
  • The actual models are stored as binary objects (BLOBs) in the database.

A Tukey limit is the limit to remove the outlier based on quantile. Anything below Q1–1.5(Q3-Q1) or above Q3+1.5(Q3-Q1) are considered outliers, where Q1 is the first quartile and Q3 is the third quartile for distribution.

We can get different insights from different cycle times like queue cycle times which can tell us where in each step there is a bottleneck that is affecting the total time or total performance.

Firstly, to get started, we need to build histogram models on the base tables.

create table histogram_models as ( 
WITH tmp_data AS (
SELECT
a.model_id,
a.model_type,
a.segment_step -- keeping model_type for convenience
,
CASE WHEN a.model_type = 'sCT' THEN d.endinst_fday - d.startinst_fday WHEN a.model_type = 'pCT' THEN d.trackoutinst_fday - d.trackininst_fday WHEN a.model_type = 'qCT' THEN d.endinst_fday - d.startinst_fday -(
d.trackoutinst_fday - d.trackininst_fday
) ELSE NULL END AS CT_fday,
Count(*) Over (
PARTITION BY a.model_id, a.model_type
) AS data_cnt -- how many data points
FROM
Your_database.client_base_table1
JOIN (
-- get segment information from SME table
-- model ID for all possible technologies, parts, etc. built here for example
SELECT
Row_Number() Over (
ORDER BY
1
) AS model_id,
s.partflow AS segment_step,
s.site,
s.technology,
s.techcode,
s.part,
s.stage,
s.eqpType,
CASE c.day_of_calendar WHEN 1 THEN 'sCT' WHEN 2 THEN 'pCT' WHEN 3 THEN 'qCT' END AS model_type
FROM
database_name.client_base_table s,
sys_calendar.CALENDAR c
WHERE
c.day_of_calendar < 4
GROUP BY
2,
3,
4,
5,
6,
7,
8,
9
) a ON a.technology = d.technology
AND a.site = d.site
AND a.part = d.part
AND a.stage = d.stage
AND a.eqptype = d.eqptype
WHERE
d.technology = '******' -- most common technology
AND d.part = '*******'
AND d.lotType = 'P' -- production lot
AND d.prcdKind = 'N' -- normal processing
),
tmp_limits AS (
SELECT
t.model_id,
t.model_type,
t.segment_step,
Percentile_Cont(0.25) Within GROUP (
ORDER BY
ct_fday
) AS q25_ct,
Percentile_Cont(0.75) Within GROUP (
ORDER BY
ct_fday
) AS q75_ct,
CASE WHEN q25_ct < (
1.5 *(q75_ct - q25_ct)
) THEN 0 ELSE q25_ct - 1.5 *(q75_ct - q25_ct) END AS lcl_ct,
q75_ct + (
1.5 *(q75_ct - q25_ct)
) AS ucl_ct
FROM
tmp_data t
GROUP BY
1,
2,
3
)
SELECT
model_id,
segment_step AS xIn,
1 AS yIn,
model_type,
data_cnt,
avg_val,
stdev_val -- added some additional stats for info
,
Current_Timestamp(0) AS model_create_dttm,
-- if data points less than bins, use normal distribution.
CASE WHEN data_cnt < 50 THEN customfunctions.histgrgendistnorm(avg_val, stdev_val, 4, 1, 50) ELSE customfunctions.histgrgendataquant(
tb.lcl_ct, tb.ucl_ct, 50, 0, tb.ct_fday
) END AS histgrmdl
FROM
(
SELECT
d.model_id,
d.segment_step,
d.model_type,
d.ct_fday,
l.lcl_ct,
l.ucl_ct,
d.data_cnt,
Average(ct_fday) Over (
PARTITION BY d.model_id, d.segment_step,
d.model_type
) AS avg_val,
StdDev_Pop(d.ct_fday) Over (
PARTITION BY d.model_id, d.segment_step,
d.model_type
) AS stdev_val
FROM
tmp_limits l
JOIN tmp_data d ON d.model_id = l.model_id
AND d.segment_step = l.segment_step
AND d.model_type = l.model_type
WHERE
d.ct_fday BETWEEN l.lcl_ct
AND l.ucl_ct
AND l.ucl_ct > l.lcl_ct
) tb
GROUP BY
1,
2,
3,
4,
5,
6,
avg_val,
stdev_val
) with data;

Now we must create a view on top of base tables that specifies the part flow for each lot and apply filters on specific columns (to pull out only lots that are in production).

REPLACE VIEW v384_Some_view AS LOCKING ROW FOR ACCESS  
SELECT
t1.partflow,
t2.*
FROM
Your_database.client_base_table1 t1
JOIN Your_database.client_base_table2 t2 ON t1.site = t2.site
AND t1.technology = t2.technology
AND t1.techcode = t2.techcode
AND t1.part = t2.part
AND t1.eqptype = t2.eqptype
AND t1.stage = t2.stage
WHERE
t2.part IN (
SELECT
part
FROM
Your_database.client_base_table1
WHERE
site = '***'
AND technology = '******'
AND techcode = '******'
HAVING
Max(partflow) = 384
GROUP BY
1
)
AND lottype = 'P';

Generating Process Flow

The connection of segments to other segments and queues forms paths that are consistent with the sequence of processing. The paths can be thought of as a roadmap, connecting segment to segment and allowing for splits and merges between steps (e.g., parallel processing on components of an item). An important concept required by the methodology outlined herein is the requirement to weigh the paths, such that an expected load on a process or predicted path can be made. This is equivalent to assigning probabilities at splits (probabilities are not needed at merges).

Moving from event to event through the supply chain are sets of goods. These are items that are tracked through the events and segments. Some expected characteristics of the set of goods as it progresses include:

  • Different tracking identifications may be used — depending on the segment/process/channel.
  • Unit of measure changes may occur (many to few or few to many or change of name)
  • Consolidation or “break up” (e.g., multiple packages into a crate, or a truckload to multiple pallets).
  • Virtual vs. real items (tracking order status vs. actual shipments).

The set of goods can influence the path selection. For example, in semiconductor manufacturing, a lot (i.e., a set of goods) that passes testing at a very high speed may require different packaging to handle the expected heat and use profiles. Once the result of the test is known, the specific path for that lot can be selected.

Figure 1 shows the path a lot or wafer takes to complete its process

Segments (and queues) can be derived from the event-to-event sequencing of sets of goods. While both segments and queues are defined as the interval (process) between two sequential events, the analytical models are different for a process segment than for a queue. It is therefore necessary to explicitly identify queues — based on the likelihood of inventory being held at the queue prior to processing.

Evaluation of the event-to-event data also provides information about the paths and interconnections (connections between segments). As paths split and merge, historical data will typically be used to estimate the probability of a split. Note that there are two distinct types of splits:

  • Path splits — in this case, the set of goods is kept intact and a selection of one or more paths is made. This is equivalent to selecting a specific road when presented with two or more roads at an intersection.
  • Lot splits — in this case, the individual components in the set of goods are physically split and proceed on different paths. This is analogous to a single passenger load splitting into multiple aircraft at a hub.
Figure 2 shows the example of Figure 1 with path/lot splits identified. Note the percentages assigned to the splits and the types of splits. Since the segments in Figure 2 are connected (no isolation exists), there can only be one channel.

The process flow is derived from tracking an item through the process. This is conveniently done with Teradata’s nPath® function on the event sequence. Since some steps are skipped, additional modifications must be made to build a cartesian type of path. Then n-path is used to track lots through the steps.

nPath is a Teradata Vantage™ analytics function that looks for a pattern that we specify by scanning a set of rows in the database. The function provides users the flexibility to define a complex pattern in the input data and defined the values that are output for each matched input set.

nPath is very useful when the main objective is to identify a certain path that leads to an outcome and that is a reason why it is widely used in different industries.

create table path_table as( 
WITH path_tmp AS (
SELECT
path_from AS from_xVal,
1 AS from_yVal,
path_to AS to_xVal,
1 AS to_yVal,
path_to - path_from - 1 AS nmbr_skip -- compute if next step skipped
,
Count(
DISTINCT(lotid)
) AS no_of_lots
FROM
NPath (
ON (
SELECT
*
FROM
partflow_model_view
WHERE
lotType = 'P' -- production lots
AND prcdKind = 'N' -- normal lots
) PARTITION BY lotId -- traceable unit through process
ORDER BY
startinst -- step start time
USING MODE(OVERLAPPING) PATTERN ('A.B') Symbols(True AS A, True AS B) RESULT(
First(lotid OF A) AS lotid,
First(partflow OF A) AS path_from,
-- defines numeric sequence
First(partflow OF B) AS path_to
)
) AS dt
GROUP BY
1,
3
HAVING
no_of_lots > 15
)
SELECT
t2.*,
Sum(t2.step_move) Over (PARTITION BY t2.from_x, t2.from_y) AS ttl_move,
(
t2.step_move(FLOAT)
)/(
NullIfZero(ttl_move)
) AS fract_move -- split fraction
FROM
(
SELECT
new_from_xval AS from_x,
new_from_yval AS from_y,
new_to_xval AS to_x,
new_to_yval AS to_y,
Sum(no_of_lots) AS step_move -- compute lots that moved into 2 (or more) steps
FROM
(
SELECT
t1.*,
c.day_of_calendar AS adder -- handle skipped steps
,
from_xval + adder - 1 AS new_from_xval,
CASE WHEN adder > 1 THEN 2 ELSE from_yVal END AS new_from_yVal,
new_from_xval + 1 AS new_to_xval,
CASE WHEN new_to_xval = to_xVal THEN to_yval ELSE 2 END AS new_to_yval
FROM
path_tmp t1
LEFT JOIN sys_calendar.CALENDAR c ON nmbr_skip + 1 >= c.day_of_calendar
WHERE
nmbr_skip >= 0
) x
GROUP BY
1,
2,
3,
4
) t2
) with data;

The output of the above SQL will be like this:

Figure 3: Teradata nPath function output to calculate the path of the production lot.

Now we must add terminal steps in the path table:

insert into path_table values (0,0,1,1,null,null,1.00); 
insert into path_table values (384,1,0,0,null,null,1.00);

Segment Combinations

The core functionality of histogram analytics is the combination of segments. There are 2 possibilities for this combination:

  • Serial combinations: one segment immediately follows another with no splits or merges in between
  • Parallel combinations: two segments move items equivalent distances in parallel

These simple combinations are described in detail in the following sections.

Figure 4: Illustration of combining 2 serial segments.
Figure 5 illustrates two segments in parallel (same skey values) and their respective histogram models.

Now we need to add Null histograms using the path and model table where y=2 (this is generated in the path table). This table can then be used for aggregating the histograms.

create table path_mdls_d as(
select
a.model_id,
a.xin,
case when b.from_y = 1 then 1 else 2 end as yin,
a.model_type as metric,
a.model_create_dttm,
case when b.from_y = 1 then a.histgrmdl else null end as histgrblob
from
histogram_models a
inner join (
select
distinct from_x,
from_y
from
path_table
) b on a.xin = b.from_x
) with data;

Projections

Based on current positions, individual segment models, and known path probabilities, the statistics can be applied to project lead time and yield statistics for future positions. This can be done from any existing location to any location by the following steps:

  • Determine all possible paths between the locations
  • Aggregate the histogram models for all segments along the possible paths to build a single aggregated histogram
  • Compute the probability of observing the item at the future location (i.e., the path load percentage).
  • Compute the statistics desired from the resulting histogram (typically any known time or yield in the current segment is offset from this calculation).

This capability allows the prediction of yield, lead times, and the most likely paths based on statistical models. Conceptually the technique is similar to running Monte Carlo simulations and building a histogram to represent the probable outcomes.

The most complex part of the projections is building the single aggregated histogram to represent all likely paths for a shipment or product. As items move in the supply chain, the calculations must be repeated to update the aggregated model.

The stored procedure was made for the projection as we start from the last step and combine all the histogram models and aggregate their probabilities from end to start based on certain conditions. Histogram models are continually added on in a series, as per the path table, until we have a merge point and a split, which is when we combine the split path in parallel and multiply their respective probabilities. The steps were repeated until we have a final aggregated histogram — a combination of all the steps which can then be used for further calculations.

Figure 6 shows the complete idea of Histogram Analytics.

Store procedure SQL is given below:

  REPLACE PROCEDURE projection_procedure( 
IN xstop INTEGER,
IN met varchar(5)
) SQL SECURITY INVOKER BEGIN DECLARE xstart INTEGER;
-- starting point at end of path
DECLARE cntr INTEGER;
-- general purpose counter
DECLARE vDummy CHAR(1);
-- create handler for nonexistent volatile tables
DECLARE CONTINUE HANDLER FOR SqlState '42000'
SET
vDummy = 'd';
/* output table. Assumes 1st model is always singular */

/* may need modification for more complex systems */
DROP
TABLE tmp_mdlAgg_out;
CREATE VOLATILE TABLE tmp_mdlAgg_out AS (
SELECT
from_x AS last_x,
from_y AS last_y,
fract_move AS path_fraction,
histgrblob AS curr_model
FROM
path_table p
JOIN tw_path_mdls_d m ON p.from_x = m.xin -- model always associated with from step
AND p.from_y = m.yin
WHERE
m.metric = met
AND p.to_x = 0
AND p.to_y = 0
) WITH DATA PRIMARY INDEX (last_x, last_y) ON COMMIT PRESERVE ROWS;
/* create work table for holding intermediate results */
DROP
TABLE tmp_mdlAgg;
CREATE VOLATILE TABLE tmp_mdlAgg (
last_x INTEGER,
last_y INTEGER,
path_fraction FLOAT,
curr_model VARBYTE(30000),
nmbr_rows SMALLINT,
row_ID SMALLINT
) PRIMARY INDEX(last_x, last_y) ON COMMIT PRESERVE ROWS;
SELECT
Max(last_x) INTO xstart
FROM
tmp_mdlAgg_out;
WHILE xstart > xstop DO
DELETE FROM
tmp_mdlagg;
INSERT INTO tmp_mdlagg
SELECT
p.from_x,
p.from_y,
p.fract_move * t.path_fraction AS path_fract,
CASE WHEN m.histgrblob IS NULL THEN t.curr_model ELSE customfunctions.histgrcombseradd(
t.curr_model, m.histgrblob, 50, 1E - 6
) END AS curr_model,
Count(*) Over (PARTITION BY p.from_x, p.from_y) AS nmbr_rows,
Row_Number() Over (
PARTITION BY p.from_x,
p.from_y
ORDER BY
path_fract
) AS row_id
FROM
tmp_mdlagg_Out t
JOIN path_table p ON p.to_x = t.last_x
AND p.to_y = t.last_y
JOIN path_mdls_d m ON m.xin = p.from_x
AND m.yin = p.from_y
AND t.last_x = (
SELECT
Min(last_x)
FROM
tmp_mdlagg_out
)
WHERE
m.metric = met;
INSERT INTO tmp_mdlAgg_out
SELECT
last_x,
last_y,
path_fraction,
curr_model
FROM
tmp_mdlagg
WHERE
nmbr_rows = 1;
DELETE FROM
tmp_mdlagg
WHERE
nmbr_rows = 1;
SELECT
Count(*) INTO cntr
FROM
tmp_mdlagg;
IF cntr > 1 THEN
/* only works for 2 paths currently */
INSERT INTO tmp_mdlagg_out
SELECT
t1.last_x,
t2.last_y,
1.0,
customfunctions.histgrcombpar(
t1.curr_model, t2.curr_model, t1.path_fraction,
t2.path_fraction, 50, 1.0E - 6
)
FROM
tmp_mdlagg t1
JOIN tmp_mdlagg t2 ON t1.last_x = t2.last_x
AND t1.last_y = t2.last_y
AND t1.nmbr_rows = 2
AND t2.nmbr_rows = t1.nmbr_rows
WHERE
t1.row_id = 1
AND t2.row_id = 2;
END IF;
SELECT
Min(last_x) INTO xstart
FROM
tmp_mdlagg_out;
END WHILE;
DROP
TABLE tmp_mdlagg;
END;

Procedure call for step cycle time:

call  projection_procedure(1,'sCT') 

The results can be queried with a simple query like:

SELECT last_x 
, customfunctions.histgrcalcquant2val(curr_model, 0.05) AS q05
, customfunctions.histgrcalcmed(curr_model) AS median_val
, customfunctions.histgrcalcquant2val(curr_model, 0.95) AS q95
, customfunctions.histgrplot(curr_model) AS histgrplot
FROM tmp_mdlAgg_out

WHERE last_x = 1
Figure 7 shows the combined histogram of all segments after running the projection procedure.

Materialize the volatile table created by the procedure into a permanent table

CREATE TABLE outproj_sct_days AS ( 
SELECT
*
FROM
tmp_mdlAgg_out
) WITH DATA;

You can call the above procedure multiple times for process cycle time (pCT) or queue cycle time (qCT).

Computing Statistics

After the projection of the final histogram, certain calculations are required to examine the variability and bottlenecks. SQL UDFs were used to compute different percentiles of the final histogram, and results are used to draw the variability graph.

Below is the SQL to extract percentiles from the histogram.

CREATE TABLE cumulative_distribution_sct_days AS( 
SELECT last_x
, customfunctions.histgrcalcmin(curr_model) AS min_time
, customfunctions.histgrcalcquant2val(curr_model, 0.01) AS q01_time
, customfunctions.histgrcalcquant2val(curr_model, 0.02) AS q02_time
, customfunctions.histgrcalcquant2val(curr_model, 0.05) AS q05_time
, customfunctions.histgrcalcquant2val(curr_model, 0.1) AS q10_time
, customfunctions.histgrcalcquant2val(curr_model, 0.20) AS q20_time
, customfunctions.histgrcalcquant2val(curr_model, 0.25) AS q25_time
, customfunctions.histgrcalcquant2val(curr_model, 0.5) AS q50_time
, customfunctions.histgrcalcquant2val(curr_model, 0.75) AS q75_time
, customfunctions.histgrcalcquant2val(curr_model, 0.8) AS q80_time
, customfunctions.histgrcalcquant2val(curr_model, 0.9) AS q90_time
, customfunctions.histgrcalcquant2val(curr_model, 0.95) AS q95_time
, customfunctions.histgrcalcquant2val(curr_model, 0.98) AS q98_time
, customfunctions.histgrcalcquant2val(curr_model, 0.99) AS q99_time
, customfunctions.histgrcalcmax(curr_model) AS max_time
FROM outproj_sct_days
WHERE last_y = 1
) with data;

Now we can use the results in Python to plot the variability graph using Matplotlib.

td_df = tdml.DataFrame(tdml.in_schema("YourDatabase","cumulative_distribution_sct_days"))  
py_df=td_df.to_pandas()
py_df['projected_average'] = py_df.mean(axis=1)

fig,ax=plt.subplots(figsize=(12,5))
sns.lineplot(py_df.index,py_df["q01_time"],label="Projected 01st pctile")
sns.lineplot(py_df.index,py_df["q05_time"],label="Projected 05th pctile")
sns.lineplot(py_df.index,py_df["q95_time"],label="Projected 95th pctile")
sns.lineplot(py_df.index,py_df["q99_time"],label="Projected 99th pctile")
sns.lineplot(py_df.index,py_df["q50_time"],label="Projected median")
ax.set_title('Projected Time to out (CHD)',fontsize=20,loc='center',fontdict=dict(weight='bold'))
ax.set_xlabel('Partflow',fontsize=16,fontdict=dict(weight='bold'))
ax.set_ylabel('Hours', fontsize=16,fontdict=dict(weight='bold'))

The output graph generated from the above Python script:

Figure 8 shows the variability graph used to identify any bottlenecks in production.

The user can observe in the graph and examine that step 302 has some uncertainty which can be investigated further at the hardware level (manufacturing machine, etc.). The same can be confirmed through the box plot where we can see that step 302 takes the highest queue time.

All the statistics and calculations are performed in Teradata Vantage. The total steps in our case are 384, so only 384 rows were extracted for data visualization, but this can be scaled to larger datasets.

Key Takeaways

We’ve developed a proven and patented methodology that will greatly improve predicting cycle time in sequential events.

This methodology can run entirely on Teradata. There’s no need to move your data around and everything stays securely within the Teradata ecosystem, saving you time, effort, and potential headaches.

Our solution is incredibly scalable. Whether you are dealing with a small-scale project or a massive operation, this solution can be used.

Once you implement our methodology, you will gain valuable insight into your sequential events. Using this knowledge, you can make data-driven decisions and optimize your processes with precision. You will have a clearer understanding of cycle times, enabling you to streamline operations, identify bottlenecks, and allocate resources effectively.

Feedback and Questions

We value your insights and perspective! Share your thoughts, feedback, and ideas in the comments below. Feel free to also explore the wealth of resources available on the Teradata Developer Portal and Teradata Developer Community.

About the Author

Taha has been working as a Data Scientist within the Data Science Department at Teradata for more than two years. He has enjoyed leveraging his expertise in Python, SQL, SAS, and Machine Learning to contribute towards projects within finance, retail, and supply chain.

Taha has a Bachelor of Computer Science from the National University of Computer and Emerging Sciences — FAST.

Connect with Taha on LinkedIn.

--

--