5. Stored Procedures for Database Init, Data ETL and Data Reporting with MySQL

35 min readJul 29, 2023

This article is about creating Stored Procedures for Database Init, Data ETL and Data Reporting with MySQL. It is part of the project where all is automated. From the very beginning of data collection until the data analytics publishing on the website, there is no manual process.

More can be found in Github.

Architecture Overview

Requirements

* MySQL Workbench

* AWS RDS (MySQL)

* SQL Query (Stored Procedures for DataBase Init, Data Loading, Report Data Generating)

Deploying

ECS (AWS Elastic Container Service) and Lambda are used to execute all the procedures in the shared files automatically.

1) Once the .sql files are uploaded into s3 bucket, SQS will trigger lambda to check the files.

2) The files’ info will be passed in to ECS containers.

3) ECS will use MySQL Command Line to connect/execute all SQL scripts in the file at once.

4) MySQL Database in the same VPC will be set up completely.

Note:

As we use MySQL command line to execute .sql files in AWS RDS for MySQL, the procedures can’t return any rows when created. It is different from building procedures in tools like MySQL Workbench. If there are rows returned after creation, AWS RDS for MySQL will not continue with the rest of the .sql file. The procedures can’t be created completely.

The solution is:

aa) to eliminate all ‘SELECT’ statements in the procedure that will generate results.

bb) to apply ‘/* ….*/’ to comments. I coded the .sql file using different DELIMITERs and MySQL Command Line break the lines in the .sql line using one DELIMITER.

If one comment with ‘ — ‘ or ‘#’ is splitted into 2 lines will definitely cause error in AWS RDS for MySQL.

=======================> init.sql

/*
#### Database Init ####
#### This file builds procedures which can be used to create all the tables that needed for data loading.####
#### The tables can be used to store data in the following file of 'loading.sql'.
-- --------------------------------------------------------
-- 0. schema building
-- 1. create stored procedure (sp) for log table for loading process
-- Note: creating sp doesnot mean to create log table. we need to use sp to create table in the end.
-- 2. create stored procedure (sp) for general table for original data (Price Index)
 --  create stored procedure (sp) smaller tables according to their category (food, cosmetics, energy, tobacco)
-- 3. creare a parent SP to include all the above child SPs
-- 4. just call the parent SP, all child SPs will be called automatically
 -- tables will be created and database is ready for data loading in the next step
-- ----------------------------------------------------------
-- 0. for schema building
-- data for this website is about price index for 'Food', 'Cosmetics', 'Energy' and 'tobacco'.
-- tables are under one schema. One general table for original data. One general table for reporting.
-- Four subtables for four category. 
-- Procedure are mainly used for database init/data loading, analyzing and reporting.
-- A user will be created for database maintanance.
-- 0.1 create user 

-- MySQL allows to create user with sha256 password
-- drop user 'your_username'@'%';
-- create user 'your_username'@'%' Identified by 'your_password';
-- the .sql files which are uploaded by Terraform automatically will have hashed password
-- instead of plain text.
-- the password in plain text, keep it safe
-- also, we can use AWS secret manager to store the username and password
-- options vary.

-- login RDS (MySQL) with username and plain password on MySQL Workbench
-- build new connections with cloud MySQL using its endpoint

-- 0.2 grant priviledges to users
-- to grant priviledges as admin
-- GRANT ALL PRIVILEGES ON your_schema.* TO 'your_username'@'%';
-- more can be found :https://www.strongdm.com/blog/mysql-create-user-manage-access-privileges-how-to#:~:text=To%20create%20a%20user%20with,their%20privileges%20to%20other%20users.
-- REMEMBER to always grant the least priviledge for users

-- 1. to create sp for for log table(for loading and reporting)
-- the log table i created only shows the information i need for error handling. 
-- select CURRENT_TIMESTAMP;  the result is precision of 3.
-- select now(6) 
-- to make time precision be 6. the performance of db is very fast. 
-- a precision of 3 can't be enough to differenciate all log events
*/
DELIMITER && 
DROP PROCEDURE IF EXISTS your_schema.sp_init_logtable_for_loading ;
CREATE PROCEDURE your_schema.sp_init_logtable_for_loading (
 IN schema_name varchar(50),
    IN event_source varchar(50),
    IN table_name varchar (50),
    IN event_status tinyint,
    IN total_rows bigint,
    IN note_s varchar(800)
)   
BEGIN    
 -- to create log table if not exists
    set @sql =concat('
    create table IF NOT EXISTS `',schema_name, '`.`log_for_loading`  (
  Time_stamp datetime not null,
        EventSource varchar (50) NOT NULL,
        Tablename varchar (50) NOT NULL,
        Status tinyint not null,
        TotalRowsAffected bigint null,
        Notes varchar (800) null
 );');
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
    
    -- to insert log into log table
    set @sql=concat ('
    insert into `',schema_name, '`.`log_for_loading`  Values (''', now(6),''',''', replace(event_source,"'","''"),''',''',table_name,''',''',event_status,''',''',total_rows,''',''', replace(note_s,"'","''"),''');');
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;

END &&  
DELIMITER ;   
/*
-- ----------------------------------------------------------
-- call your_schema.sp_logtable_for_loading ("your_schema");
-- call your_schema.sp_logtable_for_reporting ("your_schema",);
-- ----------------------------------------------------------
-- there will be log table for reporting as well:
*/
DELIMITER && 
DROP PROCEDURE IF EXISTS your_schema.sp_init_logtable_for_reporting ;  
CREATE PROCEDURE your_schema.sp_init_logtable_for_reporting (
 IN schema_name varchar(50),
    IN procedure_name varchar(50),
    IN note_s varchar(800)
)   
BEGIN    
 -- to create log table if not exists
    set @sql =concat('
    create table IF NOT EXISTS `',schema_name, '`.`log_for_reporting`  (
  Time_stamp datetime not null,
        ProcedureName varchar(50),
        Notes varchar (800) null

 );');
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
    
    -- to insert log into log table
    set @sql=concat ('
    insert into `',schema_name, '`.`log_for_reporting`  Values (''', now(6),''',''', procedure_name,''',''',replace(note_s,"'","''"),''');');
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;

END &&  
DELIMITER ;  
-- 2. to create sp for general table --> Price Index
 -- to create smaller subcategory tables as well
DELIMITER && 
DROP PROCEDURE IF EXISTS your_schema.sp_init_create_tables_for_loading ;  
CREATE PROCEDURE your_schema.sp_init_create_tables_for_loading (
 IN schema_name varchar(50), 
    IN today_date varchar(50)
)   
BEGIN    
 /*
 -- create original data file, define all columns as string 
    -- allowing all data loaded first, later to split it into smaller tables based on
    -- their categories and check duplication in smaller tables as well
    -- mysql doesnot allow table name to be passed in as parameter,
    -- the solution is to use set @sql = here is the sql string
    -- then, execute sql later
    -- within sql string, + (plus sign) can't be used
    -- instead, we need to use concat to combine and get a complete string
    */
    set @sql =concat('
    drop table `',schema_name, '`.`0.PriceIndex` ;');
    PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
    
    set @sql =concat('
    create table IF NOT EXISTS `',schema_name, '`.`0.PriceIndex`  (
  Date varchar(255),
        GEO varchar(255),
        DGUID varchar(255),
        Products varchar(255),
        UOM varchar(255),
        UOM_ID varchar(255),
        SCALAR_FACTOR varchar(255),
        SCALAR_ID  varchar(255),
        VECTOR varchar(255),
        COORDINATE varchar(255),
        VALUE varchar(255),
        STATUS varchar(255),
        SYMBOL varchar(255),
        `TERMINATED` varchar(255),
        DECIMALS varchar(255),
        Primary Key (GEO,Date,Products)
        
 );');
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
    /*
    -- the original data file should be created by date
    -- as we won't wish everytime new data loads/inserts into one table 
    -- and database has to do a lot of work on checking/excluding duplicates
    -- only today's new data will be processed
        
        
    ### while ####
    -- for each category, only one table will be created
 -- in the original data file, the composite pk is Date and Products
    -- the divided tables will inherit this pk
    -- but the process is a little bit different
    -- load data into temp first,
    -- if everything goes well, UPSERT info into persistant table 
 
 -- create table for food
    */
    set @sql=concat('
    create table IF NOT EXISTS `', schema_name , '`.`1.Food` (
  GEO char(10),
  Date char(10),
  Year int,
  Month tinyint, 
  Products varchar(100),
        Measurement varchar(100),
  `Products Details` varchar(255),
  Price decimal(10,2),
        Status tinyint DEFAULT NULL,
        Primary Key (GEO,Date,`Products Details`)
 ); ');
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
    -- create table for Gas
    set @sql = concat('
    create table IF NOT EXISTS `', schema_name , '`.`3.Energy` (
  GEO char(10),
  Date char(10),
  Year int,
  Month tinyint, 
  Products varchar(100),
        Measurement varchar(100),
  `Products Details` varchar(255),
  Price decimal(10,2),
        Status tinyint DEFAULT NULL,
        Primary Key (GEO,Date,`Products Details`)
 ); ');
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
    -- create table for Cosmetics
    set @sql =concat('
    create table IF NOT EXISTS `', schema_name , '`.`2.Cosmetics` (
     GEO char(10),
  Date char(10),
  Year int,
  Month tinyint, 
  Products varchar(100),
        Measurement varchar(100),
  `Products Details` varchar(255),
  Price decimal(10,2),
        Status tinyint DEFAULT NULL,
        Primary Key (GEO,Date,`Products Details`)
 ); ');
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
    -- create table for Tobacco
    set @sql=concat('
    create table IF NOT EXISTS `', schema_name , '`.`4.Tobacco` (
     GEO char(10),
  Date char(10),
  Year int,
  Month tinyint, 
  Products varchar(100),
        Measurement varchar(100),
  `Products Details` varchar(255),
  Price decimal(10,2),
        Status tinyint DEFAULT NULL,
        Primary Key (GEO,Date,`Products Details`)
 ); ');
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
    /*
    -- there are other tables to be created as well
    -- 1. log table for loading
    -- 2. log table for reporting 
    */
    
END &&  
DELIMITER ;   
/*
-- ----------------------------------------------------------
-- call your_schema.sp_create_tables_for_loading ("your_schema","20230528");
 
-- 4. to create parent SP for database init 
*/
DELIMITER && 
DROP PROCEDURE IF EXISTS your_schema.sp_init_database ;  
CREATE PROCEDURE your_schema.sp_init_database (
 IN schema_name varchar(50), 
    IN today_date varchar(50),
    IN db_event_source varchar(50),
    IN table_name varchar (50),
    IN db_event_status tinyint,
    IN total_rows_affected bigint,
    IN note_s varchar(800)
    )    
BEGIN  
 call your_schema.sp_init_logtable_for_loading (schema_name,db_event_source,table_name,db_event_status,total_rows_affected,note_s);
    call your_schema.sp_init_logtable_for_reporting (schema_name,db_event_source,note_s);
 call your_schema.sp_init_create_tables_for_loading (schema_name,today_date);
    call your_schema.sp_init_calendar_month (schema_name);

END &&  
DELIMITER ;   
/*
-- ----------------------------------------------------------
-- call your_schema.sp_init_database ("your_schema","20230528","testing DB init"," ",1,0,"database init successfully");
-- THIS ONE SINGLE SQL QUERY WILL GET MYSQL READY TO USE

-- AAA) for testing, we can use below sp to clear all tables and sp created by one owner
-- Note:
-- In order to make it more flexible, i add a parameter of 'excluding_tables'\
-- this allows the sp to delete all tables except for the tables stated in this parameter
-- SELECT * FROM information_schema.tables; 
-- this command will help you find all tables in the db
*/
DELIMITER && 
DROP PROCEDURE IF EXISTS your_schema.sp_init_drop_all_tables ;  
CREATE PROCEDURE your_schema.sp_init_drop_all_tables (
 IN schema_name varchar(50),
    IN excluding_tables varchar(255) 
    )    
BEGIN  
 -- the query will get us a column with multiple rows
    -- use CURSOR to get all rows into a string
 DECLARE var_tables varchar (255);
    DECLARE finished INT DEFAULT 0;
 DECLARE cursor_table_list CURSOR for SELECT table_name FROM information_schema.tables 
  WHERE table_schema = schema_name;
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET finished =1;
    OPEN cursor_table_list; 

 table_loop: LOOP
  IF finished =1 THEN
   LEAVE table_loop;
        END IF;
  FETCH cursor_table_list INTO var_tables;
        IF excluding_tables is not null AND locate(excluding_tables,var_tables)>0 THEN
   -- DONT DELETE
            SET @sql= concat('select ',1);
  ELSE 
   SET @sql=CONCAT('DROP TABLE IF EXISTS `',schema_name,'`.`', var_tables, '`;');
  END IF;
  PREPARE dynamic_statement FROM @sql;
  EXECUTE dynamic_statement;
  DEALLOCATE PREPARE dynamic_statement;
 END LOOP table_loop;

END &&  
DELIMITER ;   
/*
-- call your_schema.sp_init_drop_all_tables("your_schema","priceindex")

-- BBB) i found most reports require info month, weekday, date
-- to create a sp or user defined function for lookup sometimes better than build-in functions
-- below SP accepts numeric or none-numeric input
-- it will help check if the input is valid month info and find the coresponding month number 
*/
DELIMITER && 
DROP PROCEDURE IF EXISTS your_schema.sp_init_calendar_month ;  
CREATE PROCEDURE your_schema.sp_init_calendar_month (
 IN schema_name varchar(50)
    )    
BEGIN  
 /*
 -- note : 2 digit month number can be achieved by LPAD(MONTHNUMBER,2,0)
    -- set @schema_name ='your_schema';
    */
 set @sql= concat('
 create table IF NOT EXISTS `', schema_name, '`.`99.month` (
  month_number_1_digit tinyint,
        month_name_short char(3),
        month_name_long varchar(20) ,
        month_name varchar(50),
        PRIMARY KEY (month_number_1_digit)
        
        );'
 );
    
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
 set @sql= concat('
 TRUNCATE table  `', schema_name, '`.`99.month` ;'
 );
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
    -- insert values
 set @sql= concat('
 insert ignore into `', schema_name, '`.`99.month` values 
  ("1","jan","january","jan, january"),
        ("2","feb","february","feb, February"),
  ("3","mar","march","mar, march"),
        ("4","apr","april","apr, april"),
        ("5","may","may","may, may"),
        ("6","jun","june","jun, june"),
        ("7","jul","july","jul, july"),
        ("8","aug","august","aug, august"),
        ("9","sep","september","sep, september"),
        ("10","oct","october","oct, october"),
        ("11","nov","november","nov, november"),
        ("12","dec","december","dec, december");'
 );
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
    
END &&  
DELIMITER ;  
/*
-- call your_schema.sp_init_calendar_month ('your_schema');

-- call your_schema.sp_init_calendar_month_checking('your_schema','12',@single_month_string);
-- select @single_month_string;
-- below SP is check single_month_string, the result be either 0 or a correct month number
*/
DELIMITER && 
DROP PROCEDURE IF EXISTS your_schema.sp_init_calendar_month_checking;  
CREATE PROCEDURE your_schema.sp_init_calendar_month_checking (
 IN schema_name varchar(50),
    IN month_str varchar(50),
    OUT month_num tinyint
    )    
BEGIN  
 /*
 -- note: 
    -- 2 digit month number * 1 = 1 digit month number
    -- character month name * 1 =0
    */
 set @result='';
 set @month_str=lower(month_str);
 set @sql= concat('
 select month_number_1_digit into @result from `', schema_name, '`.`99.month` 
  WHERE lower(month_name_short)= "', @month_str, '"
  or lower(month_name_short)= "', @month_str, '"
  or month_number_1_digit= "', @month_str, '"*1 ;'
 );
 -- select @sql;
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;

 IF @result='' THEN
  /*
  -- the month_string is not an invalid month name or month number
        -- there is no 'isnumeric' function in mysql to determine
        -- if a value is number or not
        -- the solution is to use REGEXP '^[0-9]+$' to check for integer
        -- REGEXP '^[0-9]+\\.?[0-9]*$' for float
        */
        IF @month_str REGEXP '^[0-9]+$' THEN
   IF @month_str>12 THEN
    set month_num=12;
   ELSEIF @month_str <1 THEN 
    set month_num=1;
   END IF;
  ELSE
   /*
            -- it is an invalid string
            */
   set month_num=0;
  END IF;
 ELSE
  set month_num=@result;
 END IF;    
    -- select @month_num;    
END &&
DELIMITER ;  

call your_schema.sp_init_database ("your_schema","20230528","testing DB init"," ",1,0,"database init successfully");

=======================> loading.sql

/*
#### Data Loading ####
#### This file builds procedures that can be used to load and split data.####
#### The data is can be used for reporting in the following file of "reporting.sql".####
-- ----------------------------------------------------------
-- 0. create stored procedure (sp) for logging
 -- this part is included in the file of 'init.sql'
-- 1. create stored procedure (sp) for load data from s3(or other datasource) to a temp table 
 -- 1.1 create temp table 
    -- here, we need to communicate with Lambda for data loading 
    -- because RDS Mysql doesnot support 'load data from s3' command
    -- like aurora mysql does.
    -- 1.2 count in temp table
    -- 1.3 load from data file to temp table
-- 2. create stored procedure (sp) for upsert data from temp table to the permanent table (general table in this project)
-- 3. create stored procedure (sp) for upsert data from permanent table to sub tables based on their category
 -- 3.1 update general table with status column according to categories
    -- 3.2 load data into sub tables according to status
-- 4. create stored procedure (sp) to Check if there are data not split into their category table yet
 -- return the check result as OUT parameter 
-- 5. creare a parent SP to include all the above child SPs
-- 6. just call the parent SP, all child SPs will be called automatically
 -- data will be loaded, log will be written if error
-- ----------------------------------------------------------
-- things we should know about AWS RDS MYSQL~~~
-- Please note, I didn't mean Aurora MySQL.
-- aaa) mysql in aws doesnot support to use one command to import/export data from/to s3
-- for other db engines like aurora, postgres, sql server, oracle, even including redshift
-- they are easier to do the data transfer, the db engines can access s3 for direct import/export
-- but for RDS MySQL, we have no way but to apply a 3rd party AWS service to do the ETL~
-- AWS recommands data pipeline in Glue, but it is not free....
-- To reduce cost, I figure out the best way is to apply Lambda, my favorite, to load data with python. 
-- If you wish to spend less and don't mind coding, Lambda is a great choice.
-- If you don't want any coding, try and pay for GLUE.
-- (I will share Lambda function in python in another repository. )

-- bbb) using lambda to do ETL, be aware it only has a payload of 6mib
-- but we can learn from online examples to overcome this limitation
-- or we can use other computation in aws as well, EC2 or ECS (of course, these more powerful computation
-- systems are more expensive). 

-- ----------------------------------------------------------
-- 1. lambda checks the original data file and count the ROW NUMBER
-- 2. lambda call the sp in mysql to create a temp table
-- 3. after checking the log table for temp table creation
 -- lambda load data from datasource (s3) into temp table in mysql (using cursor)
 -- this will be achieved by Lambda from outside of mysql
-- 4. after loading to temp table, lambda will call parent SP and Mysql will proceed from here 
 -- until it finishes the loading completely
-- ----------------------------------------------------------------
-- 1 to create sp to build temp table
-- the tempt table is used to store data from .csv file in s3
-- the columns and their order in mysql table must be the same as .csv file 
-- the CREATE TABLE statement can be generated dynamically by lambda or EC2 or ECS
-- or can be pre-defined here in sp, of course, when original file changes
-- the sp here needs changing accordingly.
-- to reduce the management overhead, I use lamda to do some of the ETL
-- make the original data file in s3 only contains the columns we need for the report
-- these columns donot change frequently, if there is any change in the data source,
-- lambda will detect and make sure the .csv file is still loadable for mysql
*/

DELIMITER &&  
DROP PROCEDURE IF EXISTS your_schema.sp_loading_create_temp_table;
CREATE PROCEDURE your_schema.sp_loading_create_temp_table (
 IN schema_name varchar (50),
    IN temp_tablename varchar(50),
 IN table_name varchar(50),
    OUT result_for_sp_loading_create_temp_table tinyint
    )    
BEGIN   
 -- 1. Declare variables to hold diagnostics area information
 DECLARE exit handler for SQLEXCEPTION
  BEGIN
   GET DIAGNOSTICS CONDITION 1 @sqlstate = RETURNED_SQLSTATE, 
   @errno = MYSQL_ERRNO, @text = MESSAGE_TEXT;
   SET @full_error = CONCAT("ERROR ", @errno, " (", @sqlstate, "): ", @text);
   SELECT @full_error;
            set result_for_sp_loading_create_temp_table=0;
            call your_schema.sp_init_logtable_for_loading (schema_name, '1. temp table creation', temp_tablename,0, 0,@full_error);
    
        END;

 -- 2. the body SP
    START TRANSACTION;
  set @sql =concat('
  DROP TABLE IF EXISTS `',schema_name, '`.`', temp_tablename, '` ;'
  );
  PREPARE dynamic_statement FROM @sql;
  EXECUTE dynamic_statement;
  DEALLOCATE PREPARE dynamic_statement;

  -- to copy the table structure from permanent table
  -- temporary
  set @sql =concat ('
  CREATE TABLE  `',schema_name, '`.`', temp_tablename, '` LIKE  `',schema_name, '`.`', table_name, '`;'
          
  );
  PREPARE dynamic_statement FROM @sql;
  EXECUTE dynamic_statement;
  DEALLOCATE PREPARE dynamic_statement;
  set result_for_sp_loading_create_temp_table=1;
  -- after creating the temp table
  -- write log in the log table
  call your_schema.sp_init_logtable_for_loading (schema_name, '1. temp table creation', temp_tablename,1, 0, concat("created from `",schema_name,"`.`",table_name,"`"));
  
    COMMIT; 
    select @result_for_sp_loading_create_temp_table;
END &&  
DELIMITER ;  
/*
-- call your_schema.sp_loading_create_temp_table ('your_schema', 'priceindex_all_temp', '0.PriceIndex',@temp_table_created);
-- select @temp_table_created;
-- DROP TABLE IF EXISTS `your_schema`.`priceindex_all_temp`;
-- select * from priceindex_all_temp;
-- CREATE TABLE  `your_schema`.`priceindex_all` LIKE  `your_schema`.`0.priceindex`;
-- select status from log_for_loading where locate('temp table creation',EventSource)>0 and timediff(now(),Time_stamp)<10
-- ----------------------------------------------------------------
-- 2. after lambda checks log table for a eventsource of 'temp table creation' and find its status is 1
-- (or, to get the returned value @temp_table_created from sp)
-- it will load the data first and then call the following SP
-- call your_schema.sp_loading_PriceIndex("your_schema","temp","0.PriceIndex", "priceindex.csv", 18516,5, @data_loading_splitting);
-- select @data_loading_splitting;
*/

DELIMITER && 
DROP PROCEDURE IF EXISTS your_schema.sp_loading_PriceIndex; 
CREATE PROCEDURE your_schema.sp_loading_PriceIndex (
 IN schema_name varchar (50),
    IN temp_tablename varchar(50),
 IN table_name varchar(50),
    IN file_name varchar (50),
    IN total_rows bigint, maxerrors_allowed tinyint,
    OUT result_for_sp_loading_PriceIndex tinyint
    )    
BEGIN 
 -- 1. Declare variables to hold diagnostics area information
 DECLARE exit handler for SQLEXCEPTION
  BEGIN
   GET DIAGNOSTICS CONDITION 1 @sqlstate = RETURNED_SQLSTATE, 
   @errno = MYSQL_ERRNO, @text = MESSAGE_TEXT;
   SET @full_error = CONCAT("ERROR ", @errno, " (", @sqlstate, "): ", @text);
   SELECT @full_error;
            set result_for_sp_loading_PriceIndex=0;
            call your_schema.sp_init_logtable_for_loading (schema_name, '2. data loading and splitting', temp_tablename,0, 0,@full_error);    
        END;
 -- 2. the body SP
    START TRANSACTION;  
  -- 1. MySQL is to count how many rows were loaded into temp table by lambda
  call your_schema.sp_loading_count_table(schema_name, temp_tablename, @result_rows,"");

  -- 2. to compare the numbers 
  IF @result_rows >= (total_rows -maxerrors_allowed) THEN
   -- to load
   call your_schema.sp_loading_upsert_general_table (schema_name, temp_tablename, table_name);
   -- to write log for successful loading
   call your_schema.sp_init_logtable_for_loading (schema_name, concat("2.1 permanent table loading from temp table : ",temp_tablename), table_name,1, @result_rows,concat("loading from temptable to general table successfully. ","Rows skipped: ",(total_rows - @result_rows)));
   
   -- after loading to general table successfully,
   -- start loading for sub tables
   -- 3. to decide if split general table into sub tables
   -- for some reports, sub tables make report building easier
   -- in my website, I need subcategory tables 
   IF locate ('priceindex',table_name)>0 THEN
    -- update general table with status and load data to sub tables
    call your_schema.sp_loading_priceindex_split (schema_name, table_name);
    -- check if the loading is complete
    call your_schema.sp_loading_check_remaining_after_split (schema_name, table_name, @check_remain);
    
    IF @check_remain =0 THEN
     -- complete loading
     -- write log
     set result_for_sp_loading_PriceIndex =1;
     call your_schema.sp_init_logtable_for_loading (schema_name, "2.2 general table splitting", table_name,1, @result_rows,"loading to subcategory tables successfully"); 
     
    ELSE 
     set result_for_sp_loading_PriceIndex =0;
     call your_schema.sp_init_logtable_for_loading (schema_name, "2.2 general table splitting", table_name,0, 0,"loading to subcategory tables failed");
    END IF;
   -- ELSE
    -- loading process for other data files other than priceindex
   END IF;
  ELSE 
   -- don't load data from temp_table to permanent table
   -- just to write log for unsuccessful loading
            set result_for_sp_loading_PriceIndex =0;
   call your_schema.sp_init_logtable_for_loading (schema_name, file_name, "2.1 permanent table loading failed",0, 0,'temp table contains incomplete data from original data file');
   
  END IF;
    commit;
    -- use session variable to carry the value
    select @result_for_sp_loading_PriceIndex;

END &&  
DELIMITER ;
/*
-- -------------------------------------------------------------------------
-- 3. below is another child SP to count rows in temptable
-- if the number is the same as lambda counts in the .csv file in s3
-- means all rows were loaded to mysql from s3
-- Note :
-- In order to make this sp more versatile in this project
-- a parameter of 'WHRE clause' can be added, so that the sp can count the rows on some conditions
-- later , when we check the remaining rows in general table after splitting, 
-- use "where status is null" to get the row number
*/

DELIMITER &&  
DROP PROCEDURE IF EXISTS your_schema.sp_loading_count_table;
CREATE PROCEDURE your_schema.sp_loading_count_table (
 IN schema_name varchar(50),
 IN table_name varchar(50),
    OUT total_for_sp_loading_count_table bigint,
    IN where_clause varchar (255)
    )    
BEGIN    
 -- in order to get a value from sp
    -- we need a 'OUT' parameter
    -- the steps are :
    -- a) to reset @aaaa before using
    -- b) to get the count value into a variable @aaaa
    -- c) to set the out parameter equals @aaaa
    -- d) to select the session variable @aaaa in the end of procedure
    
    IF where_clause is null THEN
  set @sql = concat ('
  select COUNT(*) into @row_no from `',schema_name, '`.`', table_name, '`;'     
  );
 ELSE 
  set @sql = concat ('
  select COUNT(*) into @row_no from `',schema_name, '`.`', table_name, '` ',
         where_clause, ' ;'     
  );
    END IF;
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
    
    set total_for_sp_loading_count_table = @row_no;
    
 select @total_for_sp_loading_count_table;
END &&  
DELIMITER ;   
/*
-- to get the out paramter: 
-- call sp_loading_count_table ("your_schema", "temp",@result_row);
-- select @result_row;
-- -------------------------------------------------------------------------
-- 4. below is child sp for upsert from template to permanent table (general table in this project)
-- If there is PK or unique index in the table, any duplicate loading would cause error of 'key violation'
-- in order to update only new records into permanent table and avoid error of key violation,
-- UPSERT is applied here
-- there are 3 ways to upsert, i chose method of 'REPLACE' for this project
-- because one report from dashboard is based on this table, we have to make data in this table complete,accurate, up to date, and no duplicates
*/
DELIMITER &&  
DROP PROCEDURE IF EXISTS your_schema.sp_loading_upsert_general_table;
CREATE PROCEDURE your_schema.sp_loading_upsert_general_table (
 IN schema_name varchar(50),
 IN temp_tablename varchar(50),
 IN table_name varchar(50)
    )    
BEGIN  
 -- make sure all status is null in the temporary table
    -- later the status column will be updated according to sub category
    set @sql=concat ('
 UPDATE  `', temp_tablename, '`',
  ' set status = null;');
    SET SQL_SAFE_UPDATES = 0;    
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
    SET SQL_SAFE_UPDATES = 1;
    
 -- load for 0.PriceIndex
    set @sql=concat ('
 REPLACE INTO  `', schema_name, '`.`', table_name, '` ',
  'select 
   *
  from `', temp_tablename,'`');
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
END &&  
DELIMITER ; 
/*
-- -------------------------------------------------------------------------
-- call your_schema.sp_loading_upsert_general_table ("your_schema", "temp", "0.priceindex")
-- call your_schema.sp_loading_upsert_general_table ("your_schema",  "0.PriceIndex","temp") ; for testing purpose
-- call your_schema.sp_loading_priceindex_split ("your_schema", "0.priceindex");
-- 5. to create SP for loading --> Food, Energy, Cosmetics, Tobacco
*/
DELIMITER &&  
DROP PROCEDURE IF EXISTS your_schema.sp_loading_priceindex_split;
CREATE PROCEDURE your_schema.sp_loading_priceindex_split (
 IN schema_name varchar(50),
 IN table_name varchar(50)
    )    
BEGIN    
 -- the original data files contains various products 
    -- the data will be split into sub tables according to their categories
    -- @tablename=your_schema.`0.PriceIndex`;
    
    -- use comma number to get how many kinds of product there are.
 set @food_list ="'steak', 'roast', 'beef', 'chicken', 'pork', 'bacon', 'wiener', 'salmon','milk','butter','cheese','egg','bread', 'cracker','macaroni','flour','corn','apple','banana','grape','orange','juice','cabbage','carrot','celery','mushroom','onion','potato','fried','bake','bean','canned','ketchup','sugar','coffee','tea','cooking','salad','oil','soup','food','peanut','fruit','drink','cola','lemon'";
    
    set @cosmetics_list="'laundry','detergent','facial','towel','tissue','bathroom','shampoo','deodorant','toothpaste'";
    
    set @energy_list="'gasoline'";
    
    set @tobacco_list="'cigarette'";

 SET @sql=concat('update
  `', schema_name, '`.`',table_name ,
  '` set STATUS =1
  WHERE products REGEXP "steak|roast|beef|chicken|pork|bacon|wiener|salmon|milk|butter|cheese|egg|bread|cracker|macaroni|flour|corn|apple|banana|grape|orange|juice|cabbage|carrot|celery|mushroom|onion|potato|fried|bake|bean|canned|ketchup|sugar|coffee|tea|cooking|salad|oil|soup|food|peanut|fruit|drink|cola|lemon"
         and STATUS is null;'
        );
    SET SQL_SAFE_UPDATES = 0;
  PREPARE dynamic_statement FROM @sql;
  EXECUTE dynamic_statement;
  DEALLOCATE PREPARE dynamic_statement;
 SET SQL_SAFE_UPDATES = 1; 

    -- after we update STATUS for all food products,
    -- now to update for the second category: cosmetics

 SET @sql=concat('update
  `', schema_name, '`.`',table_name ,
  '` set STATUS =2
  WHERE products REGEXP "laundry|detergent|facial|towel|tissue|bathroom|shampoo|deodorant|toothpaste"
   and STATUS is null;'
        );
  -- update save mode will stop below from running
 SET SQL_SAFE_UPDATES = 0;
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
 SET SQL_SAFE_UPDATES = 1;

    -- next is to udpate according to the third category
 
 SET @sql=concat('
  update `', schema_name, '`.`',table_name ,
   '` set STATUS=3 
  where products REGEXP "gasoline"
   and STATUS is null;' );
 SET SQL_SAFE_UPDATES = 0;
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
 SET SQL_SAFE_UPDATES = 1;

 -- next is to udpate according to the fourth category
 SET @sql=concat('
  update `', schema_name, '`.`',table_name ,
   '` set STATUS=4 
  where products REGEXP "cigarette"
   and STATUS is null;' );
   
 SET SQL_SAFE_UPDATES = 0;
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
 SET SQL_SAFE_UPDATES = 1;
 /*
    -- after we got original data file updated
    -- to start loading for 1st category
    -- in this project, the data will be continuously got from online
    -- the new data might be duplicate with existing data in database
    -- therefore , a UPSERT will be applied here. 
 -- UPSERT CAN BE ACHIEVED IN 3 WAYS:
    -- https://www.javatpoint.com/mysql-upsert#:~:text=UPSERT%20is%20one%20of%20the,words%20named%20UPDATE%20and%20INSERT.
    -- for TESTING purpose, all these 3 methods will be used
    -- 1. INSERT IGNORE INTO...
    */
 -- load for food 
  set @sql=concat ('
  insert IGNORE into `', schema_name, '`.`1.Food`  (GEO,Date, Year, Month, Products, Measurement,`Products Details`, Price, Status)
   select 
    GEO,
    Date, 
    LEFT(date,4) as Year, 
    RIGHT(date,2) as Month,
    SUBSTRING_INDEX(Products,",",(LENGTH(Products)-LENGTH(REPLACE(Products,",","")))) as Products,
                substring_index(Products,",",-1) as Measurement,
    Products as `Products Details`,
    Value as Price,
    STATUS
   from `', schema_name , '`.`', table_name ,
   '` where status =1;');
 
 SET SQL_SAFE_UPDATES = 0;
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
    SET SQL_SAFE_UPDATES = 1; 
 /*  
 -- 2. UPSERT using Replace:
    -- load for cosmetics
 */
    set @sql=concat('
 REPLACE INTO `', schema_name , '`.`2.Cosmetics` (GEO,Date, Year, Month, Products,Measurement, `Products Details`, Price, Status)
  select 
   GEO,
   Date, 
            LEFT(date,4) as Year, 
            RIGHT(date,2) as Month,
            SUBSTRING_INDEX(Products,",",(LENGTH(Products)-LENGTH(REPLACE(Products,",","")))) as Products,
            substring_index(Products ,",",-1) as Measurement,
            Products as `Products Details`,
            Value as Price,
            Status
  from `', schema_name , '`.`', table_name ,
  '` where status =2;');
 SET SQL_SAFE_UPDATES = 0;
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
    SET SQL_SAFE_UPDATES = 1;    
 -- 3. UPSERT with ON DUPLICATE KEY UPDATE clause:
    -- load for energy:
    set @sql=concat('
 INSERT INTO `', schema_name , '`.`3.Energy` (GEO,Date, Year, Month, Products, Measurement, `Products Details`, Price, Status)
  select 
   GEO,
   Date, 
            LEFT(date,4) as Year, 
            RIGHT(date,2) as Month,
            SUBSTRING_INDEX(Products,",",(LENGTH(Products)-LENGTH(REPLACE(Products,",","")))) as Products,
            substring_index(Products,",",-1) as Measurement,
            Products as `Products Details`,
            Value as Price,
            Status
  from `', schema_name , '`.`', table_name ,
  '` where status =3
 ON DUPLICATE KEY UPDATE Status=0;');
    SET SQL_SAFE_UPDATES = 0;
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
 SET SQL_SAFE_UPDATES = 1;
    # 1 update using insert ignore into, a duplicate record is 
    # just ignored and won't be loaded at all
    # 2 update using Replace Into, the existing record is just 
    # deleted and new record overrides
    #3 update using ON Duplicate key update clause, its a little bit flexible 
    # for data loading. once duplicate, the whole row of data won't be deleted,
    # only specific values are to be updated according to the clause. 
    # But if there are more than one unique index in the table,
    # and multiple rows are found duplicate by one new row of data, only one of the existing row is updated according to 
    # the ON Duplicate key update clause. 
  
 # for this project, a overwrite (REPLACE INTO) would be better. The original data file has no pk
    # so that all records will be kept safe though duplicate may exist.
    
    -- load for tobacco
    -- use REPLACE INTO
 set @sql=concat('
 REPLACE INTO  `', schema_name , '`.`4.Tobacco` (GEO,Date, Year, Month, Products, Measurement,`Products Details`, Price, Status)
  select 
   GEO,
   Date, 
            LEFT(date,4) as Year, 
            RIGHT(date,2) as Month,
            SUBSTRING_INDEX(Products,",",(LENGTH(Products)-LENGTH(REPLACE(Products,",","")))) as Products,
            substring_index(Products,",",-1) as Measurement,
            Products as `Products Details`,
            Value as Price,
            Status
  from `', schema_name , '`.`', table_name ,
  '` where status =4;');
 SET SQL_SAFE_UPDATES = 0;
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
    SET SQL_SAFE_UPDATES = 1;
END &&  
DELIMITER ;      
/*
-- 6.
-- after data were splited in 4 sub tables
-- now is to check if there are products which have not been marked by a category
-- in the column of status
-- create a table to carry data of this kind and send a message back to lambda (or ec2 or ecs) 
-- a message or email will be sent by lambda using SNS (or SES) to the person in charge

-- call your_schema.sp_loading_check_remaining_after_split ("your_schema","0.PriceIndex",@remain);
-- select @remain
*/    

DELIMITER &&  
DROP PROCEDURE IF EXISTS your_schema.sp_loading_check_remaining_after_split;
CREATE PROCEDURE your_schema.sp_loading_check_remaining_after_split (
 IN schema_name varchar(50),
 IN table_name varchar(50),
    OUT result_for_sp_loading_check_remaining_after_split boolean
    )    
BEGIN  
 -- the table structure of remaining_data is the same as general table except it has a time_stamp column
    -- it will help DE to find the rows that need to be worked on today
 set @sql =concat ('
    CREATE TABLE  IF NOT EXISTS `',schema_name, '`.`remaining_data` LIKE  `',schema_name, '`.`', table_name, '`' 
    );
    
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
    -- if it is not for db init, the table of remaining data has already existed.
    -- the column of 'Time_stamp' has already been added to the table as well.
    -- unfortunately, there is no "add column if not exist" in mysql.
    -- need to check before adding

    set @table_name = table_name;
    set @schema_name =schema_name;
    set @column_name="Time_stamp";
    IF NOT EXISTS( SELECT NULL
  FROM INFORMATION_SCHEMA.COLUMNS
     WHERE table_name = @table_name
  AND table_schema = @schema_name
  AND column_name = @column_name)  THEN
             
  set @sql =concat ('
  ALTER TABLE `',schema_name, '`.`remaining_data` ADD COLUMN Time_stamp TIMESTAMP not null DEFAULT CURRENT_TIMESTAMP ;' 
  );
  PREPARE dynamic_statement FROM @sql;
  EXECUTE dynamic_statement;
  DEALLOCATE PREPARE dynamic_statement;
 END IF;
    
    call your_schema.sp_loading_count_table (schema_name, table_name, @remain_rows," where status is null ");
    IF @remain_rows >0 THEN 
  -- Someone needs to work today...
        -- the product can be a new one to the database, just add the product name in the list
        -- execute the modified sp, rerun the parent SP, hopefully, all rows are splitted to their category table.
  set result_for_sp_loading_check_remaining_after_split = 1;
        -- insert the remaining rows to table of 'remaining_data'
  set @sql = concat('
   INSERT INTO `', schema_name , '`.`remaining_data` 
   select * from `', schema_name , '`.`', table_name , 
   '` where status is null;');
   PREPARE dynamic_statement FROM @sql;
   EXECUTE dynamic_statement;
   DEALLOCATE PREPARE dynamic_statement;
    ELSE
  -- hooray! 
  set result_for_sp_loading_check_remaining_after_split =0;
 END IF;
    
    select @result_for_sp_loading_check_remaining_after_split;
END &&  
DELIMITER ;   
   
/*
-- Note:
-- if you wish to use mysql locally for testing purpose:
-- you can load data from local file (.csv or .txt) to localhost mysql
-- the sql command is :
-- LOAD DATA LOCAL INFILE '/path/to/priceindex.csv' INTO TABLE `0.PriceIndex` FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 ROWS;  
-- however, there might be two errors (3948 and 2068)
-- there are posts online which only suggest the solution for either of the two, finally, i figured out the solution to solve the both problems.
-- no worries. the newer version of mysql disables import and export by default.
-- if you use mac, $ sudo nano /etc/my.cnf 
-- input below:

 [mysqld]
 secure_file_priv=''
 local_infile    = 1

 [client]
 local_infile    = 1

-- in your mysql client:
-- execute below sql query:
-- set global local_infile=1
-- restart MySQL (this is important)
-- then, you can use the load data command locally. you might need to set global local_infile=1 everytime you reboot. 
-- so i usually run this command before the load data command.
-- loading files from local to localhost is just for testing. so i don't mind the repetition.

-- finally , to call the parent sp
-- call your_schema.sp_loading_PriceIndex("your_schema","temp","0.PriceIndex", "priceindex.csv", 2316,5, @result)
*/

=======================> reporting.sql

/*
#### MySQL Reporting ####
#### This file builds procedures which can be used to obtain all the report data that needed for my website.####
#### The report data will be sent to AMAZON Simple Storage Service (S3) to serve as the static content of the website.####
-- --------------------------------------------------------
-- ALTER TABLE might be slow for AWS RDS for MySQL
-- *test and determine if to use alter command 
-- *test and determine if to create a large table and renamed columns

-- --------------------------------------------------------
-- This time, i create parent SP ahead of child SPs for report building.
-- 1. to create SP for a general report (which contains info of year, month, geo, product category and its sub category as well)
 -- 6 parameters are allowed for the SP
    -- in order to increase fault tolerance and user experience, 
    -- there will be no restriction on input's data type
 -- anyone who use the SP, if he inputs invalid values for parameters,
    -- SP will try to return a reasonable result 
    -- 1.1 to create a general table for reporting
    -- 1.2 to create Stored Procedure (SP) for general report 
  -- 1.2.1 to process 6 parameters respectively
   -- 1.2.1.1 i tested different ways to process the parameters 
    -- for year and month, i tested if the values are int or not
                -- for geo_limit, ie., the values for regions like 'Canada, USA,etc', 
     -- i tested if they can be found in the data table
    -- for parameter of category, I used loop to filter out those input for main category and those for sub category
     -- put them into 2 lists respectively
        -- 1.2.2 to get correct condition statement for these 6 parameters
   -- 1.2.2.1 use loop to find which statements are null and which are not null
            -- 1.2.2.2 to add 'where' and 'and' at correct place to make the where_statement work
        -- 1.2.3 to construct the nested query using the where_statement
        -- 1.2.4 to export the result so that my website can use
-- 2. to create child SPs for the parent SP in the above step
-- the queries which are executed often can be isolated from parent SP and make it to be child SP
-- the parent SP becomes not that comlicated and long. 
-- 2.1 to create SP for month number checking, 
 -- to varify a string can be coverted into a month number and put the number in the where-clause
-- 2.2 to create SP for max values in a table
 -- to get the max(year) or min(year) from a table , for example
-- 2.3 to create SP to check if a value exists in a table
 -- to check if the values in the geo_limit exists in the GEO column in the data table
    -- meanwhile construct condition_statement for GEO parameter
#### Again, these SPs will be called by lambda or EC2 or ECS from outside of MySQL####

-- ----------------------------------------------------------------------------------
-- 1.1 below is to create a general table for reporting
-- it contains all the information we need for reports
-- this table is specially for general report. 
-- there are smaller tables based on detailed category created in the file of 'loading.sql' as well,
-- they are specially for detailed reports, and they can make nested queries less complicated.
-- so , different reports base on different tables
*/
DROP PROCEDURE IF EXISTS your_schema.sp_reporting_50_general_table_for_report_building;
DELIMITER &&  
CREATE PROCEDURE your_schema.sp_reporting_50_general_table_for_report_building (
 IN schema_name varchar(50),
 IN table_name varchar(50),
    IN reporting_table_name varchar(50),
    OUT result_for_sp_reporting_50_general_table_for_report_building tinyint
    )    
BEGIN  
 -- 1. Declare variables to hold diagnostics area information
 DECLARE exit handler for SQLEXCEPTION
  BEGIN
   GET DIAGNOSTICS CONDITION 1 @sqlstate = RETURNED_SQLSTATE, 
   @errno = MYSQL_ERRNO, @text = MESSAGE_TEXT;
   SET @full_error = CONCAT("ERROR ", @errno, " (", @sqlstate, "): ", @text);
   SELECT @full_error;
            set result_for_sp_reporting_50_general_table_for_report_building=0;
            call your_schema.sp_init_logtable_for_reporting (schema_name, 'sp_reporting_50_general_table_for_report_building', @full_error);    
        END;
 -- 2. the body SP
    START TRANSACTION;
  set @schema_name=schema_name;
  set @table_name=table_name;
  /*
  -- a). copy data from another table
  -- modify and generate new table for reporting
  */
  set @sql = concat ('
   Drop table if exists `',@schema_name,'`.`',reporting_table_name,'`;'
  );
  PREPARE dynamic_statement FROM @sql;
  EXECUTE dynamic_statement;
  DEALLOCATE PREPARE dynamic_statement;
        
  set @sql= concat_ws('',
  'create table IF NOT EXISTS `',@schema_name,'`.`',reporting_table_name,'` 
  select  ',
   ' Date, GEO, DGUID,
            Products as `Products Details`,
            UOM, UOM_ID, SCALAR_FACTOR, SCALAR_ID, VECTOR, COORDINATE,
            VALUE as Price,
            STATUS, SYMBOL, `TERMINATED`, `DECIMALS`,
            LEFT(Date, 4) as Year,
            RIGHT(Date, 2) as Month, ',
   'CASE WHEN locate(",",Products)>0
    THEN SUBSTRING_INDEX(Products,",",(LENGTH(Products)-LENGTH(REPLACE(Products,",",""))))
   WHEN locate("(", Products)>0 
    THEN SUBSTRING(Products, 1, locate("(",Products)-1)  
   END Products, ',
   'CASE WHEN locate(",",Products)>0
    THEN substring_index(Products,",",-1) 
   WHEN locate("(", Products)>0 
    THEN  SUBSTRING(Products, locate("(",Products),(locate(")",Products)-locate("(",Products)))  
   END Measurement ',
   ' from ', '`',@schema_name, '`.`',@table_name,'`');
         
  -- select @sql;
  /* to pack a sql query in Concat function
  , the easier way is to write the hard code query that runs successfully
  , then translate the hard code query into dynamic one
  */

  PREPARE dynamic_statement FROM @sql;
  EXECUTE dynamic_statement;
  DEALLOCATE PREPARE dynamic_statement;
  
  
 commit;
    set result_for_sp_reporting_50_general_table_for_report_building=1;
    call your_schema.sp_init_logtable_for_reporting (schema_name, 'sp_reporting_50_general_table_for_report_building', 'general table created successfully');
    -- select @result_for_sp_reporting_50_general_table_for_report_building;
END &&  
DELIMITER ;

/*
-- call your_schema.sp_reporting_50_general_table_for_report_building('your_schema','0.PriceIndex','1.report',@table_for_reporting);
-- select @table_for_reporting;
-- select @sql;
-- select * from log_for_reporting;
-- select * from `1.report`;
-- lambda will get the feed back if the table for reporting created and modified or not
-- ----------------------------------------------------------------------------------
-- 1.2 to create SP for the general report 
-- within this stored procedure, child procedures will be called. 
-- 6 parameters can be passed in at the same time
*/
DROP PROCEDURE IF EXISTS your_schema.sp_reporting_1_price_by_year_month_geo_category; 
DELIMITER && 
CREATE PROCEDURE your_schema.sp_reporting_1_price_by_year_month_geo_category (
 IN schema_name varchar(50),
 IN table_name varchar(50),
    IN delimiter_reporting_1 varchar(10),
    IN year_no varchar(50),
    IN month_no varchar(50),
    IN geo_limit varchar(50),
    IN category varchar(50),
 IN reporting_data_table varchar(50),
    OUT result_for_sp_reporting_1_price_by_year_month_geo_category varchar(800)
    )    
BEGIN 
 -- 1. Declare variables to hold diagnostics area information
 DECLARE exit handler for SQLEXCEPTION
  BEGIN
   GET DIAGNOSTICS CONDITION 1 @sqlstate = RETURNED_SQLSTATE, 
   @errno = MYSQL_ERRNO, @text = MESSAGE_TEXT;
   SET @full_error = CONCAT("ERROR ", @errno, " (", @sqlstate, "): ", @text);
   -- SELECT @full_error;
            set result_for_sp_reporting_1_price_by_year_month_geo_category=0;
            call your_schema.sp_init_logtable_for_reporting (schema_name, 'sp_reporting_1_price_by_year_month_geo_category', @full_error);    
        END;
 -- 2. the body SP
    START TRANSACTION;
 /*
 -- to work on the parameters
    -- the more parameters are added into sp, the more complex the sp is.
    -- that does not necessarily mean the sp will run slowly 
    -- it all depends on how we construct the nested query
    -- 1 for parameter of year_no
 */
 call your_schema.sp_reporting_99_aggregation(schema_name,table_name,"max","year","",@latest_year);
    call your_schema.sp_reporting_99_aggregation(schema_name,table_name,"min","year","",@earliest_year);
 /*
    -- Note:@latest_year in data file might not be the current year
 -- find the max and min year number using one of the child procedure
    -- if year_no that users input exceeds the year range in data file,
    -- SP will return the @latest_year or @earliest_year
    -- year_no also accepts null and any characters to increase fault tolerance
    -- if a string (null or characters) is not covertable to int, 
    -- cast() function will return a '0' and SP will get all years' record returned
    -- if users input multiple years when calling SP,
    -- each single year will be taken out of the string and be varified. if there are 
    -- inputs that are not years, SP will give a result as well. 
    -- that doesn't mean the SP can't give the accurate result. if the input all correct,
    -- the SP will give the correct result accordingly. if the input, like typo by users,
    -- the SP won't return null, it will give a closest match anyway.
    -- no matter it is my project or not, to make SP accurate, flexible, user-friendly is always my goal.
    */
 set @year_string=year_no;
 /*
    -- add a comma to the users' input, 
    -- this make the comma number equals the value number
    -- no matter single year or multiple years, now both situation 
    -- can be applied to the same loop
 */
    set @year_string =concat(@year_string,delimiter_reporting_1);
    -- select @year_string;
    set @count_year=LENGTH(@year_string)-LENGTH(REPLACE(@year_string, delimiter_reporting_1,''));
 /*
    -- get how many year values in the year string
    -- below is to loop through the year string to varify every value
 */
    set @count =1; -- for each value in the string
    set @start_position=1; 
 -- start searching from the beginning of the string
    set @year_number = ''; 
 /*
 -- collect both valid and modified values in the end
    -- if we set @year_number = null, concat (null, anystring) will get null
 */
    year_loop: LOOP
  WHILE @count <= @count_year DO   
   -- @start_position marks the point where locate function starts
   set @single_year_string=substring(@year_string,@start_position,locate(delimiter_reporting_1,@year_string,@start_position)-@start_position);
   -- the content between two commas are one single year value
   set @start_position_new=locate(delimiter_reporting_1,@year_string,@start_position)+1 ; 
   set @start_position=@start_position_new; -- move the staring position to the next comma
   set @single_year_string=trim(@single_year_string);
   /*
            -- always trim first
   -- after we get one single year , 
   -- use cast() to convert string to int (signed)
    below IF Clause is not working ,change to use IF function instead
   */
   IF @single_year_string REGEXP '^-?[0-9]+$' THEN
    set @single_year_string=cast(@single_year_string as signed);
                -- select @single_year_string;
    IF @single_year_string > @latest_year THEN
     SET @single_year_string = @latest_year;
    ELSEIF @single_year_string<@earliest_year THEN
     SET @single_year_string =@earliest_year;
    END IF;
    SET @year_number=concat(@year_number,@single_year_string,delimiter_reporting_1);
    /*
                -- accumulating value one by one into the find year string --> @year_number
    */
   ELSE 
    set @year_number = 'all';
    -- once a character string is found
    -- let sp to find all years data
    LEAVE year_loop;
    /*
    -- which means the sp will return records for all years
    */
   END IF;
   set @count=@count+1;
  END WHILE;
        LEAVE year_loop;
    END LOOP year_loop;
 /*
    -- up until now, @year_no can either be 'all' or string of number(s)
    -- Note:
    -- the latter contains a comma in the end, need to eliminate later in nested sql query
 -- to build where-clause for parameter of 'year_no'
    -- here , we only build part of the where-clause
    -- 1) we don't know if all input are null or not , if yes, there will be no where-clause at all
    -- 2) we can't put 'where' this word into every where-clause, there\s only one "where" in one query
    -- 3) the operator like 'and' and 'or' are also decided in the end in the query, as we don't know 
  -- which parameters are null and which are not.
 -- 4) if we have more after the where-clause , like order by clause, we need to decide for ',', the comma
  -- it won't necessarily be after the last parameters, as the last para can be null.
 */
 IF @year_number<>'all' THEN
  set @year_number = left(@year_number,length(@year_number)-1); 
 END IF;
 /*
    -- delete the trailing comma
    -- if we put quote around each value, we need to make sure there is no unnecessary space in the value
    -- like '1995' and '1995 ' is totally different if they are put in where-clause and be quoted
    -- that's why in the above loop, all value are trimmed.
    */
 set @delimiter_with_quotes_reporting_1=concat('"',delimiter_reporting_1,'"');
 set @where_year= IF(@year_number='all',' ',concat(' year in ("', replace(@year_number,delimiter_reporting_1,@delimiter_with_quotes_reporting_1),'")'));
 /*
 -- IF @year_no ='all' THEN 
 -- all years are required, there will be no where-clause for year parameter

    -- 2.2 for parameter of month_no
 #### ATTENTION ####
    -- when we define user parameter( @aaaa), this is a parameter out of sp
    -- after sp executed, @aaaa won't be reallocated to null or ''
    -- if two SPs share the same @aaaa, it would cause error 
    -- try name @aaaa uniquely.
 */
    set @month_string = month_no;
    set @month_string =concat(@month_string,delimiter_reporting_1);
 /*
 -- attention: there will be no space between concat and '('
    -- otherwise: mysql can't run the concat function and @month_string become ''.
 */
    set @count_month=LENGTH(@month_string)-LENGTH(REPLACE(@month_string, delimiter_reporting_1,''));
    set @count =1;
    set @start_position=1;
    set @month_number = ''; 
 /*
    -- this variable is to collect all numeric string
    -- attention: if we set @month_number =NUll, the result will null after concat any string
    -- we need to set it '', or ','
 */
    month_loop: LOOP
  WHILE @count <= @count_month DO
   -- @start_position marks the point where locate function starts
   set @single_month_string=substring(@month_string,@start_position,locate(delimiter_reporting_1,@month_string,@start_position)-@start_position);
   set @start_position_new=locate(delimiter_reporting_1,@month_string,@start_position)+1 ;
   set @start_position=@start_position_new;
   /*
   -- after we get one single month string
   -- use child sp in the file of 'init.sql' to check its validation
            -- with the help of this sp, the result will be either 0 or a correct month number
            -- when directly calling sp within sp, to use local variables, not user variables
   */
            set @schema_name =schema_name;      
            set @single_month_string=trim(@single_month_string);
            call your_schema.sp_init_calendar_month_checking(schema_name,@single_month_string, @check_result); 
            -- select @check_result;
   IF @check_result =0 THEN 
    -- the month input is not valid 
                -- sp will return all data 
                set @month_number ='all';
                LEAVE month_loop;
   ELSE
    -- continue to varify and collect valid month values
                set @check_result=LPAD(@check_result, 2, '0');
                -- to add leading zero if single digit
                set @month_number=concat(@month_number,@check_result,delimiter_reporting_1);
    /*
                -- attention: there is a trailing and begining comma in the string
    */
            END IF;
   set @count=@count+1;
  END WHILE;
        LEAVE month_loop;
 END LOOP month_loop;
    IF @month_number<>'all' THEN
  set @month_number = left(@month_number,length(@month_number)-1); 
 END IF;
    -- delete the trailing comma
 set @where_month= IF(@month_number='all',' ',concat(' month in ("', replace(@month_number,delimiter_reporting_1,@delimiter_with_quotes_reporting_1),'")'));
 /*
 -- IF @year_no ='all' THEN 
 -- all years are required, there will be no where-clause for year parameter
 
    select @where_month;
  
    -- 2.3 for parameter of geo-limit
    -- although my website only contains price info of one country and her provinces;
    -- i still make the sp accept parameter of multiple value. 
 */
    set @geo_string =geo_limit;
 -- check the values by searching in the data table
 call your_schema.sp_reporting_99_value_exists(schema_name,table_name,"GEO",@geo_string,delimiter_reporting_1,0,@geo_exists,@where_clause_for_geo);
 -- select @geo_exists;    
 set @where_geo= IF(@geo_exists=0,' ',@where_clause_for_geo);
 /*
 -- Note: 
 -- when we call this SP defining geo_limit parameter like 'Canada, USA', 
 -- all countries as together be quoted,
 -- while, in the @sql string, each country should be quoted respectively
 -- like GEO in ('Canada', 'USA')
 -- the solution is to replace every comma in the parameter to be surrounded by quotes
    -- this is extremely important!!!!! otherwise, there is no way, my website can call the SP using multi-value string using java.
    -- the query will be constructed at the end of the SP with other parameters
    select @where_geo;


    -- 2.4 for parameter of category
    -- this parameter accepts large category and sub category
    -- if the user input 'food' or 'energy' (large category)
    -- SP will search and return results based on the column of status 
    -- if the user input 'milk' or 'orange'
    -- SP will search and return results based on the column of products
    
    -- to check for large category first, add it in the where-clause
    -- to check for sub category, add it in the where-clause also
    -- to select instinct values as there might be overlapping for the above two where-clause
 */
    set @category_string=category;
    
    -- for inner loop (main):
 set @main_category_string=concat_ws(delimiter_reporting_1,'food', 'energy', 'cosmetics','tobacco');
    set @count_main_category=LENGTH(@main_category_string)-LENGTH(REPLACE(@main_category_string, delimiter_reporting_1,''));
    set @count_inner_loop =1;
    set @start_position_inner_loop=1;
    set @main_category_list = ''; -- use it to search in column status
    
    -- for outer loop (sub):
    set @sub_category_string=lower(@category_string);
    set @sub_category_string=concat(@sub_category_string,delimiter_reporting_1);
    set @count_sub_category=LENGTH(@sub_category_string)-LENGTH(REPLACE(@sub_category_string, delimiter_reporting_1,''));
    set @count_outer_loop =1;
    set @start_position_outer_loop=1;
    set @sub_category_list = ''; -- use it to search in column products
    set @sub_category_in_locate_function=''; 
    /*
    -- to get locate('milk', Products)>0 or locate('steak',Products)>0
    select @sub_category_string ;
    select @sub_category_list ;
    select @single_sub_category_string ;
    select @main_category_list;
 
    -- attention: if we set @value_list =NUll, the result will null after concat any string
    -- we need to set it '', or ','
    -- ---------------------------------------------------------------------------------------------------------
    */
 category_loop: LOOP
  WHILE @count_outer_loop <= @count_sub_category DO
   -- @start_position marks the point where locate function starts
   set @single_sub_category_string=substring(@sub_category_string,@start_position_outer_loop,locate(delimiter_reporting_1,@sub_category_string,@start_position_outer_loop)-@start_position_outer_loop);
   set @start_position_outer_loop_new=locate(delimiter_reporting_1,@sub_category_string,@start_position_outer_loop)+1 ;
   set @start_position_outer_loop=@start_position_outer_loop_new;
            set @single_sub_category_string=trim(@single_sub_category_string);
   -- --------------------------------------------------------------------------------------------------
            -- after we get one single_sub_category_string, checking within main category
   IF LOCATE(@single_sub_category_string,@main_category_string)>0 THEN 
    -- if the users' input values belong to main categories,
    -- set the value to main category list, later to search in column status
    -- attention: there is a trailing delimiter_value (comma) in the string
    set @main_category_list = concat(@main_category_list,@single_sub_category_string,delimiter_reporting_1);
   ELSE -- set the value to sub category list, later to search in column products
    set @sub_category_list= concat(@sub_category_list,@single_sub_category_string,delimiter_reporting_1);
    set @sub_category_in_locate_function=concat(@sub_category_in_locate_function,'or LOCATE("',@single_sub_category_string,'", Products)>0 ');
            END IF;
   /*
            -- select @sub_category_in_locate_function;
            -- --------------------------------------------------------------------------------------------------
   */
   set @count_outer_loop=@count_outer_loop+1;
  END WHILE;
        LEAVE category_loop;
 END LOOP category_loop;
 /*
 -- ---------------------------------------------------------------------------------------------------------
 -- there is one more step for main category, in data table
    -- main category is represented by numbers 
 */
    set @main_category_list= replace(@main_category_list, 'food', 1);
    set @main_category_list= replace(@main_category_list, 'cosmetics', 2);
    set @main_category_list= replace(@main_category_list, 'energy', 3);
    set @main_category_list= replace(@main_category_list, 'tobacco', 4);
    -- delete the trailing ',' from both list
    IF trim(@main_category_list) <>'' THEN
  set @main_category_list= left(@main_category_list,length(@main_category_list)-length(delimiter_reporting_1));
 END IF;
    IF trim(@sub_category_list) <>'' THEN
  set @sub_category_list= left(@sub_category_list,length(@sub_category_list)-length(delimiter_reporting_1));
    END IF;
 /*
    -- now is to create where-clause
    -- in below IF function, we need to use if a list ='', we can't use a list is null, as in mysql, ='' is different from is null
    */
 set @where_status= IF(@main_category_list='', ' ', concat(' status in ("', replace(@main_category_list,delimiter_reporting_1,@delimiter_with_quotes_reporting_1),'")'));
    IF @sub_category_list='all' THEN
  set @sub_category_list='';
 END IF;
    -- if user inputs 'all', it will fall in @sub_category_list
    -- 'all' is invalid for @sub_category_list
    
    set @where_products= IF(@sub_category_list='', ' ', concat('(', substr(@sub_category_in_locate_function,4),')'));
    
    /*
    -- LOCATE(substring, string, start);
    -- need to use locate to loop every item in @sub_category_list;
    
    -- select @where_products;
 -- select @sub_category_list;
 
    -- up until now, we have created the parts of where-clause
    -- now is to build the complete and complex nested query
 -- the where-clause is most tricky part
    -- we have: @where_year, @where_month, @where_geo, @where_status,@where_products
    -- every part has two possibilities, null or not-null
    -- the where statement contains: where condition 1 (and condition2) (and condition3) (....)
    -- the word of 'where' won't be used unless at least one of conditions exist
    -- the word of 'and' won't be used unless more than one conditions exist
 */
    set @delimiter_where_condition ='|';
    set @condition_string = concat_ws(@delimiter_where_condition,@where_year, @where_month, @where_geo, @where_status,@where_products);
    set @condition_string=concat(@condition_string,@delimiter_where_condition);
    set @count_condition=LENGTH(@condition_string)-LENGTH(REPLACE(@condition_string, @delimiter_where_condition,''));
    set @count_condition_loop =1;
    set @start_position_condition_loop=1;
    set @condition_list = ''; 
 /*
 -- use it to collect all those not-null conditions,
    -- later this list will be put into where_statement
    
    select @condition_string ;
    select @condition_list ;
    select @single_sub_category_string ;
    select @main_category_list;
 
    -- attention: if we set @value_list =NUll, the result will null after concat any string
    -- we need to set it '', or ','
    -- ---------------------------------------------------------------------------------------------------------
    */
 condition_loop: LOOP
  WHILE @count_condition_loop <= @count_condition DO
   -- @start_position marks the point where locate function starts
   set @single_condition_string=substring(@condition_string,@start_position_condition_loop,locate(@delimiter_where_condition,@condition_string,@start_position_condition_loop)-@start_position_condition_loop);
   set @start_position_condition_loop_new=locate(@delimiter_where_condition,@condition_string,@start_position_condition_loop)+1 ;
   set @start_position_condition_loop=@start_position_condition_loop_new;
   /*
   -- --------------------------------------------------------------------------------------------------
            -- after we get one single_condition_string, checking if null;
            -- only conditions that are not null can be written to where_statement
   */
   IF trim(@single_condition_string)<>'' THEN 
    set @condition_list= concat(@condition_list,@single_condition_string,@delimiter_where_condition);
   END IF;
   /*
            -- --------------------------------------------------------------------------------------------------
   */
   set @count_condition_loop=@count_condition_loop+1;
  END WHILE;
        LEAVE condition_loop;
 END LOOP condition_loop;
    /*
 -- now we have @condition_list based on users' input
    -- aaa) the list is empty, users input nothing for these parameters they want to know all
    -- bbb) the list is not empty, compose the list to be a complete where_statement
    */
    IF trim(@condition_list)='' THEN
  set @where_statement=' ';
 ELSE
  /*
  -- the conditions in the list are separated by '|', with a trailing '|' as well
  -- 1st , delete that trailing delimiter_where_condition
  */
  set @condition_list=left(@condition_list,length(@condition_list)-length(@delimiter_where_condition));
  -- 2nd, replace all '|' to be word of 'and'
  set @condition_list=replace(@condition_list,@delimiter_where_condition,' and ');
  -- 3rd, add 'WHERE' in the front
        set @where_statement=concat(' WHERE ',@condition_list);
    END IF;
    
    -- FINALLY, to create the sql query~
 set @sql = concat_ws('',
  'Drop table if exists `',@schema_name,'`.`',reporting_data_table,'`;'
 );
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;

 set @sql =concat_ws('',
  'Create table `',schema_name,'`.`',reporting_data_table,'`  
  Select GEO, Year, Month, Products, Measurement, Price, Status
  from `',schema_name, '`.`', table_name, '` ',
  @where_statement,';'
  );   

 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
 set result_for_sp_reporting_1_price_by_year_month_geo_category=@sql;
 call your_schema.sp_init_logtable_for_reporting (schema_name, 'sp_reporting_1_price_by_year_month_geo_category', 'report data generated and saved successfully');    
    commit;
    -- select @result_for_sp_reporting_1_price_by_year_month_geo_category;
END &&  
DELIMITER ;   
/*
-- ----------------------------------------------------------------------------------
-- lets test the SP:
-- call your_schema.sp_reporting_1_price_by_year_month_geo_category ('your_schema', '1.report',',',' 1995 ,543,5009','12,7','canada','food','priceindex_sourcedata_16894754139474',@final_sql_string);
call your_schema.sp_reporting_1_price_by_year_month_geo_category ('your_schema', '1.report',',','all','all','all','all','priceindex_sourcedata',@final_sql_string);
select * from priceindex_sourcedata;
select @final_sql_string;
select * from log_for_reporting;

-- I added space around year number 1995
-- there was a number of 543 for year parameter
-- the month parameter are multiple values
-- the country name has a typo, the initial letter is not in upper case
-- there is only main category of 'food' is stated 
-- but, sp still construct a workable and correct sql query as below:


  Select GEO, Year, Month, Products, Measurement, Price, Status
  from `your_schema`.`1.report`  WHERE  year in ("1995","1995","2022") and  month in ("12","7") and  lower(GEO) in ("canada") and  status in ("1");


-- lets test if no parameter except for schema/table name is given;
-- call your_schema.sp_reporting_1_price_by_year_month_geo_category ('your_schema', '1.report',',','','','','',@test_result);
-- again , the sp gives the correct result

  Select GEO, Year, Month, Products, Measurement, Price, Status
  from `your_schema`.`1.report`  ;

-- select @test_result;
#### this is not finished #####
-- after we build the report and get the result, we could 
-- export the report to s3 where the static content of my website resides
-- this will be achieved by Containerized Lambda shared in another repo


-- ----------------------------------------------------------------------------------
-- 2. to create child SP
-- 2.1 as the reports will use aggregation a lot
-- create a specific SP for this purpose

-- call your_schema.sp_reporting_99_aggregation("your_schema","1.report","max","year"," ",@latest_year);
-- call your_schema.sp_reporting_99_aggregation("your_schema","1.report","min","year","",@earliest_year);
-- select @latest_year;
*/

DELIMITER &&  
DROP PROCEDURE IF EXISTS your_schema.sp_reporting_99_aggregation;
CREATE PROCEDURE your_schema.sp_reporting_99_aggregation (
 IN schema_name varchar(50),
 IN table_name varchar(50),
    IN aggregation varchar(50),
    IN aggr_column varchar(50),
    IN where_clause varchar(255),
    out result_for_sp_reporting_99_aggregation varchar(100)
    )    
BEGIN 
 set @result_aggre='';
 IF where_clause is null THEN  
  set @sql = concat(' 
  select ', aggregation, '(', aggr_column, ') into @result_aggre 
  from `',schema_name, '`.`',table_name,'`;');
  PREPARE dynamic_statement FROM @sql;
  EXECUTE dynamic_statement;
  DEALLOCATE PREPARE dynamic_statement;
 ELSE 
  set @sql = concat(' 
  select ', aggregation, '(', aggr_column, ') into @result_aggre 
  from `',schema_name, '`.`',table_name,'` ',
        where_clause,';');
  PREPARE dynamic_statement FROM @sql;
  EXECUTE dynamic_statement;
  DEALLOCATE PREPARE dynamic_statement;
    END IF;
    
 set result_for_sp_reporting_99_aggregation=@result_aggre;
    -- select @result_for_sp_reporting_99_aggregation;
END &&  
DELIMITER ; 
/*
-- ----------------------------------------------------------------------------------
-- 2.3 to create sp for geo_limit
-- in order to make this SP more flexible 
-- multiple values in a string is accepted for the parameter of 'value_s';
-- also, the SP is working as the delimiter in a string, we set it ',' by default
-- there might be other possibilities, if you d like to make the sp more versatile
-- can try make every sp be acceptable for delimiter parameter

-- call your_schema.sp_reporting_99_value_exists("your_schema","1.report","geo","CANADA",",",0,@value_exists,@where_clause_for_geo);
-- select @value_exists; select @where_clause_for_geo;
*/

DELIMITER &&  
DROP PROCEDURE IF EXISTS your_schema.sp_reporting_99_value_exists;
CREATE PROCEDURE your_schema.sp_reporting_99_value_exists (
 IN schema_name varchar(50),
 IN table_name varchar(50),
    IN column_name varchar(50),
    IN value_s varchar(255),
    IN delimiter_value varchar(10),
    IN case_sensitive boolean,
    OUT result_for_sp_reporting_99_value_exists varchar(100),
    OUT where_clause_ready_to_use varchar(255)
    )    
BEGIN 
 
 -- get each value not-trimmed, in lower case, using loop
 set @value_s =value_s;
    set @value_s =concat(@value_s,delimiter_value); 
    IF case_sensitive = 0 THEN
  -- make all values to be compared in lower case
        set @value_s=lower(@value_s);
        -- remember to lower the values in the table as well
 END IF;
 /*
    select @value_s;
 
 -- attention: there will be no space between concat and '('
    -- otherwise: mysql can't run the concat function and @var become ''.
 */
    set @count_value=LENGTH(@value_s)-LENGTH(REPLACE(@value_s, delimiter_value,''));
    set @count =1;
    set @start_position=1;
    set @value_list = ''; 
 /*
    -- this variable is to collect all value into a string
    -- attention: if we set @value_list =NUll, the result will null after concat any string
    -- we need to set it '', or ','
 */
    value_loop: LOOP
  WHILE @count <= @count_value DO
   /*
   -- @start_position marks the point where locate function starts
   */
   set @single_value_string=substring(@value_s,@start_position,locate(delimiter_value,@value_s,@start_position)-@start_position);
   set @start_position_new=locate(delimiter_value,@value_s,@start_position)+1 ;
   set @start_position=@start_position_new;
   -- after we get one single value string
            set @single_value_string=trim(@single_value_string);
            -- select @single_value_string;
   set @value_list=concat(@value_list,@single_value_string,delimiter_value);
            -- select @value_list;
   /*
   -- attention: there is a trailing delimiter_value (comma) in the string
   */
   set @value_list= left(@value_list,length(@value_list)-length(delimiter_value));
   set @count=@count+1;
  END WHILE;
        LEAVE value_loop;
 END LOOP value_loop;
    
    set @result_value_exist='';
    set @delimiter_with_quotes=concat('"',delimiter_value,'"');
    IF case_sensitive = 0 THEN
  set @sql = concat(' 
  select count(*) into @result_value_exist 
  from `',schema_name, '`.`',table_name,'` 
  WHERE lower(', column_name, ') in ("', replace(@value_list,delimiter_value,@delimiter_with_quotes),'")');
        set where_clause_ready_to_use=concat (' lower(', column_name, ') in ("', replace(@value_list,delimiter_value,@delimiter_with_quotes),'")');
 ELSE 
  set @sql = concat(' 
  select count(*) into @result_value_exist 
  from `',schema_name, '`.`',table_name,'` 
  WHERE `', column_name, '` in ("', replace(@value_list,delimiter_value,@delimiter_with_quotes),'")'); 
  set where_clause_ready_to_use=concat (' `',column_name, '` in ("', replace(@value_list,delimiter_value,@delimiter_with_quotes),'")');

 END IF;
 -- select @sql;
 PREPARE dynamic_statement FROM @sql;
 EXECUTE dynamic_statement;
 DEALLOCATE PREPARE dynamic_statement;
    
    IF @result_value_exist = 0 THEN
  -- no such value
        set result_for_sp_reporting_99_value_exists=0;
 ELSE 
  set result_for_sp_reporting_99_value_exists=1;
 END IF;
 -- select @where_clause_ready_to_use;
    -- select @result_for_sp_reporting_99_value_exists;
END &&  
DELIMITER ; 

/* line chart -- price indexes for 4 categories in recent years
*/
DROP PROCEDURE IF EXISTS your_schema.sp_reporting_2_priceindex_by_category_year;
DELIMITER &&  
CREATE PROCEDURE your_schema.sp_reporting_2_priceindex_by_category_year (
 IN schema_name varchar(50),
 IN table_name varchar(50),
    IN reporting_data_table varchar(50),
    OUT result_for_sp_reporting_2_priceindex_by_category_year tinyint
    )    
BEGIN 
 -- 1. Declare variables to hold diagnostics area information
 DECLARE exit handler for SQLEXCEPTION
  BEGIN
   GET DIAGNOSTICS CONDITION 1 @sqlstate = RETURNED_SQLSTATE, 
   @errno = MYSQL_ERRNO, @text = MESSAGE_TEXT;
   SET @full_error = CONCAT("ERROR ", @errno, " (", @sqlstate, "): ", @text);
   -- SELECT @full_error;
            set result_for_sp_reporting_2_priceindex_by_category_year=0;
            call your_schema.sp_init_logtable_for_reporting (schema_name, 'sp_reporting_2_priceindex_by_category_year', @full_error);    
        END;
 -- 2. the body SP
    START TRANSACTION; 
    
    set @sql =concat_ws('',
   'Create table `',schema_name,'`.`',reporting_data_table,'`  
   Select Status,Year,AVG(Price) as AVG
   from `',schema_name, '`.`', table_name, '` ',
   'Group by Status, Year Order by Status, Year;'
   );   

  PREPARE dynamic_statement FROM @sql;
  EXECUTE dynamic_statement;
  DEALLOCATE PREPARE dynamic_statement;
        
  set result_for_sp_reporting_2_priceindex_by_category_year=1;
  call your_schema.sp_init_logtable_for_reporting (schema_name, 'sp_reporting_2_priceindex_by_category_year', 'report data generated and saved successfully');    
 commit;
    
END &&  
DELIMITER ; 

-- below is to get report data for respective charts
/*
call your_schema.sp_reporting_2_priceindex_by_category_year('your_schema','1.report','priceindex_line',@result_for_report_2);
select * from priceindex_line;
drop table priceindex_line;
select * from `1.report`;
*/

/* bar chart -- price indexes for milk and steak for provinces in the most recent year
call your_schema.sp_reporting_1_price_by_year_month_geo_category ('your_schema', '1.report',',','2022','2','all','milk,steak','priceindex_bar1',@final_sql_string);
select * from priceindex_bar1;
select @final_sql_string;
select * from log_for_reporting;
Create table `your_schema`.`priceindex_bar1`  
  Select GEO, Year, Month, Products, Measurement, Price, Status
  from `your_schema`.`1.report`  WHERE  year in ("2022") and  month in ("02") and (LOCATE("milk", Products)>0 or LOCATE("steak", Products)>0 );
*/

5. Stored Procedures for Database Init, Data ETL and Data Reporting with MySQL

Written by Julia R