How to Improve Timeliness of Data Analysis by 30 Times?

Walter Gui
Kyligence
Published in
3 min readOct 19, 2022

Timeliness of data analysis, as is often said, measures the duration between when data is generated and when analytics can be executed.

Low timeliness may fail to meet business needs. In this blog, we will discuss a real use case of a company that deploys its services on multiple cloud platforms. Considering the amount and complexity of the billing data, the enterprise has to manually aggregate and analyze the cloud cost data once a month. Therefore, they have a routine of monthly data analysis.

That is, the data generated this month will be provided to the business team for analytics at the beginning of next month. As a result, there might be a steep bill increase on the cloud that the enterprise is unaware of.

Image by our-team on Freepik

We have learned from the previous blog(How to use metrics to bring your cloud costs under control) that we can use Kyligence Zen to quickly establish a one-stop-shop metrics platform to achieve real-time cloud cost management. In this blog, we will elaborate on how to improve the timeliness of data analysis from a month to several days based on S3 buckets and Kyligence Zen’s support for incremental data updates.

First, we select the data source as Amazon S3 in the Data page of Kyligence. Then we follow the configuration wizard on the right to complete the authorization and fill in the bucket, file paths, and other information.

Set Amazon S3 (Image: Kyligence Zen)

Next, we need to go to the cloud platform to create a job to automatically generate cloud bills on a daily basis. Take the Amazon Cloud Platform as an example, we made some settings to let the cloud platform generate cloud billing data on a daily basis and store it as a CSV file. The bucket filled in here can be the same as the one mentioned in the previous step, while a different file path is recommended to use for storing the original billing file.

Amazon S3 configuration (Image: Kyligence Zen)

Given the complexity of the data contained in the native cloud billing data, we can use Amazon Glue or Byzer (an open-source tool) to clean and process the data (we will have another blog about this topic, please stay tuned). We then save the processed data as a new file and upload it to the corresponding path in Amazon S3 as Kyligence Zen data source on a daily basis. The overall dataflow process is shown in the figure below.

Dataflow (Image: Kyligence Zen)

So far, we have set up the dataflow process. Cloud billing data will be stored in Amazon S3 on a daily basis. When performing data analysis, Kyligence Zen will automatically retrieve the CSV files in the S3 bucket paths and execute queries as a whole, thus improving the timeliness of data analysis from a month to several days. Better timeliness (such as several hours) can certainly be achieved by following this process.

Kyligence Zen offers a free trial. Feel free to check it out with one click on Kyligence Zen website!

--

--