Jacqui Wu


The maximum length of a Google Analytics payload is 8192 bytes. It is useful to check if you are approaching this value with some of your hits because if the payload length exceeds this, the hit is never sent to GA.

How can we know the payload size with each hit?

Today i will show you how to send the payload size as a custom dimension to GA with each hit. The tool is Google Tag Manager.

Before starting, creating a new hit-scoped custom dimension in GA is essential, named ‘Hit Payload Length’ and check its index, which will be used in the next step.

Then, create a custom task in GTM, the custom task is to let users modify the request sent to GA before it is sent. We can take Client ID as an example. …

Due to the massive volume of data, Spark is built to handle big data in many user cases. It is an open source project on Apache.

Spark can use data stored in a variety of formats, including parquet files.

What is Spark?

Spark is a general-purpose distributed data processing engine that is suitable for use.

On top of the Spark core data processing engine, there are libraries for SQL, machine learning, etc. Tasks most frequently associated with Spark include ETL and SQL batch jobs across large data sets.

What Does Spark Do?

It has an extensive set of developer libraries and APIs and supports languages such as Java, Python, R, and Scala. Its flexibility makes it well-suited for a range of use cases, for this blog, we will just talk about data integration. Python is revealed the Spark programming model to work with structured data by the Spark Python API which is called as PySpark. …

The first question is why we need data integration?

Let me give you an example here to answer this question.

Every company has many departments, and different departments use different tools to store their data. For example, marketing team may use hubspot tool.

Now, we have different departments which store different types of data in a company.

However, insightful information is needed to make business decisions through those large amount of data.

What can we do?

Maybe we can connect all the databases everytime to generate reports. However, it will cost us large amount of time, then the term of data integration is raised.

What is data integration?

Data integration is a process in which heterogeneous data is retrieved and combined as an incorporated form and structure. …


Jacqui Wu

A current BI Analyst in a subsidiary under Webjet, with experience in applying data science techniques to business.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store