On this part I agree. this is what I meant.
But you wrote in the first comment that:
We used Redis with a sliding window of the records sub sequence to de-duplicate data.
This is completely wrong, and it will mislead people. (you also need to fix the illustration you’ve…
You may be confused, but we didn’t use Redis for records deduplication in IronSource. What we did actually, is to use the following logic in the KCL processors:
Record Processor uses a fixed number of records per Amazon S3 file, such as 5000.
The file name uses this schema…