ABDERRAHIM EL OUTMADIPersist() Vs Unpersist in SparkIn Apache Spark, persist() and unpersist() are methods used to manage the persistence of an RDD (Resilient Distributed Dataset), DataFrame…Nov 29, 2023Nov 29, 2023
ABDERRAHIM EL OUTMADIMongoDB Schema DDesigning a schema in MongoDB, despite its schema-less nature, is a critical step to ensure optimal performance, maintainability, and…Oct 30, 2023Oct 30, 2023
ABDERRAHIM EL OUTMADILeveraging Lazy Evaluation in PySpark for Optimized Data ProcessingApache Spark, a powerful open-source unified analytics engine, has become a go-to solution for big data processing and analytics. One of…Sep 26, 2023Sep 26, 2023
ABDERRAHIM EL OUTMADINavigating Schema Design in Azure Synapse Analytics: A Deep Dive into Star SchemasAzure Synapse Analytics stands as a beacon of integration, bringing together the realms of big data and data warehousing. With its…Sep 25, 2023Sep 25, 2023
ABDERRAHIM EL OUTMADIGetting Started with Amazon Redshift: A Beginner’s Guide to Setting up and Optimizing Your Data…Amazon Redshift is a powerful data warehousing service that allows users to easily store, manage, and analyze large amounts of data in the…Jan 14, 2023Jan 14, 2023
ABDERRAHIM EL OUTMADI𝐋𝐢𝐧𝐠𝐞𝐫.𝐦𝐬 & 𝐁𝐚𝐭𝐜𝐡.𝐬𝐢𝐳𝐞 𝐨𝐟 𝐊𝐚𝐟𝐤𝐚 𝐏𝐫𝐨𝐝𝐮𝐜𝐞𝐫➡️ By default, Kafka tries to send records as soon as possible • It will have up to 5 requests in flight, meaning up to 5 messages…Oct 17, 2022Oct 17, 2022
ABDERRAHIM EL OUTMADIIdempotent Producer to avoid duplicate message in KafkaHere’s the problem: the Producer can introduce duplicate messages in Kafka due to network errorsOct 16, 2022Oct 16, 2022
ABDERRAHIM EL OUTMADI𝐖𝐡𝐚𝐭 𝐢𝐬 𝐄𝐓𝐋 (𝐄𝐱𝐭𝐫𝐚𝐜𝐭-𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦-𝐋𝐨𝐚𝐝) 𝐃𝐚𝐭𝐚 𝐈𝐧𝐭𝐞𝐠𝐫𝐚𝐭𝐢𝐨𝐧?Extract, transform, load (ETL) refers to the process of moving data from a source system into a data warehouse. Before loading into the…Oct 15, 2022Oct 15, 2022
ABDERRAHIM EL OUTMADIAcks & min.insync.replicas of Kafka Producer 101✔𝐚𝐜𝐤𝐬 = 𝟎 (𝐧𝐨 𝐚𝐜𝐤𝐬) : ➡️No response is requested ➡️If the broker goes offline or an exception happens, we won’t know and will…Oct 15, 2022Oct 15, 2022
ABDERRAHIM EL OUTMADIBasic Ideas about Kafka 101Kafka is an extremely important 𝗗𝗶𝘀𝘁𝗿𝗶𝗯𝘂𝘁𝗲𝗱 𝗠𝗲𝘀𝘀𝗮𝗴𝗶𝗻𝗴 𝗦𝘆𝘀𝘁𝗲𝗺 to understand as it was the first of its kind and…Oct 14, 2022Oct 14, 2022