<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Shubhi Gupta on Medium]]></title>
        <description><![CDATA[Stories by Shubhi Gupta on Medium]]></description>
        <link>https://medium.com/@shubhigupta1503?source=rss-67ba212a8130------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/0*mv3YOxiyJfUKp74_</url>
            <title>Stories by Shubhi Gupta on Medium</title>
            <link>https://medium.com/@shubhigupta1503?source=rss-67ba212a8130------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Thu, 21 May 2026 10:26:06 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@shubhigupta1503/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Boost forecasting with Multiprocessing for SARIMA]]></title>
            <link>https://medium.com/@shubhigupta1503/boost-forecasting-with-multiprocessing-for-sarima-a2b8a21808b0?source=rss-67ba212a8130------2</link>
            <guid isPermaLink="false">https://medium.com/p/a2b8a21808b0</guid>
            <dc:creator><![CDATA[Shubhi Gupta]]></dc:creator>
            <pubDate>Mon, 05 Sep 2022 09:40:11 GMT</pubDate>
            <atom:updated>2022-09-05T09:40:11.005Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/559/1*ICYVBF3GfHa5JCfaWq4_UQ.png" /><figcaption><a href="https://realpython.com/intro-to-python-threading/">Source</a></figcaption></figure><p><strong>Introduction</strong></p><p>While forecasting time series data there comes a time when you need to fine tune the various parameters of your algorithm to get better speed and accuracy in results. Using multiprocessing the process time is decreased drastically. Here is a step-by-step guide on how to do this in python.</p><p><strong>Time Series Model - SARIMA (Seasonal Autoregressive Integrated Moving Average)</strong></p><p>If the time series data has seasonality then we have to use SARIMAX model which uses seasonal differencing. Seasonal differencing is similar to regular differencing expect for the regular differencing we have to subtract consecutive term whereas for seasonal differencing we subtract the value from the previous season.</p><p>The model is represented as SARIMAX(p,d,q)(P,D,Q)m</p><p>where p,d,q represents,</p><p><strong>p</strong> is the order of the AR term, <strong>q</strong> is the order of the MA term, <strong>d </strong>is the number of differencing to make the time series stationary.</p><p>where P,D,Q,m represents,</p><p><strong>P </strong>is Seasonal AR term, <strong>Q </strong>is seasonal MA term, <strong>D </strong>is seasonal difference order, <strong>m </strong>is the number of time steps for a single seasonal period</p><p><strong>Let’s get started with the example -</strong></p><p>Python with a multiprocessing module allows your code to run functions in parallel by offloading calls to available processors.</p><p><strong>Step 1:</strong></p><p>We need to import all the necessary libraries required for multiprocessing.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/147403f5f9f3fd85b4dbece3cb73510a/href">https://medium.com/media/147403f5f9f3fd85b4dbece3cb73510a/href</a></iframe><p><strong>Step 2:</strong></p><p>We will use <a href="https://www.kaggle.com/datasets/rakannimer/air-passengers"><strong>Air Passengers Dataset</strong></a> for this example. This dataset contains the number of air travel passengers from the start of 1949 to the end of 1960. This dataset has a positive trend and annual seasonality.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/091ba837d0a84e57b5710362b9b9b4b3/href">https://medium.com/media/091ba837d0a84e57b5710362b9b9b4b3/href</a></iframe><p>As soon as the dataset is read, the index is set to the date. This is standard practice when working with time-series data in Pandas and makes it easier to implement time series models.</p><p><strong>Step 3:</strong></p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/b490d763f0166b69c278cc89e41024cc/href">https://medium.com/media/b490d763f0166b69c278cc89e41024cc/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*0s3kfxeSowEZ9kqaIPu99A.png" /><figcaption><strong>No. of Airline Passengers by Date</strong></figcaption></figure><p>The line chart shows the number of Airline Passengers by date. At first glance, the plot makes it abundantly evident that the dataset exhibits an upward trend as well as seasonality or cyclicity and there does not appear to be any major irregularities or noise in the data.</p><p><strong>Step 4:</strong></p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/f5cb551c709c9d12de1b2a6174644ad2/href">https://medium.com/media/f5cb551c709c9d12de1b2a6174644ad2/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*L1C60Et2HIr6s6zsqMCKCQ.png" /><figcaption><strong>Additive Seasonal Decompose</strong></figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Wxj2Ee1rtRgQTqWdMwAyAw.png" /><figcaption><strong>Multiplicative Seasonal Decompose</strong></figcaption></figure><p>Seasonal decomposition allows you to break (or “decompose”) time series data into its seasonal, trend, and residual components. For this , we exploit the <strong>seasonal_decompose() </strong>function provided by the <strong>statsmodels</strong> library. Among the input parameters, we can specify the decomposition model (additive or multiplicative) and if we want to extrapolate the trend or not .By analyzing these components, we can identify some pieces of our SARIMA model to include.</p><p><strong>Step 5:</strong></p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/0ca01e17dd7689be5e7c35eb140bd5f9/href">https://medium.com/media/0ca01e17dd7689be5e7c35eb140bd5f9/href</a></iframe><p><strong>Output</strong></p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/13a5bb5e1d26aad08fe539fdab41ba98/href">https://medium.com/media/13a5bb5e1d26aad08fe539fdab41ba98/href</a></iframe><p>Here, last <strong>3 months</strong> will be used for <strong>testing </strong>the seasonal ARIMA model and everything else will be used for training.</p><p><strong>Step 6:</strong></p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/ab7bf0b925cec391a7df9182fbed7be4/href">https://medium.com/media/ab7bf0b925cec391a7df9182fbed7be4/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/889/1*6N2OtD0Rq730IEEdqf8MKA.png" /><figcaption><strong>ACF and PACF plot</strong></figcaption></figure><p><em>Autocorrelation Function (ACF)-</em></p><p>Correlation between time series with a lagged version of itself. The correlation between the observation at the current time spot and the observations at previous time spots. We will be using the <strong>plot_acf</strong> function from the <strong>statsmodels.graphics.tsaplots</strong> library.</p><p><em>Partial Autocorrelation Function (PACF)-</em></p><p>Additional correlation explained by each successive lagged term. The correlation between observations at two time spots given that we consider both observations are correlated to observations at other time spots. We will be using the <strong>plot_pacf</strong> function from the <strong>statsmodels.graphics.tsaplots</strong> library.</p><p><strong>Step 7:</strong></p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/7168330f5d9799f2a0c084303fc70d7c/href">https://medium.com/media/7168330f5d9799f2a0c084303fc70d7c/href</a></iframe><p>In this, the custom parameter grid search has the ARIMA terms d=[1,2], p =[1,2] and q =[1,2]. The seasonal terms are: D =[1,2], P =[1,2] and Q =[1,2]. Finally, m is the number of terms in each season i.e. [6,12].</p><p>With all the different <strong>hyperparameters</strong> permutations, a total of <strong>128 models</strong> are validated. If the number of hyperparameters is increased, the number of permutations increase exponentially.</p><p><strong>Step 8:</strong></p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/5d6b45bf5cb8b26d98f643aa9a7ca046/href">https://medium.com/media/5d6b45bf5cb8b26d98f643aa9a7ca046/href</a></iframe><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/82ce39c23c12e17442b33452c8a6b6b8/href">https://medium.com/media/82ce39c23c12e17442b33452c8a6b6b8/href</a></iframe><p>Here, I have defined a function <strong>Sarima()</strong> which is predicting the test values and on basis of the predictions we can find the error percentage for each parameter obtained in <strong>param_grid_models_ls </strong>using <strong>MAPE </strong>(Mean Absolute Percentage Error) metric. Then I have created a process using <strong>multiprocessing </strong>module and run iterations on the whole dataset so that the work is divided into processes which leads to <strong>significant speedup in tasks</strong>. Multiprocessing enables the computer to utilize multiple cores of a CPU to run tasks/processes in parallel.</p><p><strong>Step 9:</strong></p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/ca22c2b8a8bc7301ee23c4053135124c/href">https://medium.com/media/ca22c2b8a8bc7301ee23c4053135124c/href</a></iframe><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*yOrlYihAmi3_SjUm9cR5Jg.png" /></figure><p>In the above code, we are doing out-of-sample forecast for the next year i.e. 1961 using forecast() function. It took<strong> 0.3 seconds</strong> to execute it.</p><p>The<strong> <em>forecast()</em></strong> function has an argument called <strong><em>steps</em> </strong>that allows you to specify the number of time steps to forecast.</p><p><strong>Conclusion</strong></p><p>In this article, I have demonstrated how to adapt a multiprocessing framework to forecasting model from SARIMA on the dataset along with the out-of-sample forecast.</p><p>Further, we can use the same approach to boost the forecasting for the other time series models as well.</p><p>For more information you can refer this <a href="https://github.com/ShubhiGupta15"><strong>Link</strong></a>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/581/1*UHc1GIDJHZf9F3LWfDSu-Q.jpeg" /><figcaption><a href="https://www.stxnext.com/hs-fs/hubfs/STX%20Next%202020/blog/images/time_is_money_meme.png__581x329_q85_crop_subsampling-2_upscale.jpg">Source</a></figcaption></figure><p><strong>References</strong></p><p><a href="https://statskernel.ca/sarima-models-hyperparameter-tuning-in-parallel/"><em>https://statskernel.ca/sarima-models-hyperparameter-tuning-in-parallel/</em></a></p><p><a href="https://medium.com/analytics-vidhya/fine-tune-sarima-hyperparams-using-parallel-processing-with-joblib-step-by-step-python-code-2037fec1659"><em>https://medium.com/analytics-vidhya/fine-tune-sarima-hyperparams-using-parallel-processing-with-joblib-step-by-step-python-code-2037fec1659</em></a></p><p><a href="https://ao.ms/multiprocessing-pools-in-python/"><em>https://ao.ms/multiprocessing-pools-in-python/</em></a></p><p><a href="https://towardsdatascience.com/time-series-forecasting-with-arima-sarima-and-sarimax-ee61099e78f6"><em>https://towardsdatascience.com/time-series-forecasting-with-arima-sarima-and-sarimax-ee61099e78f6</em></a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=a2b8a21808b0" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Make your Python Program run faster]]></title>
            <link>https://medium.com/@shubhigupta1503/make-your-python-program-run-faster-3fa80f9e981d?source=rss-67ba212a8130------2</link>
            <guid isPermaLink="false">https://medium.com/p/3fa80f9e981d</guid>
            <dc:creator><![CDATA[Shubhi Gupta]]></dc:creator>
            <pubDate>Fri, 19 Aug 2022 12:12:52 GMT</pubDate>
            <atom:updated>2022-08-19T12:12:52.315Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/733/1*sEQgYsRmcLBXc6IUR3MkJQ.png" /><figcaption><a href="https://realpython.com/python-concurrency/">Source</a></figcaption></figure><p><strong>Introduction</strong></p><p>In this tutorial we will grasp an understanding of multi-threading and multi-processing and see in practice how these techniques can be implemented in Python. Before discussing about threading and multi-processing it’s important to understand the multitasking process.</p><p><strong>Multitasking</strong></p><p>Multitasking is performing multiple tasks at the same time in the operating system, for example, our computer runs multiple applications at the same time. Multitasking programming is like eating breakfast and listening to music over and over in our lives. Multitasking involves often CPU switching between the tasks, so that users can collaborate with each program together. The biggest benefit of multitasking is to improve efficiency and improve resource utilization.</p><p><em>Types of multitasking-</em></p><ul><li><em>Process-based Multitasking</em></li><li><em>Thread-based Multitasking</em></li></ul><p><strong>Process-based Multitasking</strong></p><ul><li>In process-based multitasking, two or more processes and programs can be run concurrently.</li><li>It is unable to gain access over the idle time of the CPU.</li><li>It has a faster data rate multi-tasking because two or more processes/programs can be run simultaneously.</li><li>Example: Downloading, listening to songs and playing a game.</li></ul><p><strong>Thread-based Multitasking</strong></p><ul><li>In thread-based multitasking, two or more threads can be run concurrently.</li><li>It allows taking gain access over idle time taken by the CPU.</li><li>Threads are lighter and cause less overhead. Also, because they share the same memory inside a process, it is easier, faster, and safer to share data.</li><li>Example: In a word-processing application like MS Word, we can type text in one thread, and spell checker checks for mistakes in another thread.</li></ul><p><strong>Thread vs Process</strong></p><p>A <strong>thread</strong> is a unit of execution in a process. Threads can execute individually while sharing their process resources but a process can have multiple threads running concurrently, taking on different parts of the task.</p><p>A <strong>process</strong> is basically the program in execution. When you start an application in your computer (like a browser or text editor), the operating system creates a process. Multiple processes can be running the same program, but they can use different data and compute resources.</p><p><strong>The Basics</strong></p><p>Threading and multi-processing are two of the most fundamental concepts in programming. Python supports various mechanisms that enable various tasks to be executed at almost the same time. The key differences between multi-threading and multi-processing are:</p><p>1. A process is an independent instance executed in a processor core. Threads is an entity that resides within a process and run concurrently (inside that process).</p><p>2. True parallelism can ONLY be achieved using multiprocessing by taking advantage of multi-core machines since processes can run on different CPU cores.</p><p>3. Only one thread can be executed at a given time inside a process time-space due to Python’s global interpreter lock (GIL)</p><p>4. Threads share the same memory and can write to and read from shared variables while Processes has its own memory space</p><p><strong>Concurrency and Parallelism</strong></p><p><strong>Concurrent execution</strong> means that two or more can start, execute and complete at the same time. Therefore, these tasks don’t necessarily have to run simultaneously they just need to make progress in an overlapping manner. One of the main goals of concurrency is to prevent tasks from blocking each other by switching back and forth, when one of the tasks is forced to wait</p><p>An application consisting of two tasks that are being executed concurrently in a single core is illustrated in the diagram below.</p><figure><img alt="Concurrent Execution" src="https://cdn-images-1.medium.com/max/601/1*jS8KC6gj5lL1eZfKXafGjg.png" /><figcaption><a href="https://www.quora.com/For-multiprocessing-in-Python-which-library-should-I-use-threading-or-subprocess">Concurrency</a></figcaption></figure><p><strong>Parallel execution</strong> implies that two or more jobs are being executed simultaneously. Therefore, it is not possible to have parallelism on machines with a single processor and single core. With parallelism we can maximize the use of hardware resources.</p><p>In multi-core environments, each core can execute one task at exactly the same time, as illustrated in the diagram below:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/481/1*AhN2w4cYxOVDLF9NxYKWkg.png" /><figcaption><a href="https://www.quora.com/For-multiprocessing-in-Python-which-library-should-I-use-threading-or-subprocess">Parallelism</a></figcaption></figure><p><strong><em>Multi-threading implements concurrency whereas Multi-processing implements parallelism</em></strong><em>.</em></p><p><strong>The Global Interpreter Lock (GIL)</strong></p><p>We know that threads share the same memory space, so special precautions must be taken so that two threads don’t write to the same memory location. The CPython interpreter handles this using a mechanism called GIL or the Global Interpreter Lock.</p><p>The <strong>Global Interpreter Lock (GIL)</strong> is a process lock that prevents multiple threads from executing simultaneously in a Python process. Even though multiple threads can be running concurrently in a process, only one thread can be executing code at any given time, and the rest must be waiting.<em> </em>This lock is necessary mainly because CPython’s memory management is not thread-safe. Therefore, Python cannot use multiprocessing automatically. However, the multiprocessing module solves this problem by bypassing the GIL.</p><p><strong>Multithreading: I/O bound tasks</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/778/1*niLSW_DbhlD7_Qey8gmiKA.jpeg" /><figcaption><a href="https://www.google.com/imgres?imgurl=https%3A%2F%2Fimg.devrant.com%2Fdevrant%2Frant%2Fr_1541402_ZzPjJ.jpg&amp;imgrefurl=https%3A%2F%2Fdevrant.com%2Frants%2F1541402%2Fmultithreading-in-python&amp;tbnid=nJPsviJuedoqmM&amp;vet=12ahUKEwiJvMyg5tL5AhX-mtgFHXTIA6QQMygAegQIARA5..i&amp;docid=oXfIKtGGON2BBM&amp;w=800&amp;h=698&amp;q=gil%20python%20meme%20tom%20and%20jerry&amp;ved=2ahUKEwiJvMyg5tL5AhX-mtgFHXTIA6QQMygAegQIARA5">Source</a></figcaption></figure><p><strong>Multithreading</strong> is a program execution technique that allows a single process to have multiple code segments (like threads) sharing the same CPU and memory. However, because of the GIL in Python, not all tasks can be executed faster by using multithreading. Multiple threads cannot execute code simultaneous, but when one thread is idly waiting, another thread can start executing code.</p><p>Multithreading in Python is perfect for I/O bound tasks, which are tasks whose execution time is primarily bound by the time spent waiting for input and output. Examples of these includes downloading data from the Internet and writing data to files.</p><p><strong>Let’s do some hands-on practice-</strong></p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/80a942909ca9c33b7a4c1ab34cea7a53/href">https://medium.com/media/80a942909ca9c33b7a4c1ab34cea7a53/href</a></iframe><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/f406a380b60f4fd90cb9f8ff9764ea29/href">https://medium.com/media/f406a380b60f4fd90cb9f8ff9764ea29/href</a></iframe><p><strong>Multiprocessing: CPU Bound tasks</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/395/0*oyReqgnXVGFTBANp.jpeg" /><figcaption><a href="https://medium.com/analytics-vidhya/exploiting-multithreading-and-multiprocessing-in-python-as-a-data-scientist-e2c98b61997a">Source</a></figcaption></figure><p><strong>Multiprocessing</strong> is when multiple processes are spawn from the main process, each having its own CPU and memory. These additional CPUs help increase the computing speed of the system. Processors share processes and resources amongst themselves dynamically so that no processor may sit idle or get overloaded.</p><p>Multiprocessing in Python is ideal for CPU-bound tasks, whose execution time is essentially constrained by the speed of the CPU. Because the workload is distributed among several CPUs, multiprocessing helps speed up tasks that have a high CPU utilization.</p><p><strong>Let’s do some hands-on practice-</strong></p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/180a5a8406deee907ec6ab9983b84cb1/href">https://medium.com/media/180a5a8406deee907ec6ab9983b84cb1/href</a></iframe><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/9c6efe9ece4a8f3633b8fb16f1c2b2c9/href">https://medium.com/media/9c6efe9ece4a8f3633b8fb16f1c2b2c9/href</a></iframe><p><strong>concurrent.futures</strong></p><p>Python introduced the concurrent.futures module that provides a simpler interface to bring together both the threading and multiprocessing modules. It makes use of the ThreadPoolExecutor and ProcessPoolExecutor classes to manage thread and process pools, which share much of the same interface to make switching between multithreading and multiprocessing easier.</p><p><strong>Conclusion</strong></p><p>In today’s article we introduced about the multitasking process and two of the most fundamental concepts in programming, namely concurrency and parallelism and how they differ when it comes to execution. Furthermore, we discussed about threading and multi-processing and explored their main differences as well. Finally, we showcased how to implement threaded and multi-processing applications with Python.</p><p><strong>References</strong></p><p><a href="https://towardsdatascience.com/multithreading-multiprocessing-python-180d0975ab29"><em>https://towardsdatascience.com/multithreading-multiprocessing-python-180d0975ab29</em></a></p><p><a href="https://machinelearningmastery.com/multiprocessing-in-python/"><em>https://machinelearningmastery.com/multiprocessing-in-python/</em></a></p><p><a href="https://towardsdatascience.com/multithreading-vs-multiprocessing-in-python-3afeb73e105f"><em>https://towardsdatascience.com/multithreading-vs-multiprocessing-in-python-3afeb73e105f</em></a></p><p><a href="https://www.guru99.com/difference-between-multiprocessing-and-multithreading.html"><em>https://www.guru99.com/difference-between-multiprocessing-and-multithreading.html</em></a></p><p><a href="https://www.youtube.com/watch?v=fKl2JW_qrso&amp;list=WL&amp;index=3&amp;t=1657s"><em>https://www.youtube.com/watch?v=fKl2JW_qrso&amp;list=WL&amp;index=3&amp;t=1657s</em></a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=3fa80f9e981d" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>