<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Peng Xie on Medium]]></title>
        <description><![CDATA[Stories by Peng Xie on Medium]]></description>
        <link>https://medium.com/@pengxie_81077?source=rss-4b06657d384f------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/0*6y6CoNqvWccyk_TN</url>
            <title>Stories by Peng Xie on Medium</title>
            <link>https://medium.com/@pengxie_81077?source=rss-4b06657d384f------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Wed, 27 May 2026 08:26:34 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@pengxie_81077/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Building an AI-Powered Data Lake Monitor: How We Automated Failure Detection for Houston’s…]]></title>
            <link>https://medium.com/@pengxie_81077/building-an-ai-powered-data-lake-monitor-how-we-automated-failure-detection-for-houstons-81f3b0d5d6f6?source=rss-4b06657d384f------2</link>
            <guid isPermaLink="false">https://medium.com/p/81f3b0d5d6f6</guid>
            <category><![CDATA[aws-lambda]]></category>
            <category><![CDATA[automation]]></category>
            <category><![CDATA[aws]]></category>
            <category><![CDATA[data-lake]]></category>
            <dc:creator><![CDATA[Peng Xie]]></dc:creator>
            <pubDate>Wed, 20 Aug 2025 18:50:19 GMT</pubDate>
            <atom:updated>2025-08-20T18:50:19.833Z</atom:updated>
            <content:encoded><![CDATA[<h3>Building an AI-Powered Data Lake Monitor: How We Automated Failure Detection for Houston’s Wastewater Infrastructure</h3><p><em>How we built an intelligent monitoring system that uses Claude 3.5 to automatically detect, analyze, and report data processing failures in AWS</em></p><h3>The Challenge: Monitoring Critical Data Infrastructure</h3><p>In the world of municipal infrastructure, data isn’t just about analytics — it’s about public safety, regulatory compliance, and operational efficiency. At the Houston Water Department, our Wastewater Infrastructure Program (WWIP) processes terabytes of sensor data daily through AWS Glue jobs and Lambda functions. When these processes fail, the consequences can range from delayed regulatory reporting to missed critical infrastructure alerts.</p><p><strong>The Problem</strong>: Traditional monitoring approaches left us with:</p><ul><li>Manual error investigation consuming hours of engineering time</li><li>Delayed failure detection leading to cascading data quality issues</li><li>Complex error logs requiring specialized knowledge to interpret</li><li>No automated way to prioritize and summarize failures for stakeholders</li></ul><p><strong>The Solution</strong>: We built an AI-powered monitoring system that automatically detects, analyzes, and reports failures using Amazon Bedrock’s Claude 3.5 Sonnet</p><h3>Architecture Overview: Serverless Intelligence</h3><p>Our solution leverages AWS’s serverless ecosystem to create a fully automated monitoring pipeline:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ZBn8yjwGkR7P3OzmPxL5IQ.png" /></figure><h3>Key Design Principles</h3><ol><li><strong>Serverless First</strong>: No infrastructure to manage, automatic scaling</li><li><strong>AI-Enhanced</strong>: Claude 3.5 provides intelligent error summarization</li><li><strong>Zero Configuration</strong>: Automatically discovers WWIP-related resources</li><li><strong>Professional Reporting</strong>: HTML-formatted emails with actionable insights</li></ol><h3>The Technical Implementation</h3><h3>1. Dynamic Resource Discovery</h3><p>Instead of hardcoding job names, our system automatically discovers all WWIP-related resources:</p><pre>def get_job_names():<br>    &quot;&quot;&quot;<br>    Get the names of Glue jobs and Lambda functions that start with &#39;wwip&#39;.<br>    Returns a tuple containing lists of job names and function names.<br>    &quot;&quot;&quot;<br>    glue_job_names = load_glue_job()<br>    lambda_function_names = load_lambda_function()<br><br>    return glue_job_names, lambda_function_names<br><br>def load_glue_job() -&gt; list:<br>    glue_client = boto3.client(&quot;glue&quot;)<br>    job_names = []<br><br>    paginator = glue_client.get_paginator(&quot;get_jobs&quot;)<br>    for page in paginator.paginate():<br>        for job in page[&quot;Jobs&quot;]:<br>            job_name = job[&quot;Name&quot;]<br>            # only select jobs with &#39;wwip&#39; (case-insensitive)<br>            if job_name.lower().startswith(&quot;wwip&quot;):<br>                job_names.append(job_name)<br><br>    return job_names</pre><p>This approach eliminates maintenance overhead and ensures we never miss new resources.</p><h3>2. Comprehensive Error Collection</h3><p>Our GlueErrorFetcher and LambdaErrorFetcher classes provide deep error analysis:</p><pre>class GlueErrorFetcher:<br>    def fetch_failed_runs(self, job_name):<br>        &quot;&quot;&quot;Fetch failed Glue job runs and their CloudWatch logs for a single job.&quot;&quot;&quot;<br>        failed_runs_info = {}<br>        response = self.glue.get_job_runs(JobName=job_name, MaxResults=50)<br><br>        for run in response[&#39;JobRuns&#39;]:<br>            if run[&#39;StartedOn&#39;] &gt;= self.start_time and run[&#39;JobRunState&#39;] == &#39;FAILED&#39;:<br>                run_id = run[&#39;Id&#39;]<br>                started_on = run[&#39;StartedOn&#39;].strftime(&#39;%Y-%m-%d %H:%M:%S %Z&#39;)<br>                glue_error_message = run.get(&#39;ErrorMessage&#39;, &#39;No error message available&#39;)<br><br>                # Find the correct CloudWatch log stream<br>                streams = self.logs_client.describe_log_streams(<br>                    logGroupName=self.error_log_group_name,<br>                    logStreamNamePrefix=f&quot;{run_id}&quot;<br>                )[&#39;logStreams&#39;]<br><br>                if streams:<br>                    log_stream_name = streams[0][&#39;logStreamName&#39;]<br>                    events = self.logs_client.filter_log_events(<br>                        logGroupName=self.error_log_group_name,<br>                        logStreamNames=[log_stream_name]<br>                    )<br>                    detailed_errors = &quot;\n&quot;.join(e[&#39;message&#39;] for e in events[&#39;events&#39;])<br><br>                failed_runs_info[run_id] = {<br>                    &quot;started_on&quot;: started_on,<br>                    &quot;glue_error_message&quot;: glue_error_message,<br>                    &quot;detailed_errors&quot;: detailed_errors<br>                }<br><br>        return failed_runs_info</pre><ul><li><strong>AWS Lambda logs</strong> → Stored in <strong>CloudWatch Logs</strong> under the log group:</li></ul><pre>/aws/lambda/{function_name}</pre><ul><li><strong>AWS Glue job logs</strong> → Stored in <strong>CloudWatch Logs</strong> under:</li></ul><pre>/aws-glue/python-jobs/error</pre><p>This distinction is important when troubleshooting, because people often expect both services to use the same log group structure but Glue has a dedicated one.</p><h3>3. AI-Powered Error Analysis</h3><p>The heart of our system is Claude 3.5’s ability to transform raw error logs into actionable insights:</p><pre>def summarize_errors_with_llm(errors_dict, job_type):<br>    &quot;&quot;&quot;<br>    Send the combined errors to Amazon Bedrock Claude 3.5 for summarization.<br>    &quot;&quot;&quot;<br>    prompt_text = f&quot;&quot;&quot;\n\nHuman: Please summarize the causes of the following {job_type} job failures:<br><br>    {json.dumps(errors_dict, indent=2)}<br><br>    You don&#39;t have to provide the solution, just summarize the causes of the failures.<br>    Assistant:<br>    &quot;&quot;&quot;<br><br>    native_request = {<br>        &quot;anthropic_version&quot;: &quot;bedrock-2023-05-31&quot;,<br>        &quot;max_tokens&quot;: 500,<br>        &quot;temperature&quot;: 0.5,<br>        &quot;messages&quot;: [<br>            {<br>                &quot;role&quot;: &quot;user&quot;,<br>                &quot;content&quot;: [{&quot;type&quot;: &quot;text&quot;, &quot;text&quot;: prompt_text}],<br>            }<br>        ],<br>    }<br><br>    response = bedrock_client.invoke_model(<br>        modelId=&quot;anthropic.claude-3-5-sonnet-20240620-v1:0&quot;,<br>        body=json.dumps(native_request),<br>    )<br><br>    result = json.loads(response[&quot;body&quot;].read().decode(&quot;utf-8&quot;))<br>    summary = result.get(&quot;content&quot;, &quot;&quot;)[0].get(&quot;text&quot;, &quot;&quot;).strip()<br><br>    return summary</pre><p>For more details on how to call Claude 3.5 API, you can refer to:</p><p><a href="https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-runtime_example_bedrock-runtime_InvokeModel_AnthropicClaude_section.html">https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-runtime_example_bedrock-runtime_InvokeModel_AnthropicClaude_section.html</a></p><h3>4. Professional Email Reporting</h3><p>We generate HTML-formatted emails that provide clear, actionable information:</p><pre>def format_error_message_and_send_emails(errors, list_of_recipients):<br>    html_lines = []<br>    html_lines.append(&quot;&quot;&quot;<br>    &lt;html&gt;<br>      &lt;body style=&quot;font-family:Arial,sans-serif; line-height:1.5; color:#333;&quot;&gt;<br>        &lt;h2 style=&quot;color:#D32F2F;&quot;&gt;🚨 AWS Job Failure Report 🚨&lt;/h2&gt;<br>    &quot;&quot;&quot;)<br><br>    # Glue errors<br>    if &quot;glue_errors&quot; in errors and errors[&quot;glue_errors&quot;]:<br>        html_lines.append(&#39;&lt;h3 style=&quot;color:#1976D2;&quot;&gt;Glue Job Errors&lt;/h3&gt;&lt;ul&gt;&#39;)<br>        for job, message in errors[&quot;glue_errors&quot;].items():<br>            html_lines.append(f&quot;&lt;li&gt;&lt;strong&gt;Job:&lt;/strong&gt; {job}&lt;br&gt;&lt;pre style=&#39;background:#f2f2f2; padding:10px; border-radius:5px;&#39;&gt;{message}&lt;/pre&gt;&lt;/li&gt;&quot;)<br>        html_lines.append(&quot;&lt;/ul&gt;&quot;)<br><br>    # Lambda errors<br>    if &quot;lambda_errors&quot; in errors and errors[&quot;lambda_errors&quot;]:<br>        html_lines.append(&#39;&lt;h3 style=&quot;color:#388E3C;&quot;&gt;Lambda Job Errors&lt;/h3&gt;&lt;ul&gt;&#39;)<br>        for job, message in errors[&quot;lambda_errors&quot;].items():<br>            html_lines.append(f&quot;&lt;li&gt;&lt;strong&gt;Job:&lt;/strong&gt; {job}&lt;br&gt;&lt;pre style=&#39;background:#f2f2f2; padding:10px; border-radius:5px;&#39;&gt;{message}&lt;/pre&gt;&lt;/li&gt;&quot;)<br>        html_lines.append(&quot;&lt;/ul&gt;&quot;)<br><br>    html_lines.append(&quot;&quot;&quot;<br>        &lt;p style=&quot;font-size:0.9em; color:#666;&quot;&gt;This is an automated message from AWS SES.&lt;/p&gt;<br>      &lt;/body&gt;<br>    &lt;/html&gt;<br>    &quot;&quot;&quot;)<br><br>    email_body_html = &quot;\n&quot;.join(html_lines)<br><br>    # Send using AWS SES<br>    ses = boto3.client(&quot;ses&quot;, region_name=&quot;us-east-1&quot;)<br>    response = ses.send_email(<br>        Source=&quot;xphn1985@gmail.com&quot;,<br>        Destination={&quot;ToAddresses&quot;: list_of_recipients},<br>        Message={<br>            &quot;Subject&quot;: {&quot;Data&quot;: &quot;AWS Error Report&quot;, &quot;Charset&quot;: &quot;UTF-8&quot;},<br>            &quot;Body&quot;: {&quot;Html&quot;: {&quot;Data&quot;: email_body_html, &quot;Charset&quot;: &quot;UTF-8&quot;}}<br>        }<br>    )<br><br>    return email_body_html</pre><h3>The Workflow: From Detection to Action</h3><h3>1. Automated Resource Discovery</h3><p>Every execution starts by scanning AWS for WWIP-related resources:</p><ul><li><strong>Glue Jobs</strong>: All jobs with names starting with “wwip”</li><li><strong>Lambda Functions</strong>: All functions with names starting with “wwip”</li></ul><h3>2. Failure Detection &amp; Collection</h3><p>For each resource, we collect comprehensive error information:</p><ul><li><strong>Job Metadata</strong>: Execution timestamps, error messages, run states</li><li><strong>CloudWatch Logs</strong>: Detailed error logs and stack traces</li><li><strong>Temporal Filtering</strong>: Configurable lookback periods (1–2 days by default)</li></ul><h3>3. AI-Powered Analysis</h3><p>Claude 3.5 processes the raw error data to:</p><ul><li><strong>Identify Root Causes</strong>: Distinguish between configuration, data, and infrastructure issues</li><li><strong>Remove Duplicates</strong>: Consolidate similar errors across multiple runs</li><li><strong>Provide Context</strong>: Explain the business impact of failures</li><li><strong>Prioritize Issues</strong>: Highlight critical vs. non-critical failures</li></ul><h3>4. Professional Reporting</h3><p>Automated email reports include:</p><ul><li><strong>Visual Hierarchy</strong>: Color-coded sections for different service types</li><li><strong>Actionable Content</strong>: AI-summarized insights for quick understanding</li><li><strong>Professional Formatting</strong>: HTML emails that work across all clients</li></ul><h3>Business Impact: Measurable Results</h3><h3>Operational Efficiency</h3><ul><li><strong>90% Reduction</strong> in manual error investigation time</li><li><strong>Immediate Detection</strong> of failures (vs. hours of delay)</li><li><strong>Automated Prioritization</strong> of issues requiring attention</li><li><strong>Consistent Reporting</strong> format for all stakeholders</li></ul><h3>Data Quality Improvements</h3><ul><li><strong>Proactive Monitoring</strong> prevents cascading data quality issues</li><li><strong>Faster Resolution</strong> of critical infrastructure problems</li><li><strong>Reduced MTTR</strong> (Mean Time To Resolution) for data processing failures</li><li><strong>Enhanced Visibility</strong> into system health and performance</li></ul><h3>Cost Optimization</h3><ul><li><strong>Preventive Maintenance</strong> avoids costly data processing failures</li><li><strong>Resource Efficiency</strong> through intelligent error analysis</li><li><strong>Reduced Operational Overhead</strong> through automation</li><li><strong>Scalable Solution</strong> that grows with infrastructure needs</li></ul><h3>Lessons Learned: Building Production AI Systems</h3><h3>1. Prompt Engineering Matters</h3><p>Our initial prompts were too generic. We learned to:</p><ul><li><strong>Be Specific</strong>: Ask for causes, not solutions</li><li><strong>Provide Context</strong>: Include job type and failure patterns</li><li><strong>Set Boundaries</strong>: Limit response length and focus</li><li><strong>Iterate Quickly</strong>: Test prompts with real error data</li></ul><h3>2. Error Handling is Critical</h3><p>AI systems can fail in unexpected ways:</p><ul><li><strong>Graceful Degradation</strong>: Fall back to raw error messages if AI fails</li><li><strong>Timeout Management</strong>: Set appropriate limits for AI processing</li><li><strong>Error Logging</strong>: Capture AI failures for continuous improvement</li><li><strong>Retry Logic</strong>: Handle transient AI service issues</li></ul><h3>3. Security and Privacy</h3><p>When processing error logs with AI:</p><ul><li><strong>Data Sanitization</strong>: Remove sensitive information before AI processing</li><li><strong>Access Controls</strong>: Limit who can trigger AI analysis</li><li><strong>Audit Logging</strong>: Track all AI interactions for compliance</li><li><strong>Encryption</strong>: Ensure data is encrypted in transit and at rest</li></ul><h3>4. Monitoring the Monitor</h3><p>Our monitoring system needs its own oversight:</p><ul><li><strong>Self-Monitoring</strong>: Track the health of our monitoring Lambda</li><li><strong>Performance Metrics</strong>: Monitor AI processing times and costs</li><li><strong>Accuracy Validation</strong>: Periodically review AI-generated summaries</li><li><strong>Feedback Loops</strong>: Incorporate user feedback to improve prompts</li></ul><h3>Future Enhancements: Scaling Intelligence</h3><h3>Planned Features</h3><ul><li><strong>Slack Integration</strong>: Real-time notifications for critical failures</li><li><strong>Web Dashboard</strong>: Visual monitoring interface with historical trends</li><li><strong>Predictive Alerts</strong>: Use AI to predict potential failures before they occur</li><li><strong>Custom Filters</strong>: Allow users to define their own error detection rules</li><li><strong>Multi-Region Support</strong>: Monitor resources across multiple AWS regions</li></ul><h3>Advanced AI Capabilities</h3><ul><li><strong>Trend Analysis</strong>: Identify patterns in recurring failures</li><li><strong>Root Cause Prediction</strong>: Suggest likely causes based on error patterns</li><li><strong>Automated Remediation</strong>: Suggest or execute fixes for common issues</li><li><strong>Natural Language Queries</strong>: Allow stakeholders to ask questions about system health</li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=81f3b0d5d6f6" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Setting up AWS Glue for Local Development with PyCharm and Docker on Windows]]></title>
            <link>https://medium.com/@pengxie_81077/setting-up-aws-glue-for-local-development-with-pycharm-and-docker-on-windows-086c53c0de69?source=rss-4b06657d384f------2</link>
            <guid isPermaLink="false">https://medium.com/p/086c53c0de69</guid>
            <dc:creator><![CDATA[Peng Xie]]></dc:creator>
            <pubDate>Thu, 14 Aug 2025 15:21:12 GMT</pubDate>
            <atom:updated>2025-08-14T15:40:04.202Z</atom:updated>
            <content:encoded><![CDATA[<p>As a data engineer, I sometimes need to develop AWS PySpark applications. While it’s possible to do this directly in the AWS console, it’s far less convenient than developing locally. It took me many hours to figure out how to set up a local environment with Docker to run PySpark. This guide will walk you through configuring AWS Glue locally using Docker with PyCharm on a Windows machine. This setup utilizes AWS Glue version 5.0, which was the latest version at the time the source material was created.</p><p>While this guide follows official AWS documentation, it also incorporates additional crucial steps not found in the complete official documentation for Glue version 5.0, which are specifically covered here.</p><h3>Prerequisites</h3><p>Before starting, ensure you have the following installed and configured:</p><ul><li>Docker</li><li>PyCharm Professional Version</li><li>An AWS account with an IAM configured</li></ul><h4>High-Level Setup Steps</h4><p>The configuration process involves four main steps:</p><p>1. Pulling the AWS Glue Docker image.</p><p>2. Configuring the Docker PySpark Python interpreter.</p><p>3. Configuring environment variables.</p><p>4. Updating the Docker configuration settings in PyCharm.</p><h3>Detailed Setup Instructions</h3><h4>Step 1: Pull the AWS Glue 5.0 Docker Container Image</h4><p>Open your command line and execute the following command to pull the AWS Glue 5.0 Docker image:</p><pre>docker pull public.ecr.aws/glue/aws-glue-libs:5</pre><h4>Step 2: Configure Docker Daemon Settings</h4><p>1. Right-click the Docker application icon and navigate to <strong>Settings</strong>.</p><p>2. Ensure that the option “<strong>Expose Daemon on TCP localhost:2375 without TLS</strong>” is selected. <strong>Note</strong>: This step is specifically required for Windows machines and may not be necessary for Mac.</p><p>3. If you enable this option, click <strong>Apply &amp; Restart</strong> to restart Docker.</p><h4>Step 3: Configure PyCharm Python Interpreter to Leverage Docker</h4><p>1. In PyCharm, go to <strong>File &gt; Settings</strong>.</p><p>2. Under your project settings, select <strong>Python Interpreter</strong>.</p><p>3. Click <strong>Add Interpreter</strong> and choose <strong>Docker</strong>.</p><p>4. Select <strong>Pull or use existing</strong>.</p><p>5. In the Image field, type and select amazon/aws-glue-libs:glue-libs-.0 (or the version you pulled).</p><p>6. Click <strong>Next</strong>, then <strong>Next</strong> again.</p><p>7. Ensure <strong>System interpreter</strong> is selected, then click <strong>Create</strong>.</p><h4>Step 4: Edit Docker Container Settings for Credential Files</h4><p>This step ensures your AWS credentials are available within the Docker container.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/978/1*WaYMq770M92qXiur80Dhng.png" /></figure><p>1. In the same <strong>Run/Debug Configurations</strong> window, locate the Docker container settings (usually represented by a folder icon).</p><p>2. You will see Host path and Container path under Volume binding.</p><p>3. You need to <strong>add a new volume binding</strong> to map your local AWS credential file to the container.</p><p>◦ For the <strong>Host path</strong>, specify the exact location of your AWS credential file on your local machine (e.g., C:\Users\YourUser\.aws).</p><p>◦ For the <strong>Container path</strong>, set it to /root/.aws.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/652/1*dAD6-te8JNYZahnZxAp_XQ.png" /></figure><h4>Step 5: Configure AWS Connection in PyCharm</h4><p>1. In the <strong>Run/Debug Configurations</strong> window, go to the <strong>AWS Connection</strong> tab.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/984/1*uVKDvvBrHNwEoVkodBVGWg.png" /></figure><p>2. By default, it might be set to None.</p><p>3. Choose <strong>Other credentials profile / region</strong>.</p><p>4. <strong>Select the correct AWS profile</strong> that has the necessary credentials for your AWS account.</p><p>5. <strong>Ensure your IAM role has the required permissions</strong> to interact with data in AWS, such as s3:GetObject and s3:ListBucket for accessing S3 data.</p><p>6. <strong>Set your AWS region</strong>.</p><p>7. Click <strong>Apply</strong> and then <strong>OK</strong>.</p><h3><strong>Bonus: Debugging in PyCharm with Pandas DataFrames</strong>:</h3><p>◦ PySpark DataFrames are not directly supported for viewing in PyCharm’s data viewer as of the source video’s creation.</p><p>◦ To view data, convert your PySpark DataFrame to a Pandas DataFrame using .toPandas().</p><p>◦ Set a breakpoint and run the script in <strong>debugger mode</strong>.</p><p>◦ Once the debugger stops at your breakpoint, you can click on the Pandas DataFrame variable and select <strong>View DataFrame</strong> to inspect the data directly within PyCharm.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=086c53c0de69" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[How to Publish Docker Images to AWS ECR from a Windows System Successfully]]></title>
            <link>https://medium.com/@pengxie_81077/how-to-publish-docker-images-to-aws-ecr-from-a-windows-system-successfully-078ada9d9cff?source=rss-4b06657d384f------2</link>
            <guid isPermaLink="false">https://medium.com/p/078ada9d9cff</guid>
            <category><![CDATA[docker]]></category>
            <category><![CDATA[windows]]></category>
            <category><![CDATA[aws]]></category>
            <category><![CDATA[docker-compose]]></category>
            <category><![CDATA[aws-ecr]]></category>
            <dc:creator><![CDATA[Peng Xie]]></dc:creator>
            <pubDate>Wed, 23 Jul 2025 16:20:39 GMT</pubDate>
            <atom:updated>2025-08-01T16:31:20.874Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*X2IqPjyledXDFsr4p1-fHA.png" /></figure><h3>Background</h3><p>Recently, I transitioned my AWS work environment from Linux to Windows, a move that significantly disrupted several of my data pipelines. One major challenge was publishing Docker images to AWS Elastic Container Registry (ECR), as the change in operating systems brought unexpected compatibility hurdles and necessitated workflow adjustments. In this article, I’ll share my journey of successfully publishing Docker images to ECR from a Windows environment, along with practical tips for overcoming common issues and streamlining the process.</p><h3>General Publishing a Docker Image to AWS ECR: Step-by-Step Procedure</h3><p>To publish a Docker image to AWS Elastic Container Registry (ECR) from a Windows system, follow these key steps. First, ensure you have the AWS CLI and Docker Desktop installed and configured. Create an ECR repository using the AWS Management Console or CLI command:</p><pre>aws ecr create-repository - repository-name &lt;repo-name&gt;</pre><p>Next, authenticate Docker to your ECR registry by running</p><pre>aws ecr get-login-password - region &lt;region&gt; | docker login - username AWS - password-stdin &lt;aws-account-id&gt;.dkr.ecr.&lt;region&gt;.amazonaws.com.</pre><p>Build your Docker image locally with</p><pre>docker build -t &lt;image-name&gt; .</pre><p>Then tag it for ECR using</p><pre>docker tag &lt;image-name&gt;:latest &lt;aws-account-id&gt;.dkr.ecr.&lt;region&gt;.amazonaws.com/&lt;repo-name&gt;:latest</pre><p>Finally, push the image to ECR with</p><pre>docker push &lt;aws-account-id&gt;.dkr.ecr.&lt;region&gt;.amazonaws.com/&lt;repo-name&gt;:latest</pre><p>For more details, you can visit<em> </em><a href="https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html"><em>Pushing a Docker image to an Amazon ECR private repository — Amazon ECR</em></a></p><h3>Potential Issue : Docker Manifest V1 or V2</h3><p>AWS ECR supports the Docker Image Manifest V2, but older Docker images or tools might default to the deprecated Manifest V1, leading to push failures with errors like “manifest invalid” or “unsupported media type.”</p><p>The screenshot below displays information about a Docker image uploaded to AWS ECR, featuring a V1 manifest format.</p><figure><img alt="An example of a Docker image using a V1 manifest in AWS ECR" src="https://cdn-images-1.medium.com/max/755/1*xG4PYyyKQ7-aTnrz5-iReg.png" /><figcaption>An example of a Docker image using a V1 manifest in AWS ECR</figcaption></figure><p>The artifact type detail indicates the use of a V1 manifest format, with an additional image listed at 0 KB alongside the pushed image. This is likely due to the container image being uploaded as a multi-architecture image component, referred to as a <a href="https://www.docker.com/blog/multi-arch-build-and-images-the-simple-way/">manifest list or image index.</a> For further insights, see the explanation in <a href="https://stackoverflow.com/questions/77207485/why-are-there-extra-untagged-images-in-amazon-ecr-after-doing-docker-push">“Why are there extra untagged ‘images’ in Amazon ECR after doing docker push?” on Stack Overflow.</a></p><h4>Solution:</h4><ol><li>Ensure Docker Desktop is updated to a recent version that defaults to Manifest V2, Schema 2.</li><li>Modify the docker build into:</li></ol><pre>docker build --platform linux/amd64 --provenance=false -t docker-image:test .</pre><ul><li><strong>— platform linux/amd64</strong>: Specifies the target platform for the image, ensuring it’s built for a Linux AMD64 architecture. This is particularly useful on Windows systems to ensure compatibility with AWS ECR, which expects Linux-based images for most AWS services like ECS or EKS.</li><li><strong>— provenance=false</strong>: Disables the generation of provenance metadata, which is part of Docker’s BuildKit (available in newer Docker versions). Provenance metadata provides build attestations, but setting this to false avoids including this metadata, which can be useful if ECR or your deployment environment doesn’t support it or if you want to reduce image metadata size.</li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=078ada9d9cff" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>