Stories by Satya Sirisha Bolloju on Medium

Web Scraping Wikipedia Using Python and Selenium

Satya Sirisha Bolloju — Fri, 30 May 2025 12:36:04 GMT

🔍 Why I Built This Project

As part of my coursework in MSc Data Science, we were introduced to web scraping using Selenium. I wanted to go beyond static scraping and understand how automated browsers can interact with dynamic web pages — like clicking, navigating, and reading real data.

So I built a script that uses Selenium WebDriver to scrape structured content from [Wikipedia], using real-time interactions.

— -

🚀 What This Project Does

- Launches Chrome using Selenium
- Visits a Wikipedia page
- Locates elements using:
 - ID
 - Class name
 - Tag name
 - Link text
 - CSS selector
- Clicks links and navigates pages
- Extracts paragraphs and prints structured content

— -

🧠 Technologies Used


| Tool             | Purpose                          |
|------------------|----------------------------------|
| Python           | Core scripting                   |
| Selenium         | Web browser automation           |
| webdriver-manager| Simplified driver installation   |

🛠 Code Highlights


driver.find_element(By.ID, “mp-welcome”)
driver.find_element(By.CLASS_NAME, “vector-search-box-input”)
driver.find_element(By.TAG_NAME, “p”)
driver.find_element(By.LINK_TEXT, “Wikibooks”)
driver.find_element(By.CSS_SELECTOR, “#p-logo a”)

These five methods cover the most common ways of navigating the DOM and are fundamental for automating dynamic sites.

📚 Educational Context

This project was part of a college module titled:
“Web Scraping Using Selenium in Python”, taught at Symbiosis School of Online and Distance Learning.

It was inspired by this GitHub repo:
🔗 naru94/Web-Scraping-Using-Selenium-in-Python

I extended it with added logging, documentation, and portability.

💼 Real-World Applications

While this project was academic in origin, the same techniques are used in:

Automating repetitive web tasks (form-fills, click-throughs)
Collecting dynamic data from dashboards or portals
Building bots for research or alerts

🔗 GitHub Repository

View the full code, requirements, and setup guide:
👉 github.com

👨‍💻 About Me

I’m currently a Data Analyst at Sheetal Manufacturing Co.
Pursuing my MSc in Data Science from Symbiosis.
I specialize in automating real-world workflows using Python, Excel, APIs, and dashboards.

🔗 LinkedIn

🙌 Let’s Collaborate

Have feedback or want to extend this scraper into something bigger?
Fork the repo or comment below — I’d love to hear your ideas!

Automating Excel Data Comparison with Streamlit and Python

Satya Sirisha Bolloju — Fri, 30 May 2025 06:18:12 GMT

💡 The Real-World Problem

Comparing diamond or inventory data row by row across Excel files is a slow, manual process. Each row might require checking the right reference sheet, finding the correct block, applying formulas, and highlighting differences — all by hand.

Our internal analysis team needed a faster solution.

— -

✅ The Solution

I built a Streamlit-based web tool that automates this:

Upload two Excel files:
— A base file with rows needing comparison
— A source file with multiple sheets of reference blocks

The app:
— Fuzzy matches the correct sheet name for each row
— Finds the matching “weight group”
— Extracts a 12x8 data block
— Pastes it into the base file
— Applies formulas and red/green formatting to highlight differences

Finally, it lets you download the fully updated Excel file.

— -

🧰 Tech Stack

| Tool         | Purpose                      |
|--------------|------------------------------|
| Python       | Core logic                   |
| Streamlit    | Web interface                |
| openpyxl     | Excel read/write + formatting|
| fuzzywuzzy   | Sheet name matching          |

— -

📊 What It Automates

- Sheet-to-sheet fuzzy matching
- Cell-by-cell comparison formulas
- Block-level copy-paste
- Excel styling and color formatting
- Streamlit-powered UI with instant download

— -

📂 Excel Format (Input)

The base Excel file expects:

| B (Shape) | F (Weight Group) | M (Sheet Name) | … | N to W (Output Block) |
|-----------|------------------|----------------|---|------------------------|

Each row gets its matching block pasted and processed visually.

— -

📦 Output

The final Excel file contains:
- Pasted data blocks (12x8)
- Green cells = improvement
- Red cells = drop in value
- Formulas to calculate differences

— -

🔗 GitHub Project

View full code and install guide:
👉 [github.com/yourusername/streamlit-excel-comparator

— -

📈 Business Impact

- ✅ Saves hours of cross-sheet copy-pasting
- ✅ Ensures consistency and transparency in comparisons
- ✅ Empowers analysts with an easy-to-use tool

— -

🔚 Call to Action (unchanged, still great):

Found this useful or want to collaborate on similar tools?
Let’s connect on LinkedIn or fork the repo on GitHub.

— -

🙌 Want to Collaborate?

Fork the repo, comment below, or share how you’d improve this. I’d love to connect!

Automating Market Price Fetching for Diamonds Using Python and Excel

Satya Sirisha Bolloju — Wed, 21 May 2025 11:33:11 GMT

The Problem

As a diamond company, we often need to compare our internal inventory prices with live market listings. Doing this manually — row by row — on platforms like Nivoda is not only time-consuming but error-prone.

I needed a solution that could automatically:
- Read our internal diamond inventory (Excel)
- Search for each item on [Nivoda](https://www.nivoda.com/)
- Pull the best price and price/ct
- Write it all back to Excel for team analysis

— -

🛠️ The Solution: Python Automation

I created a simple but powerful Python tool that does exactly this:

- Reads diamonds from Excel (shape, weight, color, clarity)
- Sends a GraphQL request to the Nivoda API
- Fetches best price & price/ct for each entry
- Writes it directly into the same Excel sheet

This replaced hours of tedious searching with a one-click solution.

— -

⚙️ Tech Stack

|   Tool   |      Purpose                |
| -------- | --------------------------- |
| Python   | Core scripting logic        |
| requests | API communication           |
| openpyxl | Excel read/write automation |
| GraphQL  | Query format used by Nivoda |

— -

📂 Excel Format

Your sheet should look like this:

| B (Shape) | D (Weight) | E (Color) | F (Clarity) | ... | N (Price/ct) | O (Total Price) |
|-----------|------------|-----------|-------------|-----|---------------|------------------|

Each row is processed, searched via API, and updated.

— -

🔗 GitHub Repo

You can view the full project here:
👉 [nivoda-excel-fetcher]

— -

📈 Business Impact

- ✅ Saves ~90% of the time previously spent on manual search
- ✅ Enables instant price comparisons between inventory and market
- ✅ Boosts accuracy and efficiency in procurement decisions

— -

Let’s connect:

🔗 [LinkedIn] | [GitHub]

— -

🙌 Let’s Collaborate

If you liked this post or have suggestions, drop a comment or fork the repo!

How I Automated Diamond Data Extraction from RapNet Using Python and Excel

Satya Sirisha Bolloju — Wed, 23 Apr 2025 17:05:46 GMT

In high-volume data environments, manual processes are bottlenecks. As a Data Analyst, I saw this first-hand while helping my team analyze diamond listings on RapNet. Every search was manual, repetitive, and time-consuming.

So I built a Python-based automation that connected to the RapNet API, pulled filtered data, and exported dynamic Excel files — all with a single run.

The Challenge I Faced

We were spending hours manually applying filters like shape, size, clarity, and color across RapNet’s dashboard. Copying and cleaning this data slowed down decision-making — and made it hard to scale.

### 🧠 Key Python Concepts Used
- **API authentication** with secure token handling
- **POST requests** and custom headers with `requests`
- **JSON parsing** and nested data extraction
- **Dynamic Excel file creation** using `pandas` and `openpyxl`
- **Config-driven architecture** with `.txt` and `.json` templates

📈 Real-World Results

Reduced hours of repetitive manual work
Delivered clean, consistent Excel outputs for every saved search
Tool is now reusable and adaptable across new diamond filters or teams
You can find the full code and setup guide here: GitHub Repo

🚀 What’s Next?

I’m extending this automation into:

A Streamlit dashboard for live search interaction
Future integration with Google Sheets API for cloud collaboration

🔚 Call to Action (unchanged, still great):

Found this useful or want to collaborate on similar tools?
Let’s connect on LinkedIn or fork the repo on GitHub.