Automatic download any sort of image on google using MechanicalSoup
Farkhod Khushvaktov |LinkedIn
I have some Cheating text to be Understandable for all level developer
Python is the most popular language for web scraping. The amount of flexibility it offers while extracting data from websites is one of the main reasons it is a preferred choice for data extraction. Also, it has a couple of high-performance libraries like BeautifulSoup, and Selenium, which can be used to make powerful and efficient scrapers.
Most people reading this article may have heard of the terms “Data Extraction” or “Web Scraping.” If you have not come across this yet, don’t worry, as this article is planned for all types of developers who have just started with web scraping or want to gain more information about it.
Web Scraping is the process of extracting a specific set of information from websites in the form of text, videos, images, and links. In today’s world, web scraping is an important skill to learn, as it can be used for a variety of purposes, such as lead generation, price monitoring, SERP monitoring, etc.
Why Python for Web Scraping?
There are many reasons why developers choose Python for web scraping over any other language:
Simple Syntax — Python is one of the simplest programming languages to understand. Even beginners can understand and write scraping scripts due to the clear and easy-to-read syntax.
Extreme Performance — Python provides many powerful libraries for web scraping, such as Requests, Beautiful Soup, Scrapy, Selenium, etc. These libraries can be used for making high-performance and robust scrapers.
Adaptability — Python provides a couple of great libraries that can be utilized for various conditions. You can use Requests for making simple HTTP requests and, on the other end, Selenium for scraping dynamically rendered content.
Let’s Continue with Mechanical Soup Web Scripting part
1.Installation
Download and install the latest released version from PyPI:
pip install MechanicalSoup
Download and install the development version from GitHub:
pip install git+https://github.com/MechanicalSoup/MechanicalSoup
Installing from source (installs the version in the current working directory):
2. open Google page
import mechanicalsoup
browsers = mechanicalsoup.StatefulBrowser()
url = "https://google.com/"
browsers.open(url)
print(browsers.get_url())
we got it response: “https://www.google.com/”
so how we enter our keyword which we gonna download image name But How let’s see it
3.Put keyword on google search
As you can see on the search field has name=’q’
keyword = 'Bird'
browsers['q'] = keyword
browsers.launch_browser()
response = browsers.submit_selected()
print("new Url", browsers.get_url())
browsers.open(browsers.get_url())
page = browsers.get_current_page()
print(page)
and we have enter new value which is bird after that we have launched. and we got new url with get_url() then we have opened new url finally exract HTML code.
4.get image src
all_page = page.find_all("img")
image_source = []
for image in all_page:
image = image.get('src')
image_source.append(image)
image_source
Output like this
5. Filter src
image_source = [image for image in image_source if image.startswith("https")]
6.Create Folder
import os
cwd = os.getcwd()
path = os.path.join(cwd, keyword + "s")
path
if os.path.exists(path):
print('there is')
else:
os.mkdir(path)
7.Save Image Itself
import requests
counter = 0
for image in image_source:
save_as = os.path.join(path, keyword + str(counter) + ".jpg")
r = requests.get(image)
with open(save_as, 'wb') as f:
f.write(r.content)
counter +=1
Thank you all readers
Seeeee You next Post