Image web crawler with PHP

Vo Tinh Thuong
Quick Code
Published in
4 min readJan 28, 2018

Everytime I want to give up, the voice deep inside me just whisper, very softly: “Just try some more, and you will have anything you want!”.

Learning is a long hard way, if you just have only yourself but nothing. But when you use your knowledge you own from that to build something from scratch, it’s very interesting!

Year ago I got an idea about how to downloads all images from specified link. It could be a news website, maybe a photo one. Instead of click “Save image as…” for every-single-image that page contains, why don’t use something download once.

Want to see my complete code? Check here!

https://github.com/votinhthuong/crawler_image_php

Luckily, I have learned PHP and can programming a little with it. So, let’s try!

Think I will demonstrate this tutorial with my idol? Nope! You wrong!

I want to make a tool not only for one website, but any website I want to get images. So I view source of some site and realize that, the HTML tag use for images is <img src=“…”>. Let’s view some to confirms that thing!

As you can see:

<img alt=”Hot girl 10X cao 1,74 m, van hay chu dep, rat thich Son Tung M-TP hinh anh 1" src=”https://znews-photo-td.zadn.vn/w1024/Uploaded/kcwvouvs/2017_08_16/1_zing.jpg" width=”5431" height=”3621" style=”height: 345px; width: 517.452px;”>

Now I can make sure about the tag contain image. Because work with WWW, use server scripting language PHP is the best choice. Explain in a few words about title of this article. I see in the Internet many people call the action “collect a multitude of images from websites” is a “Web Scraping”. But still have another name for this concept is “Web Crawler”. I still get confuse about this two synopsis! =.=

After searching in some dictionary, I decide to use “Image web crawler” instead “Image web scraping”. The reason is because “scraping” is impolite than “crawler”. That is in my opinion, not international!

Returns to main content of this article. I didn’t want to build everything from scratch in literally. It take so much time to finish, even how hard you try for it. So I find around in the Internet and discovered a PHP library for this purpose called “PHP Simple HTML DOM Parser”. I can’t tell you how excited I am when I found it! The author of this awesome library even created a manual for anyone want to use it. Learn everything you need to become a professional PHP developer with the best PHP tutorials.

Find out Free courses on Quick Code for various programming languages. Get new updates on Messenger.

I have to design a theme for my web tool. In this tutorial, I care not so much about the interface of it, so I just use some basic tag in HTML and use nothing CSS here. It’s not important!

<form id=”form1" name=”form1" method=”post” action=””> 
<table width=”700" border=”1" align=”center” cellpadding=”1" cellspacing=”1">
<tr>
<td colspan=”2">
<label for=”textfield”></label>
<input style=”width:100%;” type=”text” name=”url” id=”textfield” />
</td>
</tr>
<tr>
<td colspan=”2" align=”center” valign=”middle”>
<input type=”submit” name=”submit” id=”button” value=”Submit” />
</td>
</tr>
</table>
</form>

Just remember about name of some attributes and careless for something else.

In the process code, I put everything inside the event check exists of “submit” click action.

if(isset($_POST[‘submit’])){//code here…}

But first of all, you must include() or require_once() the library into this page for processing. Without it, you will be flooded in deep with errors.

include(‘simple_html_dom.php’);

In this step, I must create DOM from URL or file (in this situation is from URL).

$url=file_get_html($_POST[‘url’]);

After that, I figure out how many <img> tag in this page.

$image = $url->find(“img”);

After get all information we need, just go into the most important step.

foreach($image as $img) {//Get attribute of images}

With every $image variable, I reach out every single attribute of its self. In detail, I want to know about “src” attribute and get basename() of every <img> I got before.

$souceImg=$img->src;$imgName = ‘myImages/’.basename($souceImg);

In here, I can display all images of url link to my web app. But my purpose is that get all images to my computer, so I have to do a step more.

file_put_contents($imgName, file_get_contents($souceImg));

After have information of each images in my hand, I will catch and store it into place that specified before — folder called “myImages”.

Now run this tool to use.

After press “Submit” button, it may take few minutes to succeed, depend on your network speed and max_execution_time’s parameter in php.ini of your system. The results return nothing in screen of browser, except in your folder.

If you want to display all images from this directory to your tool’s screen browser, just add a line of code:

echo “<img src=’$imgName’/>”;

Now, you can use this tool for any site you like.

PS: Thanks to onik on StackOverFlow— he help me figure out why sometime my function doesn’t with a few site. His solution is that, add a check if() before get attribute of images.

if(!empty($img->src)) {////Get attribute of images}

VO TINH THUONG

votinhthuong9@gmail.com

Please click 👏 button below a few times to show your support! ⬇⬇

Thanks! Don’t forget to follow Quick Code below.

--

--