Selenium tutorial part 2
From part 1 we learned how to automate twitter login. In part 2 we will see how we can extract tweets, like counts, rt, comments etc from twitter page.
After successful login you will come in this page.
Now let’s say you want to search someone or some topic in twitter. To search something in twitter we go to Explore and then we type something in Search twitter. We will learn that in below section:-
First right click on the page and click inspect. Then right click on explore option and right click and select inspect again. Now you need to find that portion in code which highlight the explore option. It is easy to find, as you hover over the codes the corresponding portions get highlighted, so you can find that way.
Now to click this explore option we will write the xpath as taught in part1. Since explore is in <a>, but there are many a tag, so to uniquely identify the explore tag, we have href attribute, that we can use to select it. Combining all these our xpath will be “//a[@href=’/explore’]”. We can check it also by writing it in inspect page as follows:-
Now, we can use the code and click on explore option as follows:
Now you will see this page:
Next, we need to the input box where Search Twitter written in it. So, right click on Search twitter, you will see input tag gets highlighted.
Next we just simply needs to use this xpath in our program. Then you need to pass a search item and press enter:
You will be able to see this page:
Now coming to our main work, extracting tweets and other info. When you right click on this page, and hover over shown part of the code, you will see different each tweet is a kind of block:
In above fig, each div tag is represent one twitter post, and we need to grab all these.
To get this, expand any one div tag, you will see this below div tag:
This tag is unique for each post and can be used to grab all post in a twitter page. To do this we just need to use find_elements_by_xpath() function instead of find_element_by_xpath() function.
Since, we did not get any error that means we executed it successfully. And this function returns a list containing web element for each post.
We get the post, now we will see how to extract twitter account and handle:
First, go to first post div code and continue expanding until you get two div element simultaneously as follows:
And in these two div tag, second one will give us our info. And, when you expand this one also you will get two div tag, expand second one, again you will get two div tag, expand this one also and next you will get three div tag, which contain actual info that we want. The hierarchy is shown below:
Now let’s see how we can go down the hierarchy in div tags or any tags: