Web Scraping and Login using Python Selenium

Rahman Taufik
Analytics Vidhya
Published in
2 min readDec 16, 2020

Have a web scraping problem when website must be logged in first?

Well, we can use Selenium for that problem. Basically, selenium is used for automated testing web validation, but it can also be used for scraping, because it can be controlled automatically by scripts, easily work with javascript, DOM or complex html tags

For example, we try to scrap news from websites that need to logged first, such as www.wsj.com or www.barrons.com

The first thing we do is install libraries, including selenium python library, webdriver manager library and import several selenium functions in your file

The Libraries

Create your function/class for login, the codes include:

  • put the url
  • set the web driver options (e.g. windows size, headless, etc.) and
  • login with your username and password
Login to Website through Selenium

After successful login, we can continue the code to get the news. We can choose the information what we need (e.g. title, article, date, etc) and store it to csv

Web Scraping using Selenium

Sometimes, we still can’t get data from website because captcha or something. So, if that happen, we can prevent it by some methods like user agent or slow down the script execution

For the user agent, we can use fake_useragent library and add a random agent to web driver options. While, to slow down the script execution, we can use time.sleep(second)

However, it still tricky for web scraping using selenium, but at least this is another options tools to get data from website and it can be logged in easily to website.

**This code was adapted from here and for more information please check here

--

--