Web Scrape News Articles: The ABC of Python’s Abstract Methods
How to use Python’s abstract class functionality to scrape news articles online.
Many programmers complain that Python lacks abstract classes. This is not true. Python has an ABC module with which programmers can create broad-purpose, general classes to reuse later in more specific contexts using the ABC module.
By creating abstract classes, the programmer can later define more specific and nuanced child classes that inherit from the abstract classes. Abstract classes are helpful for modules that inherit from a common base class but have significantly different characteristics like different types of media.
Scrape news articles online with Python
For example, we could create an abstract class named Item to reuse throughout a Python program to scrape news articles online. The program could contain child classes for both articles and videos that inherit from the Item class.
from abc import ABC, abstractMethod #import ABC moduleclass Item(ABC): #inherits from ABC
def __init__(self, title, author_first, author_last):
self.title=title
self.author_first=author_first
self.author_last=author_last @abstractmethod #magic property that bestows special attributes
def display_info(self):
pass @abstratctmethod
def title_author(self):
passclass Article(Item): #inherits from item
def __init__(self, title, author):
super().__init__(name) #overrides all but name from Item
self.author = author
self.title = title def display_info(self): #we override the superclass
return “Author:{0} Title:{1}”.format(self.title,self.author) def title_author(self):
return "{0} by {1}.format(self.title, self.author)"class Video(Item): #inherits from item
def __init__(self, title, author, youtube_account, date_posted):
super().__init__(name) #overrides all but name from Item
self.author = author
self.title = title
self.youtube_account = youtube_account
self.date_posted = date_posted def display_info(self): #we override the superclass
return self.title + ' ' + 'by' + self.author, duration' def title_author(self):
return "{0} by {1}.format(self.title, self.author)"
The abstract class, item, inherits from the ABC module which you can import at the beginning of your Python file using the command from abc import ABC, abstractMethod
. The ABC class is an abstract method that does nothing and will return an exception if called.
Think of the abstract method decorator as a prefix
Now that we’ve imported the abc library, we must decide which methods are abstract methods, and mark them with the @abstractmethod
decorator.
A quick word here about decorators. Think of a decorator as a prefix that tells us how the function or class is to be called and used. We use basic decorators all the time when we create properties for a class using the @property
decorator.
The property decorator tells us that this function method is a property, which means we can access it as though it were an attribute. The @abstractmethod
decorator does just the opposite: it tells us we cannot access a method until it has been overridden by a subclass.
The child class must override all the methods in the abstract class using a line of code in the __init__ function: super().__init__(name)
. For each method bearing the @abstractmethod
decorator, there must be a corresponding method with the same name in the child classes. If we were to make a class for short stories, and it did not have one of these methods, then it would not work.
Don’t forget to override the abstract class methods!
That’s it! The abstract class provides a template for the base class and shows us what methods we need to create for a group of similar classes. We override the display_info method to show a very simple message on the screen with the title and the author.
The key is remembering to override all of the methods in the class using super().__init__(name)
. If the child class does not override these methods, you will receive an error message. For example, removing the title_author
method from our code would break it.