Simple News Aggregator

JLorenc
4 min readJun 22, 2020

--

For my first blog post I wanted to create something that used a language other than ruby. After a bit of searching and experimenting I have decided to create a simple content aggregator using this guide by Rajan at HackersFriend.com. In this blog we will cover the basic steps and code used to make this web app.

Before we start it is good to have basic knowledge of:

  1. Django Framework
  2. BeatifulSoup
  3. Requests Module
What the site will look like when “finished”

Lets start by installing the required packages. Using the terminal input the following commands.

pip install bs4pip install requests

Now lets start creating a scraper to find our headlines on each site.

On the Onion website we can see that the headlines are h4. With this information we can create this simple scaper to grab them.

import requests
from bs4 import BeautifulSoup

onion_r = requests.get("https://www.theonion.com/")
onion_soup = BeautifulSoup(r.content, 'html5lib')

onion_headings = soup.find_all('h4')

Next we should grab from headlines from Car and Driver. We use the /news version of the website to make scraping a bit easier. These headlines are in the “full-item-title item-title” class.

cd_r = requests.get("https://www.caranddriver.com/news")cd_soup = BeautifulSoup(cd_r.content, 'html5lib')cd_headings = cd_soup.findAll("div", {"class": "full-item-title item-title"})

These bits of code will act as the base for our scraper.

Next we need to build the django web app. Run these commands.

pip install djangodjango-admin startproject news_aggregator

Once the project is created move into its directory and run the following.

python manage.py startapp news

Make sure to add ‘news’ to the INSTALLED_APPS in settings.py

INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'news'
]

Next we need to create a specific directory for our index.html file.

  1. In the news directory create a directory with the name “templates”
  2. In templates create another directory called “news”
  3. In news/template/news create the index.html file
  4. Copy the django template into index.html
<!DOCTYPE html>
<html>
<head>
<title></title>
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css" integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">
</head>
<body>
<div class="jumbotron">
<center><h1>Simple News Aggregator</h1>
<a href="/" class="btn btn-danger">Refresh News</a>
</form>
</center>
</div>
<div class="container">
<div class="row">
<div class="col-6">
<h3 class="text-centre"> News from the Onion</h3>
{% for n in onion_news %}
<h5> - {{n}} </h5>
<hr>
{% endfor %}
<br>
</div>
<div class="col-6">
<h3 class="text-centre">News from Car and Driver</h3>
{% for htn in cd_news %}
<h5> - {{htn}} </h5>
<hr>
{% endfor %}
<br>
</div>
</div>
</div>
<script
src="http://code.jquery.com/jquery-3.3.1.min.js"
integrity="sha256-FgpCb/KJQlLNfOu91ta32o/NMZxltwRo8QtmkMRdAu8="
crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.12.9/umd/popper.min.js" integrity="sha384-ApNbgh9B+Y1QKtv3Rn7W3mgPxhU9K/ScQsAP7hUibX39j7fakFPskvXusvfa0b4Q" crossorigin="anonymous"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/js/bootstrap.min.js" integrity="sha384-JZR6Spejh4U02d8jOt6vLEHfe/JQGiRRSQQxSfFWpi1MquVdAyjUar5+76PVCmYl" crossorigin="anonymous"></script>
</body>
</html>

Next lets work on the views.py file. It should look like this.

from django.shortcuts import render
import requests
from bs4 import BeautifulSoup
onion_r = requests.get("https://www.theonion.com/")onion_soup = BeautifulSoup(onion_r.content, 'html5lib')onion_headings = onion_soup.find_all('h4')onion_news = []for th in onion_headings:
onion_news.append(th.text)

cd_r = requests.get("https://www.caranddriver.com/news")
cd_soup = BeautifulSoup(cd_r.content, 'html5lib')cd_headings = cd_soup.findAll("div", {"class": "full-item-title item-title"})cd_headings = cd_headings[2:]cd_news = []for hth in cd_headings:
cd_news.append(hth.text)
def index(req):return render(req, 'news/index.html', {'onion_news':onion_news, 'cd_news': cd_news})

Next move up to our main directory “news_aggregator” so we can modify our urls.py file. It should look like this.

"""news_aggregator URL Configuration

The `urlpatterns` list routes URLs to views. For more information please see:
https://docs.djangoproject.com/en/2.0/topics/http/urls/
Examples:
Function views
1. Add an import: from my_app import views
2. Add a URL to urlpatterns: path('', views.home, name='home')
Class-based views
1. Add an import: from other_app.views import Home
2. Add a URL to urlpatterns: path('', Home.as_view(), name='home')
Including another URLconf
1. Import the include() function: from django.urls import include, path
2. Add a URL to urlpatterns: path('blog/', include('blog.urls'))
"""
from django.contrib import admin
from django.urls import path
from news import views

urlpatterns = [
path('admin/', admin.site.urls),
path('', views.index, name = "home"),
]

With that we should be able to run the file!

python manage.py runserver

Now you can use control + click to view the link

And if everything is working correctly you will get a list of the headlines of each website.

With this you have a simple two site news aggregator. It can be simply modified to take any two sites and put their headlines on one page. You could add your own sites to the list by modifying the code. You can also look to add functionality by scraping more data such as URLs and images. This will allow you to actually click the headlines to go to the news article as well as make the site look better.

Another guide worth looking at https://data-flair.training/blogs/django-project-news-aggregator-app/

--

--