A Customizable News Collector

Erdi Gürbüz
2 min readJan 14, 2023

--

Aristotle is a highly customizable tool that collects title, description, images, publishDate information from news websites. This data can be store any relational DB supported by SQLAlchemy, which Aristotle uses as the ORM. Its highly customizable feature comes from source*.yaml (as follows) files that can be edited according to needs. The guide for the use of the project is available on the GitHub.

article:
- domain: cnn.com
active: true
link: https://edition.cnn.com/
filterForLink:
mandatoryWords: ["/politics/"]
permissibleWords: []
impermissibleWords: []
tagForMetadata:
title:
description:
image:
publishDate:
publishDateFormat: "%Y-%m-%d"

technology:
- domain: mashable.com
active: true
link: https://mashable.com
filterForLink:
mandatoryWords: ["-"]
permissibleWords: ['/article/']
impermissibleWords: []
tagForMetadata:
title:
description:
image:
publishDate: datetime
publishDateFormat: "%d.%m.%Y"

I created this tool for your scan-find-collect needs. I used Aristotle for my “news” website at 2017. It can be used to create all kinds of news sites, such as technology, cultural, sport, world news. My website featured daily, sports, technology, columns, culture and video news headlines (as follows).

--

--