Langflow Micro Tutorials — Links Scraper

Rodrigo Nader
Langflow
Published in
2 min readOct 10, 2023

Welcome back to our Langflow micro tutorials series! In this article, we’ll continue exploring simple Langflow examples and custom component design.

You can download the flow we’ll be discussing here to modify and understand the components in use.

Today, we’ll focus on a basic web scraper flow for extracting links that may contain valuable information. Hope you enjoy it!

Key Features

HTML Loader: The HTML Loader custom component uses the BeautifulSoup library to retrieve the soup object from a given URL. This allows for easy extraction of relevant information from HTML documents.

HTML Links Extractor: The HTML Links Extractor gets links within the HTML content. By leveraging BeautifulSoup, this component reduces the amount of text before the information is processed by the LLM, saving time and token costs.

Objective

The objective of this flow is to extract relevant links from an HTML document and present them as bullet points for a Language Model.

The model aims to identify the links that could potentially contain the company’s email ("email" is used in this example, but you can imagine similar use cases with different queries), such as “About Us” and “Contact Us” pages.

Notice that there is some pre-processing to the HTML information before it is passed to the prompt template, which makes custom components and LLM pipelines even more attractive.

Download Flow (gist)

--

--