How to read a website with ChatGPT using web-to-text
If you’re working with the ChatGPT API, you may need to extract text from websites to provide context for your responses. The web-to-text node module makes it easy to extract plain text from web pages, which you can then pass on to ChatGPT for analysis. In this tutorial, we'll show you how to use web-to-text to extract text from a website, and then pass it on to ChatGPT using the ChatGPT API.
Prerequisites
Before getting started, you’ll need to have the following installed:
- Node.js and npm
- A ChatGPT API key
You’ll also need to have the web-to-text package installed in your project. You can install it using npm by running the following command:
npm install web-to-textGetting text from a website
To get started, create a new JavaScript file and add the following code:
const getText = require('web-to-text')
const getTextFromURL = async url => {
const result = await getText(url)
console.log(result)
}
getTextFromURL('https://example.com');This code imports the web-to-text package and defines a function called getTextFromURL that takes a URL as an argument. The function uses the getText function from web-to-text to extract plain text from the website, and then logs the result to the console.
To test this code, replace https://example.com with the URL of a website that you want to extract text from. Run the code using the node command, and you should see the plain text output to the console.
Using ChatGPT to analyze the text
Now that you’ve extracted text from a website, you can pass it on to ChatGPT for analysis. Here’s an example of how to do that using the ChatGPT API:
npm install openaiconst getText = require('web-to-text')
const OpenAI = require('openai-api');
// setup OpenAI
const OPENAI_API_KEY = 'YOUR_API_KEY';
const chatGPT = new OpenAI(OPENAI_API_KEY);
const analyzeText = async text => {
const response = await chatGPT.complete({
engine: 'davinci',
prompt: `Please analyze the following text: "${text}"`,
maxTokens: 10,
});
console.log(response.data.choices[0].text);
}
// setup the web-to-text package
const getTextFromURL = async url => {
const result = await getText(url)
analyzeText(result)
}
// run it
getTextFromURL('https://example.com');// Result
// EXAMPLE DOMAIN
// This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.
// More information... [https://www.iana.org/domains/example]This code imports the openai-api package and creates a new instance of the OpenAI class using your ChatGPT API key. It then defines a function called analyzeText that takes a string of text as an argument. The function uses the complete method from the OpenAI class to send the text to ChatGPT for analysis, and then logs the result to the console.
To use this code, replace YOUR_API_KEY with your actual ChatGPT API key, and then call the getTextFromURL function with the URL of the website you want to extract text from. The code will extract plain text from the website using web-to-text, and then pass it on to ChatGPT for analysis.
Note: The OpenAI API has a limitation of 4000 max_tokens. This is around 1000 words so you may have to limit the amount of text you send the API if the website has a lot of content. You can also take the more robust route of indexing the text for the AI model with LlamaIndex (formerly known as GPT Index)
Conclusion
In this tutorial, we showed you how to use the web-to-text package to extract plain text from a website, and then pass it on to ChatGPT for analysis using the ChatGPT API. With this knowledge, you can now build applications that use ChatGPT to provide more intelligent responses based on the context of a web page.
By using web-to-text, you can easily extract the relevant text from a web page and then use ChatGPT to analyze it. This can be useful in a wide range of applications, from chatbots to content analysis tools.
Remember to always respect the legal and ethical boundaries when scraping web pages. Additionally, keep in mind that the accuracy of the plain text extracted by web-to-text depends on the quality of the HTML of the web page being scraped. You may encounter issues with sites that heavily use JavaScript, for example.
Overall, web-to-text is a useful tool for extracting plain text from web pages and could be a valuable addition to your project's toolset. Happy coding!
