Content Summarization with Generative AI and OCI
A new era of productivity has arrived with Generative AI, and this project will show you how easy it is to get started on OCI. Our goal is to easily generate compelling copy for social media using the top projects trending on GitHub.
Use cases include creating content for social media, a website, or video scripts. Of course, using these tools you can create more solutions, but the core idea is using generative AI to bring in text from a variety of sources, summarize that text, and prepare the summary text for sharing online. This is why we’re calling it a “content extractor and summarizer.”
If your organization has a large amount of text, especially if you generate new text on a regular basis (perhaps a blog or frequent media releases), you may wish to use this pipeline to generate summaries. These summaries could be used for FAQ’s, for example, or even a “live” Q&A system also using Oracle Generative AI.
In this project we will find the top 25 projects on GitHub using /trending, extract their README Markdown files, and summarize them with a focus on generating a captivating insight to draw attention. You may choose between Meta’s Llama-2 or Cohere’s models for this project, and of course many other parameters may be fine-tuned for your needs.
For example, summaries with a high “extractiveness” will generally leave the text from the Read Me file intact, but a low extractiveness value will paraphrase more concisely.
Don’t forget, you’ll need the OCI SDK, plus scrapy, PyGithub, and NLTK for tokenization of the summaries. Additional requirements are Python 3.10 and Conda.