Content Summarization with Generative AI and OCI

Victor Agreda Jr
Oracle Developers
Published in
2 min readFeb 16, 2024
Photo by Aaron Burden on Unsplash

A new era of productivity has arrived with Generative AI, and this project will show you how easy it is to get started on OCI. Our goal is to easily generate compelling copy for social media using the top projects trending on GitHub.

Use cases include creating content for social media, a website, or video scripts. Of course, using these tools you can create more solutions, but the core idea is using generative AI to bring in text from a variety of sources, summarize that text, and prepare the summary text for sharing online. This is why we’re calling it a “content extractor and summarizer.”

If your organization has a large amount of text, especially if you generate new text on a regular basis (perhaps a blog or frequent media releases), you may wish to use this pipeline to generate summaries. These summaries could be used for FAQ’s, for example, or even a “live” Q&A system also using Oracle Generative AI.

In this project we will find the top 25 projects on GitHub using /trending, extract their README Markdown files, and summarize them with a focus on generating a captivating insight to draw attention. You may choose between Meta’s Llama-2 or Cohere’s models for this project, and of course many other parameters may be fine-tuned for your needs.

For example, summaries with a high “extractiveness” will generally leave the text from the Read Me file intact, but a low extractiveness value will paraphrase more concisely.

Don’t forget, you’ll need the OCI SDK, plus scrapy, PyGithub, and NLTK for tokenization of the summaries. Additional requirements are Python 3.10 and Conda.

--

--

Oracle Developers
Oracle Developers

Published in Oracle Developers

Aggregation of articles from Oracle engineers, Groundbreaker Ambassadors, Oracle ACEs, and Java Champions on all things Oracle technology. The views expressed are those of the authors and not necessarily of Oracle.