STORIES | PUBLISHING TOOL | KNIME ANALYTICS PLATFORM

How we use KNIME to help manage our Medium journal

Creating article drafts automatically via the Medium REST API

SJ Porter
Low Code for Data Science

--

The new KNIME Community journal, Low Code for Advanced Data Science, was created as a space for community members to share their stories. Since there’s also a KNIME Blog, we needed an easy way to synchronize new and existing blog articles written by KNIME team members with our Medium publication.

As it turns out, it’s straightforward to import an existing blog article from an external website into Medium as a draft. There’s an “Import a story” button on the stories page in Medium which allows users to import content from a provided URL. However, there are a few requirements we had that made things more interesting:

  • Our evangelism team tracks all published (and unpublished) articles in Google Sheets. Preventing any articles that are already published from being submitted as a draft again saves a lot of headache.
  • We’d like to be able to import multiple articles at a time into Medium.
  • We’d like fine-tuned control over the canonical links and tags on behalf of our writers — when moving their articles from the KNIME Blog to Medium, we’d like to ensure everything is set up properly.

In order to meet the needs described above, we turned to KNIME Analytics Platform and the Medium API for automating the scraping and publishing of KNIME Blog articles on Medium.

Workflow overview

At the highest level, our workflow has three components:

  • Select articles to publish (from Google Sheets)
  • Scrape and review articles
  • Publish draft(s) to Medium
Components are handy for documentation as well as making reusable workflow segments. Check out the hundreds of components other community members have shared here.

Each component corresponds to one view in our data app, and two of our views require input from end-users. Their selections will control behavior within the components.

Let’s go component by component and take a look at what’s happening under the hood!

Select articles to publish (from Google Sheets)

The evangelism team at KNIME tracks all articles (published and upcoming) inside of a Google Sheets spreadsheet dubbed the “Blog Article Tracker”. Our KNIME workflow integrates directly into that spreadsheet using the KNIME Google Connectors extension.

This view in our data app prompts the user to select relevant articles from the “Blog Article Tracker” spreadsheet for publication. Any articles that have already been published are filtered from the view.

The “Medium Account” drop-down menu allows the user to select which Medium account the selected draft(s) will be published to.

The two selected articles written by Rosaria Silipo will be scraped from the KNIME Blog and published (as drafts) to Rosaria’s user account on Medium.

Inside the component, there are a few things going on. We’re connecting to Google and Google Sheets, reading in the content, and filtering that content based on user selections.

The Value Selection Widget node corresponds to the “Medium Account” drop-down menu. The available values are populated based on a tab in the Blog Article Tracker spreadsheet. The user’s selection will determine the integration token that will be used for authenticating with the Medium API. If you’re looking to try this yourself, see this article for instructions on how to generate an integration token.

Future enhancements to this component should include schema validation (using the Table Validator node) to ensure that the structure of the spreadsheet hasn’t changed and a more secure location for the integration tokens to ensure user privacy.

Contents of the “Select articles to publish (from Google Sheets)” component.

Scrape and review articles

The “Scrape and review articles” component does most of the heavy lifting in the workflow. Any articles that were selected in the prior view are retrieved from the web, cleaned up, and presented to the user for review. Any articles that are selected in this view will be published when the final component in our workflow executes.

None of our articles have tags at the moment, but it is possible to automate the creation of tags when publishing drafts via the Medium API.

When an article is selected, it is rendered in the article preview section. Images, styling, and even GIFs have been successfully imported!

Taking a look under the hood, each selected article is read into KNIME and parsed in a loop. The Webpage Retriever node is used to retrieve the contents of each blog article. The Replace relative URLs with absolute URLs option allows us to fix the relative links for images in the article. Otherwise, any images with relative links would not render correctly or be imported successfully into Medium.

A Python Script node is used to parse the contents of the articles. The Beautiful Soup package is used for HTML parsing. The content is filtered down to the first <article> tag in the HTML document and all <aside> tags are removed (or “decomposed”, in the terminology used by Beautiful Soup).

# import beautifulsoup for html scraping
from bs4 import BeautifulSoup
# make a copy of the input table
df = input_table_1.copy()
# extract articles
df['body'] = df['body'].apply(
lambda x: "".join(
str(y) for y in BeautifulSoup(x, 'html.parser') \
.find_all('article')[0].contents
)
)
# define function which can preprocess articles
def article_preprocessor(soup: str):
x = soup.aside
x.decompose()
return soup.prettify()
# preprocess articles
df['body'] = df['body'].apply(
lambda x: article_preprocessor(BeautifulSoup(x, 'html.parser'))
)
output_table_1 = df

It’s worth mentioning that the Empty Table Switch and Try (Data Ports) nodes are in a few places within this workflow to handle exceptions and empty tables.

Publish draft(s) to Medium

At this point, we can publish our article by sending a POST Request to the /posts endpoint of the Medium API (docs). Here’s what an example request looks like as of June 2021:

POST /v1/users/{{ user_id }}/posts HTTP/1.1
Host: api.medium.com
Authorization: Bearer {{ integration_token }}
Content-Type: application/json
Accept: application/json
Accept-Charset: utf-8
{
"title": "Liverpool FC",
"contentFormat": "html",
"content": "<h1>Liverpool FC</h1><p>You’ll never walk alone.</p>",
"canonicalUrl": "http://jamietalbot.com/posts/liverpool-fc",
"tags": ["football", "sport", "Liverpool"],
"publishStatus": "public"
}

So, we can see that the authorization method is a bearer token (our “Integration Token”) and we will need to provide the Medium User ID in the request. We can define the title, content format, canonical URL (the original URL to the blog article), the tags for SEO, and the publish status. In the example above, the POST request would submit and publish an article directly due to the publishStatus being set to public. By setting it to draft instead, we can automate the creation of drafts on Medium.

Fortunately, there’s only a handful of required fields for the articles themselves. With a bit of column renaming, cleanup, and a Table to JSON node we can structure our request inside of KNIME.

Getting the Medium User ID requires another API request.

Once the publishing process is complete, the view below displays in our data app with relevant status information.

As expected, the drafts show up in Medium!

Conclusion

Integrating with the Medium API was a fun challenge and gave us a few benefits over the regular article import feature in Medium. Integrating with REST APIs is easy work in KNIME Analytics Platform, and there are numerous possibilities for what can be done with the Medium API beyond our use case described here. Our workflow is deployed to an internal KNIME Server instance so that our evangelism team members can run the data app from the web.

Resources

--

--

SJ Porter
Low Code for Data Science

Data/cloud engineer, musician, and gamer. Editor for Low Code for Advanced Data Science, writer for TDS and The Startup.