Content Discovery tools for Commercial publishers

Identifying content in a CMS relevant to content of a published electronic document

Commercial publishers commonly research content that others have published. Such research is particularly common for content, such as for news and entertainment, published on the internet, or “online”. For example, in news and entertainment, commercial publishers need to ensure that content available from them online is interesting to potential viewers, and otherwise up-to-date.

Image for post

Each major commercial news outlet typically has a “home page” on the internet on which it publishes headlines for major stories of the moment. The home page generally has many headlines, which typically are hypertext links that can be used to access a full story. In some cases, a few sentences may be provided on the home page. The organization of the headlines on the home page generally changes several times per day as new stories become available, and older stories become less frequently viewed. Thus, online content from commercial publishers, particularly in news and entertainment can change very quickly.

For a commercial publisher to identify content published online by other publishers, and to compare such content to its own resources, a challenging task is presented due to the high volume of content, rapid change of content and limited access to content. A large amount of time and computer resources can be consumed by users in reviewing online content and content stored in their content management systems.

Several technical challenges arise for a commercial publisher to compare content published online by other publishers to its own resources. In particular, the content published by other publishers is only available to the commercial publisher in its published format, such as through a “home page” of a web site.

Thus, any analysis of the published content available from another publisher is based on the structure and content of a published electronic document. A commercial publisher generally does not have access to a database of content, and various metadata about that content, owned by other competing commercial publishers. Thus, a computer-based analysis of what another publisher’s content involves extracting information based on the structure and content from a published electronic document, typically a home page.

The extracted information is used to generate queries to find relevant content in a content management system. Results from such queries are processed to communicate to a user whether the content management has content available corresponding to the query and relevant to the published electronic document, and whether the available content is published in electronic documents available from the second source.

When the query results are processed based on the relative importance of the information extracted from the published document, the communication to users can include indications of the relative importance of the query results, thus allowing the users to focus their attention on the more important content, and reduce consumption of computer resources, such as processing and network bandwidth.

Accordingly, in one aspect, a computer system receives a published electronic document from a first source of published electronic documents. The computer system analyzes structure and content of the published electronic document to extract information, and data indicative of relative importance of the extracted information. Such extracted information can include keywords, based on content, and information indicative of relative importance of those keywords, based on structure.

The computer system generates queries based on the extracted information to query a content management system of a second source of published electronic documents. The results can indicate whether the content management has content available corresponding to the query, and whether the content is published in electronic documents available from the second source.

The computer system can process these results received from queries, using the relative importance of the extracted information, to communicate information indicative of content available in the content management system and relevant to the published document and not yet published in electronic documents available from the second source. This information for several purposes to reduce consumption of computer resources, and otherwise improve productivity of users and reduce the amount of time for making content available for distribution.

In the following description, reference is made to the accompanying drawings which form a part hereof, and in which are shown, by way of illustration, specific example implementations. Other implementations may be made without departing from the scope of the disclosure.

Referenced Patent Application

Written by

Author of https://www.theinnovationmode.com/ Opinions and views are my own

Sign up for Innovation Monthly

By The Innovation Machine

The Newsletter for the Innovation Leader - Methods, Ideas, Technology Updates Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

The community of Innovators and Inventors. We welcome people who are passionate about technology as the means of solving big problems. We believe in ideas and the power of online communities. Follow the Innovation Machine to discover problems worth solving and big ideas.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store