Analysis of Bookmarks and Revisiting the Forgotten Topics

John Samuel
Dec 27, 2020 · 6 min read

Our thoughts, as well as interests evolve and our reading sometimes reflects this evolution. A major part of our reading has now moved to the internet and the way we use to read the news on the internet has changed dramatically over the past few years. From reading news on certain news outlets to reading the news headlines on social media, we can get a glimpse of this rapid change of the reading habits of users. However, we still use some sort of bookmarking for saving certain articles for future reads and references. These bookmarks, whether on browsers or social media websites contain valuable information, which can be used to improve our future reading as well as remind us of some forgotten topics of past interests. A reading profile can be built by the analysis of the articles read in the past and the associated tags used by the readers. This article explores some of the key information that can be obtained from bookmarked and tagged articles, especially for revisiting some forgotten topics.

Photo by Daniela on Unsplash

Analysis of Bookmarks

1. Identifying the Data Sources

  • Likes and Favorites
  • Retweets and reposts

Internet browsers usually use HTML, XML, or JSON data format to store the bookmarks. But the data format is not standardized across browsers. Browsers let you export or import bookmarks in HTML format. In some cases like Firefox, user tags are not present in the exported HTML file. One option here, in the case of Firefox, is to make use of bookmark backups that provide a lot of information usually absent in the exported files like the logo URL of the bookmarked sites, time of bookmarking, user-generated tags. Social media websites provide application programming interfaces (or API) so that the developers can access some of the above information.

2. Data Analysis

Source URL

  • Article title
  • Article content (usually for feeds, after fetching the content)
  • Website URL
  • Website title
  • Website description (often available)


  • Bookmarked time
  • Time of last modification

User-generated content

  • Categories/Folders
  • Tags or hashtags

In addition to the above, the social media companies provide additional information based on their own analysis. The following information is also available to the readers:

  • Number of users who bookmarked the article
  • Number of times an article was shared
  • The users who shared the article

This is a treasure-trove of information. Take for example, given the availability of time, one interesting aspect is to know what theme of topics interested me during specific periods of time and when did a particular topic catch my first attention. Tags and categories (or folders) may also give an idea of the information classification style, in this case, how the users categorize different articles under a category or a sub-category. Internet browsers usually allow only one category (or subcategory) for an article. Tags may play an important role in the classification since an article can have more than one tag.

Now that we have seen the data to analyze, we can take a look at how to infer interesting insights with these data. Some non-exhaustive questions are given below:


  • Total number of bookmarks
  • The average number of bookmarks in a day/week/month/year
  • Maximum number of bookmarks made on a day/week/month/year
  • Minimum number of bookmarks made on a day/week/month/year


  • Total number of unique websites
  • The average number of bookmarks per website
  • The website with the maximum number of bookmarks or commonly referred website(s)
  • The website with the least number of bookmarks or the least referred website(s)


  • Total number of unique tags/categories
  • Tags/categories with the maximum number of bookmarks or the most common tags
  • Tags/categories with the minimum number of bookmarks or the least used tags
  • Tags/categories with the maximum number of websites or the most common tags
  • Tags/categories with the minimum number of websites or the most common tags

Analytics Information

  • Tags of interest
  • Users interested in similar topics

All the above questions are based on the aggregation functions like count, minimum, maximum, average. These are commonly used tasks in data analysis in almost every domain. This information is sufficient enough to build a basic reading profile. But care must be taken to ensure that it doesn’t lead to building a profile that ignores the least commonly used bookmarks or websites. In other words, data analysis must also help in filtering out the forgotten topics or bookmarks and websites. To revisit the forgotten topics, one needs to focus on the least frequented websites, the least used tags, the least used categories, etc. Additionally, the focus must be on the past bookmarks and not on the recent ones.

Unfortunately, most of our current recommendation systems focus on aspects like recency. Hence recent posts and articles have a prominent place on the search results. Certain past topics of interest never appear in the results.

3. Data Visualization

Building a Reading Profile

Another important aspect of reading is the surprise factor. Users must get a certain amount of articles that are surprising that may help them to exit their filter bubble. And as discussed above, the past topics of interest and the least visited sites may give certain insights.



  1. Categorization
  2. Tag (metadata)
  3. News Aggregator
  4. Data Analysis
  5. Pazzani, Michael J., and Daniel Billsus. “Content-Based Recommendation Systems.” The Adaptive Web: Methods and Strategies of Web Personalization, edited by Peter Brusilovsky et al., Springer, 2007, pp. 325–41.
  6. Ricci, Francesco, et al. “Introduction to Recommender Systems Handbook.” Recommender Systems Handbook, edited by Francesco Ricci et al., Springer US, 2011, pp. 1–35.

Originally published at

The Startup

Get smarter at building your thing. Join The Startup’s +730K followers.