This had happened to everybody, I’m sure.

The Internet is open for all, as are social networks. Each conversation — on politics, sex, religion, sport, and “Justin Bieber” — generates a huge trail of opinions and information.

Unfortunately, no one can stop the randomization of the content in his or her social stream. What’s more, most of the content is useless spam. We can summarize the problem in brief: “A huge amount of uncategorized and random content.” Data fuels the Internet, and it can’t be rejected.

In the last couple years, social networks have been angling to become the main source of online content, and they’ve made progress; nowadays, news agencies and websites turn to social networks as their first source of news and reports.

“Every two days now we create as much information as we did from the dawn of civilization up until 2003.”
Eric Schmidt, CEO of Google

Take the Syrian uprising and civil war for example. The main news outlets — Al Jazeera, CNN, and Russia Today — turned to Youtube, Facebook, and Twitter for content (whether it was true or fabricated). You can even find social media accounts belonging to news agencies and targeted toward a specific topic. (See Al Arabiya Syria.)

These benefits aside, social media’s got problems. Example: When I followed a friend of mine on her new Twitter account, I liked her posts. But after a while, she started tweeting about topics in which I’m not interested. My stream was full of undesirable content, so I unfollowed her. And for the record, I check her profile from time to time to decide whether to follow her again or not.

In Startup Weekend Ramallah, I found a solution (prototype), a social platform that’s a mixture of Quora, Twitter, Facebook, and Reddit — a systematic and user-driven topic generator and matcher, leading to a topic-focused platform. You can follow topics and users with filtering their posts depending on the topics you care about.

Technically, this can be done. NLP (Natural Language Processing) and text tagging are not so hard (I’ve done simple one in Arabic). A lot of libraries out there do NLP. However, the issue is the speed of this kind of process, which will prevent us from getting content in real time.

Lately, you’ll notice Facebook trying to get you to like a page by matching the page’s posts to your interests. Also, YouTube has rolled out auto-generated topics, which collect video about a topic (person, company, project, etc.) and place them in a channel you can subscribe to.

Categorization, or classification, is the core of human intelligence. And these classifications determine (or at least highly influence) the way we interpret the world around us; topic categories enable us to simplify incoming information and to reason about it. It’s time to figure out what on the web matters to us.