Telegram Group/Channel Data Extraction (User’s information, chats, and specific messages), and Data Processing

Telegram data extraction using Telethon package, and data processing to get dataframe of required information.

Dayal Chand Aichara
Game of Data
3 min readMar 19, 2019

--

Telegram is a cloud-based instant messaging and voice over IP service developed by Telegram Messenger LLP, a privately held company registered in London, United Kingdom.

Most of the blockchain and cryptocurrencies related companies use telegram to communicate with their world wide customers and supporters because of its unique features.

Telegram groups’ data, such as user’s informations, chats of specific channels, are analyzed to get insights of channels or to get airdrop participants info etc.

In this tutorial, I will explain how to get users’ information, chats, and messages which contain keyword, step by step.

Step 1. Prerequisites.

First of all you need to have a telegram account in order to extract information. Follow steps written below to get credentials (api id and api hash).

  1. Register on telegram with your mobile number.
  2. Create telegram app here.
  3. Note down App api_id and App api_hash.
  4. Join telegram group which information is needed. You can join telegram via group’s share link. ( Example link : https://t.me/c0ban_global)
Fig 1. Telegram App

Step 2. Telethon installation.

Now, install Telethon python package on your system using terminal command pip install telethon .

step 3. Telegram client creation.

Create telegram client.

Step 4. Getting User's Information.

Extract users information as list using client.get_participants . Participants list in below format.

User(id=357343635, is_self=False, contact=False, mutual_contact=False, deleted=False, bot=False, bot_chat_history=False, bot_nochats=False, verified=False, restricted=False, min=False, bot_inline_geo=False, access_hash=-7182373398681298465, first_name=’DC’, last_name=Aichara, username=dcaichara, phone=None, photo=UserProfilePhoto(photo_id=1534779226214999983, photo_small=FileLocation(dc_id=2, volume_id=238230043, local_id=76130, secret=-6996523002233662137, file_reference=b’\x00\\\x90s\xa46\x89\xa0/Z0\xc0K]\x8a!:\x15\x8f\x07\x90'), photo_big=FileLocation(dc_id=2, volume_id=238230043, local_id=76132, secret=-2346907879263725163, file_reference=b’\x00\\\x90s\xa4w\xc0\xa5K”l+\xe0\x94\x9b\xb4\xa9\xf2\x07\xbe\xe8')), status=UserStatusOffline(was_online=datetime.datetime(2019, 3, 19, 0, 42, 32, tzinfo=datetime.timezone.utc)), bot_info_version=None, restriction_reason=None, bot_inline_placeholder=None, lang_code=None)

You can extract information required from participants list. You can further process data to get a dataframe of intended information.

Step 5. Getting Chats.

We use client.get_messages to get chat history. See chat history format below.

Message(id=105406, to_id=PeerChannel(channel_id=1050637540), date=datetime.datetime(2019, 3, 18, 21, 46, 40, tzinfo=datetime.timezone.utc), message=’Hello everyone ?’, out=False, mentioned=False, media_unread=False, silent=False, post=False, from_scheduled=False, from_id=667966548, fwd_from=None, via_bot_id=None, reply_to_msg_id=None, media=None, reply_markup=None, entities=[], views=None, edit_date=None, post_author=None, grouped_id=None)

Step 6. Extracting Specific messages.

Extracting messages which have specific keywords. Use cleint.iter_messages to get messages with specific keyword. See message example for keyword c0ban in telegram channel c0ban Global Community chat history.

Message(id=29, to_id=PeerChannel(channel_id=1272398905), date=datetime.datetime(2018, 11, 16, 8, 17, 9, tzinfo=datetime.timezone.utc), message=’Welcome to c0ban global community 👋’, out=False, mentioned=False, media_unread=False, silent=False, post=False, from_scheduled=False, from_id=656819292, fwd_from=None, via_bot_id=None, reply_to_msg_id=None, media=None, reply_markup=None, entities=[], views=None, edit_date=None, post_author=None, grouped_id=None)

I hope, you liked this article. Complete code is available on Github. Reach out to me on LinkedIn or Twitter, if you have any query.

Reference: https://media.readthedocs.org/pdf/telethon/stable/telethon.pdf

Note: To extract private channel’s data, you must have admin privileges.

Bonus: If you struggle to find best hyperparameters for boosting algorithms, read my latest article to help yourself.

P.P.S. : Please, read my other articles here.

--

--

Dayal Chand Aichara
Game of Data

Data Scientist at KPMG Ignition Tokyo , Blockchain Enthusiast, Traveller, Trekker — https://www.linkedin.com/in/dcaichara/