How to Get Data From Telegram Using Python

A Python tutorial on getting Telegram channel messages and members lists

Amir Yousefi
Aug 14 · 6 min read

For research purposes, and to analyze the content of a Telegram channel, you may need the channel’s data in a clean JSON format.

I created a Python script to get data from Telegram channels. It has two main files: One for getting a member’s data from a channel, and second, to get the channel’s messages.

This script saves this data into JSON files; you can use them for analysis or to import into your databases.

Requirements

You need Python 3 installed. Also, I used telethon, a Python package to work with Telegram.

To install telethon you need to use a pip command:

pip3 install telethon

You can read Telethon’s documentation to learn about this package’s full functionalities.

Get your Telegram API credentials

To connect to Telegram, we need an api_id and an api_hash. To get these parameters, you need to login to your Telegram core and go to the API development tools area. There is a form that you need to fill out, and after that, you can receive your api_id and api_hash.

Here’s Telegram’s help documentation about how to get your API credentials.

Create a Telegram client in your Python Script

This part is pretty much the same for both getting channel members and channel messages. First, we need basic imports:

I used configparser to read API credentials from a config file andpackage.json to dump data into JSON formatted files.

We import what we need from Telethon to create a Telegram client in our script.

As you may know, it’s not secure to store your Telegram API credentials in your source code. If you put credentials of any kind directly into your source code, you are risking your own security and also the security of whoever uses that code, because you are misguiding whoever wants to use your code.

So to avoid security issues, we put our API credentials in another file called config.ini. It has a simple structure like this:

Now, to create a Telegram client in our Python script, first, we read these credentials in our code:

Now that we have everything we need, we attempt to log in to Telegram and create a client object to use for getting data:

Telegram authorizes your credentials, and then requests a verification code and a password, if you set any for your Telegram. This is exactly as if you were logging in to your Telegram account on the app or online.

Be aware that when this script runs, it has access to your Telegram account. Make sure that you run the script in a secure environment.

We have client object ready now, and we can use this object to connect and talk to Telegram.

Getting channel members

We will do this in two steps. First, we get all channel members data from Telegram, and then we save these data into a JSON file.

Before these steps, remember to add three more imports to your script head:

from telethon.tl.functions.channels import GetParticipantsRequest
from telethon.tl.types import ChannelParticipantsSearch
from telethon.tl.types import (
PeerChannel
)

Request for channel members from Telegram

First of all, we ask the user for a Telegram channel. You may give the script a channel’s URL, or the channel’s unique ID.

So, we get user input and convert it to a Telegram channel:

If the user gives us a channel ID, we can convert it to a PeerChannel object. And if the user gives us a Telegram channel URL (like https://t.me/channel) we can use that directly.

Next step is to get channel members. First, you need to know that Telegram does not respond with the whole data you request but gives data in batches. We can get 100 members in each request.

We set a limit of 100, starting from offset 0 and create a list which will hold channel members. Inside an infinite loop, we create an object GetParticipantsRequest, which searches for empty strings in the channel’s members list, and that brings us all users. As I mentioned, we can only get 100 members in each request. After getting members, we check if participants object has a users property. If it does not have users, that means we get all users, so we break the infinite loop. If it does have users, we add new members to the all members list and add received members list length to the offset, so the next request asks for users starting from this offset.

This loop continues until it gets all members of the channel.

Store data in JSON file

This is the easy part. Although you can save the data into any database, such as MySQL, MongoDB, etc., the easiest way is to store the data is in a JSON file. However, if you have a lot of data, it’s better to consider storing it in a database.

You can store the whole object of a member in the JSON file, but I prefer to store just what I need instead. So, I created a list to add members data to, and then wrote a JSON dump of this list into a file

Simple and easy: I created a dictionary of a member data, and I append it to the list. After that, I wrote the JSON dump into the file.

Here is the full code to get members of a Telegram channel:


Getting channel messages

Before starting this step you need to add these imports to your script’s head:

from telethon.tl.functions.messages import (GetHistoryRequest)
from telethon.tl.types import (
PeerChannel
)

After you edit the imports, creating a Telegram client in your Python code is exactly same as the previous section. Also, getting a channel ID or URL from the user is the same as explained in the previous section. So, I assume you have a Telegram client ready and you’ve created a channel object, which I call my_channel:

Sending a GetHistoryRequest object to the Telegram client will return a history object with the list of messages. Again we have a limit of 100 messages for each request. So, we loop this request inside an infinite loop. After each request, we check if the history object has messages property. If it doesn’t, then we have reached the end of messages in the channel, so we can break out of the loop.

I also added a total_count_limit variable. You may not want all messages, or getting all messages may take too much time, so you can set how many messages you want to get from the channel. If you set this to 0, the script will get all messages from the channel.

Setting offset is a little tricky this time. GetHistoryRequest receives an offset_id, which means, from what message it should start getting the history. You need to set the offset to the last message ID every time you receive a message list:

offset_id = messages[len(messages) - 1].id

To save messages as JSON data you need to convert the message object to a dictionary. You can use a to_dict function to get the message object in a dictionary format:

for message in messages:
all_messages.append(message.to_dict())

The last two lines of the code, check if the total_count_limit is set to higher than 0. If total messages received is the total messages we want, and if these two conditions are true, it breaks from the loop.

Now that you have all messages data, you can store this list into a JSON file. It is as easy, as I explained in the previous section.

Here you can see the complete code:


I shared the full repository of this script. You can see whole codes and fork this repository and change it on your own. Also, if you find any improvement to my source code, I’ll be happy to accept pull requests.

Better Programming

Advice for programmers.

Amir Yousefi

Written by

Making Impacts on Individuals and Organizations using Information Technology

Better Programming

Advice for programmers.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade