How to Get Data From Telegram Using Python
A Python tutorial on getting Telegram channel messages and members lists
For research purposes, and to analyze the content of a Telegram channel, you may need the channel’s data in a clean JSON format.
I created a Python script to get data from Telegram channels. It has two main files: One for getting a member’s data from a channel, and second, to get the channel’s messages.
This script saves this data into JSON files; you can use them for analysis or to import into your databases.
You need Python 3 installed. Also, I used
telethon, a Python package to work with Telegram.
telethon you need to use a
pip3 install telethon
You can read Telethon’s documentation to learn about this package’s full functionalities.
Get your Telegram API credentials
To connect to Telegram, we need an
api_id and an
api_hash. To get these parameters, you need to login to your Telegram core and go to the API development tools area. There is a form that you need to fill out, and after that, you can receive your
Here’s Telegram’s help documentation about how to get your API credentials.
Create a Telegram client in your Python Script
This part is pretty much the same for both getting channel members and channel messages. First, we need basic imports:
configparser to read API credentials from a config file and
package.json to dump data into JSON formatted files.
We import what we need from Telethon to create a Telegram client in our script.
As you may know, it’s not secure to store your Telegram API credentials in your source code. If you put credentials of any kind directly into your source code, you are risking your own security and also the security of whoever uses that code, because you are misguiding whoever wants to use your code.
So to avoid security issues, we put our API credentials in another file called
config.ini. It has a simple structure like this:
Now, to create a Telegram client in our Python script, first, we read these credentials in our code:
Now that we have everything we need, we attempt to log in to Telegram and create a client object to use for getting data:
Telegram authorizes your credentials, and then requests a verification code and a password, if you set any for your Telegram. This is exactly as if you were logging in to your Telegram account on the app or online.
Be aware that when this script runs, it has access to your Telegram account. Make sure that you run the script in a secure environment.
client object ready now, and we can use this object to connect and talk to Telegram.
Getting channel members
We will do this in two steps. First, we get all channel members data from Telegram, and then we save these data into a JSON file.
Before these steps, remember to add three more imports to your script head:
from telethon.tl.functions.channels import GetParticipantsRequest
from telethon.tl.types import ChannelParticipantsSearch
from telethon.tl.types import (
Request for channel members from Telegram
First of all, we ask the user for a Telegram channel. You may give the script a channel’s URL, or the channel’s unique ID.
So, we get user input and convert it to a Telegram channel:
If the user gives us a channel ID, we can convert it to a PeerChannel object. And if the user gives us a Telegram channel URL (like https://t.me/channel) we can use that directly.
Next step is to get channel members. First, you need to know that Telegram does not respond with the whole data you request but gives data in batches. We can get 100 members in each request.
We set a limit of 100, starting from offset 0 and create a list which will hold channel members. Inside an infinite loop, we create an object
GetParticipantsRequest, which searches for empty strings in the channel’s members list, and that brings us all users. As I mentioned, we can only get 100 members in each request. After getting members, we check if
participants object has a
users property. If it does not have
users, that means we get all users, so we break the infinite loop. If it does have
users, we add new members to the all members list and add received members list length to the offset, so the next request asks for users starting from this offset.
This loop continues until it gets all members of the channel.
Store data in JSON file
This is the easy part. Although you can save the data into any database, such as MySQL, MongoDB, etc., the easiest way is to store the data is in a JSON file. However, if you have a lot of data, it’s better to consider storing it in a database.
You can store the whole object of a member in the JSON file, but I prefer to store just what I need instead. So, I created a list to add members data to, and then wrote a JSON dump of this list into a file
Simple and easy: I created a dictionary of a member data, and I append it to the list. After that, I wrote the JSON dump into the file.
Here is the full code to get members of a Telegram channel:
Getting channel messages
Before starting this step you need to add these imports to your script’s head:
from telethon.tl.functions.messages import (GetHistoryRequest)
from telethon.tl.types import (
After you edit the imports, creating a Telegram client in your Python code is exactly same as the previous section. Also, getting a channel ID or URL from the user is the same as explained in the previous section. So, I assume you have a Telegram
client ready and you’ve created a channel object, which I call
GetHistoryRequest object to the Telegram client will return a history object with the list of messages. Again we have a limit of 100 messages for each request. So, we loop this request inside an infinite loop. After each request, we check if the history object has messages property. If it doesn’t, then we have reached the end of messages in the channel, so we can break out of the loop.
I also added a
total_count_limit variable. You may not want all messages, or getting all messages may take too much time, so you can set how many messages you want to get from the channel. If you set this to 0, the script will get all messages from the channel.
Setting offset is a little tricky this time.
GetHistoryRequest receives an
offset_id, which means, from what message it should start getting the history. You need to set the offset to the last message ID every time you receive a message list:
offset_id = messages[len(messages) - 1].id
To save messages as JSON data you need to convert the message object to a dictionary. You can use a
to_dict function to get the message object in a dictionary format:
for message in messages:
The last two lines of the code, check if the
total_count_limit is set to higher than 0. If total messages received is the total messages we want, and if these two conditions are true, it breaks from the loop.
Now that you have all messages data, you can store this list into a JSON file. It is as easy, as I explained in the previous section.
Here you can see the complete code:
I shared the full repository of this script. You can see whole codes and fork this repository and change it on your own. Also, if you find any improvement to my source code, I’ll be happy to accept pull requests.