Abstract
The article provides an alternative approach to track games in various categories, including “New & Trending”, “Top Sellers”, “Global Top Sellers”, “Popular Upcoming”, and “Specials”, using Steam’s searching API without any HTML processing. Filter values of the API are explored, providing a comprehensive list of parameters for customizing search queries.
Forewords
In my previous articles, we delved into the world of Steam API and explored how to create custom game databases and review databases. These techniques are instrumental in gathering valuable data for research projects, such as topic modeling over game reviews. While researching, I stumbled on a Medium article describing how to create a daily database of different promotional tabs in the front page of Steam, such as “New & Trending”, “Top Sellers”, “Popular Upcoming” and “Specials”, using web-scraping and HTML processing.
However, I was convinced that there should be an API only method which does not require HTML processing, and began my tiny research. After some digging, I found a Steam API that supports searching features of these categories and returns easy-to-read JSON data.
Python will be used for this project. requests
package will be used, which can be installed through pip
or conda
.
# pip
pip install requests
# conda
conda install anaconda::requests
Body
The /search/results API is a GET request which can be thought as an alias of the /search API, which normally directs to Steam’s search page (returning an HTML webpage). However, a major difference between /search/results API and /search API is that the latter supports returning structured JSON, which is more suitable for data scraping.
According to the unofficial internal API documentation, it returns a list of apps with their name and logos, which are URLs to the images. A sample result of the API is shown in the screenshot below.
The API offers lots of parameters for users to customize their search, which is almost identical to the criteria shown on the right side of the Steam’s search page, such as hiding free-to-play items, tags, types (identical to category in the API), OS, language, return ordering and more.
Initiate a request with the /search/results API is simple in Python. Extra error handling is included for good measure.
params = {
"filter": "topsellers",
"hidef2p": 1,
"page": 1, # to control the page of the returned result, similar to what "cursor" does in scraping reviews of a game
"json": 1
}
def get_search_results(params):
req_sr = requests.get(
"https://store.steampowered.com/search/results/",
params=params)
if req_sr.status_code != 200:
print_log(f"Failed to get search results: {req_sr.status_code}")
return {"items": []}
try:
search_results = req_sr.json()
except Exception as e:
print_log(f"Failed to parse search results: {e}")
return {"items": []}
return search_results
search_results = get_search_results(params)
Finding all the “filters”
The unofficial documentation does not provide a list of options for the “filter” field. Suspecting that the /search/results API works in a similar way as the /search page, I tried to trigger the search page from these four tabs, “New & Trending”, “Top Sellers”, “Popular Upcoming” and “Specials”, by mainly clicking the buttons near to the “See more” phrase.
The resulting filter values are listed in the table below. Note that the “Special” tab is triggered using the “specials” field of the API.
+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| “filter” value | Description |
+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| “” (empty) | The default value of the “filter” field, to display all games on Steam. Usually used with field “sorted_by”=”Released_DESC” to display all games on Steam from the latest release date to the earliest. |
| “popularnew” | Corresponds to the “New & Trending” tab |
| “topsellers” | Corresponds to the “Top Sellers” tab |
| “globaltopsellers” | Corresponds to the “Global Top Sellers” button next to the “See more:” phrase in the bottom of the list of “Top Sellers” tab |
| “popularcomingsoon” | Corresponds to the “Popular Upcoming” tab |
+---------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
The Scraper
With these fields in mind, we can build a basic scraper to search and record the recommended games and the corresponding appid
from Steam daily based on different filters. We will scrape for top 100 games of the five “categories”: “New & Trending”, “Top Sellers”, “Global Top Sellers”, “Popular Upcoming” and “Specials”. To make the scraping more meaningful, the information of the game will also be scraped using the /api/appdetails API. Finally, a pickle is saved for each of the five “categories” with the date of the scraping performed.
# Imports and Helper functions
from datetime import datetime
import time
import requests
import pickle
from pathlib import Path
import re
def print_log(*args):
print(f"[{str(datetime.now())[:-3]}] ", end="")
print(*args)
def get_search_results(params):
req_sr = requests.get(
"https://store.steampowered.com/search/results/",
params=params)
if req_sr.status_code != 200:
print_log(f"Failed to get search results: {req_sr.status_code}")
return {"items": []}
try:
search_results = req_sr.json()
except Exception as e:
print_log(f"Failed to parse search results: {e}")
return {"items": []}
return search_results
def get_app_details(appid):
while(True):
if appid == None:
print_log("App Id is None.")
return {}
appdetails_req = requests.get(
"https://store.steampowered.com/api/appdetails/",
params={"appids": appid, "cc": "hk", "l": "english"}) # change the countrycode to the region you are staying with
if appdetails_req.status_code == 200:
appdetails = appdetails_req.json()
appdetails = appdetails[str(appid)]
print_log(f"App Id: {appid} - {appdetails['success']}")
break
elif appdetails_req.status_code == 429:
print_log(f'Too many requests. Sleep for 10 sec')
time.sleep(10)
continue
elif appdetails_req.status_code == 403:
print_log(f'Forbidden to access. Sleep for 5 min.')
time.sleep(5 * 60)
continue
else:
print_log("ERROR: status code:", appdetails_req.status_code)
print_log(f"Error in App Id: {appid}.")
appdetails = {}
break
return appdetails
# Main code
execute_datetime = datetime.now()
search_result_folder_path = Path(f"search_results_{execute_datetime.strftime('%Y%m%d')}")
if not search_result_folder_path.exists():
search_result_folder_path.mkdir()
# a list of filters
params_list = [
{"filter": "topsellers"},
{"filter": "globaltopsellers"},
{"filter": "popularnew"},
{"filter": "popularcommingsoon"},
{"filter": "", "specials": 1}
]
page_list = list(range(1, 5))
params_sr_default = {
"filter": "topsellers",
"hidef2p": 1,
"page": 1, # page is used to go through different parts of the ranking. Each page contains 25 results
"json": 1
}
for update_param in params_list:
items_all = []
if update_param["filter"]:
filename = f"{update_param['filter']}_{execute_datetime.strftime('%Y%m%d')}.pkl"
else:
filename = f"specials_{execute_datetime.strftime('%Y%m%d')}.pkl"
if (search_result_folder_path / filename).exists():
print_log(f"File {filename} exists. Skip.")
continue
for page_no in page_list:
param = params_sr_default.copy()
param.update(update_param)
param["page"] = page_no
search_results = get_search_results(param)
print_log(search_results)
if not search_results:
continue
items = search_results.get("items", [])
# proprocessing search results to retrieve the appid of the game
for item in items:
try:
item["appid"] = re.search(r"steam/\w+/(\d+)", item["logo"]).group(1) # the URL can be steam/bundles/{appid} or steam/apps/{appid}
except Exception as e:
print_log(f"Failed to extract appid: {e}")
item["appid"] = None
# request for game information using appid
for item in items:
appid = item["appid"]
appdetails = get_app_details(appid)
item["appdetail"] = appdetails
items_all.extend(items)
# save the search results
with open(search_result_folder_path / filename, "wb") as f:
pickle.dump(items_all, f)
print_log(f"Saved {filename}")
Actual file and Detailed usage can be found in my Github
Conclusion
In this short article, we explored the /search/results API and its potential for creating custom databases to track games in different “categories”. We also delved into the filter values of the API and recorded them for future reference.
This is Part 3 out of 3 of the series of “Scraping Steam: A Comprehensive Guide (2024 ver)”.
Further Reading
The unofficial internal wiki of the /search/results Steam API: Get Search Results