“Breaking the Barrier: Resolving the 10K Log Insights Limit in CloudWatch and Consolidating Records into a CSV”

Aishwaryaicerastogi
3 min readJun 23, 2023

--

Introduction: In today’s world, CloudWatch has become the go-to choice for efficiently logging applications. With its powerful Log Insights query feature, CloudWatch enables developers to extract valuable insights from log data generated by various compute resources like AWS EC2, ECS, and Lambda. However, one common frustration arises when dealing with large-scale application logs — the 10K log insights query result limit. This limitation can hinder the process of analyzing logs and exporting data, making it a challenging task. But fear not! In this blog post, we will explore a simple Python script that can help developers overcome this hurdle by exporting log insights query results into a convenient CSV file.

The Challenge of the 10K Log Insights Query Result Limit: When working with CloudWatch, developers often encounter the 10K log insights query result limit, which can be a major roadblock in analyzing large volumes of log data. This limitation restricts the number of records that can be retrieved in a single query, causing frustration and hindering efficient log analysis. However, by breaking down the time frame of the query into smaller intervals, we can circumvent this limitation and retrieve all the required log data.

Introducing a Python Solution: To simplify the process of exporting log insights query results into a CSV file, I have developed a handy Python script. This script enables developers to overcome the 10K result limit by splitting the query into smaller time intervals, thus ensuring all log records are captured.

How the Python Script Works: The Python script utilizes the AWS SDK (Boto3) to interact with CloudWatch and perform the log insights query. By specifying the desired time range and query parameters, the script retrieves log data in chunks, ensuring that each chunk stays within the 10K result limit. The retrieved log records are then exported into a CSV file, providing developers with a convenient and organized format for further analysis.

def process_results(starttime, endtime):
try:
cw_client = boto3.client('logs')

# select your interval, according to which you want to divide your time interval into
interval = timedelta(minutes=20)

count = 0
with open('query' + '_results.csv', mode='a') as file:
writer = csv.writer(file)
periods = []
writer.writerow(['name'])
period_start = starttime
while period_start < endtime:
period_end = min(period_start + interval, endtime)
periods.append((period_start, period_end))

logs = get_logs_for_query(cw_client, "", period_start, period_end)
logs = logs['results']
for log in logs:
writer.writerow([log[0]['value']])
count = count+1
period_start = period_end
print(count)

except Exception as e:
print("Something went wrong with account: " + account)
print("Error: " + str(e))
traceback.print_exc()

def get_logs_for_query(logs_client, query_str, start_time, end_time):

query = f'fields @timestamp, @message, @logStream, @log | filter @message like "Hey there" | parse @message "*Hey there* as @a1, @name | display @name'
response = logs_client.start_query(logGroupName='/aws/lambda/<Lambda-Function-Name>',
startTime=int(start_time.timestamp()),
endTime=int(end_time.timestamp()),
queryString=query,
limit=10000
)
query_id = response['queryId']

final_response = None

while final_response is None or final_response['status'] == 'Running':
print('Waiting for query to complete ...')
time.sleep(1)
final_response = logs_client.get_query_results(
queryId=query_id
)

return final_response

def main():
# start and endtime of the log insights query that you want to divide into smaller intervals
starttime = datetime.fromisoformat('2023-06-22T10:45:00')
endtime = datetime.fromisoformat('2023-06-22T14:45:00')

process_results(starttime, endtime)



if __name__ == '__main__':
main()

Benefits and Practical Usage: Using this Python script offers several benefits. Firstly, it simplifies the process of exporting log insights query results, saving developers valuable time and effort. Secondly, it eliminates the frustration caused by the 10K result limit, enabling comprehensive log analysis even for large-scale applications. Lastly, the resulting CSV file provides a structured representation of the log data, facilitating easy filtering, sorting, and visualization.

Conclusion: CloudWatch’s Log Insights query feature is undoubtedly a powerful tool for extracting insights from application logs. However, the 10K query result limit can pose challenges for developers dealing with substantial log volumes. With the Python script presented in this blog post, developers now have an efficient solution at their disposal. By breaking down queries into smaller intervals, this script enables seamless log data export to CSV, empowering developers to gain valuable insights and streamline their log analysis workflow. HAPPY CODING !!

Author Bio:

Aishwarya Deep Rastogi is a SDE-2 at Amazon with expertise in Large Scale Software Design. With 2 years of experience in Software Engineering, he has worked with large scale distributed systems teams like AWS Redshift and Amazon Ads. Aishwarya is passionate about designing and creating robust, scalable and highly available systems. Apart from work, he enjoys playing tennis and listening to Music.

Connect with Aishwarya Deep Rastogi on Linkedin : https://www.linkedin.com/in/aish07/

--

--