The line break problem when using Server Sent Events (SSE)
Today, I want to share a significant challenge I recently encountered and the steps I took to overcome it. By walking you through my solution, I’ll demonstrate how you can implement it using Python, making the process clear and actionable.
The problem
Recently, a colleague at Alby brought a challenging issue to my attention. They noticed that the markdown streamed from our backend was occasionally displayed incorrectly on the frontend. The line breaks were misplaced, markdown identifiers like *
or **
, and even lists were not rendering properly. Interestingly, when we checked the database, everything was perfectly formatted. The issue was elusive, occurring sporadically and making it difficult to reproduce.
Given that the database was correct, we suspected the problem lay within the Server-Sent Events (SSE) streaming from our backend. Let’s consider an example of a valid SSE event as per the specifications:
"event: eventName\ndata: my-data\n\n"
When consumed by the frontend (javascript), it should look like this:
{"event": "eventName", "data": "my-data"}
Everything appears normal here. Now, imagine our backend needs to stream the literal \n
. The event would look like this:
"event: eventName\ndata: \n\n\n"
The question arises: how will the SSE library parse this? According to the documented logic, we would receive the following events:
{"event": "eventName", "data": ""}
{"data": ""}
Here’s why this happens: \n
serves as an end-of-line identifier, so the parser treats everything up to the first \n
as one block (event) and the subsequent \n
as another block with empty data. When the frontend parses this, it essentially ignores the second event and appends an empty string to the existing text. Consequently, the line break is ignored, potentially disrupting the entire markdown rendering.
Solution
After evaluating various approaches, we found the best solution was to serialize the data as a JSON object rather than as a raw string. This method offers more flexibility in the types of data we can return and prevents issues like the one we encountered.
Since we were using FastAPI and Pydantic, we decided to structure the entire event output as Pydantic objects and then serialize them to strings. Here’s the implementation using orjson
for faster serialization:
from orjson import orjson
from pydantic import BaseModel
from typing import Any
class SSEEventMessage(BaseModel):
message: Any
class SSEEvent(BaseModel):
event: str # We actually have an enum here, but I will use str
data: SSEEventMessage
def serialize():
return f"event: {self.event}\ndata: {orjson.dumps(self.data).decode()}"
By structuring the events this way, we ensured seamless streaming without worrying about the message content (as long as it’s serializable). We chose orjson
for its speed and ability to handle various data types like datetime
, UUID
, and numpy
.
Next, we updated our logic to use this new structure. Here’s an example of how to implement it:
from fastapi import APIRouter
from starlette.responses import StreamingResponse
api_router = APIRouter()
@api_router.get("")
async def stream():
return StreamingResponse(my_streaming_logic(), media_type="text/event-stream")
async def my_streaming_logic() -> AsyncIterable[bytes]:
for i in range(10):
event = SSEEvent(event="my_event", data=SSEEventMessage(message=i))
yield event.serialize()
Job done! You now have a robust method to stream markdown or any other type of data to your frontend safely and efficiently.