How 100 Smartcards Killed Server
This is a story that made me banging my head against the wall for a couple of months (ok, not literally but I wasn’t too far from that). Pain, desperation, awe, and ultimately a success. We have been in production deployment for some time and all seems to be working … for now.
But let me briefly re-live the “desperation stage”.
We were writing a proxy for smartcards with TCP/IP interface. They come in multiples of 100 so to really make use of their CPUs, one needs a “massively” parallel client software as smartcards are behind FPGA. (the hardware does up to 1.2Mbps on each smartcard ->max at about 140Mbps in raw data with just one lot of smartcards).
Anyway — we needed a Python (we thought: simple = fast development = Python) proxy that translates APDUs into “SIGN”, “GET CERTS”, … nice architecture, multi-processsing, scalable. Tests against a handful of smartcards went swimmingly. But when the customer plugged it to just one board with 100 smartcards the whole proxy started falling over.
We tried loads of things so let me just mention the last problem — the proxy starts and launches 100 processes (multiprocessing) — one for each smartcard. They do initialisation, which takes a few seconds (80–120 commands sent to each smartcard). We have a join() at the end so we can communicate upstream that the hardware is…