Optimization pt 5

When to stop at Near Enough

Peter Ward
Nerd For Tech
2 min readJun 7, 2022

--

GPXmagic reads and writes GPX files, which are an XML representation of a route made of longitude & latitude pairs, usually with altitudes.

Many athletes plan & track activities with GPX files

Obviously, I use an XML parser and process the syntax tree to access the data, right? No. I never bothered with that. When I started, I tried regular expressions to get me started quickly. (The escaping \ makes it look worse than it is).

That worked, and I never felt the need to change it. Until recently. The XML file has a trkseg tag outside of a list of trkpt tags, so a route can be composed of more than one segment, each with many points, but the routes never did.

Proposed support for an upcoming RGT feature changed this. I would need to extract segments, with an optional <namedSegment> tag.

First, I tried to do this with an outer regular expression. That really didn’t go too well. It “sort of” worked but sooo slow. I guess that’s down to the absurdly large captured strings, with attendant memory thrashing.

Seemed the time to do “The Right Thing” now and use a full-on XML parser. This would surely be the easiest and most efficient, both in terms of coding effort and execution time. You’d hope.

Long story short, it seemed totally fine. I could load files up to 84KB. My next largest file, 93KB, would not load. No errors, nothing. Several minutes passed. Nothing. No way to know what was happening. Any larger file — and I have a 22MB test file — failed in the same way.

I thought it might still be my code building up too many intermediate results, though that would not explain the catastrophic failure mode. I massaged all the code into what I figured was a super-efficient single traversal with minimal data copying. Same result.

This looked like a show-stopper. Can’t have a ludicrously low file size limitation, need to preserve the structural nesting of track points in segments, and retain segment names.

Penny dropped. Regular expression match results include the offset of the match in the file. So, for example, I know that the track point tags are at offsets 652, 723, 781, &c. I can simply do more regex searches:

Then it’s merely a question of establishing the relative positions of each and I can infer the structure. Neat, not perfect, a bit of a hack, but it’ s fast and stable.

“Near enough”, my Dad would have said.

--

--

Peter Ward
Nerd For Tech

I ride with the “mid-week professionals”, write stories and technical stuff, and enjoy the low-stress life of semi-retirement.