Look before you leap: Reviewing a B2C HealthTech app for security, API and ToS.

The North remembers

After having taken a 30k foot view of the sector HealthTech sector, I’ve decided to try-out an application for myself. Before handing over all my PHI (protected health information) to a 3rd party, I decided to take a deep-dive into one company’s B2C application. This is the story of my journey.

My feeling is that adherence to security best practice and well-reasoned architecture permeate a firm, so if both are solid in the B2C application, my health data is probably safe on the back-end.

TL;DR

Not perfect, but a very reasonable application for the aspects I examined.
Didn’t discover any major security holes, but in fairness, sent to The Faceless men (company in question) for review before unmasking the servant of the many faced god.

Security

Could have gotten slightly fancy (proxy: mitmproxy, charlesproxy, sniffing: wireshark), but I decided to work the old fashioned way, just a network console and curl.¹

Occasional improper token revocation

Most tokens are properly reject with {"detail":"Invalid token"} upon logout. However, occasionally a token survives after logout. Maybe this is an edge-case bug like improperly configured caching (they’re not using JWTs).

No back-end request checking: CORS, referrer, CSRF or XHR

After clicking through the application and logging xhr calls, I tried calling a few with curl. All requests work out of the box from the command-line with a head auth token:

curl -H 'authorization:Token <access_token>' <url> 

Not suggesting they nonce every set of XHR at initial page-load, but I was a little surprised by the complete lack of request checking.

Even though it’s a private API, they’re in production so I’d have expected minimal steps to lock-down the API, particularly end-points discoverable from the front-end application.

A CORS/referrer/CSRF restricted to *.company.com end-points would prevent the API from being usable by 3rd parties without the use of a proxy. Blocking non-xhr access would require scripting or spoofing to circumvent. Both fairly straight forward, but raises the bar just a wee bit. Guess I should be grateful.

Wildcard cert in production

The company uses the same wildcard cert everywhere except it’s main www website. My feeling is that while wildcard certs are great for development, each production deployment should be independently cert’d.

Further, for an app whose purpose is to deal in PHI, I’d recommend an EV Cert for the actual web application. It’s an additional layer of protection for users, e.g. against phishing attacks in your login area.

Partial HTTPS redirection

The main www site does not HTTP → HTTPS redirect. I prefer an HSTS policy that includes #AllTheThings, including sub-domains.

Both the front-end application and API include HTTPS redirection. Looks like an nginx redirect since there’s no special HSTS header.

Untrusted vendor certs

+1 for not using using a Comodo (wildcard , grr…see above) instead of Symantec. Unfortunately, your vendors were not so wise.

Your business/product folks may have noticed a small drop in your Chrome/Chromium user numbers, here’s why:

Why mixpanel, why!

From CDN link using Chrome 53: https://cdn.mxpnl.com/libs/mixpanel-2-latest.min.js

Improper config at mixpanel + Symantic being (rightly) punished for playing funny-money with their certs. Issuing unaudited ‘test’ certs for major domains like google, com’on Symantic! Read more here.

Wildcard accept header

Content Type is set appropriately, but:

Accept: */*

Should be application/json or similar. Nuff said.

AWS s3 asset handling

Fine. All assets have a 1 hr expiry. Fine for icons; not fine for more sensitive content (e.g. PDF of medical records). Quick and dirty would be use Expiry=Date.now(). Though, there’s the question of whether you’d wanna host that yourself or proxy s3 through your own servers for an additional layer of access control. As with all things, YMMV.

API

From an ‘evil’ (all the data are belong to us) business-side, I understand closed-APIs. API design reflects a company’s identity. Given that APIs are so easy to reverse engineer and in an API-first culture APIs are first class assets, design and architecture should not be ignored.²

Many end-points, many request per page view

I counted 23 collection-level end-points. This does not including resource end-points (i.e included: api/collection/, not included:api/collection/<id>) or multiple allowed methods (i.e. GET, POST, DELETE, etc. only counted once per collection).

From a mobile performance perspective, the downside is multiple round-trips per page load. For example, loading the main page requires 10 GET XHR round-trips. I’d consider this bloated, particularly for a mobile experience.

Nested end-points

The most deeply nested end-point was 4-levels of nesting deep: api/1/2/3/4/

No defined JSON spec, JSON schema or hypermedia links

The JSON packet structure looks self-consistent between all there end-points, but doesn’t seem to adhere to a well defined spec such as JSONAPI . Packets aren’t deeply nested, not completely flat (e.g. Ember REST adapter).

Also don’t have links to JSON schema, nor a links hash for self-discovery/introspection. HATEOAS isn’t for everyone and this is a private API, so fine on all counts.

Versioning

+1 for bothering to version a private API. Should make the shift to productionization easier down the line. Choose URL versioning scheme instead of custom request header or accept header:

https://<api>.<company>.com/v1.1/api/

With the 1.1, they may even be following semver.

Monolithic back-end likely

With all request going to the same server-side api, and no additional x-<blah> proxy-style headers, feels like a monolithic backend.

Looks like they proxy a few 3rd party API vendors, including HumanAPI and GoodRx. Good sign IMHO, means someone is at least marginally successful at keeping “Not invented here” at bay.

Performance

The app initiates more than 6 request on a load. This means some of the responses spend a lot of time (up to a second) in queue. Concerning myself with API, so only looking at XHR requests.⁴

+-------------------------------------------+
| wait(ms) receive w+r |
+-------------------------------------------+
| count 252.000000 252.000000 252.000000 |
| mean 121.224683 18.711679 139.936361 |
| std 139.071488 44.048733 140.497541 |
| min 0.000000 0.000000 0.000000 |
| 25% 35.099500 2.537000 47.089500 |
| 50% 100.152500 4.164500 111.145000 |
| 75% 144.142750 9.189750 168.644250 |
| max 988.830000 460.650000 996.554000 |
+-------------------------------------------+

Wait is Time to first byte (TTFB); receive is content download time.

To put this in perspective, Wikipedia’s median wait /receive times in ms are ~25/20 ms (no XHR); Facebook is 50/2ms (only XHR). So the API performance isn’t bad at all.

Odd: OPTION pre-flight for non-idempotent methods

Oddly, every GET request was preceded by an OPTION request (OPTION preflight is only required for non-idempotent methods like DELETE, POST and PATCH). I think this is an Angular issue, bless you Stack Overflow. Could cut their page load times by 1/4–1/3 by only doing an initial preflight for cross-site GET; back of the envelope guesstimation.

ToS (Terms of Service) and Privacy policy

Mostly standard boiler-plate, but I take a few issues here, namely that there is no brief summary that people may actually read. Your health information is important and potentially very damaging. Companies in the space absolutely must act in good faith.

PHI disclosures not available in-app

Allowable PHI disclosures are enumerated at length, but a (complete?) record of the actual disclosures is only available upon written request; not in-app. IMHO, unnecessary obfuscation + lack of transparency. I can see my security/action log on most major providers (Aside: I like GitHub’s take on this). I should be able to see all PHI disclosures in-app, at any time, as well as standing permissions/data sharing agreements. The later should be revocable by the client at any time.

Presumably old “30 day free trial” branding in on-boarding

This freaked me out a bit. There is no mention of the app being anything but “free” (gratis: free as in beer, not free as in freedom) for consumers. Digging into the ToC and the newer branding, I think this is an old marketing experiment they forgot to take out. Here’s hoping; guess I’ll find-out in 30 days.

Tangentially related, but any excuse will do.

Other curiosities

The App is for consumer educational use and convenience only

Erm…by choosing what to display the app will impact people’s choices. Maybe be a bit more honest?

Agree not to (i) reverse engineer, translate, copy or otherwise access…

Ha. Somebody’s gotta be watchdog. Do you really want an adversarial relationship with whitehats?

Our Right to Terminate: We may, at any time, in our sole discretionand for any or no reason and without notice to you, terminate your access to the App

So you want me to trust you will all my health information, but you can cut me off at any time? Ouch bro.

Marketing: Your PHI will not be sold, used or disclosed for marketing purposes without your authorisation except where permitted by law
Underwriting: We will not use or make availble genetic information about you for underwriting purposes

Both great. Both shouldn’t be buried in legalese.


Footnotes

[1]: Shameful admission. Had to check the man page on curl. Postman and SwaggerUI hath spoiled me rotten.

[2]: Assume folks, security conscious customers and prospective candidates will reverse engineer and kick the tyres on your API. I suggest leaving Easter eggs for them. Good places are some ASCII art in the web console, or a “sorry bud, but this non-public API when a request is blocked.”

[3]: If you’re curious, I discuss nesting within the context of consuming from EmberData here.

[4]: Heterogeneous end-points. Only XHR request, including GET, POST and DELETE from normal application use over about 20 page views. Pandas output is a little ugly, so thank you: https://senseful.github.io/web-tools/text-table/