Choosing your start-up tech stack

Horatiu Jeflea
7 min readDec 20, 2018

--

https://unsplash.com/photos/fEKMMV0LF2o

You have a startup business idea, finance is in place, now let’s look at the tech a bit. This is and should be different from an enterprise software/company approach, as you want to be careful with unnecessary costs but also make your app available regardless of traffic spikes and always keep latency sub-second. A slow loading app which users are not acquainted to will definitely scare them away. Let’s highlight these traits:

  • Pay what you consume only 💰
  • Handle traffic spikes and idle time automatically 📈
  • Keep latency low (<100ms) 🏎

Let’s add a few more:

  • Take security seriously 👮
  • Ease of setup, maintenance and development 🏌

I’m going to suggest an architecture and technologies which will handle all of the above. Each app should be tailored according to its specific needs, so I wouldn’t do exactly this approach every time, but take what works best.

Technologies

Infrastructure

Choosing a cloud provider instead of an on-premise deployment has its benefits for obvious reasons.

Top pick: AWS for its stability and range of services.

Alternative: Azure if you are leaning towards Microsoft. Google Cloud Platform should be used with care, as some services are great, others are not that stable compared to other cloud providers. Will go into detail later.

Computing

There are 2 approaches here: microservices and FaaS. For starting out I recommend FaaS as it’s easier to maintain. I will go on with FaaS in this article but suggest a good microservices architecture at the end.

Top pick: AWS Lambda 💰 📈🏌with Node 🏎. Using Node the cold starts will be reduced to a minimum. Also javascript is the de-facto FaaS language, so it’s easier when searching for tutorials/documentation. Stability and consistency is pretty good, also ease of deployment (check serverless).

Alternative: Google Cloud Functions 💰 📈🏌 which I don’t necessarily recommend as in my tests, cold starts are above 1 second.

Programming Language

This should be chosen based on the needs of the app. Considering FaaS, Node is definitely the way to go. If you are working on AI, then use Python. If Big Data with Spark, then Scala and so on. Performance, functionality, features and documentation differ very much between languages.

There is indeed an overhead of learning that particular programming language, but in the end you are better of investing a bit of time at the beginning and doing things right, then having to migrate your whole codebase a bit later because you have chosen the language out of commodity.

Database

This can be SQL or NoSQL, depending on the app and your preference. What should be considered here is cold start and auto-scaling.

NoSQL Top Pick: DynamoDB 💰📈🏎🏌 as it now supports pay-on-demand, transactions and it is easy to integrate with Lambda and other AWS services.

NoSQL Alternative: Any NoSQL which is has a cloud/serverless offer with auto-scaling. A very good one is FaunaDB 💰📈🏎🏌cloud. Is offers ACID transactions, strong read consistency and very good indexing. Google Firestore 💰📈🏌is very similar but it’s currently in beta and does not have good stability yet.

NoSQL Worth mentioning: MongoDB Atlas 🏎🏌 offers Mongo as a service, but it does not auto-scale and you pay per nodes, not on demand.

SQL Top Pick: Citus Cloud 🏎🏌 as it supports horizontal scaling and also all the things you expect from SQL. The downside is the scaling part, where you pay per nodes and have to scale manually.

SQL Alternative: Google Cloud Spanner 🏎🏌 promises a lot, but I would use it only when hitting higher traffic as it’s pretty costly. Scaling is manual but seems easy to adjust.

SQL Worth mentioning: AWS Aurora Serverless 💰📈🏌 which seems like a holy grail but at this point it still needs improvements in speed and features. For example the Data API is in beta and takes 200ms for each call. Data (HTTP) API is a must, as accessing it directly from a lambda will require it to be in a VPC with will add 10 seconds to cold start time.

Storage

Storing in the cloud makes more sense as objects/files are replicated and highly available (regardless of traffic).

Top Pick: Amazon S3 💰📈🏎🏌

Analytics

https://unsplash.com/photos/unRkg2jH1j0

Having an analytics pipeline, such as Redshift, is costly (maintenance and costs). I recommend using at first a 3rd party and much later do it inside your private cloud. They provide an API where you can add events and a dashboard where you can make reports and chart, either with SQL or a visual tool.

Top Pick: MixPanel 💰📈🏎🏌, 5 million free events per month

Alternative: Amplitude 💰📈🏎🏌, 10 million free events per month

The downside is the cost, which after the free tier will be very costly. At this point it may be fair to consider a self hosted analytics cluster (Redshift for example).

Regarding web/mobile analytics, there are providers which are specialised: HotJar, Heap Analytics. They might be a better fit.

Web Hosting

In order to serve web pages (dynamic or static), you need CDN, DNS and storage.

Storage Top Pick: Amazon S3 💰📈🏎🏌. Just create a bucket, set it as Static Website hosting and it’s good to go.

CDN, DNS Top Pick: Cloudflare💰📈🏎👮🏌 handles these very nicely, intuitively and with a nice UI. Pricing is flat ($20 or $ 200 depending on plan) and has a free version which can accommodate many use cases. It also offers a free shared SSL certificate so you can benefit from HTTPS without extra costs.

CDN, DNS Alternative: AWS Route 53 💰📈🏎👮🏌and CloudFront💰📈🏎👮🏌. They offer mostly the same thing as Cloudflare but have a clunkier UI and they are harder to setup.

Web Frameworks

We have the web hosting figured out, the only thing left to do is create a web app and deploy it to the S3 bucket. My recommendation here is to use a Javascript framework - a popular one (React, Angular …). It’s better to stay clear of solutions which translates the code to a different language (GWT, Dart, …). Again, these are solutions chosen out of commodity which will sacrifice performance and functionality.

Streaming/Queues/Messages

Message driven applications are very useful being decoupled, benefiting retries, prioritisation, adjusted rate. I would recommend a solution for which you pay on demand and scales almost infinitely so I would go with AWS SNS 💰📈🏎🏌and SQS 💰📈🏎🏌. Go with those instead of Kinesis 🏎 or Managed Kafka 🏎because of the ease of maintenance, costs and auto-scaling.

Auth

Authorisation and authentication, having high security risk, are recommended to be handled by an external service.

Top Pick: Auth0 💰📈🏎👮🏌, being easy to setup and maintain. Also has many features. Although the cost is based on regular active users, it can get quite high with higher traffic. For costs I would recommend the alternative.

Alternative: AWS Cognito💰📈🏎👮, which is comprehensive, but it’s not quite a pleasure to work with. It’s cheaper then Auth0. For example, the free tier contains 50000 users and the next 50000 are 275$. Auth0 free tier has 7000 users and 50000 extra users costs 850$.

API

There are two things to consider when creating an API: protocol and clients. Most used protocols are usually REST and GraphQL. I’ll go just over REST in this article. Second, it’s important to know who the client is: internal or external. If it’s internal (for example for a web app or mobile app), you can have a more relaxed documentation and authorisation. But if it’s external it needs to be documented properly and handled with more care.

Top pick: AWS API Gateway 💰📈🏎👮 which can be configured by serverless framework🏌. This is the easiest way to setup the API, also being scalable and cost effective.

Big Data & ML

In order to run ad-hoc queries on your data, I recommend using the combination S3 + ORC + AWS Athena💰📈🏌 It is serverless so it’s much easier to maintain and use in comparison with Hadoop, even with AWS EMR🏌.

Regarding ML, your particular need may already have a managed service which is provided by AWS or Google Cloud Platform. For example extracting items from a photo with Google Cloud Vision💰📈🏎🏌. If not, AWS Sagemaker💰📈🏌 can help you train your model and also providing a bit of help. No need to buy graphics card for this.

Managed service providers

As a rule of thumb, try to rely as much as possible on external service providers. It will speed things up and may even be cost effective. If not, you can do an in-house implementation later on. Example of such cases:

  • Logs: Dashbird.io 💰📈🏎🏌 which supports full text search, analytics and more
  • Cache: Redis Labs 💰📈🏎🏌offers managed Redis, which is autoscaling. Using AWS Elasticache will require lambdas in VPC which will set the cold start from 50ms (Redis Labs) to 10sec
  • Search: use Elastic Cloud 🏎🏌with same reason as caching
  • SMS/Calls: Twilio, Nexmo, Plivo 💰🏎🏌 are most popular telephony providers. They are managed with only API calls and webhooks.
  • Mails: SendGrid💰📈🏎🏌 can help with email campaigns. Doing them on your own can have a higher risk on getting into spam.

Examples

https://unsplash.com/photos/oqStl2L5oxI

Example 1, fully 💰📈🏎👮🏌, Restaurants on Maps App

  • AWS Lambda with Node.js 8, 1.5Gb memory
  • DynamoDB or FaunaDB in case of multiple indexes and strong consistent read on them
  • AWS S3 storage and webhosting, with Cloudflare for DNS/CDN
  • Mixpanel Analytics
  • SNS, SQS
  • Auth0 (< 20000 active users app)
  • API gateway
  • Dashbird.io for logs and SendGrid for mails
  • Serverless framework or Amplify for structure and deployment
  • MapBox for maps

Example 2, 🏎👮 and benefiting SQL, Financial App

  • Microservices running on Google Kubernetes
  • Node.js (with Express, Docker) or Java 8 (Spring, Spring Boot, Spring MVC, Docker)
  • Google Cloud Spanner
  • Google Memorystore for caching
  • Google Cloud Pub/Sub
  • Apigee

Example 3, 🏎👮🏌, Large Ride Hailing App

  • AWS Lambda with Node.js 8, 1.5Gb memory
  • Queries can get complex and also require strong consistency, so Citus Cloud
  • Redshift for analytics
  • AWS Cognito

Conclusion

These technologies and stacks will keep the costs low when idle, handle unexpected bursts correctly and are easy to maintain and setup. Another example will be implemented on https://github.com/horatiujeflea/quote-management.

Still having issues with latency or choosing the right service? Say hi on AngelList, LinkedIn or mail.

--

--

Horatiu Jeflea

Full Stack Engineer || Software Contractor (Python, Java, Node, React)