April 2020 — End of month update

Published in

NativShark

26 min readApr 25, 2020

Hey everyone, Caleb here.

First, let’s just rip the band-aid right off: We’re not going to release Phase One of NativShark at the end of April as we had promised before.

That said, we have just put out a general stability update that takes advantage of much of the work we have done this past month. Those who have had issues with resetting passwords, creating accounts, or subscribing should no longer have these issues.

I’m not going to set another public deadline. We’ve removed the public timeline element on the roadmap page and are deeply thinking how to make a better one that is accurate and represents NativShark better. I know that what we need to do right now is just deliver the Phase One experience so that’s what we will do.

This is a very hard message for me to send to everyone but I know it’s the right call. What we have now isn’t working well enough to release and we want what we do launch to be quality. I can assure you that we are focused, and that it is coming soon.

Perhaps the message I sent to the rest of our team earlier this week sums it up best:

We’ve been through a lot and we’ve learned a lot this past month.

We learned that we were really bad at estimating how long things would take us to do. But from this process I think we’ve gotten better at estimating time to completion, and we’ve learned a lot about better ways to work as a team and operate our development and deployment strategies so that they suit the complexity of our system and its rapid growth.

Also, I’ve noticed everyone working extra hard the last two weeks and both Niko and I really appreciate that a lot. It’s been amazing to see, and I’m really glad we’ve all put so much into this and are doing something that we believe in, enjoy, and want to see through. Not only are we building an awesome product for the world, we’re also building a really special team. You’re all becoming NativShark experts.

With that said, when you’re working on something you love, it can be easy to accidentally push too hard. It can begin to feel like there’s no end in site and then the thing you loved can become a point of pain. We don’t want to see that happen and I fear we’re on the verge of team-wide burnout. So we’re gonna nip that right in the bud. I personally don’t like to push the team on overdrive for more than 2 weeks at a time and we’ve definitely hit the point. So to say the obvious that we all know, despite how much we’d like to, we cannot launch this week on April 25th. That’s just reality. We’ve got more work in front of us and we can’t keep pushing at this rate. So we need to reframe our deadlines and deliver a polished Phase One experience, which includes the admin panel working as we need it to for Phase One publishing.

While we do still need to keep working hard here until the Phase One experience is out, we need to lighten up a little bit. We need to return to better sleep schedules and work schedules.

Learning Japanese is a long term commitment. So is building a comprehensive platform for learning Japanese.

This isn’t something we started because it was easy. We started it because we know what it’s like to be a student and we all have a passion for learning.

Just like setting out to learn a language, we knew there would be bumps along the way. Still, it sucks when you actually hit one of those bumps. Like we have just now.

Like good students, however, we won’t let some setbacks break our will to succeed. We will find the parts of our estimates that were unrealistic, adjust them a bit, and keep pushing forward.

Reading this, you may be curious as to why, exactly, we’ve now pushed back the Phase One release again. This is going to be a long post. We want to give you as much context into what’s going on behind the scenes as we can and what it’s like to bring a project of this scale into life. We’re so close. Anyway. Taking a look at how some of our team members have spent the last few weeks should provide some insight…

The image Chie drew specially for the team to help keep our spirits high

Niko, President

In the first part of April, I finished writing the base content for the Phase One lessons. This meant preparing a 450+ page document with over 140,000 words teaching Japanese while also throttling the amount of new information students are exposed to (i.e. limiting sentences to 1 new word max; teaching kanji before vocab show up using them; not using any grammar until it has been introduced; etc.).

This would have been easier if we allowed ourselves to “cheat” and just make students study extra vocab flashcards when we wanted to use specific vocab in an upcoming lesson, dialogue, etc. Instead, we very rarely go over 7 new vocabulary cards / 5 new kanji cards per unit in the first 100 or so lessons, and 8 new vocabulary cards / 5 (very rarely, 6) new kanji cards after lesson 100 or so.

Official stats from the Phase One lesson word document. Now to transfer all of this into our content management system and add all the formatting…

+1 tracking

For the majority of April, I’ve spent most of my waking hours staring at our many spreadsheets.

Since our admin panel is not fully set up yet, I have to track the +1 iteration manually. This means that I need to record:

The unit in which each word appears for the first time. This includes each iteration of a word. If we have 美味しい, we also need to know that we have already used the same word without kanji, おいしい, because it is commonly written both ways.
When kanji have been introduced, and that vocab flashcards only show up after all of the kanji in them have been introduced. (There are a very small number of exceptions here. This one is tricky because the vocab flashcards also frequently double as the onyomi or kunyomi example of a kanji that shows up in a kanji flashcard. So, our spreadsheet also needs to know that (1) we have or have not taught a word in Phase One containing that reading of a particular kanji, and (2) we haven’t accidentally assigned more than one word as the onyomi or kunyomi example for the flashcard of a particular kanji. This is further complicated by the fact that a single word can be the reading example for multiple kanji. For example, 前髪 (bangs) could be the kunyomi example for both 前 (before) and 髪 (hair).
Duplicate sentences. In order to keep example dialogues in our lessons natural-sounding, and to teach all of the things we want to teach in what we think is the most effective way, we occasionally need to use the same exact sentence in multiple places in a phase. We need to watch out for when this happens so that students don’t get assigned the same flashcards all over again.
Which content slotted for Phase One remains to be introduced. In the past six months, we have had 4 or so dictionaries/reference books manually entered into spreadsheets so that we could be sure we weren’t missing any useful language. Rei pored over these lists, then ranked each of these entries by phase. So, for Phase One, for example, we determined that there were 16 onomatopoeic words we wanted to teach, along with 6 四字熟語 (4-kanji compound words) and 32 phrasal idioms. I had to make sure that all of these were taught in Phase One without breaking the +1 iteration, meaning I couldn’t use kanji or things like individual parts of an idiom (e.g. 赤 [red], the prefix 真-, and 嘘 [lie] and all three of those kanji are taught before the idiom 真っ赤な嘘 [complete lie; outright lie] is taught).

More adventures in Spreadsheet Land

Managing all of this in Google Sheets means using LOTS of formulas, separate cross-referenced sheets within a single workbook, and all kinds of conditional formatting rules.

This led to other problems, though:

You can only have 5 million cells in a Google Sheets workbook. I was repeatedly breaking this limit, which meant deciding when and where to move data to a separate workbook that could not be cross-referenced in formulas.
When you have millions of formulas being calculated over and over again, Google Sheets puts too much of a load on a person’s CPU, even if they have a pretty fantastic setup. So, I was playing a dancing game of disabling/removing various formulas and conditional formatting, then reapplying them and editing stuff in pieces or when other team members were sleeping or not using the workbook.
A program like Excel might have been another solution, but I can work a lot faster in Sheets (though for years I only used Excel). Also, most days we had 5–8 people looking at the workbook simultaneously, with about 4 of us actively editing cells. We’ve found Google Sheets to be more reliable and generate fewer conflicts than Microsoft products for this level of simultaneous work.

Screenshot from our master document. These are some of the dialogues at the end of Phase One. Once you’ve gone through the whole of Phase One you’ll have been exposed to and studied all of the Japanese you see here, before seeing it used in a real-life dialogue example.

Japanese audio checks/editing

In the latter part of the month, I’ve been checking and editing thousands of audio files.

In the past, we’ve hired one female and one male voice artist to record each sentence for a particular set of learning materials. We want NativShark to be better, though. Accordingly, even though we already had 2 recordings for a significant number of sentences in Phase One, we hired an additional 8 voice actors to record every sentence. This way students can get exposed to different people’s voices and pronunciation, which naturally vary from person to person (especially because we have them record at natural speed and not in a “textbook” or slowed/dumbed-down voice.

Long story short, I find myself needing to check, organize, and properly label around 25,000 sentence recordings. In this one week, I’ve gotten through just under 10,000 of them. This includes not only listening to each sentence but also making sure that the title of the sentence is the Japanese being read and the filename matches up with the unique slug in our system so that we can match it to that sentence and have it show up in NativShark. Which means, again, I’m spending a lot of time in spreadsheets. But even more time staring at the titles showing up in my audio player and making sure they match the words being said, to say nothing of cutting out errors the voice actors make, silencing background noise in places, deleting improperly recorded audio, and so on.

I sound like I’m complaining, but I actually love this part of the job because I’m fascinated with the way Japanese sounds, and I get lots of ideas for tools and lessons that we need to teach in the future. The work would be a complete indulgence if I didn’t have to do it quickly.

Once all that audio stuff is done, native speakers and I will need to sift through it and choose which male and female voice we want showing up for each individual sentence. So, 25,000 sentences, Round 2!

NativShark usage guides and whatnot

I also wrote several of these posts/articles this month as well, and there are several more in the works. I almost forgot to mention this job because I can write a few rough drafts of these a day, which feels like light speed compared the monstrous tasks described above.

All in all, I’m repeatedly surprised at just how huge of an undertaking the building of NativShark is. I can’t help but get a little bit excited every time that happens, though.

You can’t take a monstrous thing like an entire language and make it simple for people to learn without putting in a monstrous amount of effort.

I get so thrilled thinking about how new students of Japanese will now learn the language so much faster than I was able to. It’s a great feeling.

Screenshot of the way dialogues appear in your “Study Now” track, at the end of each unit. The dialogues use incredibly natural Japanese, casual when it should be, formal when it should be.

Ty, Editor

One thing I’ve learned: doing things right takes a very long time. And it’s harder than it looks to estimate that time. I also know that I don’t want to sacrifice quality so we can get something out faster.

At the start of the month, I was full-focus on finishing up my last edit run through the kanji. Doing so took several days, at the pace of about 400 kanji elements and mnemonics checked a day.

And that was going fast so I could keep up pace with the speed of everything else happening.

Thing is though, the faster you go when you edit, the more you miss. So I had to tell myself to slow down so I could read everything and actually catch typos / weird phrasings.

Editing is a very precise job, and I’ve been learning that more and more every day. All I can do is keep thinking about ways to be efficient, and keep practicing my own writing so I can deliver even better content through NativShark.

Another precise job is doing quality translations, especially when you know students are going to be using these to learn. It’s a whole new level of “I have to make sure this is perfect,” and the content team and I want to make sure that every single translation is high quality.

So there’s always discussions going on among the content team about things like “does this translation capture the nuance of the Japanese? Do we need to add a note to this, and if we do, did we do a good enough job with the translation to begin with?”

These discussion can take a while once we add up all the sentences and dialogues that are going into Phase One.

We have similar discussions on the lesson in general, too:

Does this explanation do a good enough job of explaining this concept? Is it precise enough to explain it well and not too long-winded as to bore readers? Is the lesson as a whole fun to read or is it slow and boring to get through? Do the example sentences properly show off good ways to use this grammar / main point? Is everything easy to digest? Do we stay on topic or are we side-tracking every other paragraph? Etc. etc.

I’m also, at the time of writing this, editing through lessons now that the admin site for that is up and running (after a long fight to get that thing working, which Jacob will explain shortly). There are bugs which I take time to report in detail to the devs, and suggestions on how we can improve the editor to better workflow when going through it. They’re really good at getting stuff fixed and implemented, so thank you devs!

*What the admin side editor looks like in action. It allows us to write out lessons with our formatting happening in real time.*

All things considered, I can get through usually up to 10 lessons per workday if I only focus on editing for that day. (There are 166 lessons to go through.)

Though I’ve slowed this down a bit due to needing to focus on content, I also spend the start of my mornings checking the discord and answering questions where I can. Our community is great so I basically never have to worry about stepping into a situation as a mod, and can just do so as a fellow community member. So I’m grateful to all the community members and mods there who allow me to do that. Thank you.

In addition to that, I’ve been writing use guides / explanations for NativShark and thoroughly testing the site while doing so. So while doing that, I was trying my best to break the site as much as I could. And oh boy did I. I’m sure the devs love me for it, too.

Rough list of things done in April (this is just the main points):

At the start of April I was finishing editing through everything I wrote for the kanji. So I went through almost 400 kanji elements and mnemonics a day, spending the entire day on it until it was done. Didn’t even stop to check discord.
Checked and discussed all 165 Dialogue translations with the content team
Translating takes longer than one might initially think. It’s not just understanding the sentence and throwing it into English. There is a lot of discussion on how exactly to word our English translations to convey as much of the nuance the Japanese has as possible, and there is a LOT of back and forth over so many of the sentences. We’re making these translations as absolutely accurate as we can so they can be the best they can be.
Reading/modding all of Discord (almost) every time I start the day.
The search for more native materials that I can get onto the list continues. Got some suggestions from community members too, so thanks!
Wrote/outlined “Using NativShark” articles with Niko which will be coming with Phase One. While doing so I also took the time to….
…do some in-depth testing of our tools throughout the site. I caught and reported many bugs that we’ll be fixing up, along with QoL changes. It also led to discussions on design updates that we can do to make the experience smoother, easier to use, and overall better.
Discussed our study streaks in relation to new art that Chie drew for them with Caleb. The numbers are pretty different from the first time we showed them off, but we think the new numbers will feel better when studying.
Lots of discussion with the content team about kanji flashcard design and function. And I mean lots.
Community updates and ideas for down the road with Chengaiz. We’ve put a slight pause on them so we can both focus on the mountain of tasks left for Phase One though.
Edited over all of our kanji readings for Phase One so make sure you blame me if something slipped by.
When the editor was released, we found some bugs that slowed our devs down because they were vital to fix.
At the time of writing this, my life is editing and I love it.
Also finding bugs, another enjoyable pastime of mine.
Also constantly thinking of QoL and design suggestions for both sides of the system.

A personal look at this month in general:

I still have a lot of learning to do in a lot of areas. I have a way to go yet in regards to learning and using Japanese, and I mean it when I say it too, I’m not just being humble. That’s part of the reason I’m so excited to NativShark to be out. It teaches natural language so there’s not a disconnect between Japanese used by actual people and what I’ll be studying.

My writing is something I’m working on, too. For example, I spent a good while writing up a “Using NativShark” article and asked Niko to give it a look, and he gave me great feedback… which I used to rewrite the whole thing. I feel a lot better about what I produced with Niko’s feedback on this particular article, but wish I got it right the first time.

But I don’t let those things discourage me. As Niko told me in that PDF almost 5 years ago, I just need to show up and keep trying. Even if it doesn’t go fantastic every day, I’ve still gotten further than someone who didn’t show up at all. Just keep swimming.

I’ve carried that advice with me ever since I read it, and believe it applies to our work at NativShark too.

While we sadly can’t release Phase One quite yet, we’re closer than we have ever been, and progress is made every day. I’m honored to be on a team with as much dedication, drive, and ability to find solutions as this one. We won’t stop swimming.

Jacob, CTO

This last month seemed like both the longest month and the shortest month I’ve ever experienced. On top of that, the COVID-19 situation makes every day feel like it takes 10 years to pass.

Looking at the past month in the team activity log, it’s apparent that I’ve actually been able to accomplish quite a bit. Most of what was done does not affect the students, nor will they see it, but some things (such as the Dictionary service) will likely make its way into their hands in the future for things like custom flashcards.

Before writing about issues faced in the codebase, I want to talk about general development issues that were faced.

The Commanded library

Commanded is an Elixir library for event sourcing that comes with some auxiliary libraries to make implementing this system easier. While event sourcing is a great concept, it’s only a great concept when the following hold true:

It works without any issues
All developers need to understand the errors that occur and be able to fix them, otherwise their development environment is broken

For #1 above, we started to notice various bug reports coming in from the community that didn’t make sense to us. Things such as being unable to change passwords, save various parts of their profile, the billing information not updating correctly, etc. etc. When looking at our logs, we saw that these events were being dispatched in the system successfully, so what was going wrong? As it turns out, a few of our projectors were stuck. Projectors have the job of processing events and updating the database based on those events. What this means is that when a projector is stuck, the database never gets updated.

For #2, we would run into issues where developers would encounter an error on their end which was the result of events failing to process correctly on the server. Due to the nature of event sourcing, when they would restart their development environment, it would try to replay those events, crashing their environment. Due to the distributed nature of our team, this meant that developers were often unable to proceed with their work until myself or Manuel were awake and able to fix the server. This led to lost time and frustration.

In the end, we removed all event sourcing from our system. I have looked into how to better use events in the future and how we can decouple our services while maintaining stability. Although none of these new approaches have been tested or implemented yet, the system is being written in a way where it’ll be relatively easy to decouple in the future.

The lack of a staging server

The title here is a bit misleading, it’s not that we didn’t have a staging server, it’s moreso that the entire development team was too busy trying to write code to hit a deadline that we didn’t have time to change our deployment process to continue having a staging server. In addition to this, there wasn’t a single team member who had enough time to be able to test the new things being finished because everyone was scrambling to complete a mountain of work in time for the deadline.

Once it was reported that the admin site was functional enough for lessons to be created, we decided to deploy it. We ran into countless errors when deploying the site because of some core architectural changes that were made to the codebase, which I’ll discuss now.

Separate projects sharing components

Our frontend is written in React. React is marketed as a JavaScript library for building user interfaces, which is pretty succinct. In summary, it allows you to break down a large, complex experience into small components that can be composed together to make something special. When we thought about the admin site, it didn’t make sense to bundle it together with the student-facing site for a couple of reasons:

There will likely be different security requirements for the admin site such as wanting to deploy it behind a firewall, restricting access points, randomizing the access URL, etc., etc.
There will often be times where changes are pushed that affect the admin site that do not affect the student site and vice-versa. We want to avoid increasing the payload size of either of these sites unnecessarily as well as avoid re-deploying one when only the other has changed

After deciding that we needed to separate these sites into two separate entities, we decided that we needed to restructure the project in a way that allowed common components to be shared across both sites. We shouldn’t have to re-code a button on the admin site if it already exists on the student site, for example. This led us down the path of using something called Lerna to manage our frontend applications. I won’t go into what Lerna is, but you can think of it as a system that understands what each sub-project needs and intelligently shares the components between the two.

Lerna is amazing!… While developing

A goofy example of building a component with the help of Lerna packages :)

With Lerna in place, our frontend developers were once again up and running, and now were able to create components that were shared across both sites. This was great and resulted in improved productivity generally across that group. Everything was great until we wanted to deploy the site, where a few things went wrong:

The way Lerna manages your code is different to how anything assumes your code is managed
All of our deployment pipelines and hosting that we chose had no idea how to handle a Lerna project

This meant that we needed to redo our entire deployment pipeline while under the pressure of looming deadlines. Normally this would be pretty bad, but it was amplified by the fact that our entire content team was basically unable to do their job now because they needed the admin site to continue doing their job. This led to myself and Ben putting in roughly 36–38 hours over the span of two days to get this system back in place. This was probably one of the most frustrating things I’ve experienced as a developer.

If you haven’t had to work on a deployment pipeline, it’s frustrating because it takes forever. In general, the steps go like this:

Make a tweak to the CI/CD (continuous integration/continuous deployment) script
Push the new code into source control
Wait 10–15 minutes for it to build and try to run your code
Get errors and attempt to fix them locally
Go back to step 1 and repeat ad-infinitum until you have a working system

In the end, we got the new pipeline setup and discovered a couple of other issues with our staging server that we were able to resolve. While that time wasn’t enjoyable, it ended up resulting in a staging server that is more performant than the current production server. This means that we’ll also be changing where our production site is deployed which comes with a ton of other issues (looking at you NihongoShark.com redirects).

Time to evaluate the various services that encountered issues last month and how they were fixed.

The dictionary service

As one might expect, this service (along with the SRS service) are probably the two most complicated systems that we have at this point. The progression service will likely also land in this group once it’s finished, though not enough progress has been made on that yet to be able to say for sure. I’m going to break down some of the problems that were encountered by this service.

Problem #1 — Mecab has no idea what it received

This was one of the first problems that we encountered, specifically when Jannis tried to break a sentence down to where every individual kana/kanji was its own word. When Mecab received things such as ー or ゃ, it was unable to do anything with it and was returning back a blank results set.

This edge case was unaccounted for and thus resulted in the server erroring out. As mentioned earlier, this meant that our developers were stuck because their development environments were crashing. The fix for this was rather quick, but it meant that I had to refactor how the service worked and write more tests to ensure that this error wouldn’t happen again. I coordinated with Jannis, who was the primary frontend developer using this service to figure out how best to approach situations like this, and we came up with a solution that basically meant that our system would create new words out of nothingness. This works quite well and leads to general stability in the system overall.

Problem #2 — Special characters and numbers

This was an interesting problem that is a combination of something that was fixed this month and something that was fixed last month. To break this down even more, we’ll first look at numbers.

Numbers and Furigana

After searching high and low, I could not find an easy algorithm, solution, or library to import that would automatically generate furigana for an arbitrarily-large number in Japanese. What this means is that NativShark now has a custom-built library that will automatically generate furigana for an arbitrarily-large number in Japanese.

What’s curious about Japanese is how the numbers are logically grouped differently than they are in English. To talk about this, we’ll look at a couple of examples:

English: 1,000 — Japanese: 千
So far nothing too horrendous, this is a pretty logical mapping, until we go to 10,000…
English: 10,000 — Japanese: １万
Yeah, I know. If anyone is unfamiliar with Japanese they probably would have assumed something like 十千, which isn’t a thing.

The above is a really small example, but it gets quite confusing when you get into higher numbers, such as 938,535,235,831 (which is, by the way: きゅうせんさんびゃくはちじゅうごおくさんぜんごひゃくにじゅうさんまんごせんはっぴゃくさんじゅういち). Not only was this slightly brain-melting to figure out by hand, coding it was even worse.

Needless to say, we now have a system that is tested and works. Moving onto symbols…

Symbols and what is (I)?

In our earlier lessons in our platform, we often will include things such as “(I)”, or “(their)” to bring attention to the fact that these common words in English are dropped in Japanese. Mecab takes this and splits it up into the following (only after adding the neologd dictionary):

(
I
)

It’s probably obvious why this isn’t helpful. We don’t want ( as a word in our dictionary, that’s not helping anyone. This was a relatively quick fix wherein the system is able to logically understand that although Mecab gave us back 3 words, we know that we are only asking for a single word so it knows to combine them. The only thing that was left to do was to structure the “reading” of this so that the frontend was able to correctly display it in an intelligent way.

The above issues weren’t too bad, but they were unforeseen things that came up during actual use.

SRS, flashcards, and new friends

It’s hard to express how complex this system is. While it’s possible that others will suggest easier ways to handle this (and there are easier ways to handle this), those ways are also generally less flexible than what we have. Timezones are hard, allowing users to specifically state when they want their “study day” to reset and have that respect their timezone is hard, and having the system understand and respect all of these configurations is even harder.

Complexities with old friends

While the SRS and Flashcard services were happy with their only friend, User Flashcards, they were also complicated. The SQL queries were pretty intense and required writing custom database functions to simplify the code in other places. These functions had to account for the database server timezone, the user’s timezone, the fact that we store everything in the database in the UTC timezone, converting between timezones for any arbitrary date AND time, and other things.

This led to a friendship like most others — a tight circle that could let other friends in, but not without conflicts within the group. It wasn’t until the Flashcard service made a new friend, Kanji Flashcards, that the group’s complexities were tested, broken, and resolved.

Kanji flashcards want to join the group

Kanji Flashcards started integrating themselves into the system easily enough. The only big changes that needed to be made was updating the GraphQL API for the flashcard queries and mutations to understand that a flashcard could now either be a UserFlashcard or a KanjiFlashcard. This took roughly half of a day of coding to do along with updating all of the server tests and ensuring the entire test suite still passed. The good news was that this worked! The test suite passed, the server was happy, and the Flashcards service welcomed in Kanji Flashcards with open arms. The problems started when the Flashcards service wanted to introduce its new friend to the SRS service.

If you’ve done much with databases, you know that it can be complex to join tables across null values, compare null values, try not to miss any rows because of null values, etc. etc. I’ll be the first to say that I am not a SQL expert, nor am I a database expert, and I fully expect to dive deep into the workings of SQL, joins, and handling null values gracefully as soon as time permits. With that stated, it’s probably obvious that there were a lot of problems that were encountered.

Without diving too much into this, basically getting the SRS system to work with kanji flashcards took another day and exposed frustrations with our system. Ecto (the Elixir library that we use to talk to the database) was throwing random errors that experts couldn’t figure out. Postgrex (the underlying service that Ecto uses to directly talk to the database) wasn’t happy. And our test cases and environments were complex and time-consuming to debug.

The result in the end was to side-step everything and use SQL directly to solve my problem. The only issue is that this has to touch roughly 5 separate queries across the system required during various actions (give me all of my archived flashcards, give me my unarchived flashcards, give me my next set of flashcards for a given deck, give me how many flashcards I have learned today from a given deck, among some others that I can’t remember right now). While the entire test suite has passed, I am still not sure if it works 100%, but user testing will find all of the edge cases, resulting in new tests and a stronger test suite.

General technology concerns and discoveries

During the development to this point, as the system has grown in complexity, it’s become obvious that there are some shortcomings to our selected technology stack. Ecto has some issues, Commanded had issues under load (and has since been removed from the project), various domains need to talk to various other domains and are currently dependent on each other, most cloud services do not support Elixir as a first-class citizen, etc. It has also become obvious that we need to start employing design and architectural patterns to give us flexibility and adaptability moving forward.

Elixir and alternatives

One positive thing with our current architecture is that, without much more work, we can slowly split our system up to explore alternatives if we want. We do not need to rewrite the entire system at once because we have a clean separation in Elixir which allows us to proxy a single layer out to some other service. I am exploring alternatives while discussing them with Caleb and the development team and am keeping the following things in mind:

The server must remain in a functional language
Once you start developing in a functional style, it becomes hard to go back to any other method. The code is too clean, too reusable, and way easier to test.
The chosen technology must be able to handle the complexity of the project
Everything looks golden on the surface and do not show their ugly side until you hit a project with a complexity such as ours
The current server developers must be able to quickly adapt to the language, and new developers should be able to be trained
The current server developers are Manuel and myself, and whatever we choose to do we both will need to maintain documentation to help get new developers up to speed
IF we change, we must upgrade and not downgrade
Currently this means that the system needs to have a rich ecosystem, should probably have a strong type system, and should not sacrifice performance

With those in mind, the current forerunner is Scala.

Patterns and architecture are important

When working on smaller projects, it’s easy to overlook design patterns and general architecture guidelines. Usually things like that are overkill. Our project is now the most complex thing I have ever worked on and is only growing in complexity every day. This means that it is now time to look for patterns, refactor the code to make those patterns easier to replicate, and stay informed.

When pulling Commanded out of the project, I discovered a pattern that I thought gave us good flexibility and allowed us to easily separate the concerns of our business logic with the restrictions of the database. This started with the Kanji domain and made that code some of the cleanest in the system. There’s still some issues with it that should be refactored (the error-handling portion of every function looks the same now with some slight differences, this can be cleaned up).

I really want to talk with Manuel and, together, find books about patterns and architectures that fit our server and our system. Ideally the frontend engineers would do the same on their half of the system so that all of our code is not only easy to understand, but also easy to contribute to and flexible to pivot when necessary to solve any problem that comes up.

One last reflection, personally

It’s crazy to go from where I was to the CTO of this company with 4 other developers working alongside me to try and change the world of language education. Our project is insane, our goal is insane, and our potential is limitless. I have no doubt that we will create what we set out to do, and I can’t see a future where this doesn’t succeed. Our content is too good, our system is too good, and our passion is far more than what I’ve seen from any other team that I’ve been a part of. It’s on us to make sure that our execution matches everything else. Everything we make must be with the best-possible user experience in mind. Everything must feel good, every interaction must be flawless, every reasonable device must work. That is our only job.

With this dump of thoughts and analysis, I am signing off for the day, eager to jump back into the project and keep pushing forward towards our bright future.

In short…

…we’re hard at work.

And every decision we make is with you, our user and student, at the front of our minds.

Thank you for joining us on this journey and thanks for reading.

Talk to you again soon.

~ Caleb