Step 1: scratch my own itch

Julian Harris
Knowcast
Published in
7 min readJun 20, 2023

The best kind of problem to solve is one you personally have.

How do you design a great product? I forget who said this but I took it to heart:

“Know your customers better than anyone else and go with your gut”

This is why being the customer yourself is an ideal starting point: you have some direct working knowledge of the domain you’re trying to design a solution for. It’s not 100% risk free but generally considered a pretty good way to reduce “requirements risks”.

So this is me: I routinely listen to audio when I’m walking or cycling, preferably something I can learn from.

I thought: here are a few things I’d like to listen to and take notes from when I’m out and about (the phrase “on the go” is what I settled on later).

  • Listening to podcasts. Can’t take notes though. Hopeless.
  • Listening to web page content and take notes. I’d start reading an article maybe over breakfast, and want to continue with all that dead time when I was travelling or exercising or doing chores (these three core activities were cemented with field research later on). No easy way to do this. There was one app, Voicedream, with beautiful speech, but the flow of “choose an article from a web page on mobile, listen to it and take notes” was unnecessarily complex and frustrating to use. (Later on, Instapaper and Pocket both launched audio players. More about that I promise).
  • Listen to PDFs and take notes. The latest IPCC report came out on the climate crisis. It was 212 pages. I could probably listen to that over a series of long walks but I’m not going to sit down and read it.

What about YouTube and Audiobooks?

I deliberately restricted consumption of content that was

  • Widely used
  • Free to access

This ruled out:

  • Audiobooks and ebooks as Amazon has that pretty much wrapped up.
  • YouTube content as this is proprietary with no sanctioned method for accessing the material

I remained confident though that the free sources would still be a viable play. And there are always API partnerships in future.

Finding the jobs to be done

In the product design world, this is the early stage of what is called “finding the jobs to be done”.

Later on in the journey, with field research from over a dozen people across the world, I settled on this:

  • “I’m an affluent, well-educated knowledge professional”
  • “My job to be done is to better use my time while I’m on the go”

Looking at the three scenarios I mentioned (web pages, PDFs, podcasts), I chose web page content because technically it was by far the easiest, and I had an idea of how to build this.

Prototype 1: web page note taking

Lisn was my first product name idea. It was just me, untested, and quickly fell by the wayside as I wanted the emphasis to be on learning, not listening

Product design ideal: you run a series of quick experiments to validate the idea and you optimise for “time to insight”.

The product design I did: I taught myself just enough tech to get something going, and personally tried it in the field.

What did it look like?

This is a demo of the first prototype of Knowcast I built in March 2021. It still has features that are missing from apps today (June 2023) such as effortless sentence-level highlighting.

It was an Android app that allowed you do this:

  • Browse the web normally using whatever browser or app you wanted (e.g. Flipboard etc)
  • Share article to my app “lisn”
  • Listen to the web particle with a highly engaging voice that you could then tap, sentence by sentence, on the pieces you liked
  • Later you could revisit your favourite parts by filtering just the highlights.

What insights did prototype v1 bring?

  • An ultra-low-noise consumption experience helps focus, cognitive load and consumption quality: having just the text from the web page, broken into sentences, was a beautiful thing. I found I actually started preferring to read an article using lisn because it was so much easier to read than a cluttered web page.
  • Sentence-level highlighting and chunking: 500% more efficient. having the basic unit of value being a sentence really worked. I calcualted that tapping on two sentences that you find interesting by tapping twice is subsecond but tapping and dragging on those same sentences to highlight them is several seconds. This is an “inner loop” activity: my usability background underscored how massive productivity gains can come from optimising frequent inner loops and this definitely one of them.
  • Audio playback of web content works well. In 2020, most text to speech systems were unlistenable for longer passages. Amazon Polly Newscaster was really the first generally-available service that worked. I alternated voices too — I personally found an occasional change in the presenter voice helped engagement and think there’s real merit in a sort of “tag team” approach to voice presenter variety, a bit like dual news anchors on TV.

I was inspired by the insights I captured. Despite it taking several months, most of that was tech ramp-up, learning a stack of new stuff, finding some freelance and experimenting with various wasy to reliably get good quality web article content to the app.

Lesson learned: be very clear about outcomes when building prototypes

I learned a stack building and testing the web page audio POC. It created a number of inventions and gave us a lot of groundwork for future web page building that likely wouldn’t need to change much. It created new insights into what would be possible that just wasn’t before.

But did I build it in the most effective way possible? I was pretty focused on building something that would do what I wanted, but honestly, the first build took longer because I had the “quality” and “speed” parameters set a little closer to “product engineering” than “prototype for rapid insight”. I wanted a beautiful product. It looked great — but probably I could’ve built something that produced the same insights in 1/3rd the time. That would’ve translated to as much as 4–6 weeks of calendar time saved, potentially.

It’s hard when working on truly new ideas

When working with novel ideas, truly new inventions with new behaviours, it’s incredibly hard to strike the right balance between presenting a new solution that inspires an “aha!” moment, and designing incremental solutions to well-articulated problems, to avoid the “faster horses” problem. Steve Jobs said

“It’s really hard to design products by focus groups. A lot of times, people don’t know what they want until you show it to them”

What next?

The insights drove confidence but at the same time I knew if I really wanted this solution to work, I needed to test it with others. And that opened a whole can of worms.

Read next:

Knowcast technology: under the hood

Tech moves fast so it’s important to timestamp this as how I would’ve approached it today technically is very different to only 2.5 years ago.

Insights: AWS Lambda

“Serverless” was a Big Thing around 2020/2021 and I drank some of that cool-aid. One headache was the limitations of Lambda back then. Some have gone away but some are still a problem:

  • Limited memory (this has since gone up)
  • Limited access to disk space (this has gone up)
  • Limited control over latency (startup times could be slow; reserved lambda thingies have helped)
  • IO: by having little self-contained pieces they’re attractive but you do end up with a lot of bytes going in and out if you’re not careful and AWS is legendary for its eye-watering “dark costs” like data transfer fees.

Insights: Python is where it’s at

Back-end text processing / machine learning / NLP is all Python. There’s no contest, Python is the language to use. It’s where the libraries are at.

Insights: Google Flutter

  • Dart. Flutter uses its own language, Dart. Dart is nice but it’s not used anywhere else so the skillset marketplace is intrinsically smaller. And Flutter back then (2020) wasn’t nearly as mature as the primary competing cross-platform mobile framework, React Native. This introduces friction that’s hard to anticipate. E.g. Receive_Sharing_Intent at the time had a few bugs in it that were getting in the way of what I was doing.
  • State management is a PITA in Flutter. Flutter in its core has no state. It’s left as an exercise for the developer. As a result there are half a dozen ways to manage state in Flutter (Bloc, Provider, Riverpod just to name a few). To be honest, state management isn’t any better in React, but at least React ostensibly has a “built-in” approach that can be overridden.

Libraries / Services I used

Sentence-splitter: intelligent, fast, and low-resource splitting of sentences.

Trafilatura: amazing url-to-text content. This is still the fastest and best way to extract the core text from a web page.

Extruct: Trafilatura was great but it was not very good at extracting metadata such as publishers, authors, categories, publication dates, etc. While I didn’t use ScrapingHub I did happily use their free open source Extruct library in combination with Trafilatura.

Amazon Polly: partly because for the first 12 months you could use it for free but also because it really was the first text to speech service that provided decent prosody over longer sentence tracts. I used the “newscaster” speaking style. In laymans terms it means that the way the voice goes up and down would sound more natural over several sentences rather than a robotic, unintelligent sentence by sentence fall-off.

Libraries / Services I discarded

Newspaper: it was great for the first prototypes but was too limited in the end.

Spacy: it’s a super amazing NLP text library but it requires hundreds of MB of disk space and at the time more than AWS Lambda could hold. So basically it consumed more resources than I had avaialble

--

--

Julian Harris
Knowcast

Ex-Google Technical Product guy specialising in generative AI (NLP, chatbots, audio, etc). Passionate about the climate crisis.