Jeff Kofman on how AI can empower newsrooms
The voice economy has found its way into newsrooms, changing the workflow of journalists and liberating time to focus on their reporting. GEN talked to Jeff Kofman, Trint CEO and an Emmy award-winning correspondent, about the challenges he faced when launching Trint, how startups can compete in a market dominated by large tech giants and how A.I. in general can empower newsrooms.
GEN: Could you tell us a little bit about the early days? What were the challenges you faced when launching Trint? And what are the challenges today?
Jeff Kofman: For me the biggest challenge was simply learning how to start a business and build it. As a reporter with 30 years in the field as a foreign correspondent, as a war correspondent, I just had no experience building a team, raising money, managing a company. It was an incredibly steep learning curve. What motivated me was the vision our team had of the solution Trint offers. It was on a track that no one else had travelled. And so I was just determined to see it through. My gut told me that there was an opportunity here to build something like never before, a tool that enabled us to make the spoken word — whether audio or video — searchable, discoverable, in a way that would liberate us from the tedious task of transcription, and would unlock the value of recordings that already exist.
Today the challenges are simply scaling the company. When we started in December 2014, we were a team of four. Today we’re almost 50 people headquartered in London, England, with a growing team in Toronto, Canada, which is our North American sales office. It’s really complex at this level and that’s required hiring an experienced management team, a chief operating officer who sits beside me and really helps us scale the company. The other challenge when you grow is just deciding what you’re actually going to build and what you’re not going to build right now. There are so many exciting opportunities for a company like ours and on the innovation front you can’t do them all, and so you have to decide this is the thing that matters most right now.
Transcription software allows journalists to save time on what used to be a time-consuming task. What other aspects of journalism could potentially benefit from automated processes, and why?
You know, automating transcription is the first thing that we did. We were simply a transcription tool, but now we’re much more than that. The whole process of automation but also workflow is really critical to unlocking value and liberating journalists so that they can apply their time to content production and not tedious tasks. And so even here at Trint we don’t want to be seen simply as a transcription tool. That’s why our tagline is Beyond Transcription.
The introduction of our collaborative tool means that a whole team of people whether in print or audio or video can look at the same transcript, can make comments, can extract what they need without having to bother each other, so that means that a social media person can find a quote for Twitter, a radio person can pull some audio, the digital-first person can write a new top to the piece online, all of that from the same single source of truth, and everybody sending notes back and forth. So A.I. is the starting point, it’s the software, the workflow that you put on top that really transforms people’s lives at work.
Beyond facilitating repetitive tasks, how could automation contribute to the quality of a journalistic piece?
I think the answer is that artificial intelligence is not the end, it’s a means to the end, and in the case of Trint and a lot of other technologies it’s not that AI is going to do all the work. It can do a lot of the heavy lifting. It can do a lot of the tedious work. And what that means is that reporters under increasing pressure to get stuff out fast to multiple platforms can focus on actual content production. The really interesting challenging and fun part of our jobs rather than doing tedious stenography, and so you have to look at A.I. as an opportunity to liberate us to do our jobs, and to allow us to focus on on where our real skills are so that we don’t have to do the tedious stuff.
As large tech companies like Google and Microsoft are heavily investing in the development of speech recognition and its transcription capabilities, how does Trint plan to stay competitive given its relatively small size?
The reality is that every startup or scale up whether it’s Trint or another one faces the same question about the 800 pound gorillas of Google, Microsoft, Apple, Amazon, etc. The thing that startups and scale ups can do that those big companies can’t do, is they can focus on solving a single problem really well and they can do it in-depth and they can do it quickly. You know as much as those companies talk about agility and as much as they can throw bodies and money into a scale that we can’t compete, what we can do is focus on a specific challenge.
In this case the workflow of recorded and spoken and live audio and video, allowing people to get the value to find the moments that matter it so that they can actually produce content and not get bogged down in searches and transcriptions. Yes, Google and Microsoft can do that, but are they spending as much time or do they have the dedication, the devotion, have they got a team that single-mindedly focused on tackling that issue? And the answer is not really. Yes, they’re in these areas, but don’t expect magical solutions from them anytime soon. If that was the case there would be no startups. But the reason startups exist is that they can solve problems in a very focused way that those big companies can’t.
Where do you see Trint in a sector very much dominated by a small number of large tech companies?
There are a lot of small companies in this space that I call the voice economy. That’s because there’s a huge amount of very focused innovation, and huge amount of opportunity to solve problems that people don’t even know exist, and Trint is one of them. You know, people just accepted that transcription was part of our daily lives. It was a bit like our mothers washing lettuce.
You had to wash lettuce because that’s the only way you could make sure that it was clean and you wouldn’t get sick. Then somebody in California came up with the idea of pre-washing lettuce and injecting some inert gas into the bag so that it would last on the shelf and you could just open it and eat it safely. Well, you know, Trint washes the lettuce, and that’s what really smart startups and scale-ups can do in a way that the big companies just don’t have the agility, the focus, the determination to do, and that’s frankly why they buy some of these small companies.
Where is Trint headed in the future and what are some of the features you plan to enhance? Would you like to go beyond automated transcription?
No. We began when we came out of the market in September 2016 as an automated transcription tool. Two and a half years later we are much more than that already. We are a workflow tool, a productivity tool that takes raw recorded content, makes it instantly searchable, verifiable and something you can export, whether it’s for writing a text piece, or for captioning, or for cutting an audio or video piece, a podcast or a video story. We see the opportunities for innovation here as limitless.
Right now our focus is on live transcription in a collaborative workflow and that’s really transformative. What that means is that a group of people can be working on a single story, a reporter, a single reporter in the field. Say you’re covering the primaries for the Democratic nomination for 2020 and you’re out covering Elizabeth Warren. She’s giving five speeches, you’re on your own, you can record on the Trint app, that goes straight to the newsroom in New York, and in Washington, and elsewhere they can watch that transcript as it’s live and 6, 10, 15 people can be looking at it, making notes to each other and extracting the content as it happens. That completely transforms the way content is accessed from the field and it completely transforms our ability to collaborate, and to satisfy and to feed that so-called beast that is always chasing us as reporters in the field, and as news organisations having to juggle so many platforms, and so many demands on a single piece of content.
After that our roadmap goes into a live multilingual translation powered by A.I. What that means is that I could be covering something in a language I don’t speak like Portuguese, I normally would have to have an interpreter whispering in my ear, I could be watching it get Trinted or transcribed in Portuguese and instantly translated into English. It’s not a fully reliable translation, but it’s a first draft, and would allow me to say, “Ah, in this hourlong news conference here are the three moments that are most interesting. Can you translate these precisely?” So, again it’s about enabling people to get at their content fast. It’s about liberating that recorded content so that like text, and with text associated to the audio, it’s instantly searchable. It can quickly be verified and we can disseminate it without delay.
Where do you draw the line when it comes to automation in the newsroom? How important will a human element remain in the years to come? And will the rise of automated content production actually eliminate a large part of newsroom jobs?
I think it’s a mistake to see A.I. as replacing reporters. I think it allows newsrooms to do things that it couldn’t otherwise do, or that it wasn’t interested in doing. You know, one of the first uses of A.I. was to produce earnings reports from their quarterly reports. The Associated Press does this, Bloomberg does this. These are really tedious mechanical jobs. It was only done for large companies. Now it can be done automatically for every company. The same with high school sports scores, things like that. People can input a small amount of data and suddenly you can get very regionalised news from school sports, regional sports, that wasn’t being covered. So A.I. is really opening up opportunities for coverage.
It’s not displacing, it’s enhancing. Does it mean that people will have to be involved in A.I. in the future? Absolutely. You know, in our area, speech-to-text, I don’t believe there will be a time in the near or intermediate future where you will be able to take a conversation, an interview, a news conference, a panel discussion, simply transcribe it automatically, and publish it with impunity.
There will always be mistakes, whether it’s overtalk, whether it’s applause, whether it’s heavy accents, whether it’s an airplane going overhead, and those kind of mistakes that natural language processing makes could be fatal in the sense of legal errors, or embarrassing transcriptions, that news organisations simply cannot afford to stand behind. And so there will always be the need for a final level of verification for something like that. Will it eliminate jobs? I don’t think so. I think it’s going to liberate people to do their jobs and to do more interesting jobs. I think that you’ll actually get more content with fewer people. Let’s face it, jobs are already being eliminated, the challenge is to maximize the value of journalists who still have their jobs.
How do you see the skillsets of journalists changing with the increased use of AI and machine learning in newsrooms?
There’s no question that journalism has changed massively since I entered the craft. You now need to be in some form, a technician as well as a as a craftsman, as an interviewer, a writer, researcher. It’s simply not good enough if you want to be successful as a young journalist to be a great writer, a great TV presenter. You need to have some technical mastery and it’s not just AI, you need to understand the fundamentals of audio and video production, you need to understand how to post blogs, how to get material onto a website.
You may not need to be a master of all those things, but if the person next to you has your skill set plus all of those things, she or he is going to get that job, and you’re not, so it is a much more technical craft. I think in terms of A.I. the whole question of data science now enters the newsroom. Even small newsrooms are going to have to have a couple of developers, software developers, software engineers, with some data science experience to unlock the opportunities that could help them make their content more attractive, and more stickable in a very competitive news landscape.
What are some of the projects you are following in the field of AI/ machine learning right now?
Well obviously my focus right now is on voice technology, natural language processing, and what we’re focused on now is getting more accurate content, solving the problem even better than we do. That means better speaker recognition so when two people are speaking the A.I. can understand. One is the interviewer, one is the subject, and get what’s called diarization correct. Another’s punctuation.
It’s very hard to punctuate the spoken word in a way that you can stand behind. Right now our punctuation is only periods, I think in time we can get question marks and commas in, and that’s what people want but ultimately the most important thing in speech to text is getting the words right, and with good audio, with clear speech. We are 99% accurate or better. The challenge has become with multiple speakers, with bad audio, with heavy foreign accents. I think we’ll get better at all of those things but it’s going to be a gradual process.
Jeff Kofman is the CEO and founder of automated transcription platform Trint. Before becoming a tech entrepreneur, he spent more than three decades as an Emmy award-winning network television news foreign correspondent and war correspondent with ABC, CBS and CBC News. Jeff has covered many of the biggest stories of our time including the Iraq War, the Arab Spring, Hurricane Katrina, the Gulf Oil Spill and the Chile Mine Rescue.
He has won an Edward R. Murrow Award, a duPont Award and two Emmys, including one for his coverage of the fall of Muammar Gadhafi in Libya in 2011. In 2016, Trint was winner of GEN’s Startup for News competition.