In 2020, Hariti Patel tested several different transcription services and determined which was the most accurate and worth the money [7]. She concluded that GoTranscript was the most accurate transcription service at the time [7]. Especially with new developments in artificial intelligence, many of the transcription services tested in this previous blog post have evolved or changed completely. In the past few years, automatic transcription services have undergone significant improvements and updates, particularly with the integration of more advanced AI technology [10]. For example, services like Otter AI and TranscribeMe have enhanced their accuracy and speed, making them more competitive in the market. Four years later, I redid Patel’s experiment to determine the current best transcription service on the market.
What is transcription?
Transcription is simply turning audio or video into text [10]. While it can be done manually, advancements in technology and AI have led to the creation of automatic transcription services [10]. You’ve probably come across transcription in your everyday life, whether it’s through closed captions on Netflix or using voice control features like Siri. Transcription is now commonly included as a default feature in apps like Zoom, Google Meet, Microsoft Teams, and YouTube [3].
Uses and Benefits
Transcription is instrumental in many fields of work, such as the legal, healthcare, research, and business industries [10]. For example, in the courtroom, transcription can be used to document entire court hearings, so lawyers can easily review and refer back to them with full accuracy. Various other industries can benefit from using transcription to convert recorded meetings and interviews into written form for easy reference and future fact checking. With all the advanced transcription services available today, it is much easier to work with a text file accompanied by an audio/video file than it is to work with the audio/video file alone. Having the written document allows for things like searching for words or phrases using Control+F, copying and pasting, and making comments on and editing the document [10].
A major benefit of transcription is improved accessibility for people who are deaf or hard of hearing [4]. Providing a transcription or captions to accompany audios/videos allows all audiences to consume the content. Additionally, the value of an audio or video drastically increases when accompanied by its transcription [4]. Transcription boosts search engine optimization by making audio/video content more searchable, as titles and descriptions alone cannot cover all topics discussed in the content [10]. It also facilitates translating content into different languages, helping to reach wider audiences [4]. Lastly, many people simply prefer to use closed captions with their videos which is made possible with transcription [4].
Types of Transcription Services
There are two types of transcription services: manual and automatic [10]. Manual transcription uses human transcribers to manually listen to the audio and write the words they hear. Most human transcription services have a detailed and multi-step process to ensure maximum accuracy. This process often contains an initial transcription step, multiple proofreading steps, and the addition of timestamps and speaker tracking. For this reason, manual transcription services usually have slower turnaround times of about a day on average. Depending on the specific service and level of accuracy used, delivery could take up to a couple days. While manual transcription used to be far more accurate than automatic transcription, with technological advances in recent years, this is not always the case [10].
Automatic transcription offers a fast and cheap solution, but it still faces challenges in accuracy and speaker differentiation [10]. It uses AI to detect words and punctuation cues, and records these findings in a text file. Most automated transcription services claim to return the transcript within minutes, depending on the length of the file. Since automatic transcription uses software instead of human labor, it is very cheap compared to manual transcription. Even with such advanced technology, AI often struggles to accurately differentiate between speakers or understand speakers with heavy accents [10].
As technology continues to advance, automatic transcription is becoming much more common, convenient, and relevant than manual transcription [10]. This is due to its high accuracy and fast delivery time. Automatic transcription is already a large part of our daily lives, from speech-to-text gadgets like Amazon Alexa to real time captions in Zoom meetings [10]. For this reason, and the fact that manual transcription services are hard to come by nowadays, all but one of the transcription services I tested in my experiment are automatic.
Types of Transcription Services
There are two types of transcription services: manual and automatic [10]. Manual transcription uses human transcribers to manually listen to the audio and write the words they hear. Most human transcription services have a detailed and multi-step process to ensure maximum accuracy. This process often contains an initial transcription step, multiple proofreading steps, and the addition of timestamps and speaker tracking. For this reason, manual transcription services usually have slower turnaround times of about a day on average. Depending on the specific service and level of accuracy used, delivery could take up to a couple days. While manual transcription used to be far more accurate than automatic transcription, with technological advances in recent years, this is not always the case [10].
Automatic transcription offers a fast and cheap solution, but it still faces challenges in accuracy and speaker differentiation [10]. It uses AI to detect words and punctuation cues, and records these findings in a text file. Most automated transcription services claim to return the transcript within minutes, depending on the length of the file. Since automatic transcription uses software instead of human labor, it is very cheap compared to manual transcription. Even with such advanced technology, AI often struggles to accurately differentiate between speakers or understand speakers with heavy accents [10].
As technology continues to advance, automatic transcription is becoming much more common, convenient, and relevant than manual transcription [10]. This is due to its high accuracy and fast delivery time. Automatic transcription is already a large part of our daily lives, from speech-to-text gadgets like Amazon Alexa to real time captions in Zoom meetings [10]. For this reason, and the fact that manual transcription services are hard to come by nowadays, all but one of the transcription services I tested in my experiment are automatic.
Method and Results
In order to accurately repeat the experiment in Patel’s blog post, I recorded a 5-minute audio clip of myself reading “A Little Cloud” by James Joyce. I then uploaded this clip to six transcription services to be transcribed. With the help of a difference checker, I counted the number of punctuation errors, capitalization errors, and incorrect words produced by each transcription, as shown below in Figure 1. Based on recent articles about transcription services, Patel’s experiment, and word of mouth, I chose to test GoTranscript, Vook.AI, TranscribeMe (both automatic and manual), MacWhisper, Rev, and Otter AI [3, 5].
GoTranscript uses automatic transcription and charges $0.20 per minute. It allows users to create workspaces and teams to easily share audio and transcription files. There is no option to traverse through the audio and the corresponding word in the transcript like many of the other services, which can make it difficult to pinpoint specific parts of the audio/video file.
Vook.AI is another AI-based automatic transcription service at a cost of $0.05 per minute. It can transcribe audio in six languages, with an average claimed accuracy of 95%. The website is very simple and offers minimal functions. While users can easily edit the transcript, it does not allow users to highlight or comment on the text or find the corresponding location in the audio.
TranscribeMe offers both manual and automatic transcription services. Their manual transcriptions have varying levels of accuracy, which all include speaker IDs and guarantee delivery within 1-5 business days. For my experiment, I selected the “Standard Transcripts” level, which costs $1.50 per minute and falls between the highest and lowest accuracy options. TranscribeMe’s automatic transcription, available for $0.07 per minute, provides a faster, lower-cost alternative. Both the manual and automated transcripts provide the ability to access corresponding parts of audio and text with just a click. It also allows for easy editing, highlighting, striking, and commenting directly on the text.
MacWhisper is a free, automatic transcription service, with limited transcript editing and audio traversing functions. Unlike the other transcription services I explored, MacWhisper is an application downloaded onto your computer. This alleviates the concern some may have about uploading their audio files onto the internet [5]. In 2022, OpenAI created a powerful, accurate, and free transcription service. Unfortunately, it was fairly inaccessible because it required familiarity with the Terminal app. MacWhisper is a graphical user interface built on top of OpenAI’s transcription service, which hides all unnecessary commands and logistics from users [5].
Rev also uses AI to power their automatic transcripts at a rate of $0.25 per minute, following a 30-minute free trial. Rev’s text editor is easy to use, as it allows users to highlight, comment, and share clips. It also highlights words as they are spoken in the corresponding audio file. Additional features, such as summaries and increased accuracy, come with the purchase of upgraded plans that range from $9.99 to $34.99 per month.
Otter AI uses AI to automate transcriptions and has a turnaround time of just a few minutes, depending on the length of the file. The free plan includes 300 transcription minutes per month, which reset monthly. Users can purchase additional minutes by upgrading to paid plans, ranging from $8.33 to $20 per month. Otter AI was designed for businesses and office use. It can sync with users’ calendars and create teams and workspaces to easily share audios and transcripts. The AI meeting assistant automatically transcribes and provides an accurate summary of the audio in real time. Users can easily edit the transcript, highlight, add comments, and traverse through the audio and the corresponding word in the transcript. One of the best features of this service is the Otter AI chat, which provides a live chat where users can ask questions like “How should I prepare for my meeting tomorrow?” and “According to my last meeting, what’s the status of…?” Otter AI can be used with Google Meets, Zoom, and MS Teams.
Transcription Tool | Transcription Type | Cost per Minute | Punctuation Errors | Capitalization Errors | Incorrect Words | Total Errors |
---|---|---|---|---|---|---|
GoTranscript | Automatic | $0.20 | 30 | 4 | 5 | 39 |
Vook.AI | Automatic | $0.05 | 32 | 4 | 4 | 40 |
TranscribeMe | Automatic | $0.07 | 27 | 4 | 10 | 41 |
TranscribeMe | Human/Manual | $1.50 | 30 | 4 | 9 | 43 |
MacWhisper | Automatic | $0.00 | 37 | 4 | 8 | 49 |
Rev | Automatic | $0.25 | 44 | 3 | 16 | 63 |
Otter AI | Automatic | $0.00 | 49 | 10 | 8 | 67 |
Takeaways
GoTranscript, Vook.AI, TranscribeMe’s automatic transcription, and TranscribeMe’s manual transcription had low total errors of 39, 40, 41, and 43, respectively. Due to the high cost of $1.50 per minute and the slow delivery time of TranscribeMe’s manual transcription, I concluded that this service is not comparable to the other low-error services. While TranscribeMe’s automatic transcription had only 41 total errors, 10 of which were incorrect words, compared to the 4 and 5 incorrect words produced by GoTranscript and Vook.AI, respectively. For most industries and transcription uses, transcribing the correct words is more important than correct punctuation, as this can alter the meaning of sentences and entire audio files. Therefore, GoTranscript and Vook.AI are the two most accurate transcription services that I tested in this experiment.
Despite its high error rate, I would still argue that Otter AI is the most useful and usable transcription service on the market because of its AI meeting assistant and Otter AI chat. It also provided the most powerful transcript editor and ability to follow along with the audio, so some users may overlook the decreased accuracy.
In Patel’s experiment done in 2020, she found that GoTranscript’s manual transcription was the most accurate, with only 27 errors [7]. All of the automatic, AI-powered transcription services she tested had a minimum of 86 total errors. Clearly, the AI used in all of these transcription services has greatly improved since 2020, as all the automatic transcription services I tested produced less than 50 total errors on average, with the minimum being 39 errors [7]. As seen in Figure 2, the total number of errors in the automatic transcription services tested by both Patel and myself has significantly decreased. This difference reflects how rapidly AI-powered transcription has evolved in recent years.
Transcription Tool | Total Errors in 2020 | Total Errors in 2024 | Difference in Errors |
---|---|---|---|
TranscribeMe | 86 | 41 | 45 |
Otter AI | 108 | 67 | 41 |
Rev | 93 | 63 | 30 |
Overall, transcription can increase productivity in various industries, increase search engine optimization, and improve accessibility. There are many transcription services on the market, all with varying costs, accuracies, and delivery times. Otter AI is the most usable and has the most features to assist with things like meetings, summarizing audios, and editing transcripts. But, for a high accuracy transcription, I would recommend GoTranscript or Vook.AI.
Works Cited
[1] “#1 Speech to Text Service in the World.” Rev, www.rev.com/. Accessed 5 Nov. 2024.
[2] “Best Audio & Video Transcription Services.” GoTranscript, https://gotranscript.com/. Accessed 5 Nov. 2024.
[3] Guay, Matthew. “The Best Transcription Services.” The New York Times, The New York Times, 15 Oct. 2018, www.nytimes.com/wirecutter/reviews/best-transcription-services/.
[4] Lewis, Elisa. “8 Benefits of Video Transcription & Captioning.” 3Play Media, 26 June 2024, www.3playmedia.com/blog/8-benefits-of-transcribing-captioning-videos/.
[5] Novak, Matt. “MacWhisper Is the Free Transcription Software I’ve Been Waiting For.” Forbes, Forbes Magazine, 20 Feb. 2024, www.forbes.com/sites/mattnovak/2023/02/04/macwhisper-is-the-free-transcription-software-ive-been-waiting-for/.
[6] “Online Audio-to-Text Transcription Service.” Convert Your Speech to Text in Minutes, www.vook.ai/en. Accessed 5 Nov. 2024.
[7] Patel, Hariti. “Determining the Most Accurate Transcription Services.” MoCHI Research Group, 10 Dec. 2020, mochiresearch.com/2020/12/10/determining-the-most-accurate-transcription-services/.
[8] “Remember, Search, and Share Your Voice Conversations.” Otter Voice Meeting Notes, otter.ai/home. Accessed 5 Nov. 2024.
[9] “TranscribeMe! – Fast & Accurate Human Transcription Services.” TranscribeMe, 10 June 2024, www.transcribeme.com/.
[10] Walker, Ben. “Benefits of Transcription Services for Different Businesses.” Ditto, 19 Jan. 2024, www.dittotranscripts.com/blog/benefits-of-transcription-services-for-different-businesses/#:~:text=Transcription%20improves%20efficiency%20by%20making,playing%20video%20or%20audio%20recordings.&text=Transcription%20provides%20a%20written%20record,from%20audio%20or%20video%20files.