Transcription Services/Note Taking

MP3 to Text: How to Turn Audio Files to Text

By Thomas Carter on July, 2 2019
Thomas Carter

When so much information is consumed via video or audio, we forget how helpful it is to have that information noted down to read. 

You can read far faster than a video will play. When it comes to ‘skimming’, a good reader can consume information far faster at higher comprehension than a sped up audio file. So, how can you quickly and easily get all that information written out? How can you turn MP3 to text? 

The answer is transcription!

This doesn’t mean you have to transcribe it yourself. You can, but it’s a time-consuming process that would involve you rewinding the file countless times while scrambling to jot everything down. 

The faster solution is to use transcription services. There are two main kinds of transcription services on the market: those that use human transcriptionists -- professionals who can transcribe audio efficiently and much quicker than you would -- and transcription services that use speech to text software. 

The different approaches have their own benefits and caveats. It’s important to familiarise yourself with them all in order to choose the right path for your transcript needs. 

We’ll take you through the pros and cons of each of these to help you to make an informed decision and save yourself time and effort with the right transcription services for you.

Transcribe it Yourself

It is possible to transcribe audio and video files yourself -- it just takes time. 

Most professional transcribers work at speeds of up to 130 words per minute, taking approximately three to four hours to transcribe just one hour of audio. Transcription can be tedious (as you have to repeatedly pause and rewind the audio file to ensure you note everything down), and professionals are much more skilled at it. The average human types at 41 wpm, nearly three times slower than professionals, meaning it is likely that transcribing an hour of audio will take you much longer than 4 hours. 

If you have the time for it, transcribing an audio recording yourself is much cheaper. However, you need to think about the value of your own time. If you want to save time and energy, and ensure you get an accurate transcription, it is best to opt for professional services. 

Download Automated Speech Recognition (ASR)  software

In the digital age, we have the luxury of a great many digital resources to help service our transcription needs. Automated Speech Recognition (ASR) solutions parse audio files far faster than any human listener and convert even lengthy audio or video files to text in minutes. This makes for a potentially much faster speech to text conversion rate (certainly when compared to transcribing by yourself). 

What’s more, there are many ASR services that offer free trials, making the cost considerations virtually nil. Even if you have transcription needs outside of your free trial, ASR services tend to be pretty cheap. You can usually expect to pay 7-10 pence per minute for ASR services. 

While ASR software can save you some time and money, it comes with other costs.

What the algorithms used by ASR services offer in speed, they lack in accuracy. Even under ideal circumstances (no background chatter or lapses in audio quality), ASR solutions will struggle to produce accuracy over 80%. They can struggle with nuances of accent and delivery which can lead to inaccuracies in the transcription. They can also struggle with homophones and words that are pronounced similarly and this can lead to further lapses in clarity.

ASR may represent a saving in terms of cost and labour time but for some users this may represent a degree of false economy. The time saving you make with the software solution’s rapid transcription of your MP3 or MP4 file may be offset by the time you need to spend reviewing your document and making amendments where the software has erred. 

Unless you have experience with the editing process, it’s entirely possible that mistakes may slip through the cracks and you need to take several passes at the transcript to ensure that it’s perfect. Lastly, you don’t have any control over the type of transcript you receive. 

Use a professional transcription service

If you are loathed to entrust your transcription to an automated service, a trained professional can help to give you a little extra assurance. 

The difference between an ASR service and a professional transcription service is that the latter puts you much more squarely in the driver’s seat. Professional transcription services offer a level over control over the detail level and turnaround of your transcription, meaning they can easily match to your needs and budget while also offering a superior level of quality across the board when compared to ASR solutions.

The main caveat of a professional transcription service is that, as in any field, professionals need to be paid. However, users who can afford to wait a few days for their transcripts will get extremely reasonable rates. You can expect to pay anywhere between 50p and £2 a minute for professional transcription services. Why the disparity? We’ll explain...

Professional transcription services are much more flexible and in-depth, affording you a range of services that best suit your needs. This means the pricing may vary depending on the kind of transcript you get. Some of these offerings include:

    • Full Verbatim — Full verbatim transcriptions give you transcription in its rawest form. They capture everything, not just the spoken content itself but nuances of speech like “ums” “likes”, “ahs” and “you know?s” as well as laughter and other idiosyncrasies. Indeed, you can even request notes on these for a small extra charge. This is often more detail than is required, and can be costly, but it is nice to have the option. 
    • Verbatim — Verbatim (also called “intelligent verbatim” or “clean verbatim”) offers a more concise version of the speech, sifting out the content of the speech from the idiosyncratic tics. This gives a cleaner and more polished version of the raw transcription (that is far easier to read) while still offering word for word accuracy.
  • Detailed Notes — Even when you have a verbatim transcript you may still want to separate the wheat from the chaff and get quick access to the main talking points of the speech. Detailed notes remove everything that is off topic and give you a leaner and more concise version of the audio or video file without missing out on the important details.

Making the best choice for your transcription needs

Now that you know the pros and cons of all 3 ways to turn audio files into text, you’re hopefully better able to make an informed choice as to which option is best for you.

Transcribing files yourself can be a tortuously long and involved process, but it won’t cost you a penny. ASR services can be fast and affordable but they may let you down on accuracy. Professional transcription services will give you outstanding quality but you can expect to pay a little more for them.

It all depends on your needs, your priorities and your budget!


You’ve been reading about turning audio files into text, but if you have questions about transcription service rates, who is transcribing your audio or what kind of transcript is right for you, our Ultimate Guide to Transcription Services can help! 

New call-to-action

Submit a Comment

Stay up to date