Vovsoft Logo
Vovsoft Facebook Page Vovsoft Twitter Account
Home » Blog Posts » How to Use Speech to Text Converter

How to Use Speech to Text Converter

Date Last updated 4 months ago
Rated 4.5 / 5 (2 reviews)

Vovsoft Speech to Text Converter requires IBM Cloud Speech to Text API which can convert up to 500 minutes per month for free.

Please be aware: IBM Cloud has started to require credit card information for newly opened accounts.

How to get API Key and API URL

In order to get your API key and API URL, please follow these steps:

  1. Go to https://cloud.ibm.com/registration and create your IBM Cloud account for free.
    Please Note: If you get this error message "Your account cannot be created at this time", use your Gmail email address. It seems like IBM Watson doesn't like some email providers. So, if you have any such problems, just use another email address.
  2. Go to https://cloud.ibm.com/catalog/services/speech-to-text and create your Speech to Text Lite Plan instance.
  3. Go to https://cloud.ibm.com/resources; under Services and software tab, click on your Speech to Text instance. Your credentials (API key and URL) will be displayed in Manage or Service credentials page.

IBM Cloud Credentials

Enter your API key and URL into the Settings panel inside "Vovsoft Speech to Text Converter". The software is now ready to convert audio to text.

Broadband models vs Narrowband models

For most languages, the IBM Cloud service supports both broadband and narrowband models:

  • Broadband models are for audio that is sampled at greater than or equal to 16 kHz.
  • Narrowband models are for audio that is sampled at 8 kHz. Use narrowband models for offline decoding of telephone speech, which is the typical use for this sampling rate.

Choosing the correct model is important. Use the model that matches the sampling rate (and language) of your audio. The service automatically adjusts the sampling rate of your audio to match the model that you specify.

Approximate Conversion Time

Conversion times are listed in the table below. Please note that the specified times vary depending on the content of the file, its quality, language model, load of the AI servers and your computer's upload speed.

Audio Length Audio Quality Language Model Approximate Conversion Time
5 minutes 48 kHz Stereo English (Broadband) 1 minute and 20 seconds
5 minutes 8 kHz Mono English (Narrowband) 1 minute and 30 seconds
30 minutes 48 kHz Stereo English (Broadband) 9 minutes
30 minutes 8 kHz Mono English (Narrowband) 10 minutes

Common Errors

HTTP/1.1 503 Service Unavailable

Your URL is wrong. Please enter the exact "API Key" and "API URL" that was supplied for you by IBM Cloud.

Error reading data: (12152)

Your audio is too long. Please try to convert a shorter audio.

Speech to Text Converter Icon Speech to Text Converter Windows

Continue Reading

Comments Responses (5)

Avatar Image
G Sreenivasa Rao
Mar 14, 2022 at 12:58 am (PST) | Reply
Is there anyway I can use your sofware without providing that key? Is there any way I can register without providing the credit card details on IBM Cloud as it is not allowing me as of now. Thank you

Avatar Image
Oct 19, 2021 at 01:41 am (PST) | Reply
How long it will take to convert 30 mins of audio to text?
Avatar Image
Vovsoft Support
Oct 20, 2021 at 01:49 pm (PST) | Reply
Hello Sam. We updated the blog post and included approximate conversion times.

Avatar Image
Oct 9, 2021 at 07:23 am (PST) | Reply
I have copied & pasted the exact API key & URL & it still gives me the HTTP/1.1 503 Service Unavailable error.

Avatar Image
Apr 6, 2021 at 02:33 pm (PST) | Reply
Creating an account took a bit of a challenge, but it's the best speech2text I've ever tried.

Leave a Comment