Vovsoft Logo
Vovsoft Facebook Page Vovsoft Telegram Channel Vovsoft Youtube Channel Vovsoft Twitter Account
How to Use Speech to Text Converter Large Image

How to Use Speech to Text Converter

Home » Blog Posts » How to Use Speech to Text Converter
Date Last updated 1 day ago
Rated 4.3 / 5 (8 reviews)

Vovsoft Speech to Text Converter supports offline and online speech engines.

  • Vosk (Offline)
  • Continuous Dictation (Offline)
  • OpenAI (Online)
  • Deepgram (Online)
  • Microsoft Azure (Online)
  • IBM Cloud (Online)

How to use Vosk (Offline - File to Text)

Vosk is a speech recognition toolkit that works offline, supporting 20+ languages. It performs speech-to-text conversion on your own computer. No file is sent to internet in any case.

Vosk requires a "models" directory, which includes language data. Vovsoft Speech to Text Converter embeds lightweight English and French models. If you need other languages or not satisfied with the results, please follow the steps:

  1. Go to https://alphacephei.com/vosk/models and download any model file for free.
  2. Extract the model file into the models folder. Example: C:\Program Files (x86)\VOVSOFT\Speech to Text Converter\vosk\models
  3. Restart Vovsoft Speech to Text Converter.

Please note that .NET Framework 4.8 must be installed for this feature.

How to use Continuous Dictation (Offline - Microphone to Text)

Continuous Dictation requires the "Microsoft Speech Platform", which is preinstalled on most systems. This feature supports English, French, German, Japanese, Simplified Chinese, Spanish, and Traditional Chinese.

How to change speech recoginiton settings:

  1. Press Windows key.
  2. Type and enter Control Panel.
  3. Find and click Speech Recognition.

If no speech recognizer is installed on your system, please follow the steps:

  1. Download and install Microsoft Speech Platform: https://www.microsoft.com/en-us/download/details.aspx?id=27225
  2. Download and install Speech Recognition (SR) languages: https://www.microsoft.com/en-us/download/details.aspx?id=27224
  3. Connect microphone (if not already connected).
  4. Restart computer.

Online Speech to Text API Services

If you want to perform speech-to-text conversion on cloud servers instead of your own computer and take advantage of the latest AI advancements, you will need credentials from at least one of these providers:

API Provider Pricing Free Tier Credit Card
OpenAI $0.0060 per minute No Required
Deepgram $0.0044 per minute $200 free credit ✔️ Not Required 😊
IBM Cloud $0.0100 per minute 500 minutes per month ✔️ Required
Microsoft Azure $0.0167 per minute 300 minutes per month ✔️ Required
⚠️ IBM Cloud, Microsoft Azure, and OpenAI may require a valid credit card for registration and may not be available in some countries such as China and Taiwan.

How to get OpenAI API Key

In order to get your OpenAI API key, please follow these steps:

  1. Go to https://platform.openai.com/signup and create your OpenAI account for free.
  2. Go to https://platform.openai.com/account/api-keys and create your API key.

How to get Deepgram API Key

In order to get your Deepgram API key, please follow these steps:

  1. Go to https://console.deepgram.com/signup and create your Deepgram account for free.
  2. Click API Keys.
  3. Click Create a New API Key.

How to get Microsoft Azure API Key and API Region

In order to get your Microsoft Azure API key and API Region, please follow these steps:

  1. Go to https://portal.azure.com and create your Microsoft Azure account for free.
  2. Click Create a resource.
  3. Choose Cognitive Services.
  4. After you created your Speech Service, your credentials can be found in Keys and Endpoint page: KEY1, KEY2 (any one of the keys should work) and Location/Region fields.

Microsoft Azure credentials
Credentials screen of Microsoft Azure

Supported Languages in Microsoft Azure

Afrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Bengali, Bosnian, Bulgarian, Burmese, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Filipino, Finnish, French, Georgian, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Lao, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Maltese, Marathi, Mongolian, Nepali, Norwegian Bokmål, Pashto, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Sinhala, Slovak, Slovenian, Somali, Spanish, Swahili, Swedish, Tamil, Telugu, Thai, Turkish, Ukrainian, Uzbek, Vietnamese, Welsh, and Zulu are supported in Microsoft Azure Cognitive Services.

How to get IBM Cloud API Key and API URL

In order to get your IBM Cloud API key and API URL, please follow these steps:

  1. Go to https://cloud.ibm.com/registration and create your IBM Cloud account for free.
    Please Note: If you get this error message "Your account cannot be created at this time", use your Gmail email address. It seems like IBM Watson doesn't like some email providers. So, if you have any such problems, just use another email address.
  2. Go to https://cloud.ibm.com/catalog/services/speech-to-text and create your Speech to Text Lite Plan instance.
  3. Go to https://cloud.ibm.com/resources; under AI / Machine Learning tab, click on your Speech to Text instance. Your credentials (API key and URL) will be displayed in Manage or Service credentials page.

IBM Cloud Credentials
Credentials screen of IBM Cloud

Enter your API key and URL into the Settings panel inside "Vovsoft Speech to Text Converter". The software is now ready to convert audio to text.

Supported Languages and Models in IBM Cloud

English, Arabic, Chinese (Mandarin), Czech, Dutch, French, German, Hindi (Indian), Italian, Japanese, Korean, Portuguese (Brazilian) and Spanish are supported in IBM Cloud API.

For most languages, the IBM Cloud service supports broadband, narrowband, telephony and multimedia models:

  • Broadband models are for audio that is sampled at greater than or equal to 16 kHz.
  • Narrowband models are for audio that is sampled at 8 kHz. Use narrowband models for offline decoding of telephone speech, which is the typical use for this sampling rate.
  • Telephony models are intended specifically for audio that is communicated over a telephone. Like previous-generation narrowband models, telephony models are intended for audio that has a minimum sampling rate of 8 kHz.
  • Multimedia models are intended for audio that is extracted from sources with a higher sampling rate, such as video. Use a multimedia model for any audio other than telephonic audio. Like previous-generation broadband models, multimedia models are intended for audio that has a minimum sampling rate of 16 kHz.

Choosing the correct model is important. Use the model that matches the sampling rate (and language) of your audio. The service automatically adjusts the sampling rate of your audio to match the model that you specify. More information: https://cloud.ibm.com/apidocs/speech-to-text

Maximum File Size

  • OpenAI Whisper's file size limit is 26,214,400 bytes (25MB).

Approximate Conversion Time

Conversion times are listed in the table below. Please note that the specified times vary depending on the content of the file, its quality, language model, load of the AI servers and your computer's upload speed.

Audio Length Audio Quality Language Model Approximate Conversion Time
5 minutes 48 kHz Stereo English (Broadband) 1 minute and 20 seconds
5 minutes 8 kHz Mono English (Narrowband) 1 minute and 30 seconds
30 minutes 48 kHz Stereo English (Broadband) 9 minutes
30 minutes 8 kHz Mono English (Narrowband) 10 minutes

Common Errors

HTTP/1.1 503 Service Unavailable
Your URL is wrong. Please enter the exact "API Key" and "API URL" that was supplied for you by IBM Cloud.

Error sending data: (12030) The connection with the server was terminated abnormally
A firewall, proxy or antivirus interferes with the connection. Please try to disable them or use different internet connection.

Error reading data: (12152) The server returned an invalid or unrecognized response
Your audio is too long. Please try to convert a shorter audio.

"Please wait" hangs, nothing happens
The file size of your audio is too large. Please try to upload a smaller file. Converting stereo to mono may help.

Speech to Text Converter Icon Speech to Text Converter Windows

Fatih Ramazan Çıkan
About Author
Fatih Ramazan Çıkan LinkedIn
Software development enthusiast | Electronics engineer

Continue Reading

Comments Responses (14)

Avatar Image
Jun 3, 2024 at 01:27 am (PST) | Reply
I don't have a choice of language with the demo program. Difficult to interpret a speech in French.
I don't know if the program is right for me.
thanks anyway

Avatar Image
Jun 2, 2024 at 08:20 am (PST) | Reply
i downloaded and extracted the vosk files in the model folder but they are not shown why??
Avatar Image
Jun 2, 2024 at 08:22 am (PST) | Reply
Settings page does not open any help?
Avatar Image
Vovsoft Support
Jun 2, 2024 at 03:58 pm (PST) | Reply
Hello Raffaele,
1) Please make sure that the folder structure is the same as the original folder structure. Then, restart Vovsoft Speech to Text Converter. It should read the model at application launch.
2) The Vosk tab doesn't have any additional Settings, we may hide the "Settings" button in the next update.

Avatar Image
mekhled alotaibi
Feb 17, 2024 at 06:45 pm (PST) | Reply

can you download Arabic translation i don't understand English and English is not my first language please

why there's no opition forarabic translations i cant use the app without Arabic translation

Avatar Image
Oct 27, 2023 at 08:35 am (PST) | Reply
please provide us urdu language converter and give short video that help us how to use this software. thank you

Avatar Image
Sep 15, 2023 at 06:54 pm (PST) | Reply
trying now!

Avatar Image
Jan 16, 2023 at 10:08 am (PST) | Reply
It indeed a great app

Avatar Image
Jul 28, 2022 at 10:26 am (PST) | Reply
I think that is a very good software. It allows to have a windows interface to run the ibm Watson speech to text. Because without this good software, I could not use the ibm speech to text because I don't understand a interface api with of lines of command to enter to run the function. Thank you!

Avatar Image
G Sreenivasa Rao
Mar 14, 2022 at 12:58 am (PST) | Reply
Is there anyway I can use your sofware without providing that key? Is there any way I can register without providing the credit card details on IBM Cloud as it is not allowing me as of now. Thank you

Avatar Image
Oct 19, 2021 at 01:41 am (PST) | Reply
How long it will take to convert 30 mins of audio to text?
Avatar Image
Vovsoft Support
Oct 20, 2021 at 01:49 pm (PST) | Reply
Hello Sam. We updated the blog post and included approximate conversion times.

Avatar Image
Oct 9, 2021 at 07:23 am (PST) | Reply
I have copied & pasted the exact API key & URL & it still gives me the HTTP/1.1 503 Service Unavailable error.

Avatar Image
Apr 6, 2021 at 02:33 pm (PST) | Reply
Creating an account took a bit of a challenge, but it's the best speech2text I've ever tried.

Leave a Comment