IBM watson API解析1-Speech To Text(语音转文本)

前期准备

1、注册Bluemix账号

需要一个IBM的Bluemix账号,已有账号的可直接登陆,若无账号,点击注册进行注册。
注册时国家或地区默认的,否则可能会出错。

2、安装watson-developer-cloud

pip install --upgrade watson-developer-cloud

3、创建应用服务

  1. 登录Bluemix后,点击左上角目录栏,点击”服务“,然后点击”Watson“。则会看到一系列服务。如图所示:IBM watson API解析1-Speech To Text(语音转文本)_第1张图片
  2. 点击Speech To Text
  3. 点击创建,则会创建一个相应的服务凭证
  4. 进入后,点击左边的服务凭证,则会看到创建的服务
  5. 点击查看凭证,则会显示服务的URL、用户名、密码,这些在调用相应API时会用到

Speech To Text API

1、基本概念

Speech to Text 服务可将人类的声音转化为书写的文字。凡需要在语音文字与其书写文字之间建立起桥梁的地方,包括通过语音控制嵌入式系统,生成会议和电话会议的抄本,以及听写电子邮件和注释时,都可以使用语音转文字服务。此服务简单易用,它使用机器智能将有关语法和语言结构的信息与音频信号合成知识相结合,从而生成更准确的抄本。

2、目前支持语言

英语(美国)、英语(英国)、日语、现代标准阿拉伯语(仅限宽带模型)、中文普通话、葡萄牙语(巴西)、西班牙语、法语(仅限宽带模型)

简单样例(Python实现)

1、models方法

检索可用于服务的所有型号的列表。 信息包括模型的名称和赫兹的最小采样率等等。

所有模型如下:
ar-AR_BroadbandModel
en-UK_BroadbandModel
en-UK_NarrowbandModel
en-US_BroadbandModel
en-US_NarrowbandModel
es-ES_BroadbandModel
es-ES_NarrowbandModel
fr-FR_BroadbandModel
ja-JP_BroadbandModel
ja-JP_NarrowbandModel
pt-BR_BroadbandModel
pt-BR_NarrowbandModel
zh-CN_BroadbandModel
zh-CN_NarrowbandModel
实现代码如下:

# encoding: UTF-8
import json
from os.path import join,dirname
from watson_developer_cloud import SpeechToTextV1
speech_to_text = SpeechToTextV1(
    username='6da692a0-6e00-4caf-9e4d-038c4d519cd6',
    password='6girOpb1GJPN',
    x_watson_learning_opt_out=False
)
print (json.dumps(speech_to_text.models(),indent=2))

结果显示:

{
  "models": [
    {
      "description": "French broadband model.", 
      "language": "fr-FR", 
      "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/fr-FR_BroadbandModel", 
      "rate": 16000, 
      "supported_features": {
        "custom_language_model": false, 
        "speaker_labels": false
      }, 
      "name": "fr-FR_BroadbandModel"
    }, 
    {
      "description": "US English narrowband model.", 
      "language": "en-US", 
      "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/en-US_NarrowbandModel", 
      "rate": 8000, 
      "supported_features": {
        "custom_language_model": true, 
        "speaker_labels": true
      }, 
      "name": "en-US_NarrowbandModel"
    }, 
    {
      "description": "Brazilian Portuguese broadband model.", 
      "language": "pt-BR", 
      "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/pt-BR_BroadbandModel", 
      "rate": 16000, 
      "supported_features": {
        "custom_language_model": false, 
        "speaker_labels": false
      }, 
      "name": "pt-BR_BroadbandModel"
    }, 
    {
      "description": "Japanese narrowband model.", 
      "language": "ja-JP", 
      "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/ja-JP_NarrowbandModel", 
      "rate": 8000, 
      "supported_features": {
        "custom_language_model": true, 
        "speaker_labels": true
      }, 
      "name": "ja-JP_NarrowbandModel"
    }, 
    {
      "description": "Mandarin broadband model.", 
      "language": "zh-CN", 
      "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/zh-CN_BroadbandModel", 
      "rate": 16000, 
      "supported_features": {
        "custom_language_model": false, 
        "speaker_labels": false
      }, 
      "name": "zh-CN_BroadbandModel"
    }, 
    {
      "description": "Japanese broadband model.", 
      "language": "ja-JP", 
      "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/ja-JP_BroadbandModel", 
      "rate": 16000, 
      "supported_features": {
        "custom_language_model": true, 
        "speaker_labels": true
      }, 
      "name": "ja-JP_BroadbandModel"
    }, 
    {
      "description": "Brazilian Portuguese narrowband model.", 
      "language": "pt-BR", 
      "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/pt-BR_NarrowbandModel", 
      "rate": 8000, 
      "supported_features": {
        "custom_language_model": false, 
        "speaker_labels": false
      }, 
      "name": "pt-BR_NarrowbandModel"
    }, 
    {
      "description": "Spanish broadband model.", 
      "language": "es-ES", 
      "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/es-ES_BroadbandModel", 
      "rate": 16000, 
      "supported_features": {
        "custom_language_model": false, 
        "speaker_labels": true
      }, 
      "name": "es-ES_BroadbandModel"
    }, 
    {
      "description": "Modern Standard Arabic broadband model.", 
      "language": "ar-AR", 
      "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/ar-AR_BroadbandModel", 
      "rate": 16000, 
      "supported_features": {
        "custom_language_model": false, 
        "speaker_labels": false
      }, 
      "name": "ar-AR_BroadbandModel"
    }, 
    {
      "description": "Mandarin narrowband model.", 
      "language": "zh-CN", 
      "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/zh-CN_NarrowbandModel", 
      "rate": 8000, 
      "supported_features": {
        "custom_language_model": false, 
        "speaker_labels": false
      }, 
      "name": "zh-CN_NarrowbandModel"
    }, 
    {
      "description": "UK English broadband model.", 
      "language": "en-UK", 
      "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/en-UK_BroadbandModel", 
      "rate": 16000, 
      "supported_features": {
        "custom_language_model": false, 
        "speaker_labels": false
      }, 
      "name": "en-UK_BroadbandModel"
    }, 
    {
      "description": "Spanish narrowband model.", 
      "language": "es-ES", 
      "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/es-ES_NarrowbandModel", 
      "rate": 8000, 
      "supported_features": {
        "custom_language_model": false, 
        "speaker_labels": true
      }, 
      "name": "es-ES_NarrowbandModel"
    }, 
    {
      "description": "US English broadband model.", 
      "language": "en-US", 
      "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/en-US_BroadbandModel", 
      "rate": 16000, 
      "supported_features": {
        "custom_language_model": true, 
        "speaker_labels": true
      }, 
      "name": "en-US_BroadbandModel"
    }, 
    {
      "description": "UK English narrowband model.", 
      "language": "en-UK", 
      "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/en-UK_NarrowbandModel", 
      "rate": 8000, 
      "supported_features": {
        "custom_language_model": false, 
        "speaker_labels": false
      }, 
      "name": "en-UK_NarrowbandModel"
    }
  ]
}

name:用作呼叫服务中的标识符的模型的名称(例如,en-US_BroadbandModel)。
language:模型的语言标识符(例如,en-US)。
description:models的简要描述。
url:model的url
rate:被模型使用的采样率(音频的最小可接受速率)。
supported_features:一个SupportedFeatures对象,描述模型支持的附加服务功能。

2、get_model()方法

检索有关可用于服务的单个指定型号的信息。 信息包括模型的名称和赫兹的最小采样率等等。
实现代码如下:

# encoding: UTF-8
import json
from os.path import join,dirname
from watson_developer_cloud import SpeechToTextV1
speech_to_text = SpeechToTextV1(
    username='6da692a0-6e00-4caf-9e4d-038c4d519cd6',
    password='6girOpb1GJPN',
    x_watson_learning_opt_out=False
)
print(json.dumps(speech_to_text.get_model('zindent=2))

结果显示:

{
  "name": "zh-CN_BroadbandModel", 
  "language": "zh-CN", 
  "sessions": "https://stream.watsonplatform.net/speech-to-text/api/v1/sessions?model=zh-CN_BroadbandModel", 
  "url": "https://stream.watsonplatform.net/speech-to-text/api/v1/models/zh-CN_BroadbandModel", 
  "rate": 16000, 
  "supported_features": {
    "custom_language_model": false, 
    "speaker_labels": false
  }, 
  "description": "Mandarin broadband model."
}

显示zh-CN_BroadbandModel模式相关信息。

3、recognize()方法

将录制的语音进行转换。实现Speech To Text。
代码如下:

# encoding: UTF-8
import json
from os.path import join,dirname
from watson_developer_cloud import SpeechToTextV1
speech_to_text = SpeechToTextV1(
    username='6da692a0-6e00-4caf-9e4d-038c4d519cd6',
    password='6girOpb1GJPN',
    x_watson_learning_opt_out=False
)

with open('guashi.wav','rb')as audio_file:
    print (json.dumps(speech_to_text.recognize(
        audio_file,content_type='audio/wav',timestamps=True,model='zh-CN_NarrowbandModel',word_confidence=True
    ),indent=2,encoding='UTF-8',ensure_ascii=False))
`
guashi.wav文件是录制的”我银行卡丢了,我要办理挂失“

输出结果:
``` python
{
  "results": [
    {
      "alternatives": [
        {
          "word_confidence": [
            [
              "我", 
              0.259
            ], 
            [
              "银行卡", 
              0.685
            ], 
            [
              "丢了", 
              0.561
            ], 
            [
              "我要", 
              0.211
            ], 
            [
              "办理", 
              1.0
            ], 
            [
              "挂失", 
              1.0
            ]
          ], 
          "confidence": 0.668, 
          "transcript": "我 银行卡 丢了 我要 办理 挂失 ", 
          "timestamps": [
            [
              "我", 
              1.01, 
              1.29
            ], 
            [
              "银行卡", 
              1.29, 
              2.03
            ], 
            [
              "丢了", 
              2.03, 
              2.49
            ], 
            [
              "我要", 
              2.49, 
              2.89
            ], 
            [
              "办理", 
              2.89, 
              3.22
            ], 
            [
              "挂失", 
              3.22, 
              3.9
            ]
          ]
        }
      ], 
      "final": true
    }
  ], 
  "result_index": 0, 
  "warnings": [
    "Unknown arguments: continuous."
  ]
}

其中
alternatives:一组WordAlternativeResult对象,为输入音频中的单词提供单词替代假设。
word_confidence:作为列表的列表的抄本的每个单词的置信度得分。 每个内部列表由两个元素组成:单词和其置信度分数在0到1的范围内。
confidence:关键字的置信度匹配在0到1的范围内。
transcript:翻译成文本的抄本。
timestamps:指示是否为每个单词返回时间对齐。 默步改变字和扬声器标签结果。 值为true表示服务保证不向当前或任何先前结果发送任何进一步的更新; false表示服务可能会向结果发送更多更新。
warnings:关于请求中包含的无效参数的警告消息数组。 每个警告包括描述性消息和无效参数字符串列表。

你可能感兴趣的