Playing with Google Speech API

My use case: transcribing 1-hour user interview audio into text.

Result: still looking for a solution


I used the Ruby .gem google-api-client to access the system.

I set GOOGLE_ACCOUNT_TYPE, GOOGLE_CLIENT_ID, GOOGLE_CLIENT_EMAIL, GOOGLE_PRIVATE_KEY env variables, based on the Service Account key I got from the Google Developer Console.

I used this code to test an API request:

require 'google/apis/speech_v1beta1'

audio_file_path = 'brooklyn.wav'
speech_service =

speech_service.authorization = Google::Auth.get_application_default(
  %[ ]

request =  = { 
request.config = {
  encoding: "LINEAR16", # or "FLAC"
  sample_rate: 16000 # or 44000

# Make the Async request
response = speech_service.async_recognize_speech request


# Then, get the result of the Async job
status = speech_service.get_operation

The result of status should be the transcription response from the Google Speech API which contains the transcribed text of the audio snippet uploaded.

Google Speech API is currently in beta, I expect it to have a focal use case, and yes, the sample sound of the Brooklyn bridge works well - a short, clear, concise snippet of audio. However, an open-ended ~40ish minute conversation submitted to the Speech API returned an array of possible one-word transcriptions - each sorta funny, but ultimately abysmally inaccurate.


A speech API. Amazing!

Not a transcription API, oh well.

About Afomi

Afomi is the digital sandbox of Ryan Wold, who is always evolving this to better share inspirations and aspirations.

About Ryan

Ryan is a systems-thinking Product Developer and Designer who practices agile, test-driven, and lean continuous software delivery, while solving problems with people.

Random Posts