Processing audio or speech data can be challenging in any environment. This project from Servian produced a store of text files from audio in an easy pipeline on Google Cloud Platform.

Servian was engaged to help create a Speech to Text pipeline for a full service, audio on demand company. The client helps creators and brands produce, host, share, track and monetise content. The speech to text pipeline created by Servian on GCP allowed for our client to expand both ability and capacity.

problems & pain points

A huge number of growing audio assets can be challenging to manage. Servian provided a solution that would use the Google Speech API to generate transcripts for automatic summaries or provide more detailed insights. These transcripts were useful both for tracking content for advertisement as well as for the audio creators.

Some of the challenges in this project were to implement the pipeline on a Google Cloud account belonging fully to Servians client with all IP belonging to them. It was additionally important to do periodic testing of statistically significant data samples as well as development of an API endpoint.

outcomes and results

The work undertaken by Servian on GCP helped our client to realize a fully implemented pipeline with inputs of URL to an audio file, expected languages and locale of the audio file. The system also had call back of URL after a transcription of an audio file is complete.

The API endpoint inside the system returned the job ID upon success or an error code indication cause of error, allowing for quality control among other things. The fully functional API endpoint on GCP provided a complete speech to text pipeline. The ability to turn speech into structured text files could allow for further processing, natural language algorithms and many other possibilities on Google Cloud Platform.

platform used

To achieve the outcomes required, the following GCP products were used: