software development: transcribe speech to text using public APIs

Cancelado Publicado hace 7 años Pagado a la entrega
Cancelado Pagado a la entrega

1. Task

You are asked to write a program that:

- takes an audio file as input,

- chops it into clips at sentence boundaries,

- sends these audio clips one by one to three different public speech recognition services,

- saves audio clips, together with their text transcribed by the above services, into MySQL database:

timestamp, length, audio clip, Google, Baidu, iFlyTek, flag

where:

timestamp (4-byte time): audio clip starting time in original audio file

length (4-byte integer): audio clip length in millisecond

audio clip (binary): 16-bit 16KHz single channel PCM

Google transcription (text): utf-8

Baidu transcription (text): utf-8

iFlyTek transcription (text): utf-8

flag (integer): 0 if all three transcriptions are the same, 1 if two matches, 2 if all different.

2. Audio Source

The audio could be in mp3/m4a/aac/ogg/wma format. It's extracted from youtube video. Our target is educational lectures.

One example is this youtube video: [login to view URL]

you can extract audio with [login to view URL]

the downloadable mp3 result is at [login to view URL]

You can use this for any YouTube content.

3. Audio Segmentation

If you view audio file with a tool (many out there), you will visually see separation between silences and voices. Some silences are merely word boundaries or even just syllable boundaries. The rule we ask to implement is, either the silence is enough long, or the "sentence" is already 7 seconds long. In the latter case we need to chop at a locally longest silence gap.

I see this sentence boundary identification as the most challenging one to those not familiar with audio signal processing. So I outline the logic above. Still, the next question is, how to really calculate "silence"?!

Please follow up with methods listed in this page: [login to view URL]

As one of this project acceptance criteria, we will randomly (use a random number generator on the Internet) select 50 audio clips, listen to them, and confirm the sentence boundary error rate is less than 5%.

4. Speech Recognition

The three speech recognition engines are:

Google:

[login to view URL]

Baidu:

A python wrap for Baidu Yuyin API

[login to view URL]

[login to view URL]

iflytek (Xunfei):

Integrate iflytek SDK to Implement Chinese Voice Recognition in AOSP [login to view URL]

Note, it is required to integrate with all above three speech recognition engines. That is, you need to do three integrations, each with its own complexities, such as applying for a free account and receiving tokens/keys.

For both Baidu and iFlyTek, you are encouraged to use Google Translate, as lots of content are in Chinese.

Both Google and Baidu are simple REST APIs, which allows you to implement in essentially any platform and language. But iFlyTek API is really an SDK. The best example I found is the above given Android version. So put together your only choice is Android application.

5. Implementation

We are open to suggestions. But given the above, we expect a pure Android APK implementation.

I will first push/copy several extracted/converted audio files into an Android phone or tablet, and then run your Android APK and get results in corresponding set of files, either in MySQL database or simply CSV format. I will then pull/copy these files back to my computer.

You shall provide a way for me to randomly go to a clip, play out its audio clip, and read the transcribed text, place it into, say, Google web service and see results.

Android Servicios de audio Desarrollo de software Análisis estadístico Servicios web

Nº del proyecto: #11391108

Sobre el proyecto

15 propuestas Proyecto remoto Activo hace 7 años

15 freelancers están ofertando un promedio de $582 por este trabajo

Shopify

I want to discuss this project with you further, let me know the best suitable time for you to schedule the meeting, Feel free to message me at any time, i used to be online 14 hrs in a day on this website so probably Más

$773 USD en 10 días
(11 comentarios)
6.5
rukeysolutions

do u have any api in mind to implement ?

$555 USD en 5 días
(8 comentarios)
4.6
sanjidayarathne

I am a person with strong Analytical ability in Mathematics / Statistics/Economics/Finance having BSc. (specialized in statistics), MBA (specialized in Finance), MSc. (specialized in Financial Mathematics). On time d Más

$250 USD en 10 días
(11 comentarios)
4.0
TenSidesDev

I'm interested, but no project description so I don't know what to write here. Message me back with info if you up for it. Cheers, Alek

$555 USD en 10 días
(1 comentario)
3.0
Walkingdreams

Hello, Professional developers with similar expertise here. We are posting our bid as an expression of interest and appreciate further discussion in private message board. We are waiting for your message to communi Más

$526 USD en 10 días
(1 comentario)
2.4
satish92

A proposal has not yet been provided

$500 USD en 8 días
(0 comentarios)
2.4