In part 2 about speech recognition we do the reverse, instead od openly calling Google Search or integrating the speech recognition intent (prominently showing the Google logo/splash screen) we call the same recognition method in a headless way, making the experience more seamless. To make it menaingful in the aviation context, we will request BA flight information (through IAG Webservices) through audio.
(Please note, the below code snippets are incomplete and only highlighting the key methods, the complete sourcecode you find at Github, see link at the end of the post.)
Calling the speech recognizer
sr = SpeechRecognizer.createSpeechRecognizer(this); sr.setRecognitionListener(new listener()); Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH); intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,RecognizerIntent.LANGUAGE_MODEL_FREE_FORM); intent.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS,10); sr.startListening(intent);
Implementation of the speech recognition class
class listener implements RecognitionListener { public void onResults(Bundle results) { ArrayList data = results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION); StringBuffer result = new StringBuffer(); for (int i = 0; i < data.size(); i++) { Log.d(TAG, "result " + data.get(i)); result.append(i + ":" + data.get(i) + "\n"); } textViewResult.setText(result); } }
With this approach we get the usual 2 acoustic sounds signalling the begin of the speech recognition phase and the end (after a short time out).
If we need to create a hands-free user experience, avoiding the user to touch the screen, we will make use of the play or call button that you usually find on headsets. We can capture the event that gets fired when pressing these hardware buttons too.
Capture hardware button events
@Override public boolean onKeyDown(int keyCode, KeyEvent event) { Log.v(TAG, event.toString()); if(keyCode == KeyEvent.KEYCODE_HEADSETHOOK || keyCode==KeyEvent.KEYCODE_MEDIA_PLAY){ Log.i(TAG, event.toString()); listenHeadless(); return true; } return super.onKeyDown(keyCode, event); }
Text-To-Speech
The missing link to the hands-in-the-pocket experience is the audio response by the app through our headset. We will add the standard Android Text-To-Speech (TTS) implementation.
textToSpeech = new TextToSpeech(getApplicationContext(),null); ... textToSpeech.setPitch(1.0f); textToSpeech.setSpeechRate(0.80f); int speechStatus = textToSpeech.speak(textToSpeak, TextToSpeech.QUEUE_FLUSH, null, null); if (speechStatus == TextToSpeech.ERROR) { Log.e("TTS", "Error in converting Text to Speech!"); }
A remark about text to speech engines, the default speech engines are not very pleasant to listen to, for an application speaking to users repeatedly or over a long time I recommend to look for commercial TTS engines (eg. Nuance, Cereproc,..). Check out the sound samples at the end of the post. TTS engines are usually produced per language. For a multi-lingual application you need to cater for every language.
Adding Webservice
To make this sample app more business relevant (airport related), we use the spoken text to request for flight data. For this purpose I will reuse the BA webservice call that was used in an earlier post. Do not forget to add permission for internet access to our app.
We straight away hit the first challenge here, we receive a text version of the spoken request. As we wont implement a semantic or contextual speech recognition we have to apply regular expression in order to attempt to find the key elements such as fligh number and date. The human requests the information in dozens of possible variations plus the varying interpretations by the speech-to-text engine, some listed below in the regex test screen (link).
To keep it simple we allow the user to request flight info for one particular British Airways flight number between yesterday and tomorrow. We will look for a 2 character 1..4 number combination in the text, plus using the keyword yesterday/tomorrow/today (or no statement representing today).
To push the scenario further we can let the TTS engine read the response to the user, after selecting relevant information from the JSON formatted WS response.