JP7192561B2

JP7192561B2 - Audio output device and audio output method

Info

Publication number: JP7192561B2
Application number: JP2019028487A
Authority: JP
Inventors: 和也西村; 義博大栄; 直貴上野山; 博文神丸
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2019-02-20
Filing date: 2019-02-20
Publication date: 2022-12-20
Anticipated expiration: 2039-02-20
Also published as: US11295742B2; US20200265837A1; JP2020134328A

Description

本発明は、音声を出力する音声出力装置および音声出力方法に関する。 The present invention relates to an audio output device and an audio output method for outputting audio.

近年、車両の走行案内を行うナビゲーション装置が多くの車両に搭載されている。特許文献１は、運転者の医療情報を取得し、運転者の聴力が低下している場合には、ナビゲーション装置の案内時の音声の音量を通常より大きくする技術を開示する。 2. Description of the Related Art In recent years, many vehicles are equipped with navigation devices that provide vehicle travel guidance. Patent Literature 1 discloses a technique of acquiring medical information of a driver and increasing the sound volume of a guidance voice of a navigation device when the driver's hearing is impaired.

特開２００９－２５４５４４号公報JP 2009-254544 A

車室内外の雑音が大きい場合、ナビゲーション装置の音声が乗員に聞こえ難いことがある。また、ナビゲーション装置の経路案内の音声に経路案内地点の目印の情報が含まれる場合、その目印を乗員が視認できないこともある。これらの場合、乗員は音声の内容を十分に理解することが困難である。そこで、ナビゲーション装置の音声の内容を乗員が理解しにくい場合、理解しやすい音声を出力することが望まれる。 When there is a lot of noise inside and outside the vehicle, it may be difficult for the passenger to hear the voice of the navigation device. In addition, when the voice of the route guidance of the navigation device includes the information of the landmark of the route guidance point, the passenger may not be able to visually recognize the landmark. In these cases, it is difficult for the passenger to fully understand the content of the voice. Therefore, when it is difficult for the passenger to understand the contents of the voice of the navigation device, it is desirable to output an easy-to-understand voice.

本発明はこうした状況に鑑みてなされたものであり、その目的は、出力された音声の内容を乗員が理解しにくい場合、理解しやすい音声を出力できる音声出力装置および音声出力方法を提供することにある。 SUMMARY OF THE INVENTION The present invention has been made in view of such circumstances, and its object is to provide an audio output device and an audio output method capable of outputting an easy-to-understand audio when the content of the output audio is difficult for a passenger to understand. It is in.

上記課題を解決するために、本発明のある態様の音声出力装置は、車両の乗員の発話を取得する取得部と、取得された前記発話が聞き返しであるか否か判定する判定部と、前記発話が聞き返しであると判定された場合、聞き返しのタイプを分類する分類部と、聞き返しの対象となる音声の内容にもとづいて、分類された聞き返しのタイプに応じた音声を出力する出力部と、車室内の画像を画像認識して、眠っている可能性のある乗員を検出する画像認識部と、を備える。前記出力部は、聞き返しのタイプが聞き逃しを示すタイプである場合、眠っている可能性のある乗員が前記画像認識部で検出されなければ、聞き返しの対象となる音声をより大きい音量で再出力し、聞き返しのタイプが聞き逃しを示すタイプである場合、眠っている可能性のある乗員が前記画像認識部で検出されれば、音量を維持して、聞き返しの対象となる音声を再出力する。 In order to solve the above-described problems, an audio output device according to one aspect of the present invention includes an acquisition unit that acquires an utterance of a vehicle occupant, a determination unit that determines whether the acquired utterance is a reflection, and a classification unit that classifies the type of reflection when the utterance is determined to be a reflection; an output unit that outputs speech according to the classified reflection type based on the content of the speech to be reflected; an image recognition unit that recognizes an image in the vehicle interior to detect a possibly sleeping passenger . The output unit re-outputs the voice to be listened to again at a higher volume if the image recognition unit does not detect the possibly sleeping occupant in the case where the type of feedback is a type indicating missed hearing. However, if the type of feedback is a type that indicates a missed hearing, and the image recognition unit detects a possibly sleeping passenger, the volume is maintained and the voice to be reviewed is re-output. .

この態様によると、乗員の聞き返しのタイプを分類し、聞き返しの対象となる音声の内容にもとづいて、分類された聞き返しのタイプに応じた音声を出力するので、音声出力装置の音声の内容を乗員が理解しにくく、聞き返しが行われた場合、理解しやすい音声を出力できる。 According to this aspect, the feedback type of the passenger is classified, and the sound corresponding to the classified feedback type is output based on the content of the speech to be asked back. is difficult to comprehend, and when the feedback is repeated, an easy-to-understand voice can be output.

前記音声出力装置は、聞き返しの直前に前記出力部から出力された音声にもとづいて、聞き返しの対象となる音声の内容を特定する特定部を備えてもよい。 The voice output device may include a specifying unit that specifies the contents of the voice to be listened back to based on the voice output from the output unit immediately before the feedback.

前記出力部は、聞き返しのタイプが音声の内容の意味を理解していないことを示すタイプである場合、聞き返しの対象となる音声の内容に関連した別の音声を出力してもよい。 The output unit may output another voice related to the content of the voice to be reflected when the feedback type is a type indicating that the meaning of the content of the voice is not understood.

前記出力部は、聞き返しのタイプが聞き逃しを示すタイプである場合、聞き返しの対象となる音声を再出力してもよい。 The output unit may re-output the speech to be listened back to when the feedback type is a type indicating a missed listening.

前記出力部は、聞き返しのタイプが聞き取れないことを示すタイプである場合、聞き返しの対象となる音声をより大きい音量で再出力してもよい。 The output unit may re-output the speech to be listened to again at a higher volume when the feedback type is a type indicating that the speech cannot be heard.

本発明の別の態様は、音声出力方法である。この方法は、コンピュータが実行する音声出力方法であって、車両の乗員の発話を取得する取得ステップと、取得された前記発話が聞き返しであるか否か判定する判定ステップと、前記発話が聞き返しであると判定された場合、聞き返しのタイプを分類する分類ステップと、車室内の画像を画像認識して、眠っている可能性のある乗員を検出する画像認識ステップと、聞き返しの対象となる音声の内容にもとづいて、分類された聞き返しのタイプに応じた音声を出力する出力ステップと、を備える。前記出力ステップでは、聞き返しのタイプが聞き逃しを示すタイプである場合、眠っている可能性のある乗員が前記画像認識ステップで検出されなければ、聞き返しの対象となる音声をより大きい音量で再出力し、聞き返しのタイプが聞き逃しを示すタイプである場合、眠っている可能性のある乗員が前記画像認識ステップで検出されれば、音量を維持して、聞き返しの対象となる音声を再出力する。 Another aspect of the present invention is an audio output method. This method is a voice output method executed by a computer, and includes an acquisition step of acquiring an utterance of a vehicle occupant, a determination step of determining whether or not the acquired utterance is a reflection, and a step of determining whether the utterance is a reflection. If it is determined that there is, a classification step of classifying the type of reflection, an image recognition step of recognizing the image in the vehicle interior to detect possible sleeping occupants, and a speech recognition target for reflection. and an output step of outputting speech according to the type of categorized reflection based on the content. In the output step, when the type of feedback is a type indicating a missed hearing, the voice to be reviewed is re-output at a higher volume if the passenger who may be asleep is not detected in the image recognition step. However, if the feedback type is a type that indicates a missed hearing, and if a possibly sleeping occupant is detected in the image recognition step, the volume is maintained and the voice to be reviewed is re-output. .

本発明によれば、出力された音声の内容を乗員が理解しにくい場合、理解しやすい音声を出力できる。 ADVANTAGE OF THE INVENTION According to this invention, when it is difficult for a passenger to understand the content of the output voice, it is possible to output an easy-to-understand voice.

実施の形態に係るナビゲーション装置のブロック図である。1 is a block diagram of a navigation device according to an embodiment; FIG. 図１の音声出力装置の音声出力処理を示すフローチャートである。2 is a flowchart showing audio output processing of the audio output device of FIG. 1;

図１は、実施の形態に係るナビゲーション装置１０のブロック図である。ナビゲーション装置１０は、自動車である車両に搭載される。ナビゲーション装置１０は、マイク１２と、スピーカ１４と、ナビゲーション部１６と、音声出力装置１８とを備える。 FIG. 1 is a block diagram of a navigation device 10 according to an embodiment. The navigation device 10 is mounted on a vehicle, which is an automobile. The navigation device 10 includes a microphone 12 , a speaker 14 , a navigation section 16 and an audio output device 18 .

マイク１２は、車両の車室内に設置され、乗員の発話などの車室内の音声を音声信号に変換し、変換された音声信号を音声出力装置１８に出力する。スピーカ１４は、車両の車室内に設置され、音声出力装置１８から出力された音声信号を音声に変換し、その音声を出力する。 The microphone 12 is installed in the vehicle interior of the vehicle, converts the voice in the vehicle interior such as the utterance of the passenger into an audio signal, and outputs the converted audio signal to the audio output device 18 . The speaker 14 is installed in the passenger compartment of the vehicle, converts an audio signal output from the audio output device 18 into audio, and outputs the audio.

ナビゲーション部１６は、周知の技術を用いて経路案内用の案内経路を設定し、案内経路と地図を図示しない表示部に表示させ、走行案内用の音声を音声出力装置１８に出力させ、案内経路に沿って走行案内を行う。走行案内用の音声は、交差点などの走行案内すべき地点の目印の情報を含む。ナビゲーション部１６は、車両の位置が案内経路上の走行案内すべき地点に達した場合、音声により、たとえば「まもなく右方向です。コンビニエンスストアＡＢＣが目印です」という走行案内を行う。ナビゲーション部１６は、走行経路の渋滞情報や工事情報、目的地の天気予報、現在地付近の施設情報など、ドライバの利便性を向上するための各種情報の音声を音声出力装置１８に出力させてもよい。 The navigation unit 16 sets a guidance route for route guidance using a well-known technology, displays the guidance route and a map on a display unit (not shown), outputs voice for driving guidance to the voice output device 18, and outputs the guidance route. Follow the route guidance. The voice for travel guidance includes information on landmarks such as intersections where travel guidance should be provided. When the position of the vehicle reaches the point on the guide route to which the travel guidance should be given, the navigation unit 16 provides travel guidance by voice, for example, "You are about to turn right. Convenience store ABC is a landmark." The navigation unit 16 may cause the audio output device 18 to output audio of various information for improving the driver's convenience, such as information on traffic jams and construction work on the travel route, weather forecast for the destination, and information on facilities near the current location. good.

音声出力装置１８は、処理部２０および記憶部２２を備える。処理部２０は、取得部３０、判定部３２、分類部３４、特定部３６および出力部３８を備える。処理部２０の構成は、ハードウエア的には、任意のコンピュータのＣＰＵ、メモリ、その他のＬＳＩで実現でき、ソフトウエア的にはメモリにロードされたプログラムなどによって実現されるが、ここではそれらの連携によって実現される機能ブロックを描いている。したがって、これらの機能ブロックがハードウエアのみ、ソフトウエアのみ、またはそれらの組合せによっていろいろな形で実現できることは、当業者には理解されるところである。 The audio output device 18 includes a processing section 20 and a storage section 22 . The processing unit 20 includes an acquisition unit 30 , a determination unit 32 , a classification unit 34 , an identification unit 36 and an output unit 38 . The configuration of the processing unit 20 can be implemented in terms of hardware by a CPU, memory, and other LSIs of any computer, and in terms of software is implemented by a program or the like loaded into the memory. It depicts the functional blocks realized by cooperation. Therefore, those skilled in the art will understand that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof.

出力部３８は、ナビゲーション部１６から供給された走行案内用などの音声情報にもとづいて、スピーカ１４などを介して音声を出力する。 The output unit 38 outputs audio through the speaker 14 or the like based on the audio information for driving guidance supplied from the navigation unit 16 .

取得部３０は、マイク１２を介して車両の乗員の発話を取得する。乗員は、ナビゲーション装置１０による走行案内用の音声などを聞き取れなかった場合や、その音声の内容を理解できなかった場合、「え？」などの発話、すなわち聞き返しを行うことがある。取得部３０は、マイク１２から出力された音声信号にもとづいて乗員の発話を音声認識し、その発話をテキストデータとして取得し、発話のテキストデータを判定部３２と分類部３４に供給する。 Acquisition unit 30 acquires the speech of the vehicle occupant via microphone 12 . When the passenger cannot hear the voice for driving guidance by the navigation device 10 or cannot understand the contents of the voice, he/she may utter an utterance such as "Huh?" Acquisition unit 30 recognizes the utterance of the passenger based on the audio signal output from microphone 12 , acquires the utterance as text data, and supplies the text data of the utterance to determination unit 32 and classification unit 34 .

記憶部２２は、複数の聞き返しのテキストデータをデータベースとして予め保持している。判定部３２は、記憶部２２のデータベースを参照し、取得部３０で取得された発話が聞き返しであるか否か判定し、判定結果を分類部３４と特定部３６に供給する。判定部３２は、発話のテキストデータがデータベースの聞き返しのテキストデータに一致する場合、発話が聞き返しであると判定する。判定部３２は、発話のテキストデータがデータベースの聞き返しのテキストデータに一致しない場合、発話が聞き返しではないと判定する。判定部３２は、発話のテキストデータがデータベースの聞き返しのテキストデータに一致し、かつ、発話の語尾が上がる場合、発話が聞き返しであると判定してもよい。これにより、判定精度を高めうる。 The storage unit 22 prestores text data of a plurality of reflections as a database. The determination unit 32 refers to the database of the storage unit 22 , determines whether or not the utterance acquired by the acquisition unit 30 is a reflection, and supplies the determination result to the classification unit 34 and the identification unit 36 . If the text data of the speech matches the text data of the reflection in the database, the determination unit 32 determines that the speech is reflection. If the text data of the speech does not match the text data of the reflection in the database, the determination unit 32 determines that the speech is not the reflection. If the text data of the utterance matches the text data of the reflection in the database and the ending of the utterance rises, the determination unit 32 may determine that the utterance is reflection. This can improve the determination accuracy.

特定部３６は、発話が聞き返しであると判定された場合、聞き返しの直前に出力部３８から出力された音声にもとづいて、聞き返しの対象となる音声の内容を特定し、特定した音声の内容を出力部３８に供給する。これにより、乗員がどの音声の内容に対して聞き返しを行ったか正しく特定しやすい。 When the utterance is determined to be a reflection, the identification unit 36 identifies the content of the speech to be reflected based on the sound output from the output unit 38 immediately before the reflection, and confirms the content of the identified speech. It is supplied to the output section 38 . As a result, it is easy to correctly specify which voice content the passenger listened to again.

分類部３４は、発話が聞き返しであると判定された場合、聞き返しのタイプを分類し、分類した聞き返しのタイプを出力部３８に供給する。聞き返しのタイプは、音声の内容の意味を理解していないことを示す第１のタイプ、聞き逃しを示す第２のタイプ、および、聞き取れないことを示す第３のタイプを含む。 When the utterance is determined to be a reflection, the classification unit 34 classifies the reflection type and supplies the classified reflection type to the output unit 38 . Reflection types include a first type indicating that the meaning of the content of the speech is not understood, a second type indicating that the listener did not hear the content, and a third type indicating that the listener did not hear the content.

記憶部２２は、複数の聞き返しのテキストデータのそれぞれと、聞き返しのタイプとの対応関係もデータベースとして予め保持している。聞き返しと、聞き返しのタイプは、１対１に対応付けられている。たとえば、「どこ？」、「どれ？」などの聞き返しは第１のタイプに対応付けられている。「え？なんていった？」などの聞き返しは第２のタイプに対応付けられている。「なになに？聞きづらい」などの聞き返しは第３のタイプに対応付けられている。第１から第３のタイプのいずれにも当てはまる可能性があり、タイプを分類しにくい「なに？」などの聞き返しは、第２のタイプに対応付けられている。また、第１のタイプと第３のタイプのどちらにも当てはまらない聞き返しは、第２のタイプに対応付けられている。 The storage unit 22 also pre-stores as a database the correspondence relationship between each of the plurality of text data of reflection and the type of reflection. There is a one-to-one correspondence between reflections and types of reflections. For example, reflections such as "Where?", "Which?" are associated with the first type. Reflections such as "Huh? What did you say?" are associated with the second type. Reflections such as "What? It's hard to hear" are associated with the third type. Reflections such as "What?", which may apply to any of the first to third types and are difficult to classify, are associated with the second type. Reflections that do not fit into either the first type or the third type are associated with the second type.

分類部３４は、記憶部２２のデータベースを参照して、発話のテキストデータに一致する聞き返しのテキストデータのタイプを特定し、特定したタイプを聞き返しのタイプとする。 The classification unit 34 refers to the database of the storage unit 22 to identify the type of reflection text data that matches the text data of the utterance, and sets the identified type as the reflection type.

出力部３８は、特定部３６から供給された聞き返しの対象となる音声の内容にもとづいて、分類部３４で分類された聞き返しのタイプに応じた音声をスピーカ１４などを介して出力する。 The output unit 38 outputs, through the speaker 14 or the like, a sound according to the type of reflection classified by the classification unit 34 based on the content of the reflection target speech supplied from the specifying unit 36 .

出力部３８は、聞き返しのタイプが第１のタイプである場合、聞き返しの対象となる音声の内容に関連した別の音声を出力する。記憶部２２は、聞き返しの対象となる音声の内容ごとに、その音声の内容に関連付けられた１以上の別の音声データをデータベースとして予め保持している。たとえば、聞き返しの対象となる音声の内容が「コンビニエンスストアＡＢＣが目印です」である場合、「赤い看板が目印です」、「赤い建物が目印です」などの聞き返しの対象となる音声の内容を言い換える音声データが関連付けられている。つまりこの例では、「コンビニエンスストアＡＢＣ」の看板と建物は赤色であるとする。乗員は、コンビニエンスストアＡＢＣという文字を視認できず、その看板や建物の色を知らないないなどの理由で「コンビニエンスストアＡＢＣが目印です」という音声の意味を理解できなかった場合、意味を理解できなかった音声の内容に関連した「赤い看板が目印です」などの音声を聞くことができ、その内容を理解できる可能性がある。 When the feedback type is the first type, the output unit 38 outputs another voice related to the content of the voice to be reviewed. The storage unit 22 preliminarily stores, as a database, one or more pieces of separate speech data associated with the speech content for each speech content to be listened back to. For example, if the content of the voice to be reflected is "Convenience store ABC is a landmark", rephrase the content of the voice to be reviewed such as "The red sign is the landmark" or "The red building is the landmark". Audio data is associated. In other words, in this example, the signboard and building of "Convenience Store ABC" are assumed to be red. If the passenger cannot see the letters "Convenience store ABC" and cannot understand the meaning of the voice saying "Convenience store ABC is a landmark" because they do not know the signboard or the color of the building, they will not be able to understand the meaning. It is possible to hear a voice such as "The red sign is a landmark" related to the content of the voice that was not heard, and it is possible that the content can be understood.

出力部３８は、聞き返しのタイプが第２のタイプである場合、聞き返しの対象となる音声を再出力する。これにより、乗員は、音声を聞き逃した場合、聞き逃した音声を再度聞くことができ、その内容を把握しやすい。タイプを分類しにくい聞き返し、第１のタイプと第３のタイプのどちらにも当てはまらない聞き返しの場合にも音声が再出力されるので、意図を特定しにくい聞き返しの場合にも、音声の内容を乗員に理解させることができる可能性がある。 If the type of reflection is the second type, the output unit 38 re-outputs the speech that is the object of reflection. As a result, when the passenger misses hearing the voice, the passenger can listen to the missed voice again, and can easily grasp the contents of the voice. The voice is re-output even in the case of reflections that are difficult to classify, and reflections that do not fall into either the first or third type. It may be possible to make the crew understand.

出力部３８は、聞き返しのタイプが第３のタイプである場合、聞き返しの対象となる音声をより大きい音量で再出力する。これにより、乗員は、周囲の雑音の影響や自身の聴力の低さなどのために聞き取れなかった音声をより聞き取りやすい音量で再度聞くことができ、その内容を把握しやすい。 When the type of reflection is the third type, the output unit 38 re-outputs the speech to be the object of reflection with a higher volume. As a result, the occupant can hear the voice again at a volume that is easier to hear, and can easily grasp the content of the voice, which the occupant could not hear due to the influence of ambient noise or his/her own hearing loss.

次に、以上の構成による音声出力装置１８の全体的な動作を説明する。図２は、図１の音声出力装置１８の音声出力処理を示すフローチャートである。図２の処理は、繰り返し実行される。 Next, the overall operation of the audio output device 18 configured as above will be described. FIG. 2 is a flowchart showing audio output processing of the audio output device 18 of FIG. The processing in FIG. 2 is repeatedly executed.

取得部３０が乗員の発話を取得していない場合（Ｓ１０のＮ）、ステップＳ１０で待機する。発話を取得した場合（Ｓ１０のＹ）、発話が聞き返しでなければ（Ｓ１２のＮ）、ステップＳ１０に戻る。発話が聞き返しである場合（Ｓ１２のＹ）、特定部３６は聞き返しの対象となる音声の内容を特定し（Ｓ１４）、分類部３４は聞き返しのタイプを分類し（Ｓ１６）、出力部３８は聞き返しのタイプを確認する（Ｓ１８）。 If the acquisition unit 30 has not acquired the utterance of the passenger (N in S10), the process waits in step S10. If an utterance is acquired (Y of S10), if the utterance is not a repeat (N of S12), the process returns to step S10. If the utterance is a reflection (Y of S12), the identification unit 36 identifies the content of the speech to be reflected (S14), the classification unit 34 classifies the type of reflection (S16), and the output unit 38 repeats the reflection. type is confirmed (S18).

出力部３８は、聞き返しのタイプが第１のタイプである場合、聞き返しの対象となる音声の内容に関連した別の音声を出力し（Ｓ２０）、処理を終了する。出力部３８は、聞き返しのタイプが第２のタイプである場合、聞き返しの対象となる音声を再出力し（Ｓ２２）、処理を終了する。出力部３８は、聞き返しのタイプが第３のタイプである場合、聞き返しの対象となる音声をより大きい音量で再出力し（Ｓ２４）、処理を終了する。 If the feedback type is the first type, the output unit 38 outputs another voice related to the content of the voice to be reviewed (S20), and ends the process. If the feedback type is the second type, the output unit 38 re-outputs the speech to be reviewed (S22), and ends the process. If the feedback type is the third type, the output unit 38 re-outputs the voice to be reviewed at a higher volume (S24), and ends the process.

本実施の形態によれば、音声出力装置１８の音声の内容を乗員が理解しにくく、聞き返しが行われた場合、理解しやすい音声を出力できる。また、聞き返しのタイプを分類し、分類された聞き返しのタイプに応じた音声を出力するので、複数の聞き返しのそれぞれに対して出力用の音声を生成するよりも、構成を簡素化できる。そのため、コストの増加を抑制でき、車載用途に適したナビゲーション装置１０を提供できる。 According to this embodiment, it is difficult for the occupant to understand the content of the voice output from the voice output device 18, and when the passenger repeats the voice, an easy-to-understand voice can be output. In addition, since the type of reflection is classified and the sound corresponding to the classified type of reflection is output, the configuration can be simpler than generating the output sound for each of a plurality of reflections. Therefore, an increase in cost can be suppressed, and the navigation device 10 suitable for in-vehicle use can be provided.

以上、実施の形態をもとに本発明を説明した。実施の形態はあくまでも例示であり、各構成要素や各処理プロセスの組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described above based on the embodiments. It should be understood by those skilled in the art that the embodiment is merely an example, and that various modifications are possible in combination of each component and each treatment process, and that such modifications are within the scope of the present invention.

たとえば、音声出力装置１８は、車室内のカメラで撮影された車室内の画像を画像認識して、眠っている可能性のある乗員を検出する画像認識部を備えてもよい。画像認識には、周知の技術を用いることができる。出力部３８は、聞き返しのタイプが第２のタイプである場合、眠っている可能性のある乗員が画像認識部で検出されなければ、聞き返しの対象となる音声をより大きい音量で再出力してもよい。これにより、音量を維持して音声を再出力する場合よりも、聞き逃した音声の内容を乗員に把握させやすい。一方、出力部３８は、聞き返しのタイプが第２のタイプである場合、眠っている可能性のある乗員が検出されれば、音量を維持して、聞き返しの対象となる音声を再出力してもよい。これにより、眠っている乗員に配慮できる。 For example, the audio output device 18 may include an image recognition unit that recognizes an image of the interior of the vehicle captured by a camera in the vehicle to detect a possibly sleeping passenger. A well-known technique can be used for image recognition. When the feedback type is the second type, the output unit 38 re-outputs the voice to be reviewed at a higher volume if the image recognition unit does not detect the possible sleeping occupant. good too. This makes it easier for the occupant to grasp the content of the missed voice than when the voice is re-output while maintaining the volume. On the other hand, when the type of reflection is the second type, the output unit 38 maintains the volume and re-outputs the sound to be the object of reflection when a possibly sleeping passenger is detected. good too. This makes it possible to consider sleeping passengers.

実施の形態では、データベースを参照して、発話が聞き返しであるか否か判定し、聞き返しのタイプを分類したが、判定部３２と分類部３４は、発話内容の意図理解を行い、意図理解の結果に応じて、発話が聞き返しであるか否か判定し、聞き返しのタイプを分類してもよい。意図理解には周知の技術を用いることができる。この変形例では、音声出力装置１８の構成の自由度を向上できる。 In the embodiment, the database is referred to determine whether or not the utterance is a reflection, and the type of reflection is classified. Depending on the result, it may be determined whether or not the utterance is a reflection, and the type of reflection may be classified. A well-known technique can be used for intention understanding. In this modification, the degree of freedom in configuring the audio output device 18 can be improved.

１０…ナビゲーション装置、１８…音声出力装置、３０…取得部、３２…判定部、３４…分類部、３６…特定部、３８…出力部。 DESCRIPTION OF SYMBOLS 10... Navigation apparatus, 18... Voice output device, 30... Acquisition part, 32... Judgment part, 34... Classification part, 36... Identification part, 38... Output part.

Claims

an acquisition unit that acquires an utterance of an occupant of the vehicle;
a determination unit that determines whether the acquired utterance is a reflection;
a classification unit that classifies the type of reflection when the utterance is determined to be reflection;
an output unit for outputting speech according to the classified feedback type based on the content of the speech to be reviewed;
an image recognition unit that recognizes images in the vehicle interior and detects occupants who may be asleep;
with
The output unit re-outputs the voice to be listened to again at a higher volume if the image recognition unit does not detect the possibly sleeping occupant in the case where the type of feedback is a type indicating missed hearing. However, if the type of feedback is a type that indicates a missed hearing, and the image recognition unit detects a possibly sleeping passenger, the volume is maintained and the voice to be reviewed is re-output. ,
An audio output device characterized by:

2. The voice output device according to claim 1, further comprising a specifying unit that specifies the contents of the voice to be reflected based on the voice output from the output unit immediately before the feedback.

The output unit is characterized in that, when the type of reflection is a type indicating that the meaning of the content of the speech is not understood, another speech related to the content of the speech to be reflected is output. 3. The audio output device according to claim 1 or 2.

4. The output unit according to any one of claims 1 to 3 , wherein, when the type of feedback is a type indicating that the feedback cannot be heard, the output unit re-outputs the speech to be reviewed at a higher volume. audio output device.

A computer-implemented audio output method comprising:
an acquisition step of acquiring an utterance of an occupant of the vehicle;
a determination step of determining whether the acquired utterance is a reflection;
a classification step of classifying the type of reflection if the utterance is determined to be reflection;
an image recognition step of recognizing images in the vehicle interior to detect possible sleeping occupants;
an output step of outputting a speech corresponding to a type of categorized reflection based on the content of the speech targeted for reflection;
with
In the output step, when the type of feedback is a type indicating a missed hearing, the voice to be reviewed is re-output at a higher volume if the passenger who may be asleep is not detected in the image recognition step. However, if the feedback type is a type that indicates a missed hearing, and if a possibly sleeping occupant is detected in the image recognition step, the volume is maintained and the voice to be reviewed is re-output. ,
An audio output method characterized by: