CN113424255B

CN113424255B - Directing the vehicle client device to use the features on the device

Info

Publication number: CN113424255B
Application number: CN201980091340.0A
Authority: CN
Inventors: 维克拉姆·阿加尔瓦尔; 维诺德·克里希南
Original assignee: Google LLC
Current assignee: Google LLC
Priority date: 2019-02-12
Filing date: 2019-02-12
Publication date: 2024-10-01
Anticipated expiration: 2039-02-12
Also published as: KR20210110676A; JP2022519478A; US12175980B2; EP4339940A3; CN113424255A; US11727934B2; EP4339940A2; WO2020167294A1; JP2022185077A; US20220246152A1; JP7155439B2; US11315559B2; US20230343335A1; US20200342863A1; EP3891731B1; JP7412499B2; EP3891731A1; CN119252239A

Abstract

Implementations set forth herein relate to ensuring useful responsiveness of any version of a vehicle computing device that is still operating while the version of the vehicle computing device is phased out. Due to hardware limitations, certain features of the updated computing device may not be available to the previous version of the computing device. Implementations set forth herein eliminate crashes and wasteful data transfers caused by previous versions of computing devices that have not been upgraded or cannot be upgraded. The server device may respond to a particular intent request provided to the vehicle computing device, although the intent request is associated with an action that the particular version of the vehicle computing device cannot perform. In response, the server device may choose to provide voice-to-text data and/or natural language understanding data to facilitate allowing the vehicle computing device to continue to utilize resources at the server device.

Description

Direct the vehicle client device to use the features on the device

背景技术Background Art

人类可以参与与在本文被称为“自动化助理”(也被称为“数字代理”、“聊天程序”、“交互式个人助理”、“智能个人助理”、“助理应用”、“会话代理”等)的交互式软件应用的人机对话。例如，人类(当他们与自动化助理交互时，其可以被称为“用户”)可以使用口头自然语言输入(即，话语)和/或通过提供文本(例如，键入的)自然语言输入来向自动化助理提供命令和/或请求，其中口头自然语言输入在一些情况下可以被转换成文本并且然后被处理。Humans may engage in human-computer conversations with interactive software applications, referred to herein as “automated assistants” (also referred to as “digital agents,” “chat programs,” “interactive personal assistants,” “intelligent personal assistants,” “assistant applications,” “conversational agents,” etc.). For example, humans (who may be referred to as “users” when they interact with the automated assistant) may provide commands and/or requests to the automated assistant using spoken natural language input (i.e., utterances) and/or by providing textual (e.g., typed) natural language input, where the spoken natural language input may in some cases be converted to text and then processed.

自动化助理可以安装在各种不同的设备(诸如例如移动电话、智能家庭设备和/或车辆)处。与移动电话和其他计算设备不同，车辆通常可以在所有者最终决定购买替换车辆之前由相应所有者在延长的时间段(例如，十年或更多年)使用。在拥有车辆的这个时间段期间，安装在车辆处的软件可以经受更新。例如，可以向车辆计算设备提供更新，以便允许车辆计算设备响应于更新的智能家庭设备和/或更新的移动电话可以处理的命令。然而，用户可以选择不安装某些更新，导致车辆计算设备与远程服务器设备之间的不兼容性，车辆计算设备响应于命令与远程服务器设备交互。此外，在一段时间(例如，三年或更多年)之后，由于例如支持寿命的结束、车辆计算设备的硬件不能执行新的更新和/或其他因素，可能不再为车辆计算设备提供更新。这也可能导致车辆计算设备与远程服务器设备之间的不兼容。当车辆计算设备与远程服务器设备变得不兼容时，服务器设备可以响应于来自车辆计算设备的、不再能够被车辆计算设备解释的请求。这可能导致车辆计算设备未能适当地响应于各种命令，并且导致车辆计算设备浪费地将各种数据传送到服务器设备和/或服务器设备浪费地将各种数据传送到车辆计算设备(因为一些服务器设备响应将不再能够被车辆计算设备解释)。一些技术试图通过使服务器设备与最近的更新以及车辆计算设备的先前版本兼容来解决这个问题。然而，无限地提供这种向后兼容性可能需要在服务器设备处的大量存储、存储器和/或处理器使用。The automated assistant may be installed at a variety of different devices, such as, for example, mobile phones, smart home devices, and/or vehicles. Unlike mobile phones and other computing devices, vehicles may generally be used by respective owners for an extended period of time (e.g., ten or more years) before the owner ultimately decides to purchase a replacement vehicle. During this period of ownership of the vehicle, the software installed at the vehicle may be subject to updates. For example, updates may be provided to the vehicle computing device to allow the vehicle computing device to respond to commands that can be processed by updated smart home devices and/or updated mobile phones. However, a user may choose not to install certain updates, resulting in incompatibility between the vehicle computing device and a remote server device with which the vehicle computing device interacts in response to commands. In addition, after a period of time (e.g., three or more years), updates may no longer be provided to the vehicle computing device due to, for example, the end of a support life, the inability of the hardware of the vehicle computing device to execute new updates, and/or other factors. This may also result in incompatibility between the vehicle computing device and the remote server device. When the vehicle computing device and the remote server device become incompatible, the server device may respond to requests from the vehicle computing device that can no longer be interpreted by the vehicle computing device. This may result in the vehicle computing device failing to respond appropriately to various commands, and in the vehicle computing device wastefully transmitting various data to the server device and/or the server device wastefully transmitting various data to the vehicle computing device (because some server device responses will no longer be interpretable by the vehicle computing device). Some techniques attempt to address this issue by making the server device compatible with recent updates as well as previous versions of the vehicle computing device. However, providing such backward compatibility indefinitely may require significant storage, memory, and/or processor usage at the server device.

发明内容Summary of the invention

本文阐述的实现方式涉及用于处理在车辆计算设备处接收到的口头话语的技术，该技术—尽管包括可操作软件—对应于由服务器设备、和/或任何其他支持系统逐渐逐步淘汰的版本。逐渐逐步淘汰可以导致对硬件和/或软件版本的各种服务器操作的支持范围。这些服务器操作可以包括但不限于语音到文本处理、自然语言理解(例如，意图标识和/或槽值标识)、动作生成和/或动作执行。基于硬件和/或软件的版本相对于其他新发行的硬件和/或软件变得过时，服务器设备可以操作以随着时间逐渐逐步淘汰一个或多个操作的执行。结果，由于特定版本经历逐步淘汰，所以可以在本地专门执行用于这些特定版本的特定操作(诸如生成动作和/或槽值)。以此方式，通常比大多数其他计算设备(例如，车辆计算设备)操作更长时间的计算设备仍然可以在更长时间段内从服务器设备接收一定量的支持，尽管不对应于最新版本。The implementations described herein relate to techniques for processing spoken utterances received at a vehicle computing device, which techniques—although including operable software—correspond to versions gradually phased out by a server device, and/or any other support system. Gradually phasing out can result in a range of support for various server operations of hardware and/or software versions. These server operations may include, but are not limited to, speech-to-text processing, natural language understanding (e.g., intent identification and/or slot value identification), action generation and/or action execution. Based on the hardware and/or software version becoming obsolete relative to other newly released hardware and/or software, the server device may operate to gradually phase out the execution of one or more operations over time. As a result, since a specific version undergoes phased out, specific operations (such as generating actions and/or slot values) for these specific versions may be performed locally and specifically. In this way, a computing device that is typically operated for a longer period of time than most other computing devices (e.g., a vehicle computing device) can still receive a certain amount of support from a server device over a longer period of time, although not corresponding to the latest version.

例如，口头话语可以包括用户通常用于控制他们的设备中的另一个设备(诸如智能家庭设备和/或移动电话)的自然语言内容，并且自然语言内容可以指定所请求的意图。当车辆计算设备对应于服务器设备所支持的当前版本时，车辆计算设备可以将口头话语传送到服务器设备。作为响应，车辆计算设备可以在一个或多个传输中从服务器设备接收指定以下各项的数据：从口头话语转换的文本、基于文本确定的一个或多个意图、一个或多个相应意图的一个或多个槽值、和/或一个或多个意图的一个或多个动作。在一些实现方式中，用于意图和/或动作的槽可以指在根据该意图和/或该动作执行操作时引用的必需的或可选的参数。此外，槽值可以是被指配给意图和/或动作的特定槽的值。例如，诸如消息意图之类的意图可以包括用于指定消息的接收者的槽，并且槽值可以是名称(例如John Smith)和/或电话号码(例如555-555-1234)。For example, the spoken utterance may include natural language content that the user typically uses to control another device in their device (such as a smart home device and/or a mobile phone), and the natural language content may specify the requested intent. When the vehicle computing device corresponds to the current version supported by the server device, the vehicle computing device may transmit the spoken utterance to the server device. In response, the vehicle computing device may receive data specifying the following items from the server device in one or more transmissions: text converted from the spoken utterance, one or more intents determined based on the text, one or more slot values of one or more corresponding intents, and/or one or more actions of one or more intents. In some implementations, the slot for intent and/or action may refer to a required or optional parameter referenced when performing an operation according to the intent and/or the action. In addition, the slot value may be the value of a specific slot assigned to an intent and/or action. For example, an intent such as a message intent may include a slot for specifying a recipient of the message, and the slot value may be a name (e.g., John Smith) and/or a phone number (e.g., 555-555-1234).

如本文详细描述的，当车辆计算设备对应于当前版本时由服务器设备提供的特定数据可以取决于口头话语的类型，该口头话语的类型可以从口头话语的文本和/或基于该文本确定的意图导出。此外，响应于特定类型的口头话语提供哪些数据(如果有任何数据)可以随着时间动态地变化，因为车辆计算设备的版本不再是由服务器设备完全支持的当前版本。例如，如果车辆计算设备对应于不是最新版本的版本，则无论如何，服务器设备可以至少执行与口头话语对应的音频数据的语音到文本处理以生成文本数据。文本数据可以可选地提供给车辆计算设备并且由其使用。然而，服务器设备可以不提供任何进一步的数据，而是指示车辆计算设备依赖于其自身的本地引擎(例如，本地NLU引擎、本地动作引擎)来促进满足口头话语—这种做法与排他地向用户发出错误(例如，“Please update yourdevice(请更新你的设备)”)的方式相反。以此方式，尽管车辆计算设备可以具有将口头话语本地转换成文本数据的能力，但是车辆计算设备仍然可以利用服务器设备的处理能力来获得更好的语音到文本转换，尽管服务器不再完全支持车辆计算设备的版本。As described in detail herein, the specific data provided by the server device when the vehicle computing device corresponds to the current version can depend on the type of spoken utterance, which can be derived from the text of the spoken utterance and/or the intent determined based on the text. In addition, what data (if any data) is provided in response to a particular type of spoken utterance can change dynamically over time as the version of the vehicle computing device is no longer the current version fully supported by the server device. For example, if the vehicle computing device corresponds to a version that is not the latest version, the server device can at least perform speech-to-text processing of the audio data corresponding to the spoken utterance to generate text data anyway. The text data can optionally be provided to and used by the vehicle computing device. However, the server device can not provide any further data, but instead instruct the vehicle computing device to rely on its own local engine (e.g., local NLU engine, local action engine) to facilitate satisfying the spoken utterance - this approach is contrary to the way of exclusively issuing an error to the user (e.g., "Please update your device"). In this way, although the vehicle computing device may have the ability to locally convert spoken utterances into text data, the vehicle computing device may still leverage the processing power of the server device to obtain better speech-to-text conversion even though the server no longer fully supports the version of the vehicle computing device.

在一些实现方式中，服务器设备可以将文本数据连同用于车辆计算设备的不再将类似意图请求传送到服务器设备的指令一起传送到车辆计算设备。例如，车辆可以包括子系统，其可以经由用于操作车辆计算设备的第二版本的用户的自动化助理来控制，但不可以经由用于操作车辆计算设备的第一版本的用户的自动化助理来控制。子系统可以是但不限于连接到车辆计算设备的流体传感器的网络。用户可以通过提供诸如“Assistant,sendthe body shop my fluid sensor readings(助理，向车身修理厂发送我的流体传感器读数)”等口头话语来请求车辆计算设备实现与三个流体传感器的网络相关联的特定意图。服务器设备可以处理与口头话语相对应的音频数据，并且基于音频数据和版本信息来确定与仅由车辆计算设备的较晚版本(例如，第二版本)支持的意图相对应的动作(例如，从四个流体传感器检索数据)。例如，由车辆计算设备的最新版本支持的动作可以包括用于对来自四个不同流体传感器的输出信号进行采样的动作语法。因此，如果仅具有三个流体传感器的车辆计算设备接收到前述动作，则动作的语法可以使得车辆计算设备失败或者以其他方式执行无关紧要的任务以促进执行不正确的动作。响应于服务器设备从车辆计算设备接收音频数据和版本信息，服务器设备可以生成用于提供给车辆计算设备的指令和/或数据，以便使得车辆计算设备不再向服务器设备传送与相似类型的意图请求相对应的音频数据。相反，使用这些指令，当车辆计算设备确定用户正在请求相似类型的意图请求时，车辆计算设备可以本地生成针对所请求的意图的动作数据。这可以保留计算资源和网络资源，因为服务器设备将不再处理意图请求的音频数据，该意图请求与某些车辆计算设备由于硬件和/或软件版本限制而不能执行的那些动作相对应。In some implementations, the server device may transmit the text data to the vehicle computing device along with instructions for the vehicle computing device to no longer transmit similar intent requests to the server device. For example, the vehicle may include a subsystem that can be controlled via an automated assistant for a user operating a second version of the vehicle computing device, but cannot be controlled via an automated assistant for a user operating a first version of the vehicle computing device. The subsystem may be, but is not limited to, a network of fluid sensors connected to the vehicle computing device. The user may request the vehicle computing device to implement a specific intent associated with a network of three fluid sensors by providing a spoken utterance such as "Assistant, send the body shop my fluid sensor readings". The server device may process audio data corresponding to the spoken utterance and determine, based on the audio data and the version information, an action corresponding to an intent that is supported only by a later version (e.g., the second version) of the vehicle computing device (e.g., retrieving data from four fluid sensors). For example, an action supported by a latest version of the vehicle computing device may include an action grammar for sampling output signals from four different fluid sensors. Thus, if a vehicle computing device having only three fluid sensors receives the aforementioned action, the syntax of the action may cause the vehicle computing device to fail or otherwise perform an irrelevant task to facilitate the execution of an incorrect action. In response to the server device receiving the audio data and version information from the vehicle computing device, the server device may generate instructions and/or data for providing to the vehicle computing device so that the vehicle computing device no longer transmits audio data corresponding to similar types of intent requests to the server device. Instead, using these instructions, when the vehicle computing device determines that the user is requesting a similar type of intent request, the vehicle computing device may locally generate action data for the requested intent. This may conserve computing resources and network resources because the server device will no longer process audio data for intent requests corresponding to those actions that certain vehicle computing devices cannot perform due to hardware and/or software version limitations.

在一些实现方式中，可以在车辆计算设备和服务器设备两者处处理与来自用户的口头话语相对应的音频数据。然而，当车辆能够确定用户正在请求的意图时，车辆计算设备可以在服务器设备先前已经请求车辆计算设备处理这样的意图时响应于意图请求。此外，车辆计算设备可以操作以促进生成用于意图的动作，而无需来自服务器设备的进一步辅助，尽管存在用于允许车辆计算设备与服务器设备通信的网络连接。在一些实现方式中，当车辆计算设备没有接收到不再使服务器设备参与辅助某些意图的执行的指令时，车辆计算设备仍然可以利用服务器设备的某些能力，尽管服务器设备不支持车辆计算设备的硬件和/或软件的特定版本的所有特征。In some implementations, audio data corresponding to spoken utterances from a user may be processed at both the vehicle computing device and the server device. However, when the vehicle is able to determine the intent that the user is requesting, the vehicle computing device may respond to the intent request when the server device has previously requested that the vehicle computing device process such an intent. In addition, the vehicle computing device may operate to facilitate the generation of actions for the intent without further assistance from the server device, despite the presence of a network connection for allowing the vehicle computing device to communicate with the server device. In some implementations, when the vehicle computing device does not receive an instruction to no longer involve the server device in assisting the execution of certain intents, the vehicle computing device may still utilize certain capabilities of the server device, despite the server device not supporting all features of a particular version of the vehicle computing device's hardware and/or software.

作为示例，用户可以提供口头话语以促进使用关于车辆的特定子系统的信息的意图。口头话语可以是“Assistant,when do I need to buy new tires？(助理，我需要何时购买新轮胎？)”。当车辆计算设备具有与服务器设备的网络连接时，对应于口头话语的音频数据可以被传送到服务器设备以供处理。响应于接收到音频数据，服务器设备可以确定用户所请求的意图以及与车辆计算设备相关联的版本。基于车辆计算设备的版本，服务器设备可以确定对应于轮胎胎面传感器和计算设备的版本的某些请求不再由服务器设备支持，并且因此服务器设备将响应于确定意图和版本而不生成意图数据。例如，如果用户具有最新版本的计算设备，则服务器设备将生成表征意图的数据、以及动作和/或槽值，以使得车辆计算设备处的自动化助理基于用户的轮胎传感器的传感器输出来提供对用户何时应当更换他们的轮胎的估计(例如，“You should change your tires in about 2months,orin about 1600miles(你应当在约2个月内或在约1600英里内更换你的轮胎)”)。As an example, a user may provide a spoken utterance to promote an intent to use information about a particular subsystem of a vehicle. The spoken utterance may be "Assistant, when do I need to buy new tires?". When the vehicle computing device has a network connection with a server device, audio data corresponding to the spoken utterance may be transmitted to the server device for processing. In response to receiving the audio data, the server device may determine the intent requested by the user and the version associated with the vehicle computing device. Based on the version of the vehicle computing device, the server device may determine that certain requests corresponding to the tire tread sensor and the version of the computing device are no longer supported by the server device, and therefore the server device will not generate intent data in response to determining the intent and version. For example, if the user has the latest version of the computing device, the server device will generate data representing the intent, as well as actions and/or slot values, so that the automated assistant at the vehicle computing device provides an estimate of when the user should change their tires based on the sensor output of the user's tire sensor (e.g., "You should change your tires in about 2 months, or in about 1600 miles").

然而，因为车辆计算设备不对应于完全支持的版本，所以服务器设备可以提供自然语言内容和/或基于处理音频数据的服务器设备的其他数据。例如，服务器设备可以将导出的请求数据提供给车辆计算设备，并且导出的请求数据可以至少表征口头话语的自然语言内容(例如，“Assistant,when do I need to buy new tires？”)，和/或提供请求用户更新他们的设备的消息。响应于接收到车辆计算设备，车辆计算设备可以提供消息以辅助用户(例如，给用户的消息可以包括：“Please update your vehicle computing device toreceive access to all Assistant actions(请更新你的车辆计算设备以接收对所有助理动作的访问)”)。此外，车辆计算设备可以本地生成合适的动作以执行促进实现来自用户的所请求的意图。例如，本地确定的动作可以对应于搜索互联网(例如，“WEB_SEARCH_ACTION()”)，而将经由车辆计算设备的最新版本确定的动作可以对应于至少从车辆的子系统检索和呈现数据(例如，“TIRE_LIFETIME_ESTIMATE_ACTION()”)。However, because the vehicle computing device does not correspond to a fully supported version, the server device may provide natural language content and/or other data based on the server device processing the audio data. For example, the server device may provide the derived request data to the vehicle computing device, and the derived request data may at least characterize the natural language content of the spoken utterance (e.g., "Assistant, when do I need to buy new tires?"), and/or provide a message requesting the user to update their device. In response to receiving the vehicle computing device, the vehicle computing device may provide a message to assist the user (e.g., the message to the user may include: "Please update your vehicle computing device to receive access to all Assistant actions"). In addition, the vehicle computing device may locally generate appropriate actions to perform to facilitate the realization of the requested intent from the user. For example, the locally determined action may correspond to searching the Internet (e.g., "WEB_SEARCH_ACTION()"), while the action to be determined via the latest version of the vehicle computing device may correspond to retrieving and presenting data from at least a subsystem of the vehicle (e.g., "TIRE_LIFETIME_ESTIMATE_ACTION()").

作为示例，车辆计算设备可以接收所导出的请求数据，并且因为车辆计算设备具有网络连接，所以执行使用短语“When do I need to buy new tires？”而执行互联网搜索的动作(例如，“WEB_SEARCH_ACTION()”)。基于互联网搜索，自动化助理可以经由车辆计算设备的界面来提供自然语言输出，而不是提供错误或排他地指示车辆计算设备无法执行所请求的动作。例如，基于互联网搜索，自动化助理可以提供听觉自然语言输出，诸如“Somesources say you should change your tires every 5years or every50,000miles(一些来源说你应当每5年或每50,000英里更换你的轮胎)”。这样，用户使用多年的感兴趣的诸如车辆的设备可以保持特定功能，尽管这些设备的某些版本已经部分地或完全地不被制造商或其他服务提供商(例如，维护相应和支持服务器设备的实体)支持。As an example, the vehicle computing device may receive the exported request data and, because the vehicle computing device has a network connection, perform an action (e.g., “WEB_SEARCH_ACTION()”) to perform an internet search using the phrase “When do I need to buy new tires?” Based on the internet search, the automated assistant may provide natural language output via the interface of the vehicle computing device, rather than providing an error or exclusively indicating that the vehicle computing device cannot perform the requested action. For example, based on the internet search, the automated assistant may provide auditory natural language output such as “Some sources say you should change your tires every 5years or every 50,000miles.” In this way, devices of interest such as vehicles that a user has used for many years may maintain certain functionality even though certain versions of these devices may no longer be partially or completely supported by a manufacturer or other service provider (e.g., an entity that maintains corresponding and supporting server devices).

以上描述被提供作为本公开的一些实现方式的概述。下面更详细地描述这些实现方式和其它实现方式的进一步描述。The above description is provided as an overview of some implementations of the present disclosure. Further descriptions of these implementations and other implementations are described in more detail below.

其他实现方式可以包括存储指令的非暂时性计算机可读存储介质，所述指令可由一个或多个处理器(例如，中央处理单元(CPU)、图形处理单元(GPU)和/或张量处理单元(TPU))执行以执行诸如以上和/或本文其他地方描述的方法中的一个或多个的方法。然而其它实现方式可包括一个或多个计算机和/或一个或多个机器人的系统，该一个或多个计算机和/或一个或多个机器人包括一个或多个处理器，该处理器可操作以执行所存储的指令以执行诸如上述和/或本文其它地方所述的一个或多个方法的方法。Other implementations may include a non-transitory computer-readable storage medium storing instructions executable by one or more processors (e.g., a central processing unit (CPU), a graphics processing unit (GPU), and/or a tensor processing unit (TPU)) to perform a method such as one or more of the methods described above and/or elsewhere herein. Yet other implementations may include a system of one or more computers and/or one or more robots including one or more processors operable to execute stored instructions to perform a method such as one or more of the methods described above and/or elsewhere herein.

应当理解，本文更详细描述的前述概念和附加概念的所有组合被预期为本文公开的主题的一部分。例如，在本公开结尾出现的所要求保护的主题的所有组合被认为是本文公开的主题的一部分。It should be understood that all combinations of the foregoing concepts and additional concepts described in more detail herein are contemplated as being part of the subject matter disclosed herein. For example, all combinations of the claimed subject matter appearing at the end of this disclosure are considered to be part of the subject matter disclosed herein.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1A、图1B和图1C示出尽管服务器设备主动逐步淘汰计算设备的某些版本，但响应于各种计算设备的服务器设备的视图。1A , 1B, and 1C illustrate views of a server device responsive to various computing devices even as the server device proactively phases out certain versions of the computing devices.

图2示出用于在逐步淘汰对计算设备版本支持的同时确保仍由用户使用的任何计算设备版本的有用响应性的系统。2 illustrates a system for phasing out support for computing device versions while ensuring useful responsiveness of any computing device versions still in use by users.

图3示出用于在逐步淘汰对某些计算设备版本支持的同时确保仍在操作中的任何计算设备版本的有用响应性的方法。3 illustrates a method for phasing out support for certain computing device versions while ensuring useful responsiveness of any computing device versions still in operation.

图4示出用于在逐步淘汰对特定计算设备请求支持的同时确保对计算设备先前已经响应的任何请求的有用响应性的方法。4 illustrates a method for phasing out support for a particular computing device request while ensuring useful responsiveness to any requests that the computing device has previously responded to.

图5是示例计算机系统的框图。5 is a block diagram of an example computer system.

具体实施方式DETAILED DESCRIPTION

图1A、图1B和图1C示出尽管服务器设备114主动逐步淘汰对某些版本的车辆计算设备的支持但响应于各种车辆计算设备的服务器设备114的视图100、130和150。具体地，图1A示出由诸如服务器设备114的远程计算设备112完全支持的车辆计算设备106。换言之，车辆计算设备106的一个或多个软件和/或硬件组件的版本可以对应于最新可用的版本。作为示例，车辆计算设备106可以对应于第三版本，并且其他先前可用的版本可以包括第一版本和第二版本。服务器设备114可以操作以向每个版本提供一定量的支持，尽管第三版本是最新版本，以便消除在根据第一版本或第二版本操作的其他车辆计算设备处的浪费的传输和/或操作。1A, 1B, and 1C illustrate views 100, 130, and 150 of a server device 114 responsive to various vehicle computing devices, even though the server device 114 is actively phasing out support for certain versions of the vehicle computing device. Specifically, FIG. 1A illustrates a vehicle computing device 106 that is fully supported by a remote computing device 112, such as the server device 114. In other words, the version of one or more software and/or hardware components of the vehicle computing device 106 may correspond to the latest available version. As an example, the vehicle computing device 106 may correspond to a third version, and other previously available versions may include a first version and a second version. The server device 114 may be operable to provide a certain amount of support for each version, even though the third version is the latest version, in order to eliminate wasted transmissions and/or operations at other vehicle computing devices operating according to the first version or the second version.

如图1A中所提供的，用户可以提供诸如“Assistant,send my location andarrival time to my destination’s phone number(助理，请将我的位置和到达时间发送到我的目的地的电话号码)”的口头话语102。口头话语可以对应于在根据第三版本操作的车辆计算设备处排他地可执行的动作。因此，由于车辆104的车辆计算设备106对应于第三版本，所以服务器设备114可以操作以完全响应于来自用户的请求。在一些实现方式中，来自用户的请求可以对应于对于所有版本(例如，第一版本、第二版本和第三版本)都相同的意图，但是用于所请求的意图的特定动作可以根据与接收请求的车辆计算设备相关联的版本而不同。换句话说，用于该意图的标识符可以是软件和/或硬件模块的所有版本所共用的。例如，经由口头话语102所请求的意图可以被标识为MESSAGE_INTENT()。然而，可以根据意图执行的动作对于每个版本可以是不同的，和/或每个动作的句法对于每个版本可以是不同的。例如，MESSAGE_DESSITINATION()动作可用于车辆计算设备的所有版本，然而，MESSAGE_DESSITINATION()动作的特定语法对于第三版本可以是不同的，至少相对于第一版本是不同的。作为示例，第三版本的动作语法可以引用属性、值、API和/或不被第一版本的相同动作的动作语法引用的任何其他数据。As provided in FIG. 1A , a user may provide a spoken utterance 102 such as “Assistant, send my location and arrival time to my destination’s phone number.” The spoken utterance may correspond to an action that is exclusively executable at a vehicle computing device operating according to the third version. Thus, since the vehicle computing device 106 of the vehicle 104 corresponds to the third version, the server device 114 may operate to fully respond to the request from the user. In some implementations, the request from the user may correspond to an intent that is the same for all versions (e.g., the first version, the second version, and the third version), but the specific action for the requested intent may differ depending on the version associated with the vehicle computing device that receives the request. In other words, the identifier for the intent may be common to all versions of the software and/or hardware module. For example, the intent requested via the spoken utterance 102 may be identified as MESSAGE_INTENT(). However, the actions that may be performed according to the intent may be different for each version, and/or the syntax of each action may be different for each version. For example, a MESSAGE_DESSITINATION() action may be available to all versions of the vehicle computing device, however, the specific syntax of the MESSAGE_DESSITINATION() action may be different for a third version, at least relative to the first version. As an example, the action syntax of the third version may reference properties, values, APIs, and/or any other data that is not referenced by the action syntax of the same action of the first version.

在一些实现方式中，车辆计算设备106可以依赖于服务器设备114来选择特定动作以用于在车辆计算设备106处执行。车辆计算设备106可以本地确定是依赖于服务器设备114来选择动作，还是本地确定动作。例如，因为车辆计算设备106对应于第三和最新的版本，所以车辆计算设备106可以确定服务器设备114可以在用户正请求特定意图时提供动作数据和/或槽值数据。响应于用户提供对消息意图的请求，车辆计算设备106可以对口头话语102执行语音到文本处理，和/或可以将与口头话语102相对应的音频数据提供给服务器设备114以供处理。音频数据可以经由无线通信协议(诸如蜂窝、4G、5G、LTE、蓝牙、Wi-Fi、一个或多个远程协议和/或任何其他有线或无线通信协议)通过诸如互联网的网络108传送。In some implementations, the vehicle computing device 106 may rely on the server device 114 to select a particular action for execution at the vehicle computing device 106. The vehicle computing device 106 may determine locally whether to rely on the server device 114 to select the action, or to determine the action locally. For example, because the vehicle computing device 106 corresponds to the third and latest version, the vehicle computing device 106 may determine that the server device 114 can provide action data and/or slot value data when the user is requesting a particular intent. In response to the user providing a request for a message intent, the vehicle computing device 106 may perform speech-to-text processing on the spoken utterance 102, and/or may provide audio data corresponding to the spoken utterance 102 to the server device 114 for processing. The audio data may be transmitted over a network 108 such as the Internet via a wireless communication protocol (such as cellular, 4G, 5G, LTE, Bluetooth, Wi-Fi, one or more remote protocols, and/or any other wired or wireless communication protocol).

响应于服务器设备114接收到音频数据，服务器设备114和/或与服务器设备114通信的另一设备可以对音频数据执行语音到文本处理。从语音到文本处理得到的文本数据可以表征口头话语102的自然语言内容。在一些实例中，用户可以提供包括具体生成和/或标识特定意图的一个或多个自然语言词语的口头话语102。例如，可以使用自然语言词语(诸如发送、消息、草稿和/或可以与电子消息相关联的任何其他自然语言词语)来标识消息意图。文本数据可以作为助理数据118存储在服务器设备114处，并且可以经由辅助交互引擎122进一步处理以生成对应于文本数据的自然语言理解(NLU)数据。在一些实现方式中，通过生成NLU数据，助理交互引擎122可以标识用户经由口头话语102所请求的意图。In response to the server device 114 receiving the audio data, the server device 114 and/or another device communicating with the server device 114 can perform speech-to-text processing on the audio data. The text data obtained from the speech-to-text processing can characterize the natural language content of the spoken utterance 102. In some instances, the user can provide a spoken utterance 102 including one or more natural language words that specifically generate and/or identify a specific intent. For example, a natural language word (such as send, message, draft, and/or any other natural language word that can be associated with an electronic message) can be used to identify the message intent. The text data can be stored at the server device 114 as assistant data 118, and can be further processed via the auxiliary interaction engine 122 to generate natural language understanding (NLU) data corresponding to the text data. In some implementations, by generating NLU data, the assistant interaction engine 122 can identify the intent requested by the user via the spoken utterance 102.

在一些实现方式中，当车辆计算设备106从用户接收到口头话语时，车辆计算设备106可以将与口头话语相对应的音频数据连同版本数据一起传送到服务器设备114。响应于接收到版本数据，服务器设备114的版本引擎120可以确定对应于车辆计算设备106的版本。在一些实现方式中，版本可以基于车辆计算设备106的一个或多个硬件组件、在车辆计算设备106处操作的一个或多个软件模块、经由车辆计算设备106可访问的自动化助理、和/或可以与车辆计算设备106相关联的任何其他特性。In some implementations, when the vehicle computing device 106 receives the spoken utterance from the user, the vehicle computing device 106 may transmit audio data corresponding to the spoken utterance along with the version data to the server device 114. In response to receiving the version data, the version engine 120 of the server device 114 may determine a version corresponding to the vehicle computing device 106. In some implementations, the version may be based on one or more hardware components of the vehicle computing device 106, one or more software modules operating at the vehicle computing device 106, an automated assistant accessible via the vehicle computing device 106, and/or any other characteristics that may be associated with the vehicle computing device 106.

基于标识对应于车辆计算设备106的所请求的意图和版本，服务器设备114可以生成用于车辆计算设备106执行的合适的动作。可替代地或附加地，服务器设备114还可以标识用于意图的一个或多个槽值，其可以在生成所标识的动作期间使用。服务器设备114可以使用版本数据116以便标识可以使用车辆计算设备106来执行的一个或多个动作。例如，对应于车辆计算设备106的第三版本可以支持从不同的应用检索应用数据以便生成消息的槽值。因此，由于用户请求将其位置和到达时间发送到与其目的地相关联的电话号码，所以服务器设备可以在用户许可的情况下访问标识当前位置(例如，“SLOT_VALUE_X”)、估计的到达时间(例如，“SLOT_VALUE_Y”)以及与用户的目的地相关联的电话号码(例如，“SLOT_VALUE_Z”)的其他应用数据。对应于这种类型的消息的动作可以被标识为例如多源消息动作(例如“ACTION_N”)。至少当车辆计算设备106对应于第三版本时，该特定动作可以由服务器设备114选择以在车辆计算设备106处执行。Based on identifying the requested intent and version corresponding to the vehicle computing device 106, the server device 114 may generate an appropriate action for the vehicle computing device 106 to perform. Alternatively or additionally, the server device 114 may also identify one or more slot values for the intent, which may be used during the generation of the identified action. The server device 114 may use the version data 116 to identify one or more actions that may be performed using the vehicle computing device 106. For example, a third version corresponding to the vehicle computing device 106 may support retrieving application data from different applications to generate slot values for a message. Thus, since the user requests that his location and arrival time be sent to a phone number associated with his destination, the server device may access other application data identifying the current location (e.g., "SLOT_VALUE_X"), the estimated arrival time (e.g., "SLOT_VALUE_Y"), and the phone number associated with the user's destination (e.g., "SLOT_VALUE_Z") with the user's permission. The action corresponding to this type of message may be identified as, for example, a multi-source message action (e.g., "ACTION_N"). The particular action may be selected by the server device 114 to be performed at the vehicle computing device 106 at least when the vehicle computing device 106 corresponds to the third version.

因此，响应于车辆计算设备106接收到口头话语102，并且车辆计算设备106向服务器设备114提供音频数据和版本数据，服务器设备114可以向车辆计算设备106提供意图数据124。意图数据124可以标识意图(例如，“INTENT()”)、服务器选择的动作(例如，“ACTION_N”)以及服务器设备114基于话语102和/或任何其他合适的数据源而标识的任何槽值。然而，当车辆计算设备不对应于第三版本时，服务器设备114和/或车辆计算设备可以不同地但是以确保自动化助理或另一应用对口头话语的响应的方式处理类似的口头话语。Thus, in response to the vehicle computing device 106 receiving the spoken utterance 102, and the vehicle computing device 106 providing the audio data and the version data to the server device 114, the server device 114 may provide the intent data 124 to the vehicle computing device 106. The intent data 124 may identify the intent (e.g., "INTENT()"), the action selected by the server (e.g., "ACTION_N"), and any slot values identified by the server device 114 based on the utterance 102 and/or any other suitable data source. However, when the vehicle computing device does not correspond to the third version, the server device 114 and/or the vehicle computing device may process similar spoken utterances differently but in a manner that ensures a response to the spoken utterance by an automated assistant or another application.

在各种实现方式中，车辆计算设备106与车辆集成并且直接驱动也与车辆集成的车辆(多个)扬声器。车辆计算设备106可以是车辆的原始设备，或者可以是售后安装的附件。车辆计算设备106被集成，因为它直接驱动车辆(多个)扬声器和/或在不需要使用专用工具和/或需要大量时间和/或专业知识的情况下不能从车辆移除。例如，车辆计算设备102可以连接到车辆的控制器局域网(CAN)总线和/或可以经由车辆专用连接器(例如，不是12V车辆插座并且不是容易访问的辅助标准插头)供电。In various implementations, the vehicle computing device 106 is integrated with the vehicle and directly drives the vehicle's speaker(s) that are also integrated with the vehicle. The vehicle computing device 106 may be original equipment to the vehicle, or may be an aftermarket installed accessory. The vehicle computing device 106 is integrated because it directly drives the vehicle's speaker(s) and/or cannot be removed from the vehicle without the use of specialized tools and/or requiring a significant amount of time and/or expertise. For example, the vehicle computing device 102 may be connected to the vehicle's controller area network (CAN) bus and/or may be powered via a vehicle-specific connector (e.g., not a 12V vehicle receptacle and not an easily accessible auxiliary standard plug).

图1B示出用户向对应于第二版本的车辆计算设备140提供口头话语132的视图130。车辆计算设备140、和/或本文讨论的任何其他车辆计算设备可以包括一个或多个存储器设备和一个或多个处理器。响应于接收到口头话语132，车辆计算设备140可以经由诸如互联网的网络126与远程计算设备112通信。口头话语132可以类似于图1A的口头话语102，至少为了说明差异以及服务器设备114可以如何处理不同版本的类似口头话语。FIG. 1B shows a view 130 in which a user provides a spoken utterance 132 to a vehicle computing device 140 corresponding to a second version. The vehicle computing device 140, and/or any other vehicle computing device discussed herein, may include one or more memory devices and one or more processors. In response to receiving the spoken utterance 132, the vehicle computing device 140 may communicate with the remote computing device 112 via a network 126, such as the Internet. The spoken utterance 132 may be similar to the spoken utterance 102 of FIG. 1A , at least to illustrate the differences and how the server device 114 may process different versions of similar spoken utterances.

例如，因为车辆计算设备140对应于第二版本并且第三版本是最新版本，所以服务器设备114可以至少相对于图1A中所示的响应不同地响应车辆计算设备140。车辆计算设备140可以传送表征口头话语132的音频数据以及将车辆计算设备140表征为对应于第二版本的版本数据。在一些实现方式中，车辆计算设备140可以本地确定服务器设备114可以在响应于由口头话语132包含的特定意图请求时辅助车辆计算设备140的程度。附加地或替代地，至少基于服务器设备114可访问的版本数据116，服务器设备114可以确定服务器设备114可以在响应于特定意图请求时辅助车辆计算设备的程度。车辆计算设备140可以响应于确定口头话语132包含与MESSAGE_INTENT相对应的意图请求并且车辆计算设备140的第二版本相对于MESSAGE_INTENT被至少部分地支持，而向服务器设备114提供音频数据和版本数据。For example, because the vehicle computing device 140 corresponds to the second version and the third version is the latest version, the server device 114 may respond to the vehicle computing device 140 differently relative to at least the response shown in FIG. 1A. The vehicle computing device 140 may transmit audio data representing the spoken utterance 132 and version data representing the vehicle computing device 140 as corresponding to the second version. In some implementations, the vehicle computing device 140 may locally determine the extent to which the server device 114 can assist the vehicle computing device 140 in responding to the specific intent request contained by the spoken utterance 132. Additionally or alternatively, based at least on the version data 116 accessible to the server device 114, the server device 114 may determine the extent to which the server device 114 can assist the vehicle computing device in responding to the specific intent request. The vehicle computing device 140 may provide the audio data and the version data to the server device 114 in response to determining that the spoken utterance 132 contains the intent request corresponding to the MESSAGE_INTENT and that the second version of the vehicle computing device 140 is at least partially supported relative to the MESSAGE_INTENT.

作为示例，服务器设备114可以接收音频数据和版本数据，并且版本引擎120可以确定服务器设备114可以支持具有第二版本的车辆计算设备140的程度。版本引擎120可以将来自车辆计算设备140的版本数据与版本数据116进行比较，以便生成表征服务器设备114可以支持车辆计算设备140的程度的数据。替代地或附加地，服务器设备114可以访问表征口头话语132的自然语言内容的文本数据，以便生成口头话语132的自然语言理解数据。辅助交互引擎122可以使用文本数据来生成NLU数据，NLU数据表征MESSAGE_INTENT，并且可选地，用于实现意图请求的一个或多个槽值。在一些实现方式中，可以基于服务器设备114可访问的助理数据118来生成意图和/或一个或多个槽值。As an example, the server device 114 may receive the audio data and the version data, and the version engine 120 may determine the extent to which the server device 114 can support the vehicle computing device 140 having the second version. The version engine 120 may compare the version data from the vehicle computing device 140 with the version data 116 to generate data characterizing the extent to which the server device 114 can support the vehicle computing device 140. Alternatively or additionally, the server device 114 may access text data characterizing the natural language content of the spoken utterance 132 to generate natural language understanding data of the spoken utterance 132. The auxiliary interaction engine 122 may use the text data to generate NLU data that characterizes the MESSAGE_INTENT and, optionally, one or more slot values for implementing the intent request. In some implementations, the intent and/or one or more slot values may be generated based on the assistant data 118 accessible to the server device 114.

基于服务器设备114确定车辆计算设备140对应于第二版本，服务器设备114可以将数据128传送到车辆计算设备，并且数据128可以标识意图(例如，INTENT())和语音到文本数据(例如，STT_DATA(“Assistant,send…”)，其可以表征口头话语132的自然语言内容。如果口头话语132的任何本地处理保持在进行中，通过提供语音到文本数据，尽管车辆计算设备140对应于第二版本而非第三版本，但在车辆计算设备140处可以保留计算资源。此外，通过允许服务器设备114绕过传送对应于第三版本的动作数据，可以保留网络带宽，因为经由网络126传送较少数据。此外，如果车辆计算设备140已经接收到对应于第三版本的动作数据并且试图处理它，那么此处理将至少相对于消息意图的进一步实现是不重要的。Based on the server device 114 determining that the vehicle computing device 140 corresponds to the second version, the server device 114 may transmit data 128 to the vehicle computing device, and the data 128 may identify the intent (e.g., INTENT()) and speech-to-text data (e.g., STT_DATA(“Assistant, send…”), which may characterize the natural language content of the spoken utterance 132. If any local processing of the spoken utterance 132 remains in progress, by providing the speech-to-text data, computing resources may be preserved at the vehicle computing device 140 despite the vehicle computing device 140 corresponding to the second version instead of the third version. Furthermore, by allowing the server device 114 to bypass transmitting action data corresponding to the third version, network bandwidth may be preserved because less data is transmitted via the network 126. Furthermore, if the vehicle computing device 140 has already received the action data corresponding to the third version and attempted to process it, such processing would be immaterial, at least with respect to further fulfillment of the message intent.

当车辆计算设备140接收到数据128时，车辆计算设备140可以本地选择用于该意图的合适的动作。例如，除了标识和/或生成多源动作意图之外，车辆计算设备140可以标识由经由车辆计算设备140可访问的一个或多个应用支持的动作。例如，车辆计算设备140可以通过来自自动化助理的帮助生成能够由地图应用执行的LOCATION_MESSAGE动作。例如，LOCATION_MESSAGE动作的槽值可以包括当前位置和估计到达时间，其可以由地图应用生成。然而，因为车辆计算设备140对应于第二版本，所以车辆计算设备140可能不能支持可以提供用于所映射的目的地的电话号码的应用。因此，响应于口头话语132并且基于数据128，自动化助理可以提供诸如“Ok,I’m sending a message with your location andestimated arrival time.Who would you like to send this message to？(好的，我将发送具有你的位置和估计到达时间的消息。你想要发送该消息给谁？)”之类的响应142。作为响应，并且为了实现剩余槽值，用户可以提供诸如“Albert Smith”之类的另一个口头话语132。当已经确定了LOCATION_MESSAGE动作的所有槽值时，可以在车辆计算设备140处执行该动作。以此方式，尽管车辆计算设备140对应于相对于第三版本过时的第二版本，但自动化助理和/或车辆计算设备140仍可保持对于所请求的意图的响应，而不浪费计算资源。When the vehicle computing device 140 receives the data 128, the vehicle computing device 140 may locally select an appropriate action for the intent. For example, in addition to identifying and/or generating a multi-source action intent, the vehicle computing device 140 may identify an action supported by one or more applications accessible via the vehicle computing device 140. For example, the vehicle computing device 140 may generate a LOCATION_MESSAGE action that can be executed by a map application with help from an automated assistant. For example, the slot value of the LOCATION_MESSAGE action may include a current location and an estimated arrival time, which may be generated by a map application. However, because the vehicle computing device 140 corresponds to the second version, the vehicle computing device 140 may not be able to support an application that can provide a phone number for the mapped destination. Therefore, in response to the spoken utterance 132 and based on the data 128, the automated assistant may provide a response 142 such as "Ok, I'm sending a message with your location and estimated arrival time. Who would you like to send this message to?" In response, and to fulfill the remaining slot values, the user can provide another spoken utterance 132, such as "Albert Smith." When all slot values for the LOCATION_MESSAGE action have been determined, the action can be executed at vehicle computing device 140. In this way, even though vehicle computing device 140 corresponds to the second version that is outdated relative to the third version, the automated assistant and/or vehicle computing device 140 can remain responsive to the requested intent without wasting computing resources.

图1C示出用户向与服务器设备114所支持的第一版本相对应的车辆154的车辆计算设备156提供口头话语152的视图150。具体地，图1C示出尽管所请求的意图与第一和最新版本的动作相关联，但是服务器设备114仍使得车辆计算设备156响应于用户的场景。此外，图1C示出服务器设备114可以如何提供指令数据，该指令数据可以使得车辆计算设备156随后绕过传送对应于某些意图请求的某些数据。1C shows a view 150 of a user providing a spoken utterance 152 to a vehicle computing device 156 of a vehicle 154 corresponding to a first version supported by the server device 114. Specifically, FIG1C shows that the server device 114 causes the vehicle computing device 156 to respond to the user's scenario despite the requested intent being associated with an action of the first and latest version. In addition, FIG1C shows how the server device 114 can provide instruction data that can cause the vehicle computing device 156 to subsequently bypass transmitting certain data corresponding to certain intent requests.

根据图1C，用户可以提供诸如“Assistant,send my location and estimatedarrival time to my destinations phone number”之类的口头话语152。如参考图1A和图1B所讨论的，上述口头话语可以被服务器设备114完全支持或者被服务器设备114部分支持。然而，在一些情况下，服务器设备114可以使用版本引擎120和版本数据116来确定服务器设备114不再支持一个或多个所请求的意图。作为示例，响应于接收到表征口头话语152的音频数据，服务器设备114可以确定车辆计算设备156对应于第一版本，并且标识由用户经由口头话语152所请求的一个或多个意图。如果标识出的意图对应于不再被最新版本(例如，第三版本)支持的意图，则服务器设备114可以进一步确定车辆计算设备156是否已经接收到关于限制向服务器设备114提交这样的请求的能力的指导数据。替代地或附加地，车辆计算设备156可以经由网络158传送音频数据、版本数据和/或表征车辆计算设备156相对于特定意图的限制的数据。响应于接收到限制数据，服务器设备114可以确定服务器设备114应当和/或将响应于来自车辆计算设备156的数据的程度。According to FIG. 1C , the user may provide a spoken utterance 152 such as “Assistant, send my location and estimated arrival time to my destinations phone number”. As discussed with reference to FIG. 1A and FIG. 1B , the above spoken utterance may be fully supported by the server device 114 or partially supported by the server device 114. However, in some cases, the server device 114 may determine that the server device 114 no longer supports one or more requested intents using the version engine 120 and the version data 116. As an example, in response to receiving audio data representing the spoken utterance 152, the server device 114 may determine that the vehicle computing device 156 corresponds to the first version and identify one or more intents requested by the user via the spoken utterance 152. If the identified intent corresponds to an intent that is no longer supported by the latest version (e.g., the third version), the server device 114 may further determine whether the vehicle computing device 156 has received guidance data regarding limiting the ability to submit such a request to the server device 114. Alternatively or additionally, the vehicle computing device 156 may transmit audio data, version data, and/or data characterizing limitations of the vehicle computing device 156 with respect to a particular intent via the network 158. In response to receiving the limitation data, the server device 114 may determine the extent to which the server device 114 should and/or will respond to the data from the vehicle computing device 156.

例如，响应于接收到音频数据、数据和表征限制的数据，服务器设备114可以确定车辆计算设备156正在根据第一版本来操作，并且车辆计算设备156已经被指示不再请求用于一个或多个意图和/或一个或多个动作的意图数据、动作数据和/或槽数据。服务器设备114可以处理音频数据以便生成自然语言理解数据，其可以表征用户所请求的一个或多个意图。如果一个或多个被表征的意图对应于由限制数据所标识的一个或多个意图，则服务器设备114可以至少相对于一个或多个被表征的意图绕过生成用于车辆计算设备156的意图数据、动作数据和/或槽数据。此外，除了服务器设备114使用资源来生成意图数据、动作数据和/或槽数据之外，服务器设备114可以生成语音到文本数据。取决于本地车辆计算设备156的处理能力和/或有多少应用正同时在车辆计算设备156处运行，从服务器设备114提供语音到文本数据可以在车辆计算设备156处保留处理带宽。For example, in response to receiving the audio data, data, and data characterizing the restriction, the server device 114 may determine that the vehicle computing device 156 is operating according to the first version, and the vehicle computing device 156 has been instructed to no longer request intent data, action data, and/or slot data for one or more intents and/or one or more actions. The server device 114 may process the audio data to generate natural language understanding data, which may characterize the one or more intents requested by the user. If one or more characterized intents correspond to one or more intents identified by the restriction data, the server device 114 may bypass generating intent data, action data, and/or slot data for the vehicle computing device 156 at least with respect to the one or more characterized intents. In addition, in addition to the server device 114 using resources to generate intent data, action data, and/or slot data, the server device 114 may generate speech-to-text data. Depending on the processing power of the local vehicle computing device 156 and/or how many applications are running at the vehicle computing device 156 at the same time, providing speech-to-text data from the server device 114 may reserve processing bandwidth at the vehicle computing device 156.

在一些实现方式中，如果服务器设备114确定对应于车辆计算设备156的版本经历了逐步淘汰，但是尚未被提供指令数据，则服务器设备114可以将数据160传送到车辆计算设备156，并且数据160可以包括语音到文本数据和/或表征与车辆计算设备156相关联的特定版本的限制的指令数据。响应于接收到指令数据，车辆计算设备156可以更新其配置，以便使其随后绕过依赖于某些意图数据、动作数据和/或槽数据，和/或绕过在服务器设备114中查询某些意图数据、动作数据和/或槽数据。In some implementations, if the server device 114 determines that a version corresponding to the vehicle computing device 156 has undergone a phase-out, but has not yet been provided with instruction data, the server device 114 may transmit data 160 to the vehicle computing device 156, and the data 160 may include speech-to-text data and/or instruction data characterizing limitations of the particular version associated with the vehicle computing device 156. In response to receiving the instruction data, the vehicle computing device 156 may update its configuration so that it subsequently bypasses reliance on certain intent data, action data, and/or slot data, and/or bypasses querying the server device 114 for certain intent data, action data, and/or slot data.

在服务器设备114确认车辆计算设备156的版本的限制并且服务器设备114提供语音到文本数据的情况下，车辆计算设备156可以使用该语音到文本数据或本地生成的语音到文本数据，以促进对用户的响应和/或以其他方式实现所请求的意图。例如，车辆计算设备156可以基于语音到文本数据确定用户已经请求特定意图。此外，因为服务器设备114没有选择用于该意图的特定动作，所以车辆计算设备156可以本地生成动作，诸如NEW_MESSAGE动作。此外，车辆计算设备156可以确定是否已经为NEW_MESSAGE动作所需的一个或多个槽指配了槽值。例如，如果“MESSAGE_BODY()”槽值丢失，或者以其他方式由车辆计算设备156使用语音到文本数据不可标识，则车辆计算设备156可以向用户提供查询162。作为响应，车辆计算设备156可以提供自然语言输出，诸如“OK,what would you like themessage to say？(好的，你想要该消息说什么？)”。随后，用户可以提供另一口头话语152，诸如“I’m on my way(我正在路上)”。用户和车辆计算设备156之间的这种交互可以继续，直到所有所需的槽已经被指配了槽值为止、和/或直到已经经由车辆设备156执行了动作为止。以此方式，尽管车辆计算设备156对应于第一版本，或者不是最新版本的版本，车辆计算设备156可以至少响应于用户并且利用服务器设备114的某些功能。这允许车辆计算设备156响应于用户更长的使用时段，同时还消除了计算资源(例如存储器和/或处理带宽)的浪费的网络通信和/或低效使用。In the case where the server device 114 confirms the limitations of the version of the vehicle computing device 156 and the server device 114 provides speech-to-text data, the vehicle computing device 156 can use the speech-to-text data or the locally generated speech-to-text data to facilitate the response to the user and/or otherwise implement the requested intent. For example, the vehicle computing device 156 can determine that the user has requested a specific intent based on the speech-to-text data. In addition, because the server device 114 has not selected a specific action for the intent, the vehicle computing device 156 can locally generate an action, such as a NEW_MESSAGE action. In addition, the vehicle computing device 156 can determine whether a slot value has been assigned to one or more slots required for the NEW_MESSAGE action. For example, if the "MESSAGE_BODY()" slot value is missing or otherwise unidentifiable by the vehicle computing device 156 using the speech-to-text data, the vehicle computing device 156 can provide a query 162 to the user. In response, the vehicle computing device 156 can provide a natural language output, such as "OK, what would you like the message to say? (OK, what do you want the message to say?)". Subsequently, the user may provide another spoken utterance 152, such as “I’m on my way.” This interaction between the user and the vehicle computing device 156 may continue until all required slots have been assigned slot values, and/or until an action has been performed via the vehicle device 156. In this way, the vehicle computing device 156 may be responsive to the user and utilize at least some functionality of the server device 114 despite the vehicle computing device 156 corresponding to the first version, or a version that is not the latest version. This allows the vehicle computing device 156 to be responsive to the user for longer periods of use, while also eliminating wasted network communications and/or inefficient use of computing resources (e.g., memory and/or processing bandwidth).

图2示出用于在逐步淘汰计算设备(例如，车辆计算设备和/或任何其他客户端设备)版本的同时确保仍然在操作的任何计算设备版本的有用响应性的系统200。自动化助理204可以作为在诸如客户端设备218和/或服务器设备202之类的一个或多个计算设备处提供的自动化助理应用的一部分来操作。用户可以经由一个或多个助理界面220与自动化助理204交互，一个或多个助理界面220可以包括以下中的一个或多个：麦克风、相机、触摸屏显示器、用户界面和/或能够提供用户与应用之间的界面的任何其它设备。例如，用户可以通过向助理界面提供口头、文本或图形输入以使自动化助理204执行功能(例如，提供数据、控制外围设备、访问代理等)来初始化自动化助理204。客户端设备218可以包括显示设备，该显示设备可以是包括触摸界面的显示面板，该触摸界面用于接收触摸输入和/或手势以允许用户经由触摸界面控制客户端设备218和/或服务器设备202的应用。FIG. 2 illustrates a system 200 for ensuring useful responsiveness of any computing device version still in operation while phasing out versions of computing devices (e.g., vehicle computing devices and/or any other client devices). Automated assistant 204 may operate as part of an automated assistant application provided at one or more computing devices such as client device 218 and/or server device 202. A user may interact with automated assistant 204 via one or more assistant interfaces 220, which may include one or more of the following: a microphone, a camera, a touch screen display, a user interface, and/or any other device capable of providing an interface between a user and an application. For example, a user may initialize automated assistant 204 by providing verbal, textual, or graphical input to the assistant interface to cause automated assistant 204 to perform a function (e.g., provide data, control peripherals, access an agent, etc.). Client device 218 may include a display device, which may be a display panel including a touch interface for receiving touch input and/or gestures to allow a user to control an application of client device 218 and/or server device 202 via the touch interface.

在一些实现方式中，客户端设备218可以没有显示设备但是包括音频界面(例如，扬声器和/或麦克风)，从而提供可听用户界面输出，而不提供图形用户界面输出，以及提供用于从用户接收口头自然语言输入的用户界面，诸如麦克风。例如，在一些实现方式中，客户端设备218可以包括一个或多个触觉输入界面，诸如一个或多个按钮，并且省略从图形处理单元(GPU)提供图形数据的显示面板。以此方式，与包括显示面板和GPU的计算设备相比，可以节省大量能量和处理资源。In some implementations, the client device 218 may not have a display device but include an audio interface (e.g., a speaker and/or a microphone) to provide audible user interface output without providing graphical user interface output, and provide a user interface for receiving oral natural language input from a user, such as a microphone. For example, in some implementations, the client device 218 may include one or more tactile input interfaces, such as one or more buttons, and omit a display panel that provides graphics data from a graphics processing unit (GPU). In this way, a significant amount of energy and processing resources may be saved compared to a computing device that includes a display panel and a GPU.

客户端设备218可以通过诸如互联网的网络240与服务器设备202通信。客户端设备218可以将计算任务卸载到服务器设备202，以便在客户端设备218处保留计算资源。例如，服务器设备202可以托管自动化助理204，并且客户端设备218可以将在一个或多个助理界面220处接收到的输入传送到服务器设备202。然而，在一些实现方式中，自动化助理204可以托管在客户端设备218处。在各种实现方式中，自动化助理204的所有或少于所有方面可以实现在远程计算设备242及/或客户端设备218上。在那些实现方式中的一些实现方式中，自动化助理204的各方面经由客户端设备218的本地自动化助理222来实现并且与服务器设备202对接，所述服务器设备202可以实现自动化助理204的其它方面。服务器设备202可以可选地经由多个线程服务多个用户及其相关联的助理应用。在其中自动化助理204的所有或少于所有方面经由客户端设备218的本地自动化助理222来实现的实现方式中，本地自动化助理222可以是与客户端设备218的操作系统分开的应用(例如，安装在操作系统的“顶部上”)-或者可以替代地直接由客户端设备218的操作系统来实现(例如，被认为是操作系统的应用，但是与操作系统集成)。The client device 218 can communicate with the server device 202 via a network 240, such as the Internet. The client device 218 can offload computing tasks to the server device 202 in order to preserve computing resources at the client device 218. For example, the server device 202 can host the automated assistant 204, and the client device 218 can transmit input received at one or more assistant interfaces 220 to the server device 202. However, in some implementations, the automated assistant 204 can be hosted at the client device 218. In various implementations, all or less than all aspects of the automated assistant 204 can be implemented on the remote computing device 242 and/or the client device 218. In some of those implementations, various aspects of the automated assistant 204 are implemented via the local automated assistant 222 of the client device 218 and interfaced with the server device 202, which can implement other aspects of the automated assistant 204. The server device 202 can optionally serve multiple users and their associated assistant applications via multiple threads. In implementations in which all or less than all aspects of automated assistant 204 are implemented via a local automated assistant 222 of client device 218, local automated assistant 222 may be an application separate from the operating system of client device 218 (e.g., installed “on top of” the operating system)—or may alternatively be implemented directly by the operating system of client device 218 (e.g., considered an application of the operating system, but integrated with the operating system).

在一些实现方式中，自动化助理204和/或自动化助理222可以包括输入处理引擎206，其可以采用多个不同模块来处理客户端设备218的输入和/或输出。例如，输入处理引擎206可以包括语音处理引擎208，其可以处理在助理界面220处接收到的音频数据以标识音频数据中所包含的文本。音频数据可以从例如客户端设备218传送到服务器设备202，以便在客户端设备218处保留计算资源。In some implementations, automated assistant 204 and/or automated assistant 222 can include an input processing engine 206 that can employ a number of different modules to process input and/or output for client device 218. For example, input processing engine 206 can include speech processing engine 208 that can process audio data received at assistant interface 220 to identify text contained in the audio data. The audio data can be transmitted from, for example, client device 218 to server device 202 in order to preserve computing resources at client device 218.

用于将音频数据转换成文本的过程可以包括语音识别算法，其可以采用神经网络和/或统计模型来标识与单词或短语相对应的音频数据组。从音频数据转换的文本可以由自然语言理解(NLU)/意图引擎210解析，并且作为可用于标识用户所请求的一个或多个意图的数据而对自动化助理204可用。在一些实现方式中，由NLU/意图引擎210提供的输出数据可以被提供给动作引擎214以确定用户是否已提供输入，该输入对应于能够由自动化助理204和/或能够由自动化助理204访问的应用或代理执行的特定动作和/或例程。例如，助理数据216可以作为客户端数据230被存储在服务器设备202和/或客户端设备218处，并且可以包括定义能够由自动化助理204和/或自动化助理222执行的一个或多个动作的数据，以及在执行那些动作时所涉及的任何槽值和/或其他参数。The process for converting audio data into text may include a speech recognition algorithm, which may employ a neural network and/or a statistical model to identify an audio data group corresponding to a word or phrase. The text converted from the audio data may be parsed by a natural language understanding (NLU)/intent engine 210 and available to the automated assistant 204 as data that can be used to identify one or more intents requested by the user. In some implementations, the output data provided by the NLU/intent engine 210 may be provided to the action engine 214 to determine whether the user has provided input corresponding to a specific action and/or routine that can be performed by the automated assistant 204 and/or an application or agent that can be accessed by the automated assistant 204. For example, the assistant data 216 may be stored as client data 230 at the server device 202 and/or the client device 218, and may include data defining one or more actions that can be performed by the automated assistant 204 and/or the automated assistant 222, as well as any slot values and/or other parameters involved in performing those actions.

当输入处理引擎206已经确定用户已经请求要实现和/或以其他方式要执行的特定意图、例程和/或动作时，槽引擎212可以确定用于特定意图和/或动作的槽的一个或多个槽值，并且动作引擎214然后可以基于特定意图、动作、例程和/或一个或多个槽值向用户提供输出。例如，在一些实现方式中，响应于用户输入，诸如指向车辆238的客户端设备218的助理界面220的手势，自动化助理222可以使得表征手势的数据被传送到服务器设备202，从而允许服务器设备确定用户正打算让自动化助理204和/或自动化助理222执行的意图和/或动作。When input processing engine 206 has determined that the user has requested a particular intent, routine, and/or action to be implemented and/or otherwise performed, slot engine 212 can determine one or more slot values for the slot for the particular intent and/or action, and action engine 214 can then provide output to the user based on the particular intent, action, routine, and/or one or more slot values. For example, in some implementations, in response to user input, such as a gesture of assistant interface 220 of client device 218 directed toward vehicle 238, automated assistant 222 can cause data representing the gesture to be transmitted to server device 202, thereby allowing the server device to determine the intent and/or action that the user is intending for automated assistant 204 and/or automated assistant 222 to perform.

在一些实现方式中，客户端设备218可以是但不限于台式计算机、膝上型计算机、诸如蜂窝电话的便携式计算设备、平板计算设备、智能家庭设备和/或可以直接或间接与服务器设备通信的任何其他设备。客户端设备218可以对应于可以随时间改变的特定版本。例如，客户端设备218可以对应于用于从服务器设备202接收支持的最新版本，或者客户端设备218可以对应于不是用于从服务器设备202接收支持的最新版本的版本。该版本可以对应于客户端设备218的一个或多个物理硬件组件、客户端设备218的操作系统、在客户端设备218处可用的一个或多个应用、自动化助理222和/或可以与客户端设备218相关联的任何其他设备或模块。In some implementations, the client device 218 may be, but is not limited to, a desktop computer, a laptop computer, a portable computing device such as a cellular phone, a tablet computing device, a smart home device, and/or any other device that can communicate directly or indirectly with a server device. The client device 218 may correspond to a specific version that may change over time. For example, the client device 218 may correspond to a latest version for receiving support from the server device 202, or the client device 218 may correspond to a version that is not the latest version for receiving support from the server device 202. The version may correspond to one or more physical hardware components of the client device 218, an operating system of the client device 218, one or more applications available at the client device 218, an automated assistant 222, and/or any other device or module that may be associated with the client device 218.

为了使服务器设备202逐步淘汰对特定版本的支持，服务器设备202可以包括版本引擎234，服务器设备202可以根据该版本引擎来确定对支持特定版本的限制。例如，使用版本引擎234和/或服务器数据236，服务器设备202可以确定客户端设备218对应于被服务器设备202完全支持的版本。换言之，当对响应于来自客户端设备218的某些意图请求至少相对于其他版本没有限制时，该版本可以被完全支持。In order for the server device 202 to phase out support for a particular version, the server device 202 may include a version engine 234 from which the server device 202 may determine restrictions on supporting a particular version. For example, using the version engine 234 and/or the server data 236, the server device 202 may determine that the client device 218 corresponds to a version that is fully supported by the server device 202. In other words, a version may be fully supported when there are no restrictions on responding to certain intent requests from the client device 218, at least relative to other versions.

作为示例，当客户端设备218对应于被完全支持的版本，并且用户针对特定意图向客户端设备218的助理界面220提供口头话语时，服务器设备202可以负责表征该意图，生成合适的动作以及可选地生成可以由客户端设备218使用以实现特定意图的槽值。或者，当客户端设备218对应于被部分支持的版本，并且用户针对该特定意图向助理界面220提供口头事件时，服务器设备202可以负责表征该意图，并且任选地生成槽值。然后，表征意图和/或槽值的结果数据可以被提供回客户端设备218，其可以采用动作引擎228来生成与特定意图相对应并且也被客户端设备218的版本支持的动作。As an example, when the client device 218 corresponds to a fully supported version and a user provides a spoken utterance to the assistant interface 220 of the client device 218 for a particular intent, the server device 202 can be responsible for characterizing the intent, generating appropriate actions, and optionally generating a slot value that can be used by the client device 218 to implement the particular intent. Alternatively, when the client device 218 corresponds to a partially supported version and a user provides a spoken event to the assistant interface 220 for the particular intent, the server device 202 can be responsible for characterizing the intent and optionally generating a slot value. The resulting data characterizing the intent and/or the slot value can then be provided back to the client device 218, which can employ the action engine 228 to generate an action that corresponds to the particular intent and is also supported by the version of the client device 218.

在一些实现方式中，版本引擎234可以确定客户端设备218对应于不再被服务器设备202支持的版本。例如，响应于在助理界面220处接收到来自用户的口头话语，客户端设备218可以向服务器设备202传送与口头话语相对应的音频数据以及版本数据232。存储在客户端设备218处的版本数据232可以指示与客户端设备218相对应的一个或多个版本、和/或已经由一个或多个服务器设备阐述的一个或多个限制。限制可以包括但不限于使客户端设备218不依赖服务器设备202提供意图数据、动作数据、槽数据和/或与来自用户的某些请求相对应的任何其他数据的指令。使用版本数据，服务器设备202可以确定客户端设备218是否对应于服务器设备202完全支持、部分支持或不再支持的版本。In some implementations, version engine 234 may determine that client device 218 corresponds to a version that is no longer supported by server device 202. For example, in response to receiving a spoken utterance from a user at assistant interface 220, client device 218 may transmit audio data corresponding to the spoken utterance and version data 232 to server device 202. Version data 232 stored at client device 218 may indicate one or more versions corresponding to client device 218, and/or one or more limitations that have been elaborated by one or more server devices. Limitations may include, but are not limited to, instructions that cause client device 218 not to rely on server device 202 to provide intent data, action data, slot data, and/or any other data corresponding to certain requests from the user. Using the version data, server device 202 may determine whether client device 218 corresponds to a version that server device 202 fully supports, partially supports, or no longer supports.

当客户端设备218对应于服务器设备202不再支持的版本时，服务器设备202可以处理从客户端设备218接收到的音频数据，并且将表征口头话语的自然语言内容的文本数据提供回客户端设备218。因此，客户端设备219将不被提供意图、动作和/或槽值支持，而是当客户端设备218与远程计算设备242之间的网络连接可用时，语音到文本服务仍然可以对客户端设备218可用。这可以消除在客户端设备218处的浪费的处理，该客户端设备218可以具有比服务器设备202更少的存储器。例如，当客户端设备218对应于相当旧版本的硬件，并且因此仅包括200MB的RAM，但是服务器设备202包括8GB的RAM时，服务器设备202将能够在比客户端设备218可以执行语音到文本处理更少的时间内完成对音频数据的语音到文本处理。结果，当客户端设备218正采用语音到文本引擎224来将音频数据转换成文本数据时，客户端设备218可以从服务器设备202接收结果文本数据。然后客户端设备218可以选择终止正在进行的本地语音到文本处理，至少因为由服务器设备202生成的文本数据消除了对客户端设备218生成这类数据的需要。When the client device 218 corresponds to a version that is no longer supported by the server device 202, the server device 202 can process the audio data received from the client device 218 and provide text data representing the natural language content of the spoken utterance back to the client device 218. Thus, the client device 219 will not be provided with intent, action, and/or slot value support, but the speech-to-text service may still be available to the client device 218 when a network connection between the client device 218 and the remote computing device 242 is available. This can eliminate wasted processing at the client device 218, which may have less memory than the server device 202. For example, when the client device 218 corresponds to a significantly older version of hardware and therefore includes only 200MB of RAM, but the server device 202 includes 8GB of RAM, the server device 202 will be able to complete speech-to-text processing of the audio data in less time than the client device 218 can perform speech-to-text processing. As a result, while the client device 218 is employing the speech-to-text engine 224 to convert the audio data into text data, the client device 218 may receive the resulting text data from the server device 202. The client device 218 may then choose to terminate the ongoing local speech-to-text processing, at least because the text data generated by the server device 202 eliminates the need for the client device 218 to generate such data.

在一些实现方式中，与被服务器设备202部分支持的版本相对应的客户端设备218可以在本地处理用于口头话语的音频数据，并且还将音频数据传送到服务器设备202用于处理。对音频数据的本地处理可以由NLU/意图引擎226执行，该NLU/意图引擎能够从表征口头话语的自然语言内容的文本数据生成自然语言理解(NLU)数据。自然语言理解数据可以表征一个或多个所请求的意图。如果服务器设备202在比客户端设备218更短的时间段内生成自然语言理解数据，则客户端设备218可以依赖于服务器设备202来获得自然语言理解数据。然而，此后，客户端设备218可以使得动作引擎228处理来自服务器设备202的自然语言理解数据，以至少生成可以由客户端设备218的版本执行的适当动作。版本数据可以表征一个或多个硬件组件、一个或多个软件组件和/或客户端设备的任何其他特征的一个或多个特性。例如，在一些实现方式中，版本数据可以表征客户端设备的一个或多个操作规范。可替换地或附加地，版本数据可以表征客户端设备218的应用和/或操作系统的一个或多个操作规范。可替换地或附加地，版本数据可以由客户端设备218的制造商、客户端设备的组件和/或客户端设备218的应用或操作系统的软件制造商来指定。在一些实现方式中，版本数据可以表征与一个或多个用户相对应的账户和/或订阅层级。例如，版本数据可以将一个或多个设备表征为对应于一个或多个账户层级(例如，完整成员资格和/或完整服务层级)，并且其他版本数据可以将一个或多个其他设备表征为对应于一个或多个其他账户层级(例如，有限成员资格和/或有限服务层级)。In some implementations, a client device 218 corresponding to a version partially supported by the server device 202 can process audio data for spoken utterances locally, and also transmit the audio data to the server device 202 for processing. Local processing of the audio data can be performed by an NLU/intent engine 226, which is capable of generating natural language understanding (NLU) data from text data representing the natural language content of the spoken utterances. The natural language understanding data can represent one or more requested intents. If the server device 202 generates natural language understanding data in a shorter time period than the client device 218, the client device 218 can rely on the server device 202 to obtain the natural language understanding data. However, thereafter, the client device 218 can cause the action engine 228 to process the natural language understanding data from the server device 202 to generate at least appropriate actions that can be performed by the version of the client device 218. The version data can represent one or more characteristics of one or more hardware components, one or more software components, and/or any other features of the client device. For example, in some implementations, the version data can represent one or more operating specifications of the client device. Alternatively or additionally, the version data may characterize one or more operating specifications of an application and/or operating system of the client device 218. Alternatively or additionally, the version data may be specified by a manufacturer of the client device 218, a component of the client device, and/or a software manufacturer of an application or operating system of the client device 218. In some implementations, the version data may characterize accounts and/or subscription tiers corresponding to one or more users. For example, the version data may characterize one or more devices as corresponding to one or more account tiers (e.g., full membership and/or full service tier), and other version data may characterize one or more other devices as corresponding to one or more other account tiers (e.g., limited membership and/or limited service tier).

图3示出用于在逐步淘汰对某些计算设备版本的支持的同时确保仍在操作中的任何计算设备版本的有用响应性的方法300。方法300可以由一个或多个计算设备、应用和/或可以与自动化助理相关联的任何其它设备或模块来执行。方法300可以包括确定在计算设备处是否检测到口头话语的操作302。计算设备可以是但不限于车辆计算设备、移动计算设备、台式计算设备、蜂窝设备、服务器设备和/或能够作为计算设备操作的任何其他装置。口头话语可以是来自计算设备的用户的口头自然语言输入。口头话语可以是例如“Assistant,stream music(助理，流音乐)”。FIG3 illustrates a method 300 for ensuring useful responsiveness of any computing device version still in operation while phasing out support for certain computing device versions. The method 300 may be performed by one or more computing devices, applications, and/or any other device or module that may be associated with an automated assistant. The method 300 may include an operation 302 of determining whether a spoken utterance is detected at a computing device. The computing device may be, but is not limited to, a vehicle computing device, a mobile computing device, a desktop computing device, a cellular device, a server device, and/or any other device capable of operating as a computing device. The spoken utterance may be a spoken natural language input from a user of the computing device. The spoken utterance may be, for example, “Assistant, stream music”.

方法300可以从操作302前进至操作304，该操作可以包括确定对应于计算设备的版本。或者，如果没有检测到口头话语，则操作302可以重复直到在计算设备处检测到口头话语为止。然而，关于操作304，对应于计算设备的版本可以是用于整个计算设备、计算设备的一个或多个硬件组件、计算设备的一个或多个软件组件、和/或可以由版本表征的计算设备的任何其他特征的版本。例如，计算设备可以操作本地自动化助理，其可以对应于特定版本。因此，本地自动化助理可以根据具有本地自动化助理的版本而不同地响应。类似地，整个计算设备可以根据对应于整个计算设备的特定版本而不同地响应，整个计算设备可以可选地集成到车辆。From operation 302, method 300 may proceed to operation 304, which may include determining a version corresponding to a computing device. Alternatively, if no spoken utterance is detected, operation 302 may be repeated until a spoken utterance is detected at the computing device. However, with respect to operation 304, the version corresponding to the computing device may be a version for the entire computing device, one or more hardware components of the computing device, one or more software components of the computing device, and/or any other feature of the computing device that may be characterized by a version. For example, a computing device may operate a local automated assistant, which may correspond to a particular version. Thus, the local automated assistant may respond differently depending on the version of the local automated assistant. Similarly, the entire computing device may respond differently depending on a particular version corresponding to the entire computing device, which may optionally be integrated into a vehicle.

方法300可以从操作304前进至操作306，其可以包括确定计算设备的版本是否被服务器设备完全支持。操作306可以由服务器设备执行，该服务器设备可以从计算设备接收版本数据，或者以其他方式响应于确定在计算设备处检测到口头话语而访问版本数据。当确定对应于计算设备的版本被服务器设备完全支持时，方法300可以前进至操作308。可替换地，当确定对应于计算设备的版本不被服务器设备完全支持时，方法300可以前进至操作310。From operation 304, method 300 may proceed to operation 306, which may include determining whether the version of the computing device is fully supported by the server device. Operation 306 may be performed by a server device, which may receive version data from the computing device or otherwise access version data in response to determining that a spoken utterance is detected at the computing device. When it is determined that the version corresponding to the computing device is fully supported by the server device, method 300 may proceed to operation 308. Alternatively, when it is determined that the version corresponding to the computing device is not fully supported by the server device, method 300 may proceed to operation 310.

操作308可以包括基于口头话语的自然语言内容来生成动作数据。动作数据可以表征能够由计算设备的特定版本执行、并且当前也被服务器设备支持的一个或多个动作。例如，至少对应于前述示例(例如，“Assistant,stream music”)，动作数据可以生成STREAM_MUSIC动作，其可以对应于PLAY_MUSIC意图。Operation 308 may include generating action data based on the natural language content of the spoken utterance. The action data may characterize one or more actions that can be performed by a particular version of the computing device and that are currently supported by the server device. For example, corresponding at least to the aforementioned example (e.g., "Assistant, stream music"), the action data may generate a STREAM_MUSIC action, which may correspond to a PLAY_MUSIC intent.

操作310可以包括确定对应于计算设备的版本是否被服务器设备至少部分地支持。当确定对应于计算设备的版本被服务器设备至少部分地支持时，方法300可以前进至操作312。操作312可以包括基于自然语言内容生成意图/NLU数据和/或槽数据。意图/NLU数据可以表征用户经由口头话语所请求的意图，并且槽数据可以表征当计算设备正在执行一个或多个动作时可以使用的一个或多个槽值。例如，如果用户在他们的口头话语中已经指定艺术家演奏，则槽数据可以表征艺术家姓名。以此方式，尽管服务器设备不完全支持对应于计算设备的版本，但是服务器设备仍可以向计算设备提供有用信息，以便消除计算设备处的计算资源的浪费。Operation 310 may include determining whether the version corresponding to the computing device is at least partially supported by the server device. When it is determined that the version corresponding to the computing device is at least partially supported by the server device, method 300 may proceed to operation 312. Operation 312 may include generating intent/NLU data and/or slot data based on natural language content. The intent/NLU data may characterize the intent requested by the user via the spoken utterance, and the slot data may characterize one or more slot values that may be used when the computing device is performing one or more actions. For example, if the user has specified an artist to perform in their spoken utterance, the slot data may characterize the artist name. In this way, although the server device does not fully support the version corresponding to the computing device, the server device may still provide useful information to the computing device to eliminate the waste of computing resources at the computing device.

当对应于计算设备的版本不被服务器设备至少部分地支持时，方法300可以前进至操作314。操作314可以包括基于口头话语的自然语言内容生成文本数据。因为计算设备至少相对于服务器设备可能具有有限的处理带宽和/或存储器，所以计算设备的计算资源仍然可以通过依赖于服务器设备至少进行语音到文本处理而被保留。应当注意，当对应于计算设备的版本被服务器设备完全支持时，操作308、操作312和/或操作314可以对于该特定版本由服务器设备执行。此外，当确定对应于计算设备的版本被服务器设备至少部分地支持时，操作312和/或操作314可以对于该特定版本由服务器设备执行。When the version corresponding to the computing device is not supported at least in part by the server device, the method 300 can proceed to operation 314. Operation 314 can include generating text data based on the natural language content of the spoken utterance. Because the computing device may have limited processing bandwidth and/or memory at least relative to the server device, the computing resources of the computing device can still be retained by relying on the server device to at least perform speech-to-text processing. It should be noted that when the version corresponding to the computing device is fully supported by the server device, operation 308, operation 312 and/or operation 314 can be performed by the server device for this specific version. In addition, when determining that the version corresponding to the computing device is supported at least in part by the server device, operation 312 and/or operation 314 can be performed by the server device for this specific version.

方法300可以从操作314前进至操作316，用于将所生成的数据提供给计算设备。所生成的数据可以对应于动作数据、意图/NLU数据、槽数据和/或文本数据，这取决于服务器设备支持对应于计算设备的版本的程度。因此，尽管服务器设备可能不完全或部分地支持特定版本，但是服务器设备仍然可以提供将帮助计算设备处理来自用户的口头话语的一定量的数据。当服务器设备不再支持对应于计算设备的版本时，计算设备可以使用从服务器设备提供的语音到文本数据来本地选择我们的用于实现来自用户的请求的动作之一。例如，关于前述示例，当服务器设备不完全或部分地支持对应于计算设备的版本时，服务器设备可以提供表征口头话语的自然语言内容的文本数据(例如，“STT_DATA(‘Assistant,stream music’)”)。作为响应，计算设备可以确定用户所请求的意图(例如，“PLAY_MUSIC”)，并且选择经由计算设备可用的动作(例如，“SHUFFLE_MUSIC()”)。以此方式，可以消除服务器设备与计算设备之间的浪费的通信，同时还向用户提供响应性。尽管用户没有采用对应于最新版本的版本的计算设备，或者对应于甚至不被服务器设备至少部分地支持的版本的计算设备，也可以实现这样的益处。应当注意，服务器设备的支持可以指意图特定支持，但是不包括意图不可知的支持，诸如语音到文本处理，其可以由服务器设备执行而不管来自用户的口头话语的自然语言内容。Method 300 can proceed from operation 314 to operation 316 for providing the generated data to a computing device. The generated data may correspond to action data, intent/NLU data, slot data, and/or text data, depending on the extent to which the server device supports the version corresponding to the computing device. Therefore, although the server device may not fully or partially support a specific version, the server device can still provide a certain amount of data that will help the computing device process the spoken utterance from the user. When the server device no longer supports the version corresponding to the computing device, the computing device can use the speech-to-text data provided from the server device to locally select one of our actions for implementing the request from the user. For example, with respect to the aforementioned example, when the server device does not fully or partially support the version corresponding to the computing device, the server device can provide text data (e.g., "STT_DATA('Assistant, stream music')") characterizing the natural language content of the spoken utterance. In response, the computing device can determine the intent requested by the user (e.g., "PLAY_MUSIC") and select an action (e.g., "SHUFFLE_MUSIC()") available via the computing device. In this way, wasteful communications between the server device and the computing device can be eliminated while also providing responsiveness to the user. Such benefits can be achieved despite the fact that the user does not employ a computing device corresponding to a version of the latest version, or a computing device corresponding to a version that is not even at least partially supported by the server device. It should be noted that support by the server device may refer to intent-specific support, but does not include intent-agnostic support, such as speech-to-text processing, which may be performed by the server device regardless of the natural language content of the spoken utterance from the user.

图4示出用于在逐步淘汰对特定请求的支持的同时确保对计算设备先前已经响应的任何请求的有用响应性的方法400。方法400可以由一个或多个计算设备、应用和/或可以与自动化助理相关联的任何其他设备或模块来执行。方法400可以包括确定在计算设备处是否检测到口头话语的操作402。计算设备可以是但不限于车辆计算设备、移动计算设备、台式计算设备、蜂窝设备、服务器设备和/或能够作为计算设备操作的任何其他设备。口头话语可以是来自计算设备的用户的口头自然语言输入。作为示例，口头话语可以包括计算设备控制诸如IoT设备的外围设备的请求(例如，“Assistant,turn down the backlightof my smart TV(助理，请调低我的智能电视的背光)”)。FIG4 illustrates a method 400 for ensuring useful responsiveness to any request that a computing device has previously responded to while phasing out support for a particular request. The method 400 may be performed by one or more computing devices, applications, and/or any other device or module that may be associated with an automated assistant. The method 400 may include an operation 402 of determining whether a spoken utterance is detected at a computing device. The computing device may be, but is not limited to, a vehicle computing device, a mobile computing device, a desktop computing device, a cellular device, a server device, and/or any other device capable of operating as a computing device. The spoken utterance may be a spoken natural language input from a user of the computing device. As an example, a spoken utterance may include a request for a computing device to control a peripheral device such as an IoT device (e.g., “Assistant, turn down the backlight of my smart TV”).

方法400可以从操作402前进至操作404，该操作可以包括确定经由口头话语所请求的意图。意图可以表征来自用户的请求和/或可以与能够由不同的相应计算设备执行的一个或多个不同的动作相关联。例如，与口头话语“Assistant,turn down the back lightof my smart TV”相对应的意图可以与诸如ADJUST_TV_SETTING意图之类的意图相对应。对应于计算设备的至少一些版本可以支持用于意图的动作，诸如BACKLIGHT_ADJUST动作，然而，对应于计算设备的其它版本将不支持BACKLIGHT_ADJUST动作，而是可以执行SETTING_ADJUST动作。From operation 402, method 400 may proceed to operation 404, which may include determining an intent requested via a spoken utterance. An intent may characterize a request from a user and/or may be associated with one or more different actions that can be performed by different corresponding computing devices. For example, an intent corresponding to the spoken utterance "Assistant, turn down the back light of my smart TV" may correspond to an intent such as an ADJUST_TV_SETTING intent. At least some versions corresponding to the computing device may support actions for the intent, such as a BACKLIGHT_ADJUST action, however, other versions corresponding to the computing device will not support the BACKLIGHT_ADJUST action, but may perform a SETTING_ADJUST action.

方法400可以从操作404前进至操作406，其可以包括基于自然语言内容生成和/或接收文本或数据。在一些实现方式中，计算设备和服务器设备两者都可以响应于计算设备检测到口头话语而执行语音到文本处理。具体地，计算设备可以响应于接收到口头话语而生成音频数据，并且在来自用户的许可的情况下将音频数据提供给服务器设备，并且还在本地处理音频数据。如果服务器设备对音频数据的提供无响应，则计算设备可以本地处理音频数据，以便生成表征口头话语的自然语言内容的文本数据。从服务器设备和/或从计算设备接收到的场境数据可以被进一步处理以便标识被请求实现的一个或多个意图。Method 400 can proceed to operation 406 from operation 404, which can include generating and/or receiving text or data based on natural language content. In some implementations, both computing devices and server devices can perform speech-to-text processing in response to the computing device detecting spoken utterances. Specifically, the computing device can generate audio data in response to receiving spoken utterances, and provide the audio data to the server device with permission from the user, and also process the audio data locally. If the server device is unresponsive to the provision of audio data, the computing device can process the audio data locally to generate text data representing the natural language content of the spoken utterances. The context data received from the server device and/or from the computing device can be further processed to identify one or more intentions requested to be implemented.

方法400可以从操作408前进到操作410，其可以包括确定服务器设备是否完全支持一个或多个标识出的意图。在一些实现方式中，操作408可以包括确定计算设备是否已经被指示不再请求对已经从口头话语的自然语言内容标识出的一个或多个意图的支持。当计算设备确定一个或多个意图被服务器设备完全支持或者计算设备尚未被完全阻止请求对一个或多个意图的支持时，方法400可以从操作408前进到操作410。可替换地，当计算设备确定一个或多个意图不被服务器设备至少部分地支持，或者计算设备已经被完全阻止请求对一个或多个意图的支持时，方法400可以从操作408前进到操作412。Method 400 may proceed from operation 408 to operation 410, which may include determining whether the server device fully supports one or more identified intents. In some implementations, operation 408 may include determining whether the computing device has been instructed to no longer request support for one or more intents that have been identified from the natural language content of the spoken utterance. When the computing device determines that one or more intents are fully supported by the server device or the computing device has not been completely blocked from requesting support for one or more intents, method 400 may proceed from operation 408 to operation 410. Alternatively, when the computing device determines that one or more intents are not at least partially supported by the server device, or the computing device has been completely blocked from requesting support for one or more intents, method 400 may proceed from operation 408 to operation 412.

操作410可以包括基于自然语言内容和/或从自然语言内容标识出的一个或多个意图来请求动作数据。例如，计算设备可以标识ADJUST_TV_SETTING意图，并且响应于确定该意图在给定对应于计算设备的版本的情况下被服务器设备完全支持，计算设备可以从服务器设备请求动作数据。例如，动作数据可以表征BACKLIGHT_ADJUST动作，该BACKLIGHT_ADJUST动作可以由相对于经由其它计算设备操作的其它版本的最新版本可执行。Operation 410 may include requesting action data based on the natural language content and/or one or more intents identified from the natural language content. For example, the computing device may identify the ADJUST_TV_SETTING intent, and in response to determining that the intent is fully supported by the server device given a version corresponding to the computing device, the computing device may request action data from the server device. For example, the action data may characterize a BACKLIGHT_ADJUST action that may be executable by the latest version relative to other versions operated via other computing devices.

方法400可以从操作408前进到操作412，其可以包括确定一个或多个标识出的意图是否被服务器设备部分地支持。在一些实现方式中，操作412可以包括确定计算设备是否已经被指示仅请求对已经从口头话语的自然语言内容标识出的一个或多个意图的特定支持。当计算设备确定一个或多个意图被服务器设备至少部分地支持，或者计算设备在对一个或多个意图的支持的类型方面没有被限制时，方法400可以从操作412前进到操作414。可替换地，当计算设备确定一个或多个意图不被服务器设备至少部分地支持，或者计算设备已经被限制请求对一个或多个意图的支持时，方法400可以从操作412前进到操作416。From operation 408, method 400 may proceed to operation 412, which may include determining whether one or more identified intents are partially supported by the server device. In some implementations, operation 412 may include determining whether the computing device has been instructed to request only specific support for one or more intents that have been identified from the natural language content of the spoken utterance. When the computing device determines that one or more intents are at least partially supported by the server device, or the computing device is not limited in the type of support for one or more intents, method 400 may proceed from operation 412 to operation 414. Alternatively, when the computing device determines that one or more intents are not at least partially supported by the server device, or the computing device has been limited in requesting support for one or more intents, method 400 may proceed from operation 412 to operation 416.

操作414可以包括从服务器设备请求意图/NLU数据和/或槽数据。意图/NLU数据和/或槽数据可以由计算设备使用，以便标识经由对应于计算设备的版本保持可执行的特定功能。例如，服务器设备和/或计算设备可以标识意图ADJUST_TV_SETTING，并且服务器设备可以标识要由计算设备所选择的动作的槽数据。例如，槽数据可以包括至少基于口头话语的自然语言内容的槽值，诸如“backlight(背光)”。Operation 414 may include requesting intent/NLU data and/or slot data from the server device. The intent/NLU data and/or slot data may be used by the computing device to identify specific functionality that remains executable via a version corresponding to the computing device. For example, the server device and/or the computing device may identify the intent ADJUST_TV_SETTING, and the server device may identify slot data for an action to be selected by the computing device. For example, the slot data may include a slot value based at least on the natural language content of the spoken utterance, such as "backlight".

当至少对于对应于计算设备的版本确定意图不再被服务器设备支持时，方法400可以从操作412前进至操作416。操作416包括本地选择用于实现所请求的意图的动作。例如，计算设备可以确定服务器设备先前提供了用于完全限制计算设备请求对ADJUST_TV_SETTING意图的支持的能力的指令。基于该确定，计算设备可以本地选择动作，SETTING_ADJUST()，该动作在被执行时，可以使得菜单出现在TV处以便允许用户对TV的设置进行调整。When it is determined that the intent is no longer supported by the server device, at least for the version corresponding to the computing device, method 400 may proceed from operation 412 to operation 416. Operation 416 includes locally selecting an action for implementing the requested intent. For example, the computing device may determine that the server device previously provided instructions for completely limiting the computing device's ability to request support for the ADJUST_TV_SETTING intent. Based on this determination, the computing device may locally select an action, SETTING_ADJUST(), which, when executed, may cause a menu to appear at the TV to allow the user to adjust the TV's settings.

在一些实现方式中，方法400可以从操作410前进至操作414，以便计算设备利用来自服务器设备的支持。方法400可以从操作414和/或操作416前进到操作418。操作418可以包括执行动作以促进感觉所请求的意图。当在计算设备处本地标识动作时，执行可以包括执行本地选择的动作。当动作被服务器设备标识时，执行可以包括执行远程选择的动作。这样，无论对应于计算设备的版本是最新版本还是不是最新版本，计算设备仍然将以对用户有用的方式响应于来自用户的口头话语，同时还消除了计算资源的浪费。例如，远程选择的动作可以使得BACKLIGHT_ADJUST动作被执行。可替换地，本地选择的动作计算机可以使得SETTING_ACTION_ADJUST动作被执行，从而使得菜单出现在TV或另一设备处。当该版本针对特定意图被服务器设备至少部分地支持并且服务器设备提供槽数据时，本地选择的动作可以使得菜单出现，并且使得标识菜单的背光调节部分的子菜单出现。以此方式，部分支持仍然允许计算设备利用服务器设备的计算过程，尽管计算设备不是根据最新版本进行操作。In some implementations, method 400 may proceed from operation 410 to operation 414 so that the computing device utilizes support from the server device. Method 400 may proceed from operation 414 and/or operation 416 to operation 418. Operation 418 may include performing an action to facilitate the requested intention of the sensation. When the action is identified locally at the computing device, the execution may include performing the action selected locally. When the action is identified by the server device, the execution may include performing the action selected remotely. In this way, whether the version corresponding to the computing device is the latest version or not, the computing device will still respond to the spoken words from the user in a manner useful to the user, while also eliminating the waste of computing resources. For example, the action selected remotely may cause the BACKLIGHT_ADJUST action to be executed. Alternatively, the action computer selected locally may cause the SETTING_ACTION_ADJUST action to be executed, so that the menu appears at the TV or another device. When the version is at least partially supported by the server device for a specific intention and the server device provides slot data, the action selected locally may cause the menu to appear, and cause the submenu of the backlight adjustment part of the identification menu to appear. In this manner, partial support still allows the computing device to utilize the computing processes of the server device even though the computing device is not operating according to the latest version.

图5是示例计算机系统510的框图。计算机系统510通常包括至少一个处理器514，其经由总线子系统512与多个外围设备通信。这些外围设备可以包括存储子系统524，例如包括存储器525和文件存储子系统526、用户界面输出设备520、用户界面输入设备522和网络接口子系统516。输入和输出设备允许用户与计算机系统510交互。网络接口子系统516提供到外部网络的界面，并且耦合到其它计算机系统中的对应界面设备。5 is a block diagram of an example computer system 510. Computer system 510 typically includes at least one processor 514 that communicates with a number of peripheral devices via a bus subsystem 512. These peripheral devices may include a storage subsystem 524, for example including a memory 525 and a file storage subsystem 526, a user interface output device 520, a user interface input device 522, and a network interface subsystem 516. The input and output devices allow a user to interact with computer system 510. Network interface subsystem 516 provides an interface to an external network and is coupled to corresponding interface devices in other computer systems.

用户界面输入设备522可以包括键盘、诸如鼠标、跟踪球、触摸板或图形输入板等定点设备、扫描仪、结合到显示器中的触摸屏、诸如言语识别系统等音频输入设备、麦克风、和/或其它类型的输入设备。通常，术语“输入设备”的使用旨在包括将信息输入到计算机系统510中或通信网络上的所有可能类型的设备和方式。The user interface input devices 522 may include a keyboard, a pointing device such as a mouse, a trackball, a touch pad or a graphics tablet, a scanner, a touch screen incorporated into a display, an audio input device such as a speech recognition system, a microphone, and/or other types of input devices. In general, the use of the term "input device" is intended to include all possible types of devices and ways to input information into the computer system 510 or over a communication network.

用户界面输出设备520可以包括显示子系统、打印机、传真机、或者诸如音频输出设备的非视觉显示器。显示子系统可以包括阴极射线管(CRD)、诸如液晶显示器(LCD)的平板设备、投影设备、或用于创建可见图像的一些其他机构。显示子系统还可以诸如经由音频输出设备提供非视觉显示。通常，术语“输出设备”的使用旨在包括从计算机系统510向用户或另一机器或计算机系统输出信息的所有可能类型的设备和方式。User interface output device 520 can include display subsystem, printer, fax machine, or non-visual display such as audio output device.Display subsystem can include cathode ray tube (CRD), flat panel device such as liquid crystal display (LCD), projection device, or some other mechanism for creating visible image.Display subsystem can also provide non-visual display such as via audio output device.Usually, the use of term "output device" is intended to include all possible types of devices and modes for outputting information from computer system 510 to a user or another machine or computer system.

存储子系统524存储提供本文所述的一些或所有模块的功能的编程和数据构造。例如，存储子系统524可以包括用于执行方法300、方法400和/或服务器设备114、车辆计算设备106、车辆104、车辆计算设备140、车辆134、远程计算设备112、车辆计算设备156、车辆154、服务器设备202、客户端设备218和/或本文讨论的任何其他装置、模块和/或引擎的选定方面的逻辑。The storage subsystem 524 stores programming and data constructs that provide the functionality of some or all of the modules described herein. For example, the storage subsystem 524 may include logic for executing selected aspects of the method 300, the method 400, and/or the server device 114, the vehicle computing device 106, the vehicle 104, the vehicle computing device 140, the vehicle 134, the remote computing device 112, the vehicle computing device 156, the vehicle 154, the server device 202, the client device 218, and/or any other apparatus, module, and/or engine discussed herein.

这些软件模块通常由处理器514单独执行或与其它处理器结合执行。存储子系统524中使用的存储器525可以包括多个存储器，包括用于在程序执行期间存储指令和数据的主随机存取存储器(RAM)530和其中存储固定指令的只读存储器(ROM)532。文件存储子系统526可以为程序和数据文件提供永久存储，并且可以包括硬盘驱动器、软盘驱动器以及相关联的可移动介质、CD-ROM驱动器、光驱动器、或可移动介质盒。实现某些实现的功能的模块可以由文件存储子系统526存储在存储子系统524中，或者存储在由(多个)处理器514可访问的其他机器中。These software modules are typically executed by processor 514 alone or in combination with other processors. The memory 525 used in storage subsystem 524 may include multiple memories, including a main random access memory (RAM) 530 for storing instructions and data during program execution and a read-only memory (ROM) 532 in which fixed instructions are stored. File storage subsystem 526 may provide permanent storage for program and data files, and may include a hard drive, a floppy disk drive and associated removable media, a CD-ROM drive, an optical drive, or a removable media cartridge. Modules that implement certain implemented functions may be stored in storage subsystem 524 by file storage subsystem 526, or in other machines accessible by processor(s) 514.

总线子系统512提供了用于使计算机系统510的各种组件和子系统如所期望的那样彼此通信的机制。虽然总线子系统512被示意性地示出为单个总线，但是总线子系统的替代实现可以使用多个总线。The bus subsystem 512 provides a mechanism for the various components and subsystems of the computer system 510 to communicate with each other as desired. Although the bus subsystem 512 is shown schematically as a single bus, alternative implementations of the bus subsystem may use multiple busses.

计算机系统510可以是各种类型，包括工作站、服务器、计算群集、刀片服务器、服务器群、或任何其它数据处理系统或计算设备。由于计算机和网络的不断改变的性质，图5中描绘的计算机系统510的描述仅旨在作为用于示出一些实现的目的具体示例。计算机系统510的许多其它配置可能具有比图5中描绘的计算机系统更多或更少的组件。Computer system 510 can be of various types, including a workstation, a server, a computing cluster, a blade server, a server farm, or any other data processing system or computing device. Due to the ever-changing nature of computers and networks, the description of computer system 510 depicted in FIG5 is intended only as a specific example for the purpose of illustrating some implementations. Many other configurations of computer system 510 may have more or fewer components than the computer system depicted in FIG5.

在本文描述的系统收集关于用户(或如本文经常称为“参与者”)的个人信息或可利用个人信息的情况下，可向用户提供控制程序或特征是否收集用户信息(例如，关于用户的社交网络、社交动作或活动、职业、用户的偏好或用户的当前地理位置的信息)或控制是否和/或如何从内容服务器接收可能与用户更相关的内容的机会。而且，某些数据在被存储或使用之前可以以一种或多种方式处理，从而移除个人可标识信息。例如，可处理用户的身份以使得不能为用户确定个人可标识信息，或者可在获得地理位置信息的情况下概括用户的地理位置(诸如到城市、邮政编码或州级)，以使得不能确定用户的特定地理位置。因此，用户可以控制如何收集和/或使用关于用户的信息。In situations where the systems described herein collect or make available personal information about users (or, as they are often referred to herein, "participants"), users may be provided with the opportunity to control whether a program or feature collects user information (e.g., information about the user's social network, social actions or activities, occupation, the user's preferences, or the user's current geographic location) or to control whether and/or how content that may be more relevant to the user is received from a content server. Moreover, certain data may be processed in one or more ways before being stored or used to remove personally identifiable information. For example, the user's identity may be processed so that personally identifiable information cannot be determined for the user, or the user's geographic location may be summarized (such as to a city, zip code, or state level) where geographic location information is available so that the user's specific geographic location cannot be determined. Thus, users may have control over how information about the user is collected and/or used.

在一些实现方式中，由一个或多个处理器实现的方法被阐述为包括操作，诸如基于在服务器设备处处理的音频数据，确定用户已经向连接到车辆的车辆计算设备的自动化助理界面提供口头话语。该方法还可以包括响应于确定所述用户已经提供所述口头话语，访问与所述车辆计算设备相关联的版本信息，其中所述版本信息指示所述车辆计算设备对应于特定版本。该方法还可以包括基于处理所述音频数据，表征由所述用户提供的所述口头话语的自然语言内容。该方法还可以包括基于表征所述口头话语的所述自然语言内容，确定对于所述特定版本所述自然语言内容的至少一部分是否被所述服务器设备完全支持。该方法还可以包括响应于确定对于所述特定版本所述自然语言内容的至少一部分不被所述服务器设备完全支持，至少基于所述版本信息：向所述车辆计算设备提供表征由所述用户所提供的所述口头话语的所述自然语言内容的文本数据，并且使得所述车辆计算设备本地执行由所述车辆计算设备基于所述自然语言内容本地生成的动作。In some implementations, a method implemented by one or more processors is described as including operations such as determining, based on audio data processed at a server device, that a user has provided a spoken utterance to an automated assistant interface of a vehicle computing device connected to a vehicle. The method may also include, in response to determining that the user has provided the spoken utterance, accessing version information associated with the vehicle computing device, wherein the version information indicates that the vehicle computing device corresponds to a specific version. The method may also include characterizing natural language content of the spoken utterance provided by the user based on processing the audio data. The method may also include determining, based on the natural language content characterizing the spoken utterance, whether at least a portion of the natural language content is fully supported by the server device for the specific version. The method may also include, in response to determining that at least a portion of the natural language content is not fully supported by the server device for the specific version, based at least on the version information: providing text data characterizing the natural language content of the spoken utterance provided by the user to the vehicle computing device, and causing the vehicle computing device to locally perform an action generated locally by the vehicle computing device based on the natural language content.

在一些实现方式中，确定对于所述特定版本所述自然语言内容的至少所述部分是否被所述服务器设备完全支持包括：确定所述自然语言内容的至少所述部分是否包括与在所述服务器设备处被指示为不被所述服务器设备完全支持的一个或多个意图相对应的一个或多个自然语言词语。在一些实现方式中，所述一个或多个意图在所述服务器设备处被指示为对于不包括所述车辆计算设备的所述特定版本的其他车辆计算设备版本被完全支持。在一些实现方式中，所述文本数据表征包括所述一个或多个自然语言词语的所述自然语言内容的所述部分。在一些实现方式中，该方法还可以包括响应于确定对于所述特定版本所述自然语言内容的至少所述部分被所述服务器设备完全支持：在所述服务器设备处生成动作数据，所述动作数据标识由所述用户所请求的意图，以及由所述车辆计算设备的所述特定版本支持的另一动作，并且将所述动作数据提供给所述车辆计算设备。在一些实现方式中，将所述动作数据提供给所述车辆计算设备使得所述车辆计算设备使用所提供的动作数据来执行另一动作。In some implementations, determining whether at least the portion of the natural language content is fully supported by the server device for the specific version includes: determining whether at least the portion of the natural language content includes one or more natural language terms corresponding to one or more intents indicated at the server device as not fully supported by the server device. In some implementations, the one or more intents are indicated at the server device as being fully supported for other vehicle computing device versions that do not include the specific version of the vehicle computing device. In some implementations, the text data represents the portion of the natural language content that includes the one or more natural language terms. In some implementations, the method may also include, in response to determining that at least the portion of the natural language content is fully supported by the server device for the specific version: generating action data at the server device, the action data identifying the intent requested by the user and another action supported by the specific version of the vehicle computing device, and providing the action data to the vehicle computing device. In some implementations, providing the action data to the vehicle computing device causes the vehicle computing device to perform another action using the provided action data.

在一些实现方式中，该方法还可以包括，响应于确定对于所述特定版本所述自然语言内容的至少所述部分不被所述服务器设备完全支持，至少关于所述版本信息：生成表征所述车辆计算设备关于所请求的意图的限制的指令数据，并且将所述指令数据提供给所述车辆计算设备，其中，所述指令数据使得所述车辆计算设备响应于后续用户输入而绕过请求所述服务器设备生成与所请求的意图的另一实例相对应的特定动作。在一些实现方式中，该方法还可以包括，响应于确定对于所述特定版本所述自然语言内容的至少所述部分不被所述服务器设备完全支持，并且在向所述车辆计算设备提供所述指令数据之后：确定在所述车辆计算设备的所述自动化助理界面处接收到与所请求的意图相关联的另一口头话语，以及向所述车辆计算设备提供其他文本数据，其中所述其他文本数据表征所述其他口头话语的其他自然语言内容并且忽略表征用于所请求的意图的特定动作的数据。In some implementations, the method may further include, in response to determining that at least the portion of the natural language content for the particular version is not fully supported by the server device, at least with respect to the version information: generating instruction data characterizing a limitation of the vehicle computing device with respect to the requested intent, and providing the instruction data to the vehicle computing device, wherein the instruction data causes the vehicle computing device to bypass requesting the server device to generate a specific action corresponding to another instance of the requested intent in response to subsequent user input. In some implementations, the method may further include, in response to determining that at least the portion of the natural language content for the particular version is not fully supported by the server device, and after providing the instruction data to the vehicle computing device: determining that another spoken utterance associated with the requested intent is received at the automated assistant interface of the vehicle computing device, and providing other text data to the vehicle computing device, wherein the other text data characterizes other natural language content of the other spoken utterance and ignores data characterizing a specific action for the requested intent.

在一些实现方式中，该方法还可以包括，响应于确定对于所述特定版本所述自然语言内容的至少所述部分被所述服务器设备部分地支持，至少关于所述版本信息：向所述车辆计算设备提供自然语言理解(NLU)数据，所述NLU数据表征由所述用户经由所述口头话语所请求的特定意图。在一些实现方式中，该方法还可以包括，响应于确定对于所述特定版本所述自然语言内容的至少所述部分被所述服务器设备部分地支持，至少关于所述版本信息：向所述车辆计算设备提供表征用于所述特定意图的一个或多个槽值的槽数据，并且使得所述车辆计算设备使用所述一个或多个槽值来执行所述动作，其中所述动作由所述车辆计算设备基于所述特定意图和所述槽值来本地标识。在一些实现方式中，所述口头话语与所述车辆的硬件子系统相关联，并且当所述用户正在乘坐和/或驾驶所述车辆时，在所述自动化助理界面处接收所述口头话语。在一些实现方式中，该方法还可以包括在确定所述用户提供了所述口头话语之后，确定在根据当前支持的版本操作的单独车辆计算设备处接收到另一口头话语；以及基于另一口头话语的另一自然语言内容，提供表征所请求的意图的NLU数据、表征所请求的意图的一个或多个槽值的槽数据、以及表征要由所述单独车辆计算设备执行的单独动作的动作数据。In some implementations, the method may also include, in response to determining that at least the portion of the natural language content for the specific version is partially supported by the server device, at least with respect to the version information: providing natural language understanding (NLU) data to the vehicle computing device, the NLU data representing the specific intent requested by the user via the spoken utterance. In some implementations, the method may also include, in response to determining that at least the portion of the natural language content for the specific version is partially supported by the server device, at least with respect to the version information: providing slot data representing one or more slot values for the specific intent to the vehicle computing device, and causing the vehicle computing device to perform the action using the one or more slot values, wherein the action is locally identified by the vehicle computing device based on the specific intent and the slot value. In some implementations, the spoken utterance is associated with a hardware subsystem of the vehicle, and the spoken utterance is received at the automated assistant interface while the user is riding in and/or driving the vehicle. In some implementations, the method may further include, after determining that the user provided the spoken utterance, determining that another spoken utterance is received at a separate vehicle computing device operating according to a currently supported version; and providing NLU data representing the requested intent, slot data representing one or more slot values of the requested intent, and action data representing a separate action to be performed by the separate vehicle computing device based on another natural language content of the other spoken utterance.

在一些实现方式中，该方法还可以包括在确定所述用户提供了所述口头话语之前，确定先前被与所述车辆计算设备通信的所述服务器设备完全支持的所述特定版本被所述服务器设备完全支持。在一些实现方式中，确定所述自然语言内容的至少所述部分是否被所述特定版本完全支持包括，确定所述特定版本被所述服务器设备支持的程度。In some implementations, the method may further include, before determining that the user provided the spoken utterance, determining that the specific version previously fully supported by the server device in communication with the vehicle computing device is fully supported by the server device. In some implementations, determining whether at least the portion of the natural language content is fully supported by the specific version includes determining the extent to which the specific version is supported by the server device.

在又一些其他实现方式中，由一个或多个处理器实现的方法被阐述为包括诸如确定在第一车辆计算设备处接收到口头话语中包含的自然语言内容对应于第一意图请求的操作。该方法还可以包括基于确定所述自然语言内容对应于所述第一意图请求，确定对于与所述第一车辆计算设备相对应的版本所述第一意图请求被服务器设备支持的程度。该方法还可以包括基于所述第一意图请求被所述服务器设备支持的程度，生成表征由所述用户所请求的意图的第一数据。该方法还可以包括确定在第二车辆计算设备处接收到另一口头话语中包含的另一自然语言内容对应于第二意图请求。所述方法还可以包括基于确定所述另一自然语言内容包括所述第二意图请求，确定对于与所述第二车辆计算设备相对应的另一版本所述第二意图请求被所述服务器设备支持的另一程度，其中，所述版本不同于所述另一版本。该方法还可以包括基于所述第二意图请求被所述服务器设备支持的另一程度来生成表征另一口头话语的所述另一自然语言内容的第二数据。该方法还可以包括向所述第一车辆计算设备提供所述第一数据，以促进使得所述第一车辆计算设备实现所述第一意图请求；以及向所述第二车辆计算设备提供所述第二数据，以促进使得所述第二车辆计算设备实现所述第二意图请求。在一些实现方式中，在对应于所述第一车辆计算设备的所述版本被所述服务器设备最初支持的时间之后，对应于所述第二车辆计算设备的所述另一版本被所述服务器设备最初支持。In yet other implementations, a method implemented by one or more processors is described as including operations such as determining that natural language content contained in a spoken utterance received at a first vehicle computing device corresponds to a first intent request. The method may also include determining, based on determining that the natural language content corresponds to the first intent request, to what extent the first intent request is supported by a server device for a version corresponding to the first vehicle computing device. The method may also include generating first data representing the intent requested by the user based on the extent to which the first intent request is supported by the server device. The method may also include determining that another natural language content contained in another spoken utterance received at a second vehicle computing device corresponds to a second intent request. The method may also include determining, based on determining that the another natural language content includes the second intent request, to another extent to which the second intent request is supported by the server device for another version corresponding to the second vehicle computing device, wherein the version is different from the another version. The method may also include generating second data representing the other natural language content of another spoken utterance based on the other extent to which the second intent request is supported by the server device. The method may also include providing the first data to the first vehicle computing device to facilitate causing the first vehicle computing device to implement the first intent request; and providing the second data to the second vehicle computing device to facilitate causing the second vehicle computing device to implement the second intent request. In some implementations, the another version corresponding to the second vehicle computing device is initially supported by the server device after the time when the version corresponding to the first vehicle computing device is initially supported by the server device.

在一些实现方式中，所述第一意图请求和所述第二意图请求对应于车辆硬件设备的类型，并且其中所述第一数据还表征与所述意图相对应的动作以及能够由该类型的所述车辆硬件意图执行的操作。在一些实现方式中，响应于所述第二车辆计算设备接收到所述第二数据，所述第二数据使得所述类型的车辆硬件设备执行所述操作和/或不同的操作。在一些实现方式中，所述类型的车辆硬件设备包括一个或多个传感器、一个或多个其他计算设备和/或一个或多个机电设备。In some implementations, the first intent request and the second intent request correspond to a type of vehicle hardware device, and wherein the first data further characterizes an action corresponding to the intent and an operation that can be performed by the vehicle hardware intent of the type. In some implementations, in response to the second vehicle computing device receiving the second data, the second data causes the vehicle hardware device of the type to perform the operation and/or a different operation. In some implementations, the vehicle hardware device of the type includes one or more sensors, one or more other computing devices, and/or one or more electromechanical devices.

在其他实现方式中，阐述了一种由一个或多个处理器实现的方法，包括诸如经由与车辆的车辆计算设备通信的界面接收表征由用户向所述界面提供的口头话语的数据的操作，其中所述口头话语对应于所述车辆计算设备实现特定意图的请求。该方法还可以包括至少基于表征由所述用户提供的所述口头话语的所述数据来确定所述口头话语的自然语言内容，其中所述自然语言内容包括与所述特定意图相对应的一个或多个词语。该方法还可以包括至少基于所述自然语言内容的所述一个或多个词语和对应于所述车辆计算设备的版本，确定对于所述版本所述一个或多个词语被服务器设备支持的程度。该方法还可以包括，响应于确定对于所述版本所述自然语言内容的所述一个或多个词语被所述服务器设备完全支持：向与所述车辆计算设备通信的所述服务器设备提供对于动作数据的请求，并且在所述服务器设备接收到所述请求之后，从所述服务器设备接收所述动作数据，并且使得所述车辆计算设备执行由所述动作数据表征的动作。该方法还可以包括，响应于确定对于所述版本所述自然语言内容的所述一个或多个词语被所述服务器设备部分地支持：提供对于来自所述服务器设备的至少意图数据的不同请求，响应于所述服务器设备接收到所述不同请求，从所述服务器设备接收所述意图数据，其中，所述意图数据表征所述特定意图，以及基于所述意图数据，使得所述车辆计算设备执行与所述特定意图相关联的不同动作。In other implementations, a method implemented by one or more processors is described, including operations such as receiving data representing a spoken utterance provided by a user to a vehicle computing device of a vehicle via an interface in communication with the interface, wherein the spoken utterance corresponds to a request for the vehicle computing device to implement a specific intent. The method may also include determining a natural language content of the spoken utterance based at least on the data representing the spoken utterance provided by the user, wherein the natural language content includes one or more words corresponding to the specific intent. The method may also include determining the extent to which the one or more words are supported by a server device for the version based at least on the one or more words of the natural language content and a version corresponding to the vehicle computing device. The method may also include, in response to determining that the one or more words of the natural language content are fully supported by the server device for the version: providing a request for action data to the server device in communication with the vehicle computing device, and after the server device receives the request, receiving the action data from the server device, and causing the vehicle computing device to perform an action represented by the action data. The method may also include, in response to determining that the one or more terms of the natural language content for the version are partially supported by the server device: providing different requests for at least intent data from the server device, in response to the server device receiving the different requests, receiving the intent data from the server device, wherein the intent data characterizes the specific intent, and based on the intent data, causing the vehicle computing device to perform different actions associated with the specific intent.

在一些实现方式中，所述动作对应于应用并且所述不同动作对应于不同应用。在一些实现方式中，所述动作和所述不同动作两者都对应于所述车辆的硬件子系统。在一些实现方式中，该方法还可以包括响应于确定对于所述版本所述自然语言内容的所述一个或多个词语不再被所述服务器设备支持：提供对于来自所述服务器设备的至少文本数据的单独请求，响应于所述服务器设备接收到所述单独请求，从所述服务器设备接收所述文本数据，其中所述文本数据表征所述口头话语的所述自然语言内容，以及基于所述文本数据，使得所述车辆计算设备选择用于在所述车辆计算设备处执行的特定动作。In some implementations, the action corresponds to an application and the different actions correspond to different applications. In some implementations, both the action and the different actions correspond to a hardware subsystem of the vehicle. In some implementations, the method may also include, in response to determining that the one or more words of the natural language content for the version are no longer supported by the server device: providing a separate request for at least text data from the server device, in response to the server device receiving the separate request, receiving the text data from the server device, wherein the text data characterizes the natural language content of the spoken utterance, and based on the text data, causing the vehicle computing device to select a specific action for execution at the vehicle computing device.

虽然本文已经描述和示出了若干实现方式，但是可以利用用于执行本文描述的功能和/或获得本文描述的结果和/或一个或多个优点的各种其他设备和/或结构，并且这样的变化和/或修改中的每一个被认为在本文描述的实现的范围内。更一般地，本文描述的所有参数、尺寸、材料和构造都意味着是示例性的，并且实际的参数、尺寸、材料和/或构造将取决于使用本教导的一个或多个具体应用。本领域技术人员将认识到或能够使用不超过常规实验来确定本文所述的具体实现方式的许多等效物。因此，应当理解，前述实现方式仅以示例的方式呈现，并且在所附权利要求及其等同物的范围内，可以以与具体描述和要求保护的方式不同的方式来实现方式。本公开的实现方式涉及本文所述的每个单独的特征、系统、制品、材料、套件和/或方法。此外，如果这些特征、系统、物品、材料、套件和/或方法不是相互矛盾的，则两个或更多个这些特征、系统、物品、材料、套件和/或方法的任何组合包括在本公开的范围内。Although several implementations have been described and shown herein, various other devices and/or structures for performing the functions described herein and/or obtaining the results and/or one or more advantages described herein may be utilized, and each of such changes and/or modifications is considered to be within the scope of the implementations described herein. More generally, all parameters, dimensions, materials, and configurations described herein are meant to be exemplary, and the actual parameters, dimensions, materials, and/or configurations will depend on one or more specific applications using this teaching. Those skilled in the art will recognize or be able to determine many equivalents of the specific implementations described herein using no more than routine experiments. Therefore, it should be understood that the aforementioned implementations are presented only by way of example, and within the scope of the appended claims and their equivalents, implementations may be implemented in a manner different from that specifically described and claimed. The implementations of the present disclosure relate to each individual feature, system, article, material, kit, and/or method described herein. In addition, if these features, systems, articles, materials, kits, and/or methods are not mutually contradictory, any combination of two or more of these features, systems, articles, materials, kits, and/or methods is included within the scope of the present disclosure.

Claims

1. A method implemented by one or more processors, the method comprising:

determining, based on the audio data processed at the server device, that a user has provided a spoken utterance to an automated assistant interface of a vehicle computing device connected to the vehicle;

responsive to determining that the user has provided the spoken utterance, accessing version information associated with the vehicle computing device, wherein the version information indicates that the vehicle computing device corresponds to a particular version;

characterizing natural language content of the spoken utterance provided by the user based on processing the audio data;

determining, based on the natural language content representing the spoken utterance, whether at least a portion of the natural language content is fully supported by the server device for the particular version; and

In response to determining that at least a portion of the natural language content is not fully supported by the server device for the particular version, based at least on the version information:

providing text data to the vehicle computing device, the text data representing the natural language content of the spoken utterance provided by the user, and

causing the vehicle computing device to locally perform an action, the action being locally generated by the vehicle computing device based on the natural language content;

generating instruction data characterizing limitations of the vehicle computing device with respect to the requested intent, and

The instruction data is provided to the vehicle computing device, wherein the instruction data causes the vehicle computing device to bypass requesting the server device to generate a specific action corresponding to another instance of the requested intent in response to a subsequent user input.

2. The method of claim 1 , wherein determining whether at least the portion of the natural language content for the particular version is fully supported by the server device comprises:

determining whether at least the portion of the natural language content includes one or more natural language terms corresponding to one or more intents, the one or more intents being indicated at the server device as not fully supported by the server device,

Wherein the one or more intents are indicated at the server device as being fully supported for other vehicle computing device versions that do not include the particular version of the vehicle computing device.

3 . The method of claim 2 , wherein the text data representation comprises the portion of the natural language content that includes the one or more natural language terms.

4. The method according to claim 1, further comprising:

In response to determining that at least the portion of the natural language content is fully supported by the server device for the particular version:

generating, at the server device, action data identifying an intent requested by the user and another action supported by the particular version of the vehicle computing device, and

providing the motion data to the vehicle computing device,

Wherein providing the action data to the vehicle computing device causes the vehicle computing device to perform the other action using the provided action data.

5. The method according to claim 1, further comprising:

In response to determining that at least the portion of the natural language content is not fully supported by the server device for the particular version, and after providing the instruction data to the vehicle computing device:

determining that another spoken utterance associated with the requested intent is received at the automated assistant interface of the vehicle computing device, and

Other text data is provided to the vehicle computing device, wherein the other text data represents other natural language content of the other spoken utterance and omits data representing a specific action for the requested intent.

6. The method according to claim 1, further comprising:

In response to determining that at least the portion of the natural language content is partially supported by the server device for the particular version, at least with respect to the version information:

Natural language understanding (NLU) data is provided to the vehicle computing device, the NLU data representing a specific intent requested by the user via the spoken utterance.

7. The method according to claim 6, further comprising:

providing slot data to the vehicle computing device, the slot data representing one or more slot values for the particular intent, and

The vehicle computing device is caused to perform the action using the one or more slot values, wherein the action is locally identified by the vehicle computing device based on the specific intent and the one or more slot values.

8. The method of claim 1, wherein the spoken utterance is associated with a hardware subsystem of the vehicle and is received at the automated assistant interface while the user is riding in and/or driving the vehicle.

9. The method according to claim 1, further comprising:

After determining that the user provided the spoken utterance, determining that another spoken utterance was received at a separate vehicle computing device operating according to a currently supported version; and

Based on another natural language content of the another spoken utterance, NLU data representing the requested intent, slot data representing one or more slot values of the requested intent, and action data representing a separate action to be performed by the separate vehicle computing device are provided.

10. The method according to any one of claims 1 to 9, further comprising:

Prior to determining that the user provided the spoken utterance, determining that the particular version previously fully supported by the server device in communication with the vehicle computing device is fully supported by the server device.

11. The method of claim 10, wherein determining whether at least the portion of the natural language content is fully supported by the specific version comprises determining the extent to which the specific version is supported by the server device.

12. A computer program comprising instructions which, when executed by one or more processors, cause the one or more processors to perform the method according to any one of claims 1 to 11.

13. A computer-readable storage medium comprising instructions which, when executed by one or more processors, cause the one or more processors to perform the method according to any one of claims 1 to 11.

14. A system comprising one or more processors for executing the method according to any one of claims 1 to 11.