TW202324041A

TW202324041A - User interactions with remote devices

Info

Publication number: TW202324041A
Application number: TW111138424A
Authority: TW
Inventors: 喬納森齊斯; 雷歐尼德辛恩伯雷特
Original assignee: 美商高通公司
Priority date: 2021-10-12
Filing date: 2022-10-11
Publication date: 2023-06-16
Also published as: WO2023064719A1; CN118103799A; KR20240072170A; US20230116190A1; JP2024540828A; EP4416577A1; US11960652B2; US20240256052A1

Abstract

Systems, methods, and non-transitory media are provided for presenting information associated with at least one input option. An example method can include receiving data identifying one or more input options associated with a first device in a scene; determining, including using at least one memory, information relevant to at least one of the scene, the first device, and a user associated with a second device; and based on the one or more input options and the information, output user guidance data corresponding to an input option for which relevant context information has been determined.

Description

Interact with the user of the remote device

本案大體而言係關於與遠端設備的互動。例如，本案的各態樣包括為與遠端設備的使用者互動進行過濾及/或建議虛擬內容。This case is generally about interactions with remote devices. For example, aspects of this case include filtering and/or suggesting virtual content for user interaction with a remote device.

擴展現實技術可用於向使用者呈現虛擬內容，及/或可組合來自現實世界的真實環境和虛擬環境，以向使用者提供擴展現實體驗。術語擴展現實可以包含虛擬實境、增強現實、混合現實等。擴展現實的該等形式中的每一種皆允許使用者體驗沉浸式虛擬環境或內容或者與之互動。例如，擴展現實體驗可以允許使用者與用虛擬內容增強或擴充的真實或實體環境進行互動。Extended reality technology can be used to present virtual content to users, and/or can combine real and virtual environments from the real world to provide users with an extended reality experience. The term extended reality can encompass virtual reality, augmented reality, mixed reality, etc. Each of these forms of extended reality allows users to experience or interact with immersive virtual environments or content. For example, an extended reality experience may allow a user to interact with a real or physical environment that is augmented or augmented with virtual content.

可以實施擴展現實技術來增強諸如娛樂、醫療保健、零售、教育、社交媒體等各種環境中的使用者體驗。Extended reality technology can be implemented to enhance user experience in various environments such as entertainment, healthcare, retail, education, social media, and more.

揭示用於決定遠端設備互動（例如，擴展現實（XR）設備和一或多個遠端設備（諸如物聯網路設備）之間的互動）的使用者互動資料的系統、裝置、方法和電腦可讀取媒體。根據至少一個實例，提供了一種用於呈現與至少一個輸入選項相關聯的資訊的方法。該方法包括以下步驟：接收辨識與場景中的設備相關聯的一或多個輸入選項的資料；決定（包括使用至少一個記憶體）與場景、設備和與電子設備相關聯的使用者中的至少一個相關的資訊；及基於一或多個輸入選項和資訊，輸出與已經決定了相關上下文資訊的輸入選項相對應的使用者指導資料。Systems, devices, methods and computers for revealing user interaction data for determining remote device interactions (e.g., interactions between an extended reality (XR) device and one or more remote devices, such as Internet of Things network devices) Readable media. According to at least one example, a method for presenting information associated with at least one input option is provided. The method includes the steps of: receiving data identifying one or more input options associated with a device in a scene; determining (including using at least one memory) at least one of the scene, the device, and the user associated with the electronic device an associated message; and based on the one or more input options and information, outputting user guidance data corresponding to the input option for which the associated context information has been determined.

在另一實例中，提供了一種用於呈現與至少一個輸入選項相關聯的資訊的裝置，該裝置包括至少一個記憶體和耦合到該至少一個記憶體的至少一個處理器（例如，在電路系統中實現）。該至少一個處理器被配置為並且可以：接收辨識與場景中的設備相關聯的一或多個輸入選項的資料；決定（包括使用至少一個記憶體）與場景、設備和與電子設備相關聯的使用者中的至少一個相關的資訊；及基於一或多個輸入選項和資訊，輸出與已經決定了相關上下文資訊的輸入選項相對應的使用者指導資料。In another example, an apparatus for presenting information associated with at least one input option is provided that includes at least one memory and at least one processor coupled to the at least one memory (e.g., in circuitry implemented in). The at least one processor is configured and may: receive data identifying one or more input options associated with a device in a scene; determine (including using at least one memory) information associated with a scene, a device, and an electronic device; at least one of the user-related information; and based on the one or more input options and information, outputting user guidance data corresponding to the input options for which relevant contextual information has been determined.

在另一實例中，提供了一種其上儲存有指令的非暫時性電腦可讀取媒體，當由一或多個處理器執行時，該等指令使得一或多個處理器：接收辨識與場景中的設備相關聯的一或多個輸入選項的資料；決定（包括使用至少一個記憶體）與場景、設備和與電子設備相關聯的使用者中的至少一個相關的資訊；及基於一或多個輸入選項和資訊，輸出與已經決定了相關上下文資訊的輸入選項相對應的使用者指導資料。In another example, a non-transitory computer readable medium having stored thereon instructions that, when executed by one or more processors, cause the one or more processors to: receive recognition and scene information about one or more input options associated with a device in an electronic device; determining (including using at least one memory) information related to at least one of a scene, a device, and a user associated with an electronic device; and based on one or more input options and information, and output user guidance data corresponding to the input options for which relevant contextual information has been determined.

在另一實例中，提供了一種用於呈現與至少一個輸入選項相關聯的資訊的裝置。該裝置包括用於以下操作的構件：接收辨識與場景中的設備相關聯的一或多個輸入選項的資料；決定（包括使用至少一個記憶體）與場景、設備和與電子設備相關聯的使用者中的至少一個相關的資訊；及基於一或多個輸入選項和資訊，輸出與已經決定了相關上下文資訊的輸入選項相對應的使用者指導資料。In another example, an apparatus for presenting information associated with at least one input option is provided. The apparatus includes means for: receiving data identifying one or more input options associated with a device in a scene; determining (including using at least one memory) a usage associated with the scene, the device, and the electronic device information related to at least one of them; and based on the one or more input options and information, outputting user guidance data corresponding to the input options for which relevant contextual information has been determined.

在一些態樣，上述方法、非暫時性電腦可讀取媒體和裝置可以包括：基於該資訊預測與設備的使用者互動；及基於一或多個輸入選項和預測的使用者互動，呈現對應於輸入選項的使用者指導資料。In some aspects, the above-described methods, non-transitory computer-readable media, and apparatus may include: predicting a user interaction with the device based on the information; and based on the one or more input options and the predicted user interaction, presenting a corresponding Enter user guidance information for the option.

在一些實例中，使用者指導資料可以包括與輸入選項相關聯的使用者輸入元素、與輸入選項相關聯的實體物件上的虛擬覆加及/或指示如何提供與輸入選項相關聯的輸入的提示中的至少一個。In some examples, user guidance data may include user input elements associated with input options, virtual overlays on physical objects associated with input options, and/or prompts indicating how to provide input associated with input options at least one of the

在一些實例中，該設備可以包括具有網路通訊能力的連接設備，並且上述方法、非暫時性電腦可讀取媒體和裝置可以包括：基於該資訊和一或多個輸入選項來決定表示預測的使用者互動的手勢；及呈現使用者指導資料。在一些實例中，預測的使用者互動可以包括對設備的預測的使用者輸入。在一些情況下，使用者指導資料可以包括手勢的指示，其在被偵測到時在設備處引動實際的使用者輸入。In some examples, the device may include a connected device with network communication capabilities, and the above-described methods, non-transitory computer-readable media, and apparatus may include: determining, based on the information and one or more input options, an Gestures for user interaction; and presenting user guidance information. In some examples, the predicted user interaction may include predicted user input to the device. In some cases, user guidance data may include indications of gestures that, when detected, cause actual user input at the device.

在一些態樣，呈現使用者指導資料可以包括在與電子設備相關聯的顯示器處渲染虛擬覆加，該虛擬覆加被配置為看起來位於設備的表面上。在一些實例中，虛擬覆加可以包括與輸入選項相關聯的使用者介面元素。在一些情況下，使用者介面元素可以包括與輸入選項相關聯的虛擬使用者輸入物件和設備上被配置成接收對應於輸入選項的輸入的實體控制物件的視覺指示中的至少一個。In some aspects, presenting the user guidance material can include rendering a virtual overlay at a display associated with the electronic device, the virtual overlay configured to appear to be located on a surface of the device. In some examples, the virtual overlay can include user interface elements associated with input options. In some cases, the user interface element may include at least one of a virtual user input object associated with the input option and a visual indication of a physical control object on the device configured to receive input corresponding to the input option.

在一些實例中，該資訊包括使用者的眼睛注視和使用者的姿態中的至少一個，並且上述方法、非暫時性電腦可讀取媒體和裝置可以包括：基於使用者的眼睛注視和使用者的姿態中的至少一個來預測與設備的使用者互動；在呈現使用者指導資料之後，偵測與輸入選項相關聯的實際使用者輸入，該實際使用者輸入表示預測的使用者互動；及向設備傳輸對應於與輸入選項相關聯的實際使用者輸入的命令。In some examples, the information includes at least one of the user's eye gaze and the user's gesture, and the methods, non-transitory computer-readable media, and apparatus described above may include: at least one of the gestures to predict user interaction with the device; after presenting the user guidance material, detecting actual user input associated with the input options, the actual user input representing the predicted user interaction; and sending the device Commands corresponding to actual user input associated with input options are transmitted.

在一些實例中，輸出對應於輸入選項的使用者指導資料可以包括顯示使用者指導資料。在一些實例中，輸出對應於輸入選項的使用者指導資料可以包括輸出表示使用者指導資料的音訊資料。In some examples, outputting user guidance data corresponding to the input options may include displaying the user guidance data. In some examples, outputting user guidance data corresponding to the input options may include outputting audio data representing the user guidance data.

在一些實例中，輸出對應於輸入選項的使用者指導資料可以包括顯示使用者指導資料；及輸出與顯示的使用者指導資料相關聯的音訊資料。In some examples, outputting user guidance data corresponding to the input options may include displaying the user guidance data; and outputting audio data associated with the displayed user guidance data.

在一些態樣，上述方法、非暫時性電腦可讀取媒體和裝置可以包括從設備接收辨識與設備相關聯的一或多個輸入選項的資料。在一些態樣，上述方法、非暫時性電腦可讀取媒體和裝置可以包括從伺服器接收辨識與設備相關聯的一或多個輸入選項的資料。In some aspects, the methods, non-transitory computer-readable media, and apparatus described above can include receiving, from a device, data identifying one or more input options associated with the device. In some aspects, the methods, non-transitory computer-readable media, and apparatus described above can include receiving from a server data identifying one or more input options associated with a device.

在一些情況下，設備沒有用於接收一或多個使用者輸入的外部使用者介面。In some cases, the device has no external user interface for receiving one or more user inputs.

在一些態樣，上述方法、非暫時性電腦可讀取媒體和裝置可以包括基於該資訊來抑制呈現與設備相關聯的附加使用者指導資料。In some aspects, the methods, non-transitory computer-readable media, and apparatus described above can include refraining from presenting additional user guidance material associated with the device based on the information.

在一些態樣，上述方法、非暫時性電腦可讀取媒體和裝置可以包括：在呈現使用者指導資料之後，獲得與輸入選項相關聯的使用者輸入；及向該設備傳輸對應於使用者輸入的指令。在一些情況下，指令可以被配置成控制設備的一或多個操作。In some aspects, the above-described methods, non-transitory computer-readable media, and apparatus may include: after presenting the user-guidance material, obtaining user input associated with input options; and transmitting to the device a corresponding instructions. In some cases, the instructions may be configured to control one or more operations of the device.

在一些態樣，該裝置包括相機、行動設備（例如，行動電話或「智慧型電話」或其他行動設備）、可穿戴設備、擴展現實設備（例如，虛擬實境（VR）設備、增強現實（AR）設備或混合現實（MR）設備）、個人電腦、膝上型電腦、伺服器電腦或任何其他設備。在一些態樣，該裝置包括用於擷取一或多個圖像的一或多個相機。在一些態樣，該裝置亦包括用於顯示一或多個圖像、通知及/或其他可顯示的資料的顯示器。在一些態樣，上述裝置可以包括一或多個感測器。In some aspects, the device includes a camera, a mobile device (e.g., a mobile phone or "smartphone" or other mobile device), a wearable device, an extended reality device (e.g., a virtual reality (VR) device, an augmented reality ( AR) devices or Mixed Reality (MR) devices), PCs, laptops, server computers or any other device. In some aspects, the device includes one or more cameras for capturing one or more images. In some aspects, the device also includes a display for displaying one or more images, notifications, and/or other displayable data. In some aspects, the apparatus described above may include one or more sensors.

該發明內容不意欲辨識所主張保護的標的的關鍵或必要特徵，亦不意欲孤立地用於決定所主張保護的標的的範疇。應該經由參考本專利的整個說明書、任何或所有附圖以及每個請求項的適當部分來理解標的。This Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used in isolation to determine the scope of the claimed subject matter. The subject matter should be understood by reference to the entire specification of this patent, any or all drawings, and each claim in due part.

參考以下說明書、申請專利範圍和附圖，前述內容以及其他特徵和實施例將變得更加明顯。The foregoing, as well as other features and embodiments, will become more apparent with reference to the following specification, claims and drawings.

下文提供了本案的某些態樣和實施例。描述的一些態樣和實施例可以獨立應用，並且其中的一些可以組合應用，此情形對熟習此項技術者而言是顯而易見的。在以下描述中，出於解釋的目的，闡述了具體細節，以便提供對本案的實施例的全面理解。然而，顯而易見的是，可以在沒有該等具體細節的情況下實踐各種實施例。附圖和描述並非意欲限制。Certain aspects and examples of the present case are provided below. It will be apparent to those skilled in the art that some of the described aspects and embodiments can be applied independently and some of them can be applied in combination. In the following description, for purposes of explanation, specific details are set forth in order to provide a thorough understanding of the embodiments of the present case. It may be evident, however, that various embodiments may be practiced without these specific details. The figures and description are not intended to be limiting.

隨後的描述僅提供示例性實施例，並且不意欲限制本案的範疇、適用性或配置。相反，示例性實施例的隨後描述將為熟習此項技術者提供用於實現示例性實施例的賦能描述。應當理解，在不脫離如所附請求項中闡述的本案的精神和範疇的情況下，可以對元件的功能和佈置進行各種改變。The ensuing description provides exemplary embodiments only, and is not intended to limit the scope, applicability, or configuration of the present case. Rather, the ensuing description of the example embodiments will provide those skilled in the art with an enabling description for implementing the example embodiments. It should be understood that various changes can be made in the function and arrangement of elements without departing from the spirit and scope of the invention as set forth in the appended claims.

使用者經常與可以提供使用者感興趣的某些功能的不同設備進行互動。例如，使用者可以與智慧設備（例如，物聯網路（IoT）或其他聯網設備）、行動設備、控制設備（例如，電視、電器、揚聲器等的遙控器）、系統控制台、電器等進行互動。在各種說明性實例中，使用者可以與網路連接的電視互動以管理觀看內容或改變網路連接的電視的功率設置，與網路連接的燈泡互動以控制由網路連接的燈泡發出的光或網路連接的燈泡的操作，與網路路由器互動以配置網路路由器的操作和設置，與網路連接的恒溫器互動以控制與網路連接的恒溫器的溫度或配置設置，等等。Users often interact with different devices that provide certain functionality of interest to the user. For example, users can interact with smart devices (e.g., Internet of Things (IoT) or other connected devices), mobile devices, control devices (e.g., remote controls for TVs, appliances, speakers, etc.), system consoles, appliances, etc. . In various illustrative examples, the user can interact with the Internet-connected TV to manage viewing content or change the power setting of the Internet-connected TV, interact with the Internet-connected light bulb to control the light emitted by the Internet-connected light bulb or operation of a network-connected light bulb, interacting with a network-connected router to configure network-connected router operations and settings, interacting with a network-connected thermostat to control the temperature or configure settings of a network-connected thermostat, and so on.

在一些情況下，設備可以包括硬體使用者介面或者可以（例如，經由使用顯示器或其他技術顯示使用者介面）呈現圖形使用者介面，使用者可以使用該硬體使用者介面或圖形使用者介面來與設備互動。然而，在一些情況下，使用者可能難以經由設備的使用者介面與設備進行互動。在一個實例中，使用者介面可能是使用者無法觸及的，此舉可以阻止使用者與和設備相關聯的使用者介面進行互動（或使其難以互動）。作為另一實例，由使用者介面顯示的內容可能是使用者不理解的語言、可能太小以至於使用者不能辨識，或者對於使用者使用使用者介面而言不方便。在一些實例中，與設備相關聯的輸入選項（例如，支援的輸入、支援的輸入方法等）可能不容易被使用者明白或理解（例如，與使用者介面輸入相反的手勢或語音命令等），此情形會使使用者難以與設備進行互動。在其他實例中，使用者介面可能沒有協助工具設置（或足夠的協助工具設置）來説明有視覺、語音及/或聽覺障礙的使用者。在一些情況下，設備可能不具有對使用者可見或以其他方式可存取的向外/外部控制項，及/或可能不包括呈現使用者可用來與設備互動的使用者介面的顯示器。此外，在許多情況下，場景可以包括多個遠端設備，諸如例如連接的燈泡、電視、連接的插頭、連接的揚聲器等。在場景中存在多個設備的此種情況下，使用者設備可能難以與場景中多個設備中的特定遠端設備互動及/或管理與該特定遠端設備的互動。In some cases, a device may include a hardware user interface or may present (for example, by displaying the user interface using a display or other technology) a graphical user interface that a user may use to interact with the device. However, in some cases, it may be difficult for a user to interact with the device via the user interface of the device. In one example, the user interface may be inaccessible to the user, which prevents the user from interacting with (or makes it difficult to interact with) the user interface associated with the device. As another example, the content displayed by the user interface may be in a language that the user does not understand, may be too small for the user to recognize, or may be inconvenient for the user to use the user interface. In some instances, input options associated with a device (e.g., supported inputs, supported input methods, etc.) may not be readily understood or understood by the user (e.g., gestures or voice commands as opposed to user interface input, etc.) , which can make it difficult for the user to interact with the device. In other instances, the user interface may not have accessibility settings (or sufficient accessibility settings) to account for users with visual, speech, and/or hearing disabilities. In some cases, the device may not have outward/external controls visible or otherwise accessible to the user, and/or may not include a display presenting a user interface that the user may use to interact with the device. Furthermore, in many cases, a scene may include multiple remote devices such as, for example, connected light bulbs, televisions, connected plugs, connected speakers, and the like. In such a situation where there are multiple devices in the scene, it may be difficult for the user device to interact with and/or manage the interaction with a particular remote device among the multiple devices in the scene.

例如，房間（例如，廚房、臥室、辦公室、起居室等）可以具有具備連接及/或互動能力的多個設備。為了啟動及/或管理與多個設備之一的通訊及/或互動，使用者設備可能難以從多個設備中辨識特定設備、管理及/或簡化與該特定設備的使用者互動及/或相關資料、管理該特定設備的相關內容等。在一些情況下，使用者設備可以從場景中的多個設備獲得互動資料（例如，輸入選項/能力、輸入、輸出、圖形使用者介面、使用者互動輔助資料等）。若使用者設備對場景及/或當前上下文沒有足夠的知識及/或理解，及/或沒有從使用者接收到清楚的指令，則使用者設備可能資料（例如，互動資料、設備資料等）過載。在一些實例中，使用者設備可能難以管理內容及/或與場景中的多個設備中的一或多個的互動。然而，本文描述的系統和技術可以允許使用者設備基於上下文資訊對與場景中的遠端設備相關聯的虛擬內容進行約束、限制、過濾、去雜亂等。在一些情況下，使用本文描述的系統和技術，使用者設備可以提煉為與場景中的遠端設備通訊/互動而處理及/或呈現的內容，及/或可以以最適應/適合上下文的方式（例如，大、小、覆加、世界鎖定、頭部或設備鎖定等）呈現內容。在一些實例中，使用者設備可以使用上下文資訊來理解如何與場景中的特定遠端設備互動、什麼互動資料及/或虛擬內容與該特定遠端設備相關、如何管理與場景中的任何遠端設備的互動等。For example, a room (eg, kitchen, bedroom, office, living room, etc.) may have multiple devices with connectivity and/or interaction capabilities. In order to initiate and/or manage communication and/or interaction with one of the multiple devices, it may be difficult for the user device to identify a specific device from among multiple devices, manage and/or simplify user interaction with and/or related to that specific device data, manage content related to that particular device, etc. In some cases, a user device may obtain interaction data (eg, input options/capabilities, input, output, GUI, user interaction aids, etc.) from multiple devices in the scene. If the user device does not have sufficient knowledge and/or understanding of the scene and/or the current context, and/or does not receive clear instructions from the user, the user device may be overloaded with data (e.g., interaction data, device data, etc.) . In some instances, a user device may have difficulty managing content and/or interactions with one or more of the devices in the scene. However, the systems and techniques described herein may allow user devices to constrain, restrict, filter, declutter, etc., virtual content associated with remote devices in a scene based on contextual information. In some cases, using the systems and techniques described herein, a user device can refine the content that is processed and/or presented for communication/interaction with a remote device in a scene, and/or in a manner that is most adaptive/appropriate to the context (e.g. large, small, overlay, world locked, head or device locked, etc.) to render the content. In some examples, the user device can use contextual information to understand how to interact with a particular remote device in a scene, what interaction data and/or virtual content is relevant to that particular remote device, how to manage interactions with any remote device in a scene device interaction, etc.

如本文進一步描述的，在許多情況下，本文描述的系統和技術可以允許設備統一及/或簡化使用者與場景中的遠端設備的互動。在上述情況下，設備可以更好地管理、統一、簡化及/或促進與場景中的遠端設備的通訊及/或互動。在一些情況下，即使在更具挑戰性的場景及/或條件下，設備亦可以促進及/或支援與一或多個遠端設備的使用者互動。為了說明，使用者及/或與使用者相關聯的設備可能難以與設備互動。例如，當使用者夠不到遙控器時，或者當使用者（例如，由於使用者的障礙、按鈕的尺寸、按鈕標籤的語言等）難以看到/理解遙控器的按鈕時，使用者可能會在照明條件差的情況下努力與電視遙控器進行互動。作為另一實例，使用者可以努力與沒有外部控制的網路路由器或連接的設備（例如，網路連接的恒溫器、燈泡、揚聲器、照相機、電器、開關等）互動，特別是若使用者不能存取用於與網路路由器或IoT設備互動的使用者介面。作為又一實例，若控制台（或控制台的某些控制）是使用者夠不到的，或者使用者不知道使用何者控制來進行期望的操作/互動，則使用者可能難以與控制台（諸如車輛或電梯控制台）互動。As further described herein, in many cases, the systems and techniques described herein may allow devices to unify and/or simplify user interactions with remote devices in a scene. In the above cases, the device can better manage, unify, simplify and/or facilitate communication and/or interaction with remote devices in the scene. In some cases, a device may facilitate and/or support user interaction with one or more remote devices even under more challenging scenarios and/or conditions. To illustrate, it may be difficult for a user and/or a device associated with the user to interact with the device. For example, a user may experience a Struggling to interact with the TV remote in poor lighting conditions. As another example, a user may struggle to interact with a networked router or connected device (e.g., networked thermostat, light bulb, speaker, camera, appliance, switch, etc.) that has no external control, especially if the user cannot Access the user interface for interacting with network routers or IoT devices. As yet another example, if the console (or certain controls of the console) are out of reach of the user, or the user does not know which control to use for the desired operation/interaction, it may be difficult for the user to interact with the console ( such as vehicle or elevator consoles).

如本文更詳細描述的，本文描述了用於改良、統一、簡化及/或促進使用者與遠端設備的互動的系統、裝置、方法（亦稱為過程和電腦可讀取媒體（本文統稱為「系統和技術」））。在一些實例中，電子設備可以統一、簡化及/或促進與其他設備的互動，諸如例如連接的設備（例如，網路連接的設備）、行動設備、缺少向外/外部控制的設備、缺少顯示器及/或使用者介面的設備、具有向希望與此設備互動的使用者呈現一或多個挑戰的某些特性的設備（例如，具有不同語言的介面、不被使用者辨識/理解、具有有限的可存取性選項的設備等）、具有使用者無法觸及的控制/介面的設備及/或任何其他設備。在一些實例中，被配置成促進與其他設備互動的電子設備可以包括智慧型電話、智慧可穿戴設備（例如，智慧手錶、智慧耳塞等）、擴展現實（XR）系統或設備（例如，智慧眼鏡、頭戴式顯示器（HMD）等）等等。儘管在此使用XR系統作為可以實現在此描述的技術的電子設備的實例來描述實例，但是可以使用其他電子設備（例如，行動設備、智慧可穿戴設備等）來執行該等技術。As described in greater detail herein, described herein are systems, apparatus, methods (also referred to as processes, and computer-readable media (collectively referred to herein as "Systems and Technologies")). In some instances, electronic devices can unify, simplify, and/or facilitate interaction with other devices, such as, for example, connected devices (e.g., network-connected devices), mobile devices, devices lacking outward/external controls, devices lacking displays and/or user interface devices, devices with certain characteristics that present one or more challenges to users wishing to interact with the device (e.g., interfaces in different languages, not recognized/understood by users, with limited devices with accessibility options, etc.), devices with controls/interfaces that are not accessible to the user, and/or any other device. In some examples, electronic devices configured to facilitate interaction with other devices may include smartphones, smart wearable devices (e.g., smart watches, smart earbuds, etc.), extended reality (XR) systems or devices (e.g., smart glasses , head-mounted displays (HMDs), etc.), etc. Although examples are described here using an XR system as an example of an electronic device that may implement the techniques described herein, other electronic devices (eg, mobile devices, smart wearable devices, etc.) may be used to perform the techniques.

一般而言，XR系統或設備可以向使用者提供虛擬內容及/或可以將現實世界或實體環境和虛擬環境（由虛擬內容組成）相結合，以向使用者提供XR體驗。現實世界環境可以包括現實世界物件（亦稱為實體物件），諸如書、人、車輛、建築物、桌子、椅子及/或其他現實世界或實體物件。XR系統或設備可以促進與不同類型的XR環境的互動（例如，使用者可以使用XR系統或設備來與XR環境互動）。XR系統可以包括促進與VR環境互動的虛擬實境（VR）系統、促進與AR環境互動的增強現實（AR）系統、促進與MR環境互動的混合現實（MR）系統及/或其他XR系統。如本文所用，術語XR系統和XR設備可互換使用。XR系統或設備的實例包括HMD、智慧眼鏡（例如，可以使用通訊網路進行通訊的網路連接的眼鏡）等。In general, an XR system or device may provide virtual content to a user and/or may combine a real world or physical environment with a virtual environment (consisting of virtual content) to provide an XR experience to a user. The real-world environment may include real-world objects (also referred to as physical objects), such as books, people, vehicles, buildings, tables, chairs, and/or other real-world or physical objects. An XR system or device may facilitate interaction with different types of XR environments (eg, a user may use an XR system or device to interact with an XR environment). XR systems may include virtual reality (VR) systems that facilitate interaction with VR environments, augmented reality (AR) systems that facilitate interaction with AR environments, mixed reality (MR) systems that facilitate interaction with MR environments, and/or other XR systems. As used herein, the terms XR system and XR device are used interchangeably. Examples of XR systems or devices include HMDs, smart glasses (eg, internet-connected glasses that can communicate using a communication network), and the like.

AR是一種在使用者的實體、現實世界場景或環境視圖上提供虛擬或電腦產生內容（稱為AR內容）的技術。AR內容可以包括虛擬內容，諸如視訊、圖像、圖形內容、位置資料（例如，全球定位系統（GPS）資料或其他位置資料）、聲音、其任何組合及/或其他增強內容。AR系統或設備意欲加強（或增強）而不是取代人對現實的當前感知。例如，使用者可以經由AR設備顯示器（例如，AR眼鏡的鏡片）看到真實的靜止或移動的實體物件，但是使用者對實體物件的視覺感知可以經由以下方式來加強或增強：經由該物件的虛擬圖像（例如，由DeLorean的虛擬圖像代替的現實世界的汽車）、經由添加到實體物件的AR內容（例如，添加到活的動物的虛擬翅膀）、經由相對於實體物件顯示的AR內容（例如，在建築物上的標誌附近顯示的資訊虛擬內容，在一或多個圖像中虛擬地錨定到（例如，放置在）現實世界的桌子上的虛擬咖啡杯等），及/或經由顯示其他類型的AR內容。各種類型的AR系統可以用於遊戲、娛樂及/或其他應用。AR is a technology that provides virtual or computer-generated content (referred to as AR content) over a user's view of a physical, real-world scene or environment. AR content may include virtual content such as video, images, graphical content, location data (eg, global positioning system (GPS) data or other location data), sound, any combination thereof, and/or other enhanced content. AR systems or devices are intended to augment (or augment) rather than replace a person's current perception of reality. For example, a user can see a real stationary or moving physical object through the AR device display (for example, the lens of AR glasses), but the user's visual perception of the physical object can be enhanced or enhanced in the following ways: through the object's Virtual images (e.g., a real-world car replaced by a DeLorean's virtual image), via AR content added to a physical object (e.g., virtual wings added to a live animal), via AR content displayed relative to a physical object (e.g., informative virtual content displayed near a sign on a building, a virtual coffee mug virtually anchored to (e.g., placed on) a table in the real world in one or more images, etc.), and/or Via displaying other types of AR content. Various types of AR systems can be used for gaming, entertainment, and/or other applications.

在一些情況下，可以用於提供AR內容的兩種類型的AR系統包括視訊透視（亦稱為視訊通過）顯示器和光學透視顯示器。視訊透視和光學透視顯示器可用於增強使用者對現實世界或實體物件的視覺感知。在視訊透視系統中，顯示現實世界場景的實況視訊（例如，包括在實況視訊上增強或加強的一或多個物件）。視訊透視系統可以使用行動設備（例如，行動電話顯示器上的視訊）、HMD或可以在視訊上顯示視訊和電腦產生的物件的其他合適的設備來實現。In some cases, two types of AR systems that may be used to provide AR content include video see-through (also known as video-through) displays and optical see-through displays. Video see-through and optical see-through displays can be used to enhance a user's visual perception of the real world or physical objects. In a video see-through system, a live video of a real world scene is displayed (eg, including one or more objects augmented or augmented on the live video). A video see-through system may be implemented using a mobile device (eg, video on a mobile phone display), an HMD, or other suitable device that can display video and computer-generated objects on top of the video.

具有AR特徵的光學透視系統可以將AR內容直接顯示在現實世界場景的視圖上（例如，不顯示現實世界場景的視訊內容）。例如，使用者可以經由顯示器（例如，眼鏡或透鏡）觀看現實世界場景中的實體物件，並且AR系統可以在顯示器上顯示AR內容（例如，投影或以其他方式顯示）以向使用者提供對一或多個現實世界物件的增強的視覺感知。光學透視AR系統或設備的實例是AR眼鏡、HMD、另一AR頭戴式耳機或其他類似設備，其可以包括在每隻眼睛前面的透鏡或眼鏡（或在兩隻眼睛上的單個透鏡或眼鏡），以允許使用者直接看到具有實體物件的現實世界場景，同時亦允許該物件的增強圖像或附加AR內容被投影到顯示器上，以增強使用者對現實世界場景的視覺感知。Optical see-through systems with AR features can display AR content directly on a view of a real-world scene (eg, without displaying video content of a real-world scene). For example, a user may view physical objects in a real world scene via a display (e.g., glasses or lenses), and the AR system may display AR content on the display (e.g., projected or otherwise displayed) to provide the user with insight into a or enhanced visual perception of multiple real-world objects. An example of an optical see-through AR system or device is AR glasses, an HMD, another AR headset, or other similar device, which may include a lens or glasses in front of each eye (or a single lens or glasses on both eyes ) to allow the user to directly see the real world scene with the physical object, and also allow the enhanced image of the object or additional AR content to be projected on the display to enhance the user's visual perception of the real world scene.

VR在三維電腦產生的VR環境或圖示現實世界環境的虛擬版本的視訊中提供完整的沉浸式體驗。VR環境可以以看似真實或實體的方式進行互動。當體驗VR環境的使用者在現實世界中移動時，在虛擬環境中渲染的圖像亦發生變化，給使用者一種使用者正在VR環境中移動的感覺。例如，使用者可以向左轉或向右轉、向上看或向下看，及/或向前或向後移動，從而改變使用者對VR環境的視點。呈現給使用者的VR內容可以相應變化，使得使用者的體驗如同在現實世界中一樣無瑕疵。在某些情況下，VR內容可以包括VR視訊，該等視訊可以以非常高的品質進行擷取和渲染，潛在地提供了真正沉浸式虛擬實境體驗。虛擬實境應用可以包括遊戲、訓練、教育、體育視訊、線上購物等等。VR內容可以使用諸如VR HMD或其他VR頭戴式設備的VR系統或設備來渲染和顯示，其在VR體驗期間完全覆蓋使用者的眼睛。VR provides a fully immersive experience in a three-dimensional computer-generated VR environment or a video that illustrates a virtual version of a real-world environment. VR environments can be interacted with in ways that appear to be real or physical. When the user experiencing the VR environment moves in the real world, the image rendered in the virtual environment also changes, giving the user a feeling that the user is moving in the VR environment. For example, the user may turn left or right, look up or down, and/or move forward or backward, thereby changing the user's viewpoint of the VR environment. The VR content presented to the user can change accordingly, making the user's experience as flawless as in the real world. In some cases, VR content can include VR video, which can be captured and rendered at very high quality, potentially providing a truly immersive virtual reality experience. Virtual reality applications may include gaming, training, education, sports video, online shopping, and more. VR content may be rendered and displayed using a VR system or device, such as a VR HMD or other VR headset, which completely covers the user's eyes during the VR experience.

MR技術可以結合VR和AR的各態樣，為使用者提供沉浸式體驗。例如，在MR環境中，現實世界和電腦產生的物件可以互動（例如，真實的人可以與虛擬的人互動，就好像虛擬的人是真實的人一樣）。MR technology can combine various aspects of VR and AR to provide users with an immersive experience. For example, in an MR environment, the real world and computer-generated objects can interact (eg, real people can interact with virtual people as if the virtual people were real people).

在一些情況下，XR系統可以追蹤使用者的部分（例如，使用者的手及/或指尖）以允許使用者與虛擬內容的專案互動。諸如智慧眼鏡或HMD的XR系統可以實現相機及/或一或多個感測器來追蹤XR系統和XR系統所在的實體環境內的其他物件的位置。XR系統可以使用此種追蹤資訊來為XR系統的使用者提供真實的XR體驗。例如，XR系統可以允許使用者體驗沉浸式虛擬環境或內容或者與沉浸式虛擬環境或內容互動。為了提供真實的XR體驗，一些XR系統或設備可以將虛擬內容與現實世界相結合。在一些情況下，XR系統或設備可以匹配物件和設備的相對姿態和移動。例如，XR系統可以使用追蹤資訊來計算現實世界環境的設備、物件及/或地圖的相對姿態，以便匹配設備、物件及/或現實世界環境的其他部分的相對位置和移動。使用一或多個設備、物件及/或現實世界環境的其他部分的姿勢和移動，XR系統可以以對XR系統的使用者看起來逼真的方式將內容錨定到現實世界環境。相對姿態資訊可用於將虛擬內容與使用者感知的運動和現實世界環境的設備、物件和其他部分的時空狀態相匹配。In some cases, the XR system can track parts of the user (eg, the user's hand and/or fingertips) to allow the user to interact with items of virtual content. An XR system such as smart glasses or an HMD may implement a camera and/or one or more sensors to track the location of the XR system and other objects within the physical environment in which the XR system is located. The XR system can use this tracking information to provide a real XR experience for the user of the XR system. For example, an XR system may allow a user to experience or interact with an immersive virtual environment or content. To provide a realistic XR experience, some XR systems or devices can combine virtual content with the real world. In some cases, an XR system or device can match the relative pose and movement of objects and devices. For example, an XR system may use tracking information to calculate relative poses of devices, objects, and/or maps of the real-world environment in order to match relative positions and movements of devices, objects, and/or other parts of the real-world environment. Using gestures and movements of one or more devices, objects, and/or other parts of the real-world environment, the XR system can anchor content to the real-world environment in a manner that appears realistic to a user of the XR system. Relative pose information can be used to match virtual content to the user's perceived motion and spatio-temporal state of devices, objects, and other parts of the real-world environment.

在一些實例中，XR系統可以用於提供用於與一或多個其他設備（諸如一或多個連接的設備、遙控器、控制台、行動設備等）互動的使用者指導資料。根據本文描述的系統和技術，可以利用XR系統來實現更直觀和自然的內容及/或與其他設備的互動。在一些實例中，XR系統可以偵測場景中的其他設備，以及促進及/或管理與其他設備的互動。在一些情況下，XR系統可以具有先前建立的與其他設備的連接（例如，配對等），此舉可以允許XR系統偵測場景中的其他設備。在其他實例中，XR系統可以維護包括位於場景中的其他設備的場景的地圖，當XR系統在場景中時，XR系統可以使用該地圖來偵測其他設備。在一些實例中，XR系統可以使用一或多個感測器，諸如圖像感測器、音訊感測器、雷達感測器、LIDAR感測器等，XR系統可以使用該等感測器來感測場景中的其他設備。在一些情況下，XR系統可以使用與場景相關聯的上下文資訊來決定其他設備在場景中。例如，XR系統可以包括關於場景及/或其他設備的上下文資訊，該上下文資訊向XR設備指示該其他設備出現在場景中。當XR系統在場景中時，XR系統可以基於上下文資訊來決定其他設備在場景中。XR系統可以從場景中偵測到的其他設備獲得資訊，以對與其他設備的互動進行促進、管理等。In some examples, the XR system can be used to provide user guidance material for interacting with one or more other devices (such as one or more connected devices, remote controls, consoles, mobile devices, etc.). According to the systems and techniques described herein, XR systems can be utilized to enable more intuitive and natural content and/or interaction with other devices. In some examples, the XR system can detect other devices in the scene, and facilitate and/or manage interactions with other devices. In some cases, the XR system may have previously established connections (eg, paired, etc.) with other devices, which may allow the XR system to detect other devices in the scene. In other examples, the XR system can maintain a map of the scene including other devices located in the scene, which the XR system can use to detect other devices when the XR system is in the scene. In some examples, the XR system may use one or more sensors, such as image sensors, audio sensors, radar sensors, LIDAR sensors, etc., which the XR system may use to Sense other devices in the scene. In some cases, the XR system may use contextual information associated with the scene to determine that other devices are in the scene. For example, the XR system may include contextual information about the scene and/or other devices that indicate to the XR device that the other devices are present in the scene. When the XR system is in the scene, the XR system can determine that other devices are in the scene based on contextual information. The XR system can obtain information from other devices detected in the scene to facilitate and manage interactions with other devices.

例如，XR系統可以從場景中偵測到的其他設備獲得指示或辨識該其他設備的一或多個輸入選項的輸入資料及/或與該場景、該其他設備及/或XR系統相關聯的上下文資訊。XR系統可以處理輸入資料及/或上下文資訊，並且向使用（例如，佩戴）XR系統的使用者呈現使用者介面、虛擬內容及/或輸入選項，用於與設備進行互動（例如，控制、存取內容、存取狀態資訊、存取輸出等）。在一些情況下，XR系統亦可以用於基於輸入資料及/或上下文資訊來控制場景中的其他設備，如本文中進一步描述的。For example, the XR system may obtain input data from other devices detected in the scene indicating or identifying one or more input options for the other devices and/or context associated with the scene, the other devices, and/or the XR system Information. The XR system can process input data and/or contextual information and present a user interface, virtual content, and/or input options to a user using (e.g., wearing) the XR system for interacting with the device (e.g., controlling, storing access content, access status information, access output, etc.). In some cases, the XR system can also be used to control other devices in the scene based on input data and/or contextual information, as further described herein.

在一些實例中，XR系統可以從其他設備、從伺服器（例如，涉及其他設備的操作的基於雲端的伺服器），及/或從另一源獲得或接收指示一或多個輸入選項的輸入資料。輸入資料可指示可用於與其他設備互動的一些或所有輸入選項，諸如輸入類型（例如，基於手勢、基於語音、基於觸摸等）、基於特定輸入的功能（例如，向右滑動可以使恒溫器提高溫度，等等），以及其他資訊。在一些實例中，若設備不向XR系統傳送任何輸入資料（例如，在XR系統向設備發送請求之後），則XR系統可以決定或推斷該設備不具有與XR系統（例如，經由無線網路）通訊的能力。在此種實例中，XR系統可以呈現指令或其他資訊（例如，突出顯示設備上的特定按鈕）來幫助使用者決定如何與設備互動。In some examples, the XR system may obtain or receive input indicative of one or more input options from other devices, from a server (e.g., a cloud-based server related to the operation of other devices), and/or from another source material. Input profiles may indicate some or all of the input options available for interacting with other devices, such as type of input (e.g., gesture-based, voice-based, touch-based, etc.), functionality based on a particular input (e.g., temperature, etc.), and other information. In some instances, if the device does not transmit any input data to the XR system (e.g., after the XR system sends a request to the device), the XR system may determine or infer that the device does not have a connection with the XR system (e.g., via a wireless network). ability to communicate. In such instances, the XR system can present instructions or other information (eg, highlighting a particular button on the device) to help the user decide how to interact with the device.

XR系統可以使用上下文資訊來決定內容、輸入選項、使用者介面及/或模態，以輸出給使用者用於與其他設備互動。在一些實例中，XR系統可以在本端及/或從一或多個遠端源（例如，伺服器、雲端、網際網路、其他設備、一或多個感測器等）（例如，使用XR系統的一或多個感測器，諸如一或多個相機、一或多個慣性量測單元（IMU）等）獲得或接收上下文資訊。上下文資訊可以與其他設備（使用者可以使用XR系統與之互動）、XR系統及/或其他設備所處的場景或環境、嘗試與其他設備互動的XR系統的使用者，及/或任何給定的時間點的其他上下文相關。例如，上下文資訊可以包括與其他設備的預期使用者互動、使用者在場景中的一或多個動作、與使用者相關聯的特性或個人資訊、與使用者和其他設備相關聯的歷史資訊（例如，使用者對設備的過去使用等）、其他設備的使用者介面能力（例如，其是否具有對使用者可見或以其他方式可存取的向外/外部控制）、與其他設備相關聯的資訊（例如，其他設備離XR系統有多遠）、與場景相關聯的資訊（例如，照明、雜訊等），及/或其他資訊。XR systems can use contextual information to determine content, input options, user interface and/or modalities for output to the user for interaction with other devices. In some examples, the XR system can be implemented locally and/or from one or more remote sources (e.g., server, cloud, Internet, other device, one or more sensors, etc.) (e.g., using One or more sensors of the XR system, such as one or more cameras, one or more inertial measurement units (IMUs), etc., obtain or receive contextual information. Contextual information may relate to other devices with which a user may interact with the XR System, the scene or environment in which the XR System and/or other devices are located, the user of the XR System attempting to interact with the other device, and/or any given Other contexts at the point in time. For example, contextual information may include expected user interactions with other devices, one or more actions of the user within a scene, characteristic or personal information associated with the user, historical information associated with the user and other devices ( For example, the user's past use of the device, etc.), the user interface capabilities of other devices (for example, whether it has outward/external controls that are visible or otherwise accessible to the user), the information associated with other devices information (eg, how far other devices are from the XR system), information associated with the scene (eg, lighting, noise, etc.), and/or other information.

上下文資訊可以向XR系統提供與使用者、XR系統、（多個）其他設備、場景等相關聯的情況/上下文的上下文感知。使用上下文資訊和辨識與其他設備相關聯的輸入選項的資料，XR系統可以輸出（例如，呈現、提供、產生等）對應於一或多個輸入選項的使用者互動資料，該等輸入選項使得使用者能夠與其他設備互動。在一些情況下，XR系統可以呈現對應於一或多個輸入選項的可視內容/資料。例如，使用者互動資料可以包括與輸入選項相關聯的一或多個使用者介面元素、指示如何提供與輸入選項相關聯的輸入的提示及/或其他資料。在一些情況下，XR系統可以替代地或附加地輸出對應於一或多個輸入選項的非視覺資料。例如，XR系統可以輸出觸覺及/或音訊資訊，諸如對應於一或多個輸入選項的音訊提示或指令。The contextual information may provide the XR system with a contextual awareness of the situation/context associated with the user, the XR system, other device(s), the scene, etc. Using contextual information and data identifying input options associated with other devices, the XR system can output (e.g., present, provide, generate, etc.) user interaction data corresponding to one or more input options that cause the user to or to interact with other devices. In some cases, the XR system may present visual content/material corresponding to one or more input options. For example, user interaction data may include one or more user interface elements associated with an input option, prompts indicating how to provide input associated with an input option, and/or other data. In some cases, the XR system may alternatively or additionally output non-visual material corresponding to one or more input options. For example, an XR system may output tactile and/or audio information, such as audio prompts or instructions corresponding to one or more input options.

上下文資訊可以使XR系統能夠輸出（例如，視覺虛擬內容、音訊內容、觸覺回饋等）在給定與使用者、XR系統、其他設備、場景相關聯的情形/背景的情況下是上下文適當的內容，及/或以其他方式促進與其他設備的互動的內容。在一些實例中，XR系統可以使用上下文資訊來簡化使用者互動及/或相關聯的資料/內容。例如，在一些情況下，其他設備可以具有多種輸入功能/能力。XR系統可以使用上下文資訊來過濾與和XR系統及/或當前上下文相關的設備相關聯的輸入選項。XR系統可以輸出過濾/減少數量的輸入選項，以簡化使用者互動、使用者互動資料/內容等。Contextual information may enable an XR system to output (e.g., visual virtual content, audio content, haptic feedback, etc.) content that is contextually appropriate given the situation/context associated with the user, XR system, other device, scene , and/or otherwise facilitate interaction with other devices. In some examples, the XR system can use contextual information to simplify user interaction and/or associated data/content. For example, in some cases other devices may have multiple input functions/capabilities. The XR system may use contextual information to filter input options associated with devices relevant to the XR system and/or the current context. The XR system can output filter/reduce number of input options to simplify user interaction, user interaction data/content, etc.

如前述，XR系統可以使用輸入資料和上下文資訊來渲染使用者介面及/或輸入選項，以供使用者與來自XR系統的其他設備互動。在一些情況下，XR系統可以與其他設備通訊，以基於與XR系統所渲染的使用者介面的使用者互動來向其他設備提供輸入/命令。例如，XR系統可以顯示使用者介面，使得其在使用者看來是其他設備或其他設備的一部分（例如，控制項、表面、顯示器、面板等）上的覆加，以促進與其他設備的使用者互動。XR系統可以偵測和翻譯與使用者介面的使用者互動，以產生控制命令及/或使用者互動指令。例如，XR系統可以偵測使用者手勢及/或經由設備（例如，經由XR系統及/或控制器）的輸入，並且將此種使用者手勢及/或輸入解釋/翻譯為與使用者介面的互動。XR系統隨後可以基於解釋/翻譯的使用者手勢及/或輸入來產生用於控制其他設備的命令/指令。XR系統可以使用與覆加的使用者介面的互動來基於使用者輸入及/或來自其他設備的存取資料/輸出來控制其他設備。在一些情況下，為了促進及/或改良與其他設備的使用者互動，XR系統可以定位及/或映射其他設備，使得XR系統可以準確地渲染相對於其他設備或其他設備的一部分的使用者介面。As mentioned above, the XR system can use the input data and contextual information to render a user interface and/or input options for the user to interact with other devices from the XR system. In some cases, the XR system can communicate with other devices to provide input/commands to the other devices based on user interaction with the user interface rendered by the XR system. For example, an XR system may display a user interface such that it appears to the user to be an overlay on or part of another device (e.g., controls, faces, displays, panels, etc.) to facilitate use with other devices interaction. The XR system can detect and interpret user interactions with the user interface to generate control commands and/or user interaction commands. For example, the XR system can detect user gestures and/or input via the device (e.g., via the XR system and/or the controller) and interpret/translate such user gestures and/or input into interactions with the user interface interactive. The XR system can then generate commands/instructions for controlling other devices based on the interpreted/translated user gestures and/or inputs. The XR system can use interaction with the overlay user interface to control other devices based on user input and/or accessing data/output from other devices. In some cases, in order to facilitate and/or improve user interaction with other devices, the XR system may locate and/or map other devices so that the XR system can accurately render the user interface relative to the other device or a portion of the other device .

在一些實例中，XR系統可以以世界鎖定或螢幕鎖定的方式呈現覆加在場景上的使用者介面，以經由使用者介面向使用者提供使用者互動指導資料及/或輸入選項。在一些情況下，覆加可以包括一或多個圖形使用者介面元素，其具有向使用者指示如何與一或多個圖形使用者介面元素互動的指導資訊。In some examples, the XR system can present a user interface superimposed on the scene in a world-locked or screen-locked manner, so as to provide the user with user interaction guidance information and/or input options through the user interface. In some cases, the overlay may include one or more GUI elements with guidance information instructing the user how to interact with the one or more GUI elements.

關於虛擬私人空間的產生的進一步細節在此參考各種附圖來提供。圖1是圖示根據本案的一些態樣的示例性擴展現實（XR）系統100的圖。XR系統100可以運行（或執行）XR應用程式並實現XR操作。在一些實例中，作為XR體驗的一部分，XR系統100可以執行追蹤和定位、現實世界（例如，場景）的映射以及虛擬內容在顯示器109上的定位和渲染。例如，XR系統100可以產生現實世界中場景的地圖（例如，三維（3D）地圖）、追蹤XR系統100相對於場景（例如，相對於場景的3D地圖）的姿態（例如，地點和位置）、將虛擬內容定位及/或錨定在場景地圖上的（多個）特定地點，以及在顯示器109上渲染虛擬內容，使得虛擬內容看起來位於與虛擬內容被定位及/或錨定的場景地圖上的特定地點相對應的場景中的地點。顯示器109可以包括玻璃、螢幕、一或多個透鏡、投影儀及/或允許使用者看到現實世界環境並且亦允許XR虛擬內容顯示在其上的其他顯示機制。Further details regarding the creation of virtual private spaces are provided herein with reference to the various figures. FIG. 1 is a diagram illustrating an example extended reality (XR) system 100 in accordance with some aspects of the present disclosure. The XR system 100 can run (or execute) XR applications and implement XR operations. In some instances, XR system 100 may perform tracking and positioning, mapping of the real world (eg, scene), and positioning and rendering of virtual content on display 109 as part of the XR experience. For example, the XR system 100 can generate a map of a scene in the real world (e.g., a three-dimensional (3D) map), track the pose (e.g., location and position) of the XR system 100 relative to the scene (e.g., relative to the 3D map of the scene), positioning and/or anchoring the virtual content at specific location(s) on the scene map, and rendering the virtual content on the display 109 such that the virtual content appears to be located on the scene map at which the virtual content is positioned and/or anchored The location in the scene corresponding to the specific location of the . The display 109 may include glass, a screen, one or more lenses, a projector, and/or other display mechanisms that allow the user to see the real world environment and also allow XR virtual content to be displayed thereon.

在該說明性實例中，XR系統100包括一或多個圖像感測器102、加速度計104、陀螺儀106、儲存裝置107、計算元件110、XR引擎120、輸入選項引擎122、上下文管理引擎123、圖像處理引擎124和渲染引擎126。應當注意，圖1所示的元件102-126是出於說明和解釋的目的而提供的非限制性實例，並且其他實例可以包括比圖1所示更多、更少或不同的元件。例如，在一些情況下，XR系統100可以包括一或多個其他感測器（例如，除了加速度計104和陀螺儀106之外的一或多個慣性量測單元（IMU）、雷達、光探測和測距（LIDAR）感測器、音訊感測器等）、一或多個顯示設備、一或多個其他處理引擎、一或多個其他硬體元件及/或圖1中未圖示的一或多個其他軟體及/或硬體元件。以下參考圖9進一步描述了可以由XR系統100實現的示例性架構和示例性硬體元件。In this illustrative example, XR system 100 includes one or more image sensors 102, accelerometer 104, gyroscope 106, storage device 107, computing element 110, XR engine 120, input options engine 122, context management engine 123 , an image processing engine 124 and a rendering engine 126 . It should be noted that the elements 102-126 shown in FIG. 1 are non-limiting examples provided for purposes of illustration and explanation, and that other examples may include more, fewer, or different elements than those shown in FIG. 1 . For example, in some cases, XR system 100 may include one or more other sensors (e.g., one or more inertial measurement units (IMUs) in addition to accelerometer 104 and gyroscope 106, radar, light detection and ranging (LIDAR) sensors, audio sensors, etc.), one or more display devices, one or more other processing engines, one or more other hardware components, and/or one or more other software and/or hardware components. An exemplary architecture and exemplary hardware elements that may be implemented by the XR system 100 are further described below with reference to FIG. 9 .

此外，出於簡單和解釋的目的，一或多個圖像感測器102在此處將被稱為圖像感測器102（例如，以單數形式）。然而，一般技術者將認識到，XR系統100可以包括單個圖像感測器或多個圖像感測器。此外，以單數或複數形式提及XR系統100的任何元件（例如，102-126）不應被解釋為將由XR系統100實現的此種元件的數量限制為一個或多於一個。例如，以單數形式提及加速度計104不應被解釋為將XR系統100實現的加速度計的數量限制為一個。一般技術者將認識到，對於圖1所示的元件102-126中的任何一個，XR系統100可以僅包括一個此種元件或者多於一個此種元件。Furthermore, for simplicity and explanation purposes, one or more image sensors 102 will be referred to herein as image sensors 102 (eg, in the singular). However, those of ordinary skill will recognize that XR system 100 may include a single image sensor or multiple image sensors. Furthermore, reference to any element of XR system 100 in singular or plural (eg, 102 - 126 ) should not be construed as limiting the number of such elements implemented by XR system 100 to one or more than one. For example, reference to accelerometer 104 in the singular should not be construed as limiting the number of accelerometers implemented by XR system 100 to one. Those of ordinary skill will recognize that for any of the elements 102-126 shown in FIG. 1, the XR system 100 may include only one such element or more than one such element.

XR系統100包括輸入設備108或與輸入設備108（有線或無線地）通訊。輸入設備108可以包括任何合適的輸入設備，諸如觸控式螢幕、筆或其他指標裝置、鍵盤、滑鼠、按鈕或按鍵、用於接收語音命令的麥克風、用於接收手勢命令的手勢輸入設備、其任何組合及/或其他輸入設備。在一些情況下，圖像感測器102可以擷取圖像，該等圖像可以被處理用於解釋手勢命令。The XR system 100 includes or is in communication (wired or wirelessly) with an input device 108 . Input device 108 may include any suitable input device, such as a touch screen, pen or other pointing device, keyboard, mouse, buttons or keys, a microphone for receiving voice commands, a gesture input device for receiving gesture commands, Any combination thereof and/or other input devices. In some cases, image sensor 102 may capture images that may be processed to interpret gesture commands.

XR系統100可以是單個計算設備或多個計算設備的一部分，或者由單個計算設備或多個計算設備實現。在一些實例中，XR系統100可以是諸如擴展現實頭戴式顯示（HMD）設備、擴展現實眼鏡（例如，增強現實或AR眼鏡）、相機系統（例如，數位相機、IP相機、攝像機、安全性相機等）、電話系統（例如，智慧手機、蜂巢式電話、會議系統等）、桌上型電腦、膝上型或筆記型電腦、平板電腦、機上盒、智慧電視、顯示設備、遊戲控制台、視訊串流設備、IoT（物聯網路）設備及/或任何其他合適的（多個）電子設備的電子設備（或多個設備）的一部分。XR system 100 may be part of or be implemented by a single computing device or multiple computing devices. In some examples, XR system 100 may be a device such as an extended reality head-mounted display (HMD) device, extended reality glasses (eg, augmented reality or AR glasses), a camera system (eg, digital camera, IP camera, video camera, security cameras, etc.), telephone systems (e.g., smartphones, cellular phones, conferencing systems, etc.), desktops, laptops or notebooks, tablets, set-top boxes, smart TVs, display devices, game consoles , a video streaming device, an IoT (Internet of Things) device and/or any other suitable electronic device(s) that is part of an electronic device (or devices).

在一些實施方式中，一或多個圖像感測器102、加速度計104、陀螺儀106、儲存裝置107、計算元件110、XR引擎120、輸入選項引擎122、上下文管理引擎123、圖像處理引擎124和渲染引擎126可以是同一計算設備的一部分。例如，在一些情況下，一或多個圖像感測器102、加速度計104、陀螺儀106、儲存裝置107、計算元件110、XR引擎120、輸入選項引擎122、上下文管理引擎123、圖像處理引擎124和渲染引擎126可以被整合到HMD、擴展現實眼鏡、智慧手機、膝上型電腦、平板電腦、遊戲系統及/或任何其他計算設備中。然而，在一些實施方式中，一或多個圖像感測器102、加速度計104、陀螺儀106、儲存裝置107、計算元件110、XR引擎120、輸入選項引擎122、上下文管理引擎123、圖像處理引擎124和渲染引擎126可以是兩個或更多個獨立計算設備的一部分。例如，在一些情況下，元件102-126中的一些可以是一個計算設備的一部分或由一個計算設備實現，而其餘元件可以是一或多個其他計算設備的一部分或由一或多個其他計算設備實現。In some implementations, one or more of image sensor 102, accelerometer 104, gyroscope 106, storage device 107, computing element 110, XR engine 120, input options engine 122, context management engine 123, image processing Engine 124 and rendering engine 126 may be part of the same computing device. For example, in some cases, one or more of image sensor 102, accelerometer 104, gyroscope 106, storage device 107, computing element 110, XR engine 120, input options engine 122, context management engine 123, image Processing engine 124 and rendering engine 126 may be integrated into an HMD, extended reality glasses, smartphone, laptop, tablet, gaming system, and/or any other computing device. However, in some implementations, one or more of image sensor 102, accelerometer 104, gyroscope 106, storage device 107, computing element 110, XR engine 120, input options engine 122, context management engine 123, graph Like processing engine 124 and rendering engine 126 may be part of two or more separate computing devices. For example, in some cases some of elements 102-126 may be part of or implemented by one computing device, while remaining elements may be part of or implemented by one or more other computing devices. device implementation.

儲存裝置107可以是用於儲存資料的任何（多個）儲存設備。此外，儲存裝置107可以儲存來自XR系統100的任何元件的資料。例如，儲存裝置107可以儲存來自圖像感測器102的資料（例如，圖像或視訊資料）、來自加速度計104的資料（例如，量測）、來自陀螺儀106的資料（例如，量測）、來自計算元件110的資料（例如，處理參數、偏好、虛擬內容、渲染內容、場景地圖、追蹤和定位資料、物件偵測資料、隱私資料、XR應用資料、臉孔辨識資料、遮擋資料等）、來自XR引擎120的資料、來自輸入選項引擎122的資料、來自上下文管理引擎123的資料、來自圖像處理引擎124的資料，及/或來自渲染引擎126的資料（例如，輸出訊框）。在一些實例中，儲存裝置107可以包括用於儲存用於由計算元件110處理的訊框的緩衝器。Storage device 107 may be any storage device(s) for storing data. Additionally, storage device 107 may store data from any of the components of XR system 100 . For example, storage device 107 may store data from image sensor 102 (eg, image or video data), data from accelerometer 104 (eg, measurements), data from gyroscope 106 (eg, measurements ), data from computing element 110 (e.g., processing parameters, preferences, virtual content, rendered content, scene maps, tracking and positioning data, object detection data, privacy data, XR application data, face recognition data, occlusion data, etc. ), data from the XR engine 120, data from the input options engine 122, data from the context management engine 123, data from the image processing engine 124, and/or data from the rendering engine 126 (e.g., output frames) . In some examples, storage device 107 may include a buffer for storing frames for processing by computing element 110 .

一或多個計算元件110可以包括中央處理單元（CPU）112、圖形處理單元（GPU）114、數位信號處理器（DSP）116及/或圖像信號處理器（ISP）118。計算元件110可以執行各種操作，諸如圖像增強、電腦視覺、圖形渲染、擴展現實（例如，追蹤、定位、姿態估計、映射、內容錨定、內容渲染等）、圖像/視訊處理、感測器處理、辨識（例如，文字辨識、臉孔辨識、物件辨識、特徵辨識、追蹤或模式辨識、場景辨識、遮擋偵測等）、機器學習、過濾以及此處描述的各種操作中的任何一種。在該實例中，計算元件110實現XR引擎120、輸入選項引擎122、上下文管理引擎123、圖像處理引擎124和渲染引擎126。在其他實例中，計算元件110亦可以實現一或多個其他處理引擎。One or more computing elements 110 may include a central processing unit (CPU) 112 , a graphics processing unit (GPU) 114 , a digital signal processor (DSP) 116 and/or an image signal processor (ISP) 118 . Computing element 110 may perform various operations such as image enhancement, computer vision, graphics rendering, extended reality (e.g., tracking, localization, pose estimation, mapping, content anchoring, content rendering, etc.), image/video processing, sensing processor processing, recognition (eg, text recognition, face recognition, object recognition, feature recognition, tracking or pattern recognition, scene recognition, occlusion detection, etc.), machine learning, filtering, and any of the various operations described herein. In this example, computing element 110 implements XR engine 120 , input options engine 122 , context management engine 123 , image processing engine 124 , and rendering engine 126 . In other examples, the computing element 110 may also implement one or more other processing engines.

圖像感測器102可以包括任何圖像及/或視訊感測器或擷取設備。在一些實例中，圖像感測器102可以是多相機組件（諸如雙相機組件、三相機組件、四相機組件或其他數量的相機）的一部分。在一些實例中，圖像感測器102可以包括一或多個可見光相機（例如，被配置成擷取單色或彩色圖像，諸如紅綠藍或RGB圖像）、一或多個紅外（IR）相機及/或近紅外（NIR）相機、一或多個深度感測器及/或其他類型的（多個）圖像感測器或（多個）相機的任何組合。The image sensor 102 may include any image and/or video sensor or capture device. In some examples, image sensor 102 may be part of a multi-camera assembly, such as a two-camera assembly, three-camera assembly, four-camera assembly, or other number of cameras. In some examples, image sensor 102 may include one or more visible light cameras (eg, configured to capture monochrome or color images, such as red-green-blue or RGB images), one or more infrared ( IR) camera and/or near infrared (NIR) camera, one or more depth sensors and/or other types of image sensor(s) or camera(s).

圖像感測器102可以擷取圖像及/或視訊內容（例如，原始圖像及/或視訊資料），其隨後可以由計算元件110、XR引擎120、輸入選項引擎122、上下文管理引擎123、圖像處理引擎124及/或渲染引擎126處理，如本文所述。例如，圖像感測器102可以擷取圖像資料，並且可以基於圖像資料產生訊框，及/或可以將圖像資料或訊框提供給計算元件110、XR引擎120、輸入選項引擎122、上下文管理引擎123、圖像處理引擎124及/或渲染引擎126進行處理。訊框可以包括視訊序列的視訊訊框或靜止圖像。訊框可以包括表示場景的圖元陣列。例如，訊框可以是每圖元具有紅色、綠色和藍色分量的紅綠藍（RGB）訊框；每圖元具有一個亮度分量和兩個色度（顏色）分量（色度-紅和色度-藍）的亮度、色度-紅、色度-藍（YCbCr）訊框；或者任何其他合適類型的彩色或單色圖片。Image sensor 102 may capture image and/or video content (e.g., raw image and/or video data), which may then be generated by computing element 110, XR engine 120, input options engine 122, context management engine 123 , image processing engine 124 and/or rendering engine 126, as described herein. For example, image sensor 102 may capture image data, and may generate frames based on the image data, and/or may provide image data or frames to computing element 110, XR engine 120, input option engine 122 , the context management engine 123, the image processing engine 124 and/or the rendering engine 126 for processing. A frame may comprise a video frame or a still image of a video sequence. A frame may include an array of primitives representing a scene. For example, a frame could be a red-green-blue (RGB) frame with red, green, and blue components per picture element; chroma-blue), chroma-red, chroma-blue (YCbCr) frames; or any other suitable type of color or monochrome picture.

在一些情況下，圖像感測器102（及/或XR系統100的其他相機）可以被配置成亦擷取深度資訊。例如，在一些實施方式中，圖像感測器102（及/或其他相機）可以包括RGB深度（RGB-D）相機。在一些實例中，XR系統100可以包括與圖像感測器102（及/或其他相機）分離並且可以擷取深度資訊的一或多個深度感測器（未圖示）。例如，此種深度感測器可以獨立於圖像感測器102獲得深度資訊。在一些實例中，深度感測器可以實體地安裝在與圖像感測器102相同的大致地點，但是可以以與圖像感測器102不同的頻率或訊框率工作。在一些實例中，深度感測器可以採取光源的形式，該光源可以將結構化或紋理化的光圖案投射到場景中的一或多個物件上，該光圖案可以包括一或多個窄頻光。隨後，經由利用由物件的表面形狀引起的投影圖案的幾何失真，可以獲得深度資訊。在一個實例中，深度資訊可以從立體感測器（諸如紅外結構光投影儀和配準到相機（例如，RGB相機）的紅外相機的組合）獲得。In some cases, image sensor 102 (and/or other cameras of XR system 100 ) may be configured to also capture depth information. For example, in some implementations, image sensor 102 (and/or other cameras) may include an RGB depth (RGB-D) camera. In some examples, XR system 100 may include one or more depth sensors (not shown) that are separate from image sensor 102 (and/or other cameras) and that may capture depth information. For example, such a depth sensor can obtain depth information independently of the image sensor 102 . In some examples, a depth sensor may be physically mounted at the same general location as image sensor 102 , but may operate at a different frequency or frame rate than image sensor 102 . In some examples, the depth sensor can take the form of a light source that can project a structured or textured light pattern onto one or more objects in the scene, the light pattern can include one or more narrow-band Light. Depth information can then be obtained by exploiting the geometric distortion of the projected pattern caused by the surface shape of the object. In one example, depth information may be obtained from a stereo sensor such as a combination of an infrared structured light projector and an infrared camera registered to the camera (eg, an RGB camera).

XR系統100亦包括除圖像感測器102之外的一或多個感測器。一或多個感測器可以包括一或多個加速度計（例如，加速度計104）、一或多個陀螺儀（例如，陀螺儀106）及/或其他感測器。一或多個感測器可以向計算元件110提供速度、方位及/或其他位置相關資訊。例如，加速度計104可以偵測XR系統100的加速度，並且可以基於偵測到的加速度產生加速度量測。在一些情況下，加速度計104可以提供一或多個平移向量（例如，上/下、左/右、前/後），其可以用於決定XR系統100的位置或姿態。陀螺儀106可以偵測和量測XR系統100的方位和角速度。例如，陀螺儀106可以用於量測XR系統100的俯仰、滾動和偏航。在一些情況下，陀螺儀106可以提供一或多個旋轉向量（例如，俯仰、偏航、滾動）。在一些實例中，圖像感測器102及/或XR引擎120可以使用由加速度計104（例如，一或多個平移向量）及/或陀螺儀106（例如，一或多個旋轉向量）獲得的量測來計算XR系統100的姿態。如前述，在其他實例中，XR系統100亦可以包括其他感測器，諸如慣性量測單元（IMU）、磁力計、凝視及/或眼睛追蹤感測器（例如，眼睛追蹤相機）、機器視覺感測器、智慧場景感測器、語音辨識感測器、衝擊感測器、震動感測器、位置感測器、傾斜感測器等。XR system 100 also includes one or more sensors other than image sensor 102 . The one or more sensors may include one or more accelerometers (eg, accelerometer 104 ), one or more gyroscopes (eg, gyroscope 106 ), and/or other sensors. One or more sensors may provide velocity, orientation, and/or other position-related information to computing element 110 . For example, accelerometer 104 may detect acceleration of XR system 100 and may generate an acceleration measurement based on the detected acceleration. In some cases, accelerometer 104 may provide one or more translation vectors (eg, up/down, left/right, forward/backward), which may be used to determine the position or attitude of XR system 100 . The gyroscope 106 can detect and measure the orientation and angular velocity of the XR system 100 . For example, gyroscope 106 may be used to measure pitch, roll, and yaw of XR system 100 . In some cases, gyroscope 106 may provide one or more rotation vectors (eg, pitch, yaw, roll). In some examples, image sensor 102 and/or XR engine 120 may use data obtained from accelerometer 104 (eg, one or more translation vectors) and/or gyroscope 106 (eg, one or more rotation vectors). to calculate the attitude of the XR system 100 . As mentioned above, in other examples, the XR system 100 may also include other sensors, such as inertial measurement units (IMUs), magnetometers, gaze and/or eye-tracking sensors (eg, eye-tracking cameras), machine vision sensors, smart scene sensors, voice recognition sensors, impact sensors, vibration sensors, position sensors, tilt sensors, etc.

在一些情況下，一或多個感測器可以包括至少一個IMU。IMU是使用一或多個加速度計、一或多個陀螺儀及/或一或多個磁力計的組合來量測XR系統100的比力、角速度及/或方位的電子設備。在一些實例中，一或多個感測器可以輸出與由圖像感測器102（及/或XR系統100的其他相機）擷取的圖像的擷取相關聯的量測資訊及/或使用XR系統100的一或多個深度感測器獲得的深度資訊。In some cases, the one or more sensors may include at least one IMU. An IMU is an electronic device that measures specific force, angular velocity, and/or orientation of the XR system 100 using a combination of one or more accelerometers, one or more gyroscopes, and/or one or more magnetometers. In some examples, one or more sensors may output measurement information and/or Depth information obtained using one or more depth sensors of the XR system 100 .

一或多個感測器（例如，加速度計104、陀螺儀106、一或多個其他類型的IMU及/或其他感測器）的輸出可以被擴展現實引擎120用來決定XR系統100的姿態（亦稱為頭部姿態）及/或圖像感測器102（或XR系統100的其他相機）的姿態。在一些情況下，XR系統100的姿態和圖像感測器102（或其他相機）的姿態可以是相同的。圖像感測器102的姿態指的是圖像感測器102相對於參照系（例如，相對於物件202）的位置和方位。在一些實施方式中，可以針對6自由度（6DOF）來決定相機姿態，6自由度是指三個平移分量（例如，其可以由相對於諸如圖像平面的參照系的X（水平）、Y（垂直）和Z（深度）座標來提供）和三個角度分量（例如，相對於同一參照系的滾動、俯仰和偏轉）。The output of one or more sensors (e.g., accelerometer 104, gyroscope 106, one or more other types of IMUs, and/or other sensors) may be used by extended reality engine 120 to determine the pose of XR system 100 (also referred to as head pose) and/or the pose of the image sensor 102 (or other cameras of the XR system 100). In some cases, the pose of XR system 100 and the pose of image sensor 102 (or other camera) may be the same. The pose of the image sensor 102 refers to the position and orientation of the image sensor 102 relative to a frame of reference (eg, relative to the object 202 ). In some implementations, the camera pose can be determined for 6 degrees of freedom (6DOF), which refers to three translational components (e.g., which can be defined by X (horizontal), Y (vertical) and Z (depth) coordinates) and three angular components (for example, roll, pitch, and yaw relative to the same frame of reference).

在一些情況下，設備追蹤器（未圖示）可以使用來自一或多個感測器的量測和來自圖像感測器102的圖像資料來追蹤XR系統100的姿態（例如，6DOF姿態）。例如，設備追蹤器可以將來自圖像資料的視覺資料（例如，使用視覺追蹤解決方案）與來自量測的慣性資料融合，以決定XR系統100相對於現實世界（例如，場景）和現實世界的地圖的位置和運動。如下所述，在一些實例中，當追蹤XR系統100的姿態時，設備追蹤器可以產生場景（例如，真實世界）的三維（3D）地圖及/或產生場景的3D地圖的更新。3D地圖更新可以包括，例如但不限於，與場景及/或場景的3D地圖相關聯的新的或更新的特徵及/或特徵或界標點、辨識或更新XR系統100在場景和場景的3D地圖內的位置的定位更新等。3D地圖可以提供真實/現實世界中場景的數位表示。在一些實例中，3D地圖可以將基於位置的物件及/或內容錨定到真實世界的座標及/或物件。XR系統100可以使用映射的場景（例如，由3D地圖表示的及/或與3D地圖相關聯的現實世界中的場景）來合併現實和虛擬世界及/或將虛擬內容或物件與現實環境合併。In some cases, a device tracker (not shown) may use measurements from one or more sensors and image data from image sensor 102 to track the pose of XR system 100 (e.g., 6DOF pose ). For example, a device tracker can fuse visual data from imagery (e.g., using a visual tracking solution) with inertial data from measurements to determine how the XR system 100 behaves relative to the real world (e.g., a scene) and Map location and movement. As described below, in some examples, the device tracker may generate a three-dimensional (3D) map of a scene (eg, the real world) and/or generate updates to the 3D map of the scene while tracking the pose of the XR system 100 . 3D map updates may include, for example and without limitation, new or updated features and/or features or landmark points associated with the scene and/or the 3D map of the scene, identifying or updating the position of the XR system 100 in the scene and/or the 3D map of the scene Location updates for locations within the . A 3D map can provide a digital representation of a scene in the real/real world. In some examples, a 3D map can anchor location-based objects and/or content to real-world coordinates and/or objects. The XR system 100 may use a mapped scene (eg, a scene in the real world represented by and/or associated with a 3D map) to merge real and virtual worlds and/or to merge virtual content or objects with a real environment.

在一些態樣，圖像感測器102及/或XR系統100的姿態作為整體可以由計算元件110基於圖像感測器102（及/或XR系統100的其他相機）擷取的圖像使用視覺追蹤解決方案來決定及/或追蹤。例如，在一些實例中，計算元件110可以使用基於電腦視覺的追蹤、基於模型的追蹤及/或同步定位和繪圖（SLAM）技術來執行追蹤。例如，計算元件110可以執行SLAM或者可以與SLAM引擎（未圖示）通訊（有線或無線）。SLAM是指其中建立環境的地圖（例如，由XR系統100建模的環境的地圖）、同時追蹤相機（例如，圖像感測器102）及/或XR系統100相對於該地圖的姿態的一類技術。該地圖可以被稱為SLAM地圖，並且可以是三維（3D）的。SLAM技術可以使用由圖像感測器102（及/或XR系統100的其他相機）擷取的彩色或灰階圖像資料來執行，並且可以用於產生圖像感測器102及/或XR系統100的6DOF姿態量測的估計。此種被配置為執行6DOF追蹤的SLAM技術可以被稱為6DOF SLAM。在一些情況下，一或多個感測器（例如，加速度計104、陀螺儀106、一或多個IMU及/或其他感測器）的輸出可以用於估計、校正及/或以其他方式調整估計的姿態。In some aspects, the pose of image sensor 102 and/or XR system 100 as a whole may be used by computing element 110 based on images captured by image sensor 102 (and/or other cameras of XR system 100 ). Vision tracking solutions to determine and/or track. For example, in some examples computing element 110 may perform tracking using computer vision-based tracking, model-based tracking, and/or simultaneous localization and mapping (SLAM) techniques. For example, computing element 110 may perform SLAM or may communicate (wired or wirelessly) with a SLAM engine (not shown). SLAM refers to a class in which a map of an environment (e.g., a map of the environment modeled by the XR system 100) is created while simultaneously tracking the pose of the camera (e.g., the image sensor 102) and/or the XR system 100 relative to the map technology. This map may be referred to as a SLAM map, and may be three-dimensional (3D). SLAM techniques can be performed using color or grayscale image data captured by image sensor 102 (and/or other cameras of XR system 100) and can be used to generate image sensor 102 and/or XR system 100 Estimation of 6DOF pose measurements of system 100 . Such SLAM techniques configured to perform 6DOF tracking may be referred to as 6DOF SLAM. In some cases, the output of one or more sensors (e.g., accelerometer 104, gyroscope 106, one or more IMUs, and/or other sensors) may be used to estimate, correct, and/or otherwise Adjust the estimated pose.

在一些情況下，6DOF SLAM（例如，6DOF追蹤）可以將從來自圖像感測器102（及/或其他相機）的某些輸入圖像觀察到的特徵與SLAM圖相關聯。例如，6DOF SLAM可以使用來自輸入圖像的特徵點關聯來決定輸入圖像的圖像感測器102及/或XR系統100的姿態（位置和方向）。亦可以執行6DOF映射來更新SLAM映射。在一些情況下，使用6DOF SLAM維護的SLAM地圖可以包含從兩個或更多個圖像三角量測的3D特徵點。例如，可以從輸入圖像或視訊串流中選擇關鍵訊框來表示觀察到的場景。對於每個關鍵訊框，可以決定與圖像相關聯的相應的6DOF相機姿態。圖像感測器102及/或XR系統100的姿態可以經由將來自3D SLAM地圖的特徵投影到圖像或視訊訊框中並根據經驗證的2D-3D對應關係更新相機姿態來決定。In some cases, 6DOF SLAM (eg, 6DOF tracking) may associate features observed from certain input images from image sensor 102 (and/or other cameras) with a SLAM map. For example, 6DOF SLAM can use feature point associations from an input image to determine the pose (position and orientation) of the image sensor 102 and/or the XR system 100 of the input image. It is also possible to perform 6DOF mapping to update the SLAM map. In some cases, a SLAM map maintained using 6DOF SLAM can contain 3D feature points triangulated from two or more images. For example, keyframes may be selected from an input image or video stream to represent an observed scene. For each keyframe, the corresponding 6DOF camera pose associated with the image can be determined. The pose of the image sensor 102 and/or the XR system 100 can be determined by projecting features from the 3D SLAM map into an image or video frame and updating the camera pose according to the verified 2D-3D correspondence.

在一個說明性實例中，計算元件110可以從每個輸入圖像或每個關鍵訊框中提取特徵點。本文使用的特徵點（亦稱為配準點）是圖像的獨特或可辨識的部分，諸如手的一部分、桌子的邊緣等等。從擷取的圖像中提取的特徵可以表示沿著三維空間（例如，X、Y和Z軸上的座標）的不同特徵點，並且每個特徵點可以具有相關聯的特徵位置。關鍵訊框中的特徵點匹配（相同或對應於）或者不匹配先前擷取的輸入圖像或關鍵訊框的特徵點。特徵偵測可用於偵測特徵點。特徵偵測可以包括用於檢查圖像的一或多個圖元以決定特定圖元處是否存在特徵的圖像處理操作。特徵偵測可用於處理整個擷取的圖像或圖像的某些部分。對於每個圖像或關鍵訊框，一旦偵測到特徵，就可以提取該特徵周圍的局部圖像塊（image patch）。可以使用任何合適的技術來提取特徵，諸如尺度不變特徵變換（SIFT）（其定位特徵並產生其描述）、加速穩健特徵（SURF）、梯度位置-方位長條圖（GLOH）、正規化互相關（NCC）或其他合適的技術。In an illustrative example, computing element 110 may extract feature points from each input image or each keyframe. As used herein, a feature point (also called a registration point) is a unique or identifiable part of an image, such as a part of a hand, the edge of a table, and so on. Features extracted from the captured image may represent different feature points along a three-dimensional space (eg, coordinates on X, Y, and Z axes), and each feature point may have an associated feature location. The feature points in the keyframe match (same or correspond to) or do not match the feature points of the previously captured input image or keyframe. Feature detection can be used to detect feature points. Feature detection may include image processing operations for examining one or more primitives of an image to determine whether a feature is present at a particular primitive. Feature detection can be used to process the whole captured image or some parts of the image. For each image or keyframe, once a feature is detected, local image patches around the feature can be extracted. Features can be extracted using any suitable technique, such as Scale Invariant Feature Transform (SIFT) (which locates features and produces their descriptions), Speeded Up Robust Features (SURF), Gradient Location-Orientation Histogram (GLOH), Normalized Interaction Correlation (NCC) or other suitable techniques.

在一些情況下，XR系統100亦可以追蹤使用者的手及/或手指，以允許使用者與虛擬環境中的虛擬內容（例如，虛擬私人空間中顯示的虛擬內容）互動及/或控制虛擬環境中的虛擬內容。例如，XR系統100可以追蹤使用者的手及/或指尖的姿態及/或移動，以辨識或翻譯與虛擬環境的使用者互動。使用者互動可以包括，例如但不限於，移動虛擬內容的專案、調整虛擬內容的專案及/或虛擬私人空間的位置的大小、選擇虛擬使用者介面中的輸入介面元素（例如，行動電話的虛擬表示、虛擬鍵盤及/或其他虛擬介面）、經由虛擬使用者介面提供輸入等。In some cases, the XR system 100 may also track the user's hands and/or fingers to allow the user to interact with virtual content in the virtual environment (e.g., virtual content displayed in a virtual private space) and/or control the virtual environment virtual content in . For example, the XR system 100 may track gestures and/or movements of a user's hand and/or fingertips to recognize or translate user interactions with the virtual environment. User interaction may include, for example and without limitation, moving an item of virtual content, resizing an item of virtual content and/or the location of a virtual private space, selecting an input interface element in a virtual user interface (e.g., a virtual display, virtual keyboard and/or other virtual interface), providing input via a virtual user interface, etc.

圖2是圖示可用於追蹤手200的位置以及手200與虛擬環境（諸如本文描述的虛擬私人空間內顯示的虛擬內容）的互動的手200的示例性標誌點的圖。圖2中圖示的標誌點對應於手200的不同部位，包括手200的手掌上的標誌點235、手200的拇指230上的標誌點、手200的食指232上的標誌點、手200的中指234上的標誌點、手200的無名指236上的標誌點以及手200的小指238上的標誌點。手200的手掌可以在三個平移方向上移動（例如，相對於諸如圖像平面的平面在X、Y和Z方向上量測）和在三個旋轉方向上移動（例如，相對於平面在偏轉、俯仰和滾動上量測），並且因此提供可以用於配準及/或追蹤的六個自由度（6DOF）。手掌的6DOF移動在圖2中示為正方形，如圖例240所示。2 is a diagram illustrating exemplary landmarks of the hand 200 that may be used to track the position of the hand 200 and the interaction of the hand 200 with a virtual environment, such as virtual content displayed within a virtual private space described herein. The landmarks illustrated in FIG. 2 correspond to different parts of the hand 200, including landmarks 235 on the palm of the hand 200, landmarks on the thumb 230 of the hand 200, landmarks on the index finger 232 of the hand 200, A marker point on the middle finger 234 , a marker point on the ring finger 236 of the hand 200 , and a marker point on the little finger 238 of the hand 200 . The palm of hand 200 can move in three translational directions (e.g., measured in X, Y, and Z directions relative to a plane such as the image plane) and in three rotational directions (e.g., in yaw relative to a plane , pitch and roll measurements), and thus provide six degrees of freedom (6DOF) that can be used for registration and/or tracking. The 6DOF movement of the palm is shown as a square in FIG. 2 , as indicated by legend 240 .

手200的手指的不同關節允許不同程度的運動，如圖例240所示。如圖2中的菱形（例如菱形233）所示，每個手指的根部（對應於近端指骨和掌骨之間的掌指關節（MCP））具有對應於彎曲和伸展以及外展和內收的兩個自由度（2DOF）。如圖2中的圓形（例如圓231）所示，每個手指的每個上關節（對應於遠端、中間和近端指骨之間的指間關節）具有對應於彎曲和伸展的一個自由度（1DOF）。結果，手200提供了26個自由度（26DOF），從中追蹤手200以及手200與XR系統100所渲染的虛擬內容的互動。Different joints of the fingers of hand 200 allow for different degrees of motion, as shown in legend 240 . As shown by the rhombus in Figure 2 (such as rhombus 233), the base of each finger (corresponding to the metacarpophalangeal (MCP) joint between the proximal phalanx and the metacarpal) has a Two degrees of freedom (2DOF). Each upper joint of each finger (corresponding to the interphalangeal joints between the distal, middle and proximal phalanxes) has a freedom corresponding to flexion and extension, as indicated by the circles in FIG. degrees (1DOF). As a result, the hand 200 provides 26 degrees of freedom (26 DOF) from which to track the hand 200 and its interaction with the virtual content rendered by the XR system 100 .

XR系統100可以使用手200上的標誌點中的一或多個來追蹤手200（例如，追蹤手200的姿態及/或移動）並追蹤與XR系統100所渲染的虛擬環境的互動。如前述，作為偵測手200上的一或多個標誌點的結果，可以建立標誌（以及手和手指）相對於XR系統100的相對實體位置的姿態。例如，可以在圖像中偵測手200的手掌上的標誌點（例如，標誌點235），並且可以決定標誌點相對於XR系統100的圖像感測器102的位置。由XR系統100渲染的虛擬內容項的點（例如，中心點，諸如質心或其他中心點）可以被翻譯成XR系統100的顯示器（例如，圖1的顯示器109）上相對於為手200的手掌上的標誌點決定的地點的位置（或顯示器上的渲染）。XR system 100 may use one or more of the landmark points on hand 200 to track hand 200 (eg, track the pose and/or movement of hand 200 ) and track interactions with the virtual environment rendered by XR system 100 . As before, as a result of detecting one or more landmark points on hand 200 , a pose of the relative physical position of the landmark (and hand and fingers) with respect to XR system 100 may be established. For example, a landmark (eg, landmark 235 ) on the palm of hand 200 may be detected in the image, and the location of the landmark relative to image sensor 102 of XR system 100 may be determined. A point (e.g., a center point, such as a centroid or other center point) of a virtual content item rendered by XR system 100 may be translated into a position on a display (e.g., display 109 of FIG. 1 ) of XR system 100 relative to hand 200 The position of the place (or rendering on the monitor) is determined by the marker point on the palm.

如下所述，XR系統100亦可以將虛擬內容及/或手200配準到現實世界中的點（如在一或多個圖像中偵測到的）及/或使用者的其他部位。例如，在一些實施方式中，除了決定手200相對於XR系統100（或XR系統100）及/或虛擬內容項的實體姿態之外，XR系統100亦可以決定其他標誌的地點，其他標誌諸如牆壁上的獨特點（稱為特徵點）、物件的一或多個角、地板上的特徵、人臉上的點、附近設備上的點等等。在一些情況下，XR系統100可以將虛擬內容放置在相對於環境中偵測到的特徵點的特定位置內，該等特徵點可以對應於例如環境中偵測到的物件及/或人。As described below, XR system 100 may also register virtual content and/or hand 200 to points in the real world (as detected in one or more images) and/or other parts of the user. For example, in some embodiments, in addition to determining the physical pose of hand 200 relative to XR system 100 (or XR system 100 ) and/or the virtual content item, XR system 100 may also determine the location of other landmarks, such as walls A unique point on a surface (called a feature point), one or more corners of an object, a feature on a floor, a point on a human face, a point on a nearby device, and so on. In some cases, XR system 100 may place virtual content within specific locations relative to detected feature points in the environment, which may correspond to, for example, objects and/or people detected in the environment.

在一些實例中，可以使用例如來自圖像感測器102的圖像資料及/或來自一或多個感測器的量測來決定XR系統100（及/或使用者的頭部）的姿態，該一或多個感測器諸如加速度計104、陀螺儀106及/或一或多個其他感測器（例如，一或多個磁力計、一或多個慣性量測單元（IMU）等）。頭部姿態可用於決定虛擬內容、手200及/或物件及/或人在環境中的位置。In some examples, the pose of XR system 100 (and/or the user's head) may be determined using, for example, image data from image sensor 102 and/or measurements from one or more sensors. , the one or more sensors such as accelerometer 104, gyroscope 106, and/or one or more other sensors (eg, one or more magnetometers, one or more inertial measurement units (IMUs), etc. ). Head pose can be used to determine the position of virtual content, hands 200 and/or objects and/or people in the environment.

XR引擎120、輸入選項引擎122、上下文管理引擎123、圖像處理引擎124和渲染引擎126（以及任何圖像處理引擎）的操作可以由計算元件110中的任何一個來實現。在一個說明性實例中，渲染引擎126的操作可以由GPU 114實現，並且XR引擎120、輸入選項引擎122、上下文管理引擎123和圖像處理引擎124的操作可以由CPU 112、DSP 116及/或ISP 118實現。在一些情況下，計算元件110可以包括其他電子電路或硬體、電腦軟體、韌體或其任何組合，以執行本文描述的各種操作中的任何操作。The operations of XR engine 120 , input options engine 122 , context management engine 123 , image processing engine 124 , and rendering engine 126 (and any image processing engines) may be implemented by any of computing elements 110 . In one illustrative example, operations of rendering engine 126 may be implemented by GPU 114, and operations of XR engine 120, input options engine 122, context management engine 123, and image processing engine 124 may be implemented by CPU 112, DSP 116, and/or ISP 118 implementation. In some cases, computing element 110 may include other electronic circuitry or hardware, computer software, firmware, or any combination thereof, to perform any of the various operations described herein.

在一些實例中，XR引擎120可以執行XR操作以基於來自圖像感測器102、加速度計104、陀螺儀106及/或XR系統100上的一或多個感測器（諸如一或多個IMU、雷達等）的資料來產生XR體驗。在一些實例中，XR引擎120可以執行追蹤、定位、姿態估計、映射、內容錨定操作及/或任何其他XR操作/功能。XR體驗可以包括在虛擬通信期使用XR系統100來向使用者呈現XR內容（例如，虛擬實境內容、增強現實內容、混合現實內容等）。在一些實例中，XR內容和體驗可以由XR系統100經由XR應用程式（例如，由XR引擎120執行或實現）來提供，該XR應用程式提供諸如例如XR遊戲體驗、XR課堂體驗、XR購物體驗、XR娛樂體驗、XR活動（例如，操作、故障排除活動等）等等的特定XR體驗。在XR體驗期間，使用者可以使用XR系統100觀看虛擬內容及/或與虛擬內容互動。在一些情況下，使用者可以觀看虛擬內容及/或與虛擬內容互動，同時亦能夠觀看使用者周圍的實體環境及/或與使用者周圍的實體環境互動，從而允許使用者在實體環境和與實體環境混合或整合的虛擬內容之間具有沉浸式體驗。In some examples, XR engine 120 may perform XR operations based on input from image sensor 102, accelerometer 104, gyroscope 106, and/or one or more sensors on XR system 100, such as one or more IMU, radar, etc.) to generate XR experience. In some examples, XR engine 120 may perform tracking, localization, pose estimation, mapping, content anchoring operations, and/or any other XR operations/functions. The XR experience may include using the XR system 100 to present XR content (eg, virtual reality content, augmented reality content, mixed reality content, etc.) to a user during a virtual communication session. In some examples, XR content and experiences can be provided by the XR system 100 via XR applications (e.g., executed or implemented by the XR engine 120) that provide experiences such as, for example, an XR gaming experience, an XR classroom experience, an XR shopping experience , XR entertainment experiences, XR activities (eg, operations, troubleshooting activities, etc.), etc. During an XR experience, a user may use the XR system 100 to view and/or interact with virtual content. In some cases, a user may be able to view and/or interact with virtual content while also being able to view and/or interact with the physical environment surrounding the user, thereby allowing the user to interact with and interact with the physical environment and Immersive experience between virtual content mixed or integrated with physical environment.

XR引擎120、輸入選項引擎122和上下文管理引擎123可以執行各種操作來決定（和管理）如何、在何處及/或何時相對於一或多個其他設備渲染某些虛擬內容。例如，XR引擎120、輸入選項引擎122和上下文管理引擎123可以促進與其他設備的互動，諸如例如連接的設備（例如，網路連接的相機、揚聲器、燈泡、集線器、鎖、插頭、恒溫器、顯示器、警報系統、電視（TV）、小工具、電器等）、行動設備、缺少向外/外部控制的設備、缺少顯示器及/或使用者介面的設備、具有對希望與該等設備互動的使用者提出一或多個挑戰的某些特性的設備（例如，具有使用者不理解的語言的控制/介面的設備、使用者不辨識/理解的控制/介面、有限的可存取性選項、使用者不易辨識/理解的輸入選項等）、具有使用者無法觸及的控制/介面的設備及/或任何其他設備。例如，輸入選項引擎122可以獲得或接收指示或辨識另一設備的一或多個輸入選項的輸入資料。輸入選項引擎122可以向XR引擎120發送輸入資料。上下文管理引擎123可以獲得或接收與其他設備（使用者可以使用XR系統100與之互動）、XR系統及/或其他設備所處的場景或環境、嘗試與其他設備互動的XR系統的使用者，及/或其他上下文相關的資訊（例如，上下文資訊等）。上下文管理引擎123可以向XR引擎120發送上下文資訊。XR engine 120, input options engine 122, and context management engine 123 may perform various operations to determine (and manage) how, where, and/or when certain virtual content is rendered relative to one or more other devices. For example, XR engine 120, input options engine 122, and context management engine 123 can facilitate interaction with other devices, such as, for example, connected devices (e.g., network-connected cameras, speakers, light bulbs, hubs, locks, plugs, thermostats, monitors, alarm systems, televisions (TVs), gadgets, appliances, etc.), mobile devices, devices lacking outward/external controls, devices lacking a display and/or user interface, and applications that wish to interact with such devices devices with certain characteristics that present one or more challenges (e.g., devices with controls/interface in a language the user does not understand, controls/interface that the user does not recognize/understand, limited accessibility options, use not easily identifiable/understandable input options, etc.), devices with controls/interfaces that are not accessible to the user, and/or any other device. For example, input options engine 122 may obtain or receive input data indicating or identifying one or more input options for another device. Input options engine 122 may send input data to XR engine 120 . The context management engine 123 may obtain or receive information about other devices (with which the user can interact with the XR system 100 ), the scene or environment where the XR system and/or other devices are located, or the user of the XR system trying to interact with other devices, and/or other contextually relevant information (eg, contextual information, etc.). The context management engine 123 can send context information to the XR engine 120 .

使用指示設備的一或多個輸入選項的輸入資料及/或使用上下文資訊，XR引擎120可以使得渲染引擎126向使用者呈現相關資訊。例如，XR引擎120可以使得渲染引擎126輸出與已經決定了相關上下文資訊的輸入選項相對應的指導資料。指導資料可以通知使用者什麼輸入選項可用於其他設備、如何提供此種輸入等。在一些實例中，在給定當前上下文的情況下（例如，鑒於與XR系統100、場景、其他設備及/或與XR系統100相關聯的使用者相關的資訊），指導資料可以過濾掉可能不太相關或不可用的輸入選項（及/或相關聯的資訊）。在一些實例中，使用指示用於設備的一或多個輸入選項的輸入資料及/或使用上下文資訊，XR引擎120可以使得渲染引擎126呈現用於與設備互動（例如，控制、存取內容、存取狀態資訊、存取輸出等）的使用者介面及/或輸入選項。Using input data from one or more input options of a pointing device and/or usage context information, XR engine 120 may cause rendering engine 126 to present relevant information to a user. For example, XR engine 120 may cause rendering engine 126 to output guidance material corresponding to input options for which relevant contextual information has been determined. Instructional material may inform the user what input options are available for other devices, how to provide such input, and so on. In some examples, guidance material may filter out information that may not be relevant given the current context (eg, in view of information related to the XR system 100 , the scene, other equipment, and/or the user associated with the XR system 100 ). Input options (and/or associated information) that are too relevant or unavailable. In some examples, using input data and/or usage context information indicating one or more input options for the device, XR engine 120 may cause rendering engine 126 to render for interacting with the device (e.g., control, access content, access status information, access output, etc.) user interface and/or input options.

例如，基於上下文資訊和辨識與其他設備相關聯的輸入選項的輸入資料，XR引擎120可以使得渲染引擎126呈現與使得使用者能夠與其他設備互動的一或多個輸入選項相對應的使用者互動資料。例如，使用者互動資料可以包括與輸入選項相關聯的一或多個使用者介面元素（例如，可選擇的控制選項等）、指示如何提供與輸入選項相關聯的輸入的提示（例如突出顯示、箭頭、文字等）及/或其他資料。上下文資訊可以使XR引擎120能夠呈現（例如，虛擬內容、音訊內容、使用者介面內容等）在給定與使用者、XR系統100、其他設備、場景相關聯的情形/上下文的情況下是上下文適當的內容，及/或以其他方式促進與其他設備的互動的內容。在一些情況下，XR引擎120可以使得渲染引擎126不呈現內容，或者基於上下文減少或過濾要顯示的虛擬內容的數量（例如，顯示使用者介面選項的子集）。For example, based on contextual information and input data identifying input options associated with other devices, XR engine 120 may cause rendering engine 126 to present user interactions corresponding to one or more input options that enable the user to interact with other devices material. For example, user interaction data may include one or more user interface elements associated with an input option (e.g., selectable controls, etc.), prompts indicating how to provide input associated with an input option (e.g., highlighting, arrows, text, etc.) and/or other material. Context information may enable XR engine 120 to render (e.g., virtual content, audio content, user interface content, etc.) contextual given the situation/context associated with the user, XR system 100, other device, scene Appropriate content, and/or content that otherwise facilitates interaction with other devices. In some cases, XR engine 120 may cause rendering engine 126 to render no content, or reduce or filter the amount of virtual content to be displayed based on context (eg, display a subset of user interface options).

在一些情況下，XR引擎120可以利用其XR能力來促進與其他設備的互動。例如，XR系統100可以具有AR能力，諸如在XR系統100的顯示器上顯示虛擬內容的能力，同時亦允許使用者經由顯示器觀看現實世界環境。XR引擎120可以利用AR能力來為使用者渲染使用者介面，以直接或經由向XR引擎120提供輸入（例如，使用輸入設備108）來與其他設備互動。例如，渲染引擎126可以在顯示器上渲染使用者介面，使得使用者介面在使用者看來是其他設備或其他設備的一部分（例如，控制項、表面、顯示器、面板等）上的覆加，以促進與其他設備的使用者互動。XR引擎120（或XR系統100的其他元件）可以使用與覆加的使用者介面的互動來基於使用者輸入及/或來自其他設備的存取資料/輸出來控制其他設備。在一些情況下，為了促進及/或改良與其他設備的使用者互動，XR系統100可以定位及/或映射其他設備。例如，XR系統100可以定位其他設備並使用定位資訊來渲染覆加在該其他設備或該其他設備的一部分上的使用者介面。In some cases, XR engine 120 may utilize its XR capabilities to facilitate interaction with other devices. For example, the XR system 100 may have AR capabilities, such as the ability to display virtual content on the display of the XR system 100, while also allowing the user to view the real world environment via the display. XR engine 120 may utilize AR capabilities to render a user interface for a user to interact with other devices directly or by providing input to XR engine 120 (eg, using input device 108 ). For example, rendering engine 126 may render the user interface on a display such that the user interface appears to the user as an overlay on other devices or parts of other devices (e.g., controls, surfaces, displays, panels, etc.) to Facilitate interaction with users of other devices. XR engine 120 (or other elements of XR system 100 ) may use interaction with an overlay user interface to control other devices based on user input and/or accessing data/output from other devices. In some cases, the XR system 100 may locate and/or map other devices in order to facilitate and/or improve user interaction with the other devices. For example, XR system 100 may locate other devices and use the positioning information to render a user interface overlaid on the other device or a portion of the other device.

XR系統100可以將虛擬內容項配準或錨定到場景中偵測到的特徵點（例如，相對於其定位）。例如，輸入選項引擎122、上下文管理引擎123及/或圖像處理引擎124可以與XR引擎120及/或渲染引擎126協調，以將使用者介面的虛擬內容錨定到將顯示虛擬內容的表面的特徵點。The XR system 100 may register or anchor the virtual content item to (eg, relative to) the detected feature points in the scene. For example, input options engine 122, context management engine 123, and/or image processing engine 124 may coordinate with XR engine 120 and/or rendering engine 126 to anchor the virtual content of the user interface to the surface on which the virtual content will be displayed. Feature points.

在一些實例中，XR系統100可以與一或多個其他設備通訊，以基於與渲染引擎126所渲染的使用者介面的使用者互動來向其他設備提供輸入/命令。例如，XR引擎120可以使得XR系統100基於經由渲染的使用者介面及/或輸入選項接收的使用者輸入（使用傳輸器或收發器）向設備發送命令。該命令可以使設備基於使用者輸入執行一或多個功能。In some examples, XR system 100 may communicate with one or more other devices to provide input/commands to the other devices based on user interaction with the user interface rendered by rendering engine 126 . For example, XR engine 120 may cause XR system 100 to send commands to the device based on user input (using a transmitter or transceiver) received via a rendered user interface and/or input options. The command may cause the device to perform one or more functions based on user input.

在其他實例中，XR引擎120可以利用一或多個互動模式（例如，視覺、語音/音訊、基於手勢、基於運動等）以促進與其他設備的使用者互動。例如，XR引擎120可以使用手追蹤及/或手勢辨識能力來允許使用者使用手勢和其他互動來與其他設備互動（例如，控制、存取等）。作為另一實例，XR引擎120可以使用語音辨識來允許使用者使用語音命令來與其他設備互動。In other examples, XR engine 120 may utilize one or more modes of interaction (eg, visual, voice/audio, gesture-based, motion-based, etc.) to facilitate user interaction with other devices. For example, XR engine 120 may use hand tracking and/or gesture recognition capabilities to allow a user to interact (eg, control, access, etc.) with other devices using gestures and other interactions. As another example, XR engine 120 may use voice recognition to allow a user to interact with other devices using voice commands.

如前述，輸入資料可以指示可用於與其他設備互動的一些或所有輸入選項。例如，輸入選項可以包括設備支援的輸入類型（例如，基於手勢、基於語音、基於觸摸等）、設備可以基於特定輸入執行的功能（例如，向右滑動可以使恒溫器提高溫度等），以及其他資訊。As previously mentioned, input data may indicate some or all of the input options available for interacting with other devices. For example, input options may include the types of input supported by the device (e.g., gesture-based, voice-based, touch-based, etc.), the functions the device can perform based on a particular input (e.g., swiping to the right causes the thermostat to increase the temperature, etc.), and other Information.

在一些實例中，輸入選項引擎122可以從其他設備、從伺服器（例如，涉及其他設備的操作的基於雲端的伺服器），及/或從另一源獲得或接收指示其他設備的一或多個輸入選項的輸入資料。例如，輸入選項引擎122可以發送（或使傳輸器或收發器發送）對設備的輸入選項的請求。在一個實例中，XR系統100可以偵測或感測該設備，諸如基於與該設備的先前網路配對、基於由該設備傳輸（例如，廣播）的指示其存在的週期性信標信號、基於在由圖像感測器102提供的一或多個圖像中偵測該設備、等等。回應於偵測或感測到設備，XR系統100可以從設備請求輸入資料。回應於該請求，該設備可以用指示與該設備相關聯的任何輸入選項的輸入資料來回應。在另一實例中，輸入選項引擎122可以從與給定設備相關聯的伺服器（例如，與Google Home ^TM設備相關聯的Google ^TM伺服器）請求該設備的輸入選項。伺服器可以用指示與設備相關聯的輸入選項的輸入資料來回應。 In some examples, input options engine 122 may obtain or receive one or more information indicating the other device from the other device, from a server (eg, a cloud-based server related to the operation of the other device), and/or from another source. Input data for each input option. For example, the input options engine 122 may send (or cause a transmitter or transceiver to send) a request for input options for a device. In one example, the XR system 100 may detect or sense the device, such as based on previous network pairing with the device, based on periodic beacon signals transmitted (e.g., broadcast) by the device indicating its presence, based on The device is detected in one or more images provided by image sensor 102, and so on. In response to detecting or sensing a device, the XR system 100 may request input data from the device. In response to the request, the device may respond with input data indicating any input options associated with the device. In another example, the input options engine 122 may request input options for a given device from a server associated with the device (eg, a Google ^™ server associated with a Google Home ^™ device). The server may respond with input data indicating input options associated with the device.

在一些情況下，輸入選項引擎122可以經由處理由圖像感測器102擷取的其他設備的一或多個圖像來決定其他設備的一或多個輸入選項。例如，使用電梯控制台作為設備（電梯）的使用者介面的說明性實例，輸入選項引擎122可以從圖像感測器102接收電梯控制台的圖像。使用機器學習（例如，使用一或多個基於神經網路的物件偵測器或分類器）、電腦視覺（例如，使用基於電腦視覺的物件偵測器或分類器）或其他圖像分析技術，輸入選項引擎122可以決定控制台包括對應於建築物樓層的十五個數值、開門按鈕、關門按鈕、緊急按鈕及/或其他實體或虛擬按鈕。In some cases, the input option engine 122 may determine one or more input options of other devices by processing one or more images of other devices captured by the image sensor 102 . For example, using an elevator console as an illustrative example of a user interface for a device (an elevator), input options engine 122 may receive an image of an elevator console from image sensor 102 . using machine learning (for example, using one or more neural network-based object detectors or classifiers), computer vision (for example, using computer vision-based object detectors or classifiers), or other image analysis techniques, The input options engine 122 may determine that the console includes fifteen numerical values corresponding to building floors, a door open button, a door close button, a panic button, and/or other physical or virtual buttons.

在一些實例中，若設備不向XR系統傳送任何輸入資料（例如，在XR系統向設備發送請求之後），則XR系統可以決定或推斷該設備不具有與XR系統（例如，經由無線網路）通訊的能力。在此種實例中，XR系統可以呈現指令或其他資訊（例如，突出顯示設備上的特定按鈕）來幫助使用者決定如何與設備互動。例如，在上述電梯實例中，電梯可能不具有（例如，經由通訊網路）與XR系統通訊的能力。在此種實例中，XR系統可以呈現幫助使用者與電梯互動的虛擬內容，而不是基於接收到的使用者輸入發送一或多個命令來控制電梯。In some instances, if the device does not transmit any input data to the XR system (e.g., after the XR system sends a request to the device), the XR system may determine or infer that the device does not have a connection with the XR system (e.g., via a wireless network). ability to communicate. In such instances, the XR system can present instructions or other information (eg, highlighting a particular button on the device) to help the user decide how to interact with the device. For example, in the elevator example above, the elevator may not have the capability to communicate (eg, via a communication network) with the XR system. In such an example, the XR system may present virtual content that assists the user in interacting with the elevator, rather than sending one or more commands to control the elevator based on received user input.

上下文管理引擎123可以向XR引擎120發送上下文資訊。上下文資訊向XR引擎120提供與使用者、XR系統100、（多個）其他設備、場景等相關聯的情形/上下文的上下文感知。XR引擎120可以使用上下文資訊來管理、調制及/或決定呈現給使用者以與其他設備互動的內容、輸入選項、使用者介面及/或模態。例如，如前述，XR引擎120可以使用上下文資訊來呈現內容（例如，虛擬內容、音訊內容、使用者介面內容等），該內容和與使用者、XR系統100、其他設備、場景相關聯的情形/上下文是上下文相關的，及/或以其他方式促進與其他設備的互動。The context management engine 123 can send context information to the XR engine 120 . The contextual information provides the XR engine 120 with a contextual awareness of the situation/context associated with the user, the XR system 100 , other device(s), the scene, and the like. The XR engine 120 may use the contextual information to manage, modulate and/or determine the content, input options, user interface and/or modalities presented to the user to interact with other devices. For example, as previously described, the XR engine 120 may use contextual information to render content (e.g., virtual content, audio content, user interface content, etc.) that is associated with the user, the XR system 100, other devices, the scene /Context is context-sensitive and/or otherwise facilitates interaction with other devices.

上下文資訊可以與其他設備（使用者可以使用XR系統100與之互動）、XR系統100及/或其他設備所處的場景或環境、嘗試與其他設備互動的XR系統100的使用者，及/或任何給定的時間點的其他上下文相關。在一些實例中，上下文資訊可以包括使用者與其他設備的預期互動，諸如與該設備互動的預期、與該設備的使用者介面的特定輸入選項（例如，特定使用者介面控制元素）互動的預期，及/或其他預期的使用者互動。上下文管理引擎123可以估計預期的使用者互動，諸如基於眼睛凝視、正在執行的特定姿勢、使用者手持其他設備、使用者走向該設備及/或其他資訊。例如，上下文管理引擎123可以決定使用者正在凝視恒溫器，並且基於所決定的凝視，決定使用者打算與恒溫器互動。在另一實例中，上下文資訊可以包括使用者在場景中的一或多個動作。例如，一或多個動作可以包括使用者走向設備、走向場景中的門、坐在使用者通常觀看電視的椅子上等等。上下文資訊的其他實例包括與使用者相關聯的特性（例如，視覺品質、（多種）出聲語言等）、與使用者和其他設備相關聯的歷史資訊（例如，使用者對該設備的過去使用、使用者對該設備或類似設備的體驗水平等）、其他設備的使用者介面能力（例如，其是否具有對使用者可見或以其他方式可存取的向外/外部控制）、與其他設備相關聯的資訊（例如，其他設備離XR系統100有多遠）、與場景相關聯的資訊（例如，照明、諸如環境聲音的雜訊水平、XR系統100和設備之間的物件或其他障礙物、場景中是否有任何其他使用者等），及/或其他資訊。Contextual information may relate to other devices with which a user may interact using the XR system 100 , the scene or environment in which the XR system 100 and/or other devices are located, users of the XR system 100 attempting to interact with other devices, and/or Other contextually relevant at any given point in time. In some instances, contextual information may include the user's expected interactions with other devices, such as expectations to interact with the device, expectations to interact with specific input options (e.g., specific UI control elements) of the device's user interface , and/or other expected user interactions. The context management engine 123 may estimate expected user interaction, such as based on eye gaze, specific gestures being performed, user holding other devices, user walking toward the device, and/or other information. For example, the context management engine 123 may determine that the user is gazing at a thermostat, and based on the determined gaze, determine that the user intends to interact with the thermostat. In another example, the context information may include one or more actions of the user in the scene. For example, one or more actions may include the user walking towards a device, walking towards a door in a scene, sitting in a chair where the user normally watches television, and the like. Other examples of contextual information include characteristics associated with the user (e.g., visual quality, spoken language(s), etc.), historical information associated with the user and other devices (e.g., the user's past use of the device , the user's level of experience with the device or similar devices, etc.), the user interface capabilities of other devices (e.g., whether it has outward/external controls that are visible or otherwise accessible to the user), interactions with other devices Associated information (e.g., how far other devices are from the XR system 100), information associated with the scene (e.g., lighting, noise levels such as ambient sounds, objects or other obstructions between the XR system 100 and the device , whether there are any other users in the scene, etc.), and/or other information.

在一些實例中，上下文管理引擎123可以在本端決定、獲得或接收上下文資訊。例如，上下文管理引擎123可以從XR系統100的一或多個感測器（例如，圖像感測器102、加速度計104、陀螺儀106及/或XR系統100的其他感測器）獲得感測器資訊。上下文管理引擎123可以處理感測器資訊以決定上下文資訊，諸如使用者與其他設備的一或多個有意的互動、使用者在場景中的一或多個動作、其他設備的使用者介面能力、與其他設備相關聯的資訊（例如，設備與XR系統100以及使用者的距離）、與場景相關聯的資訊（例如，照明、雜訊、XR系統100和設備之間的物件或障礙物等），及/或其他上下文資訊。在一個說明性實例中，上下文管理引擎123可以從圖像感測器102接收指示使用者正看著烤箱的圖像，並且亦可以從加速度計104及/或陀螺儀106接收指示使用者正走向烤箱的感測器資料。基於圖像和感測器資料，上下文管理引擎123可以決定使用者打算與烤箱互動。In some examples, the context management engine 123 can determine, obtain or receive context information locally. For example, context management engine 123 may obtain senses from one or more sensors of XR system 100 (eg, image sensor 102, accelerometer 104, gyroscope 106, and/or other sensors of XR system 100). Meter information. The context management engine 123 may process sensor information to determine contextual information, such as one or more intentional interactions of the user with other devices, one or more actions of the user in the scene, user interface capabilities of other devices, Information associated with other devices (e.g., the distance between the device and the XR system 100 and the user), information associated with the scene (e.g., lighting, noise, objects or obstacles between the XR system 100 and the device, etc.) , and/or other contextual information. In one illustrative example, context management engine 123 may receive an image from image sensor 102 indicating that the user is looking at an oven, and may also receive an image from accelerometer 104 and/or gyroscope 106 indicating that the user is walking toward an oven. Sensor data for the oven. Based on the image and sensor data, the context management engine 123 can determine that the user intends to interact with the oven.

在一些實例中，上下文管理引擎123可以從一或多個遠端源（例如，伺服器、雲端、網際網路、其他設備、一或多個感測器等）獲得或接收上下文資訊。例如，上下文管理引擎123可以存取儲存在與使用者相關聯的基於網路或基於雲端的系統上的使用者簡介，該簡介指示與使用者相關聯的特性（例如，使用者的視覺品質、使用者說的語言等）及/或與使用者和其他設備相關聯的歷史資訊（例如，使用者對一或多個設備的體驗水平，使用者擁有一或多個設備多長時間，等等）。在一些情況下，使用者簡介可以本端儲存在XR系統100上。In some examples, the context management engine 123 may obtain or receive context information from one or more remote sources (eg, server, cloud, Internet, other devices, one or more sensors, etc.). For example, the context management engine 123 may access a user profile stored on a web-based or cloud-based system associated with the user, the profile indicating characteristics associated with the user (e.g., the user's visual quality, language spoken by the user, etc.) and/or historical information associated with the user and other devices (e.g., the user's level of experience with one or more devices, how long the user has owned one or more devices, etc. ). In some cases, the user profile may be stored locally on the XR system 100 .

如前述，上下文資訊向XR引擎120提供上下文感知，使得XR系統100可以呈現與XR系統100正被使用的特定情形上下文相關的內容。例如，當決定向使用者呈現何者AR內容以指導及/或幫助與其他設備的使用者互動及/或如何呈現AR內容時，XR引擎120可以使用包括使用者、其他設備等的特性的上下文資訊。在一個實例中，XR引擎120可以考慮使用者理解/說出的語言，以確保所呈現的AR內容是使用者理解/說出的語言。作為另一實例，XR引擎120可以將使用者介面指導定製為關於使用者對設備的知識的期望。在一個說明性實例中，若使用者被估計為使用設備的新手（例如，基於使用者擁有該設備的時間量、使用者使用該設備的次數及/或基於其他因素），則XR引擎120可以為使用者輸出附加支援（例如，經由呈現關於何者輸入可以用於控制該設備的指令）。在另一實例中，若使用者被估計為使用設備的專家（例如，使用者被估計為至少具有閾值量的經驗/熟悉度），則XR引擎120可以不為使用者呈現附加支援，或者可以減少/最小化此種支援。作為另一實例，若使用者穿著可能影響與其他設備的互動的一件或多件物品，則XR引擎120可以根據使用者穿著來調整向使用者輸出何者使用者介面、控制項及/或指導。在一個說明性實例中，若使用者戴著可能妨礙使用者選擇/觸摸控制項的能力的手套，若使用者戴著可能妨礙可視性的眼鏡（或沒有戴眼鏡）或太陽鏡，若使用者具有限制使用者的移動和與某些控制項互動的能力的醫療設備等，則XR引擎120可以調整其提供的使用者介面、控制項及/或指導。As previously described, the context information provides context awareness to the XR engine 120 so that the XR system 100 can present content that is contextually relevant to the particular situation in which the XR system 100 is being used. For example, XR engine 120 may use contextual information including characteristics of the user, other devices, etc., when deciding what AR content to present to the user to guide and/or facilitate user interaction with other devices and/or how to present the AR content . In one example, the XR engine 120 can take into account the language the user understands/speaks to ensure that the presented AR content is in the language the user understands/speaks. As another example, XR engine 120 may tailor user interface guidance to expectations regarding the user's knowledge of the device. In one illustrative example, if the user is estimated to be new to using the device (e.g., based on the amount of time the user has owned the device, the number of times the user has used the device, and/or based on other factors), XR engine 120 may Outputting additional support to the user (eg, by presenting instructions as to what inputs can be used to control the device). In another example, if the user is estimated to be an expert in using the device (e.g., the user is estimated to have at least a threshold amount of experience/familiarity), the XR engine 120 may not present additional support to the user, or may Reduce/minimize such support. As another example, if the user is wearing one or more items that may affect interactions with other devices, the XR engine 120 may adjust which user interface, controls, and/or guidance is output to the user based on what the user is wearing . In one illustrative example, if the user is wearing gloves that may impede the user's ability to select/touch controls, if the user is wearing glasses (or no glasses) or sunglasses that may impede visibility, if the user has For medical devices that restrict the user's movement and ability to interact with certain controls, the XR engine 120 can adjust the user interface, controls and/or guidance it provides.

作為另一實例，XR引擎120可以在決定如何呈現AR內容及/或向使用者呈現什麼AR內容時使用與場景相關聯的上下文資訊（例如，環境因素）。例如，XR引擎120可以考慮照明條件，諸如弱光或強光條件（例如，其可以建議使用音訊提示及/或視覺提示）、環境聲音（例如，其可以建議使用視覺提示而不是音訊提示）、其他人的存在（例如，其可以建議需要謹慎或私密，諸如不提供任何音訊輸出、不呈現包括手或手勢運動的輸入選項等），等等。As another example, XR engine 120 may use contextual information (eg, environmental factors) associated with a scene when deciding how to present AR content and/or what AR content to present to a user. For example, XR engine 120 may take into account lighting conditions, such as low or bright light conditions (e.g., it may suggest audio cues and/or visual cues), ambient sounds (e.g., it may suggest visual cues instead of audio cues), The presence of other people (eg, which may suggest a need for discreetness or privacy, such as not providing any audio output, not presenting input options involving hand or gesture movements, etc.), etc.

在一些實例中，XR引擎120可以決定使用者打算與多個設備中的何者進行互動。在一些情況下，XR引擎120可以使得渲染引擎126呈現使用者介面，以允許使用者與使用者想要與之互動的特定設備進行互動。在一些情況下，XR引擎120可以輸出關於如何與特定設備互動及/或控制特定設備的指導（例如，作為視覺內容或音訊內容）。例如，在具有播放音樂的智慧家庭助理、連接的冰箱和連接的爐子的廚房中，使用者可能對改變爐子的溫度感興趣。基於從上下文管理引擎123接收的上下文資訊，XR引擎120可以決定使用者打算與連接的爐子互動，而不是與智慧家庭助理或連接的冰箱互動。例如，基於上下文資訊，XR引擎120可以決定使用者的眼睛凝視、姿勢及/或移動，並且決定使用者打算與所連接的爐子互動，而不是與智慧家庭助理或所連接的冰箱互動。在一些實例中，XR引擎120可以決定使用者可以執行以與連接的爐子互動的多個可能的手勢。例如，XR引擎120可以決定使用者可以使用旋鈕轉動手勢來改變爐子的溫度。XR引擎120可以提供輸出，（例如，經由視覺提示、音訊提示等）通知使用者可以使用旋鈕轉動手勢來改變爐子的溫度。在一些情況下，XR引擎120可以偵測來自使用者的旋鈕轉動手勢，並將相關聯的輸入傳送到所連接的爐子。在一些情況下，連接的爐子可以直接偵測旋鈕轉動手勢。In some examples, the XR engine 120 may determine which of the plurality of devices the user intends to interact with. In some cases, XR engine 120 may cause rendering engine 126 to present a user interface to allow the user to interact with the particular device with which the user wants to interact. In some cases, XR engine 120 may output instructions (eg, as visual or audio content) on how to interact with and/or control a particular device. For example, in a kitchen with a smart home assistant playing music, a connected refrigerator, and a connected stove, a user may be interested in changing the temperature of the stove. Based on the contextual information received from the context management engine 123, the XR engine 120 may determine that the user intends to interact with the connected stove rather than the smart home assistant or the connected refrigerator. For example, based on contextual information, the XR engine 120 may determine the user's eye gaze, posture, and/or movement, and determine that the user intends to interact with a connected stove rather than a smart home assistant or a connected refrigerator. In some examples, XR engine 120 may determine a number of possible gestures that a user may perform to interact with the connected stove. For example, the XR engine 120 may determine that the user may use a knob turning gesture to change the temperature of the stove. The XR engine 120 may provide an output informing the user (eg, via visual cues, audio cues, etc.) that a knob turning gesture may be used to change the temperature of the furnace. In some cases, XR engine 120 may detect a knob turning gesture from a user and communicate the associated input to the connected stove. In some cases, the connected stove can directly detect the knob turning gesture.

圖像處理引擎124可以執行與正在呈現的虛擬使用者介面內容相關的一或多個圖像處理操作。例如，圖像處理引擎124可以基於來自圖像感測器102的資料執行圖像處理操作。在一些情況下，圖像處理引擎124可以執行圖像處理操作，諸如例如濾波、去馬賽克、縮放、顏色校正、顏色轉換、分割、降雜濾波、空間濾波、偽像校正等。渲染引擎126可以獲得由計算元件110、圖像感測器102、XR引擎120、輸入選項引擎122、上下文管理引擎123及/或圖像處理引擎124產生及/或處理的圖像資料，並且可以渲染視訊及/或圖像訊框以在顯示設備上呈現。The image processing engine 124 may perform one or more image processing operations related to the virtual user interface content being presented. For example, image processing engine 124 may perform image processing operations based on data from image sensor 102 . In some cases, image processing engine 124 may perform image processing operations such as, for example, filtering, demosaicing, scaling, color correction, color conversion, segmentation, denoising filtering, spatial filtering, artifact correction, and the like. Rendering engine 126 may obtain image data generated and/or processed by computing element 110, image sensor 102, XR engine 120, input options engine 122, context management engine 123, and/or image processing engine 124, and may Renders video and/or image frames for presentation on a display device.

儘管XR系統100被示為包括某些元件，但是一般技術者將會理解，XR系統100可以包括比圖1所示更多或更少的元件。例如，在一些情況下，XR系統100亦可以包括一或多個記憶體設備（例如，RAM、ROM、快取記憶體及/或類似物）、一或多個網路介面（例如，有線及/或無線通訊介面和類似物）、一或多個顯示設備及/或圖1中未圖示的其他硬體或處理設備。以下參考圖9描述了可以用XR系統100實現的計算系統和硬體元件的說明性實例。Although XR system 100 is shown as including certain elements, one of ordinary skill will appreciate that XR system 100 may include more or fewer elements than shown in FIG. 1 . For example, in some cases, XR system 100 may also include one or more memory devices (e.g., RAM, ROM, cache, and/or the like), one or more network interfaces (e.g., wired and and/or wireless communication interfaces and the like), one or more display devices, and/or other hardware or processing devices not shown in FIG. 1 . An illustrative example of a computing system and hardware elements that may be implemented with XR system 100 is described below with reference to FIG. 9 .

圖3是圖示由使用者301穿戴的擴展現實系統300的實例的圖。在一些情況下，擴展現實系統300類似於圖1的XR系統100，並且可以執行類似的操作。擴展現實系統300可以包括任何合適類型的XR設備或系統，諸如AR或MR眼鏡、AR、VR或MR HMD或其他XR設備。出於說明的目的，可以使用AR來描述下文描述的一些實例。然而，下文描述的態樣可以應用於其他類型的XR，諸如VR和MR。圖3所示的擴展現實系統300可以包括光學透視AR設備，其允許使用者301在佩戴擴展現實系統300的同時觀看現實世界。FIG. 3 is a diagram illustrating an example of an extended reality system 300 worn by a user 301 . In some cases, extended reality system 300 is similar to XR system 100 of FIG. 1 and may perform similar operations. Extended reality system 300 may include any suitable type of XR device or system, such as AR or MR glasses, AR, VR or MR HMD, or other XR devices. For purposes of illustration, AR may be used to describe some of the examples described below. However, the aspects described below can be applied to other types of XR, such as VR and MR. The extended reality system 300 shown in FIG. 3 may include an optical see-through AR device, which allows the user 301 to watch the real world while wearing the extended reality system 300 .

例如，使用者301可以在離使用者301一定距離的平面304上觀看現實世界環境中的物件303。如圖3所示，擴展現實系統300具有圖像感測器302和顯示器309。如前述，顯示器309可以包括玻璃、螢幕、透鏡，及/或允許使用者301看到現實世界環境並且亦允許AR內容顯示在其上的其他顯示機制。AR內容（例如，圖像、視訊、圖形、虛擬或AR物件或其他AR內容）可以被投影或以其他方式顯示在顯示器309上。在一個實例中，AR內容可以包括物件303的增強版本。在另一實例中，AR內容可以包括與物件303相關及/或與現實世界環境中的一或多個其他物件相關的附加AR內容。儘管在圖3中圖示一個圖像感測器302和一個顯示器309，但是在一些實施方式中，擴展現實系統300可以包括多個相機及/或多個顯示器（例如，用於右眼的顯示器和用於左眼的顯示器）。For example, the user 301 can view the object 303 in the real world environment on the plane 304 at a certain distance from the user 301 . As shown in FIG. 3 , the extended reality system 300 has an image sensor 302 and a display 309 . As previously mentioned, display 309 may include glass, screens, lenses, and/or other display mechanisms that allow user 301 to see the real world environment and also allow AR content to be displayed thereon. AR content (eg, images, videos, graphics, virtual or AR objects, or other AR content) may be projected or otherwise displayed on display 309 . In one example, AR content may include an enhanced version of object 303 . In another example, the AR content may include additional AR content related to object 303 and/or related to one or more other objects in the real world environment. Although one image sensor 302 and one display 309 are illustrated in FIG. 3 , in some implementations, the augmented reality system 300 may include multiple cameras and/or multiple displays (e.g., a display for the right eye). and display for the left eye).

如上參考圖1所述，XR引擎122可以利用指示設備的一或多個輸入選項的輸入資料和上下文資料來使得渲染引擎126呈現用於與設備互動（例如，控制、存取內容、存取狀態資訊、存取輸出等）的使用者介面及/或輸入選項。在一個說明性實例中，XR系統100可以幫助使用者與控制台（諸如電梯控制台、車輛的控制台等）互動。圖4A、圖4B和圖4C是圖示使用者使用XR系統400與電梯的控制台410互動的實例的圖。例如，當佩戴XR系統400的使用者進入電梯時，XR系統400可以決定使用者的眼睛凝視（例如，經由凝視掃瞄，基於眼睛凝視相機等）及/或一或多個手勢，諸如使用使用者的手405執行的手勢。基於眼睛凝視及/或（多個）手勢，XR系統400可以決定使用者不能找到或者很難找到要在電梯控制台410上按壓的樓層號。例如，XR系統400可以決定使用者的眼睛注視正在來回移動（好像使用者正在搜尋正確的數值）。As described above with reference to FIG. 1 , XR engine 122 may utilize input data and context data indicating one or more input options for the device to cause rendering engine 126 to render for interaction with the device (e.g., control, access content, access state information, access to output, etc.) user interface and/or input options. In one illustrative example, XR system 100 may assist a user in interacting with a console (such as an elevator console, a console of a vehicle, etc.). 4A , 4B and 4C are diagrams illustrating an example of a user interacting with a console 410 of an elevator using the XR system 400 . For example, when a user wearing XR system 400 enters an elevator, XR system 400 may determine the user's eye gaze (e.g., via gaze scanning, based on eye gaze camera, etc.) and/or one or more gestures, such as using A gesture performed by the hand 405 of the user. Based on the eye gaze and/or gesture(s), the XR system 400 may determine that the user cannot find or has difficulty finding the floor number to press on the elevator console 410 . For example, the XR system 400 may determine that the user's eye gaze is moving back and forth (as if the user were searching for the correct value).

基於XR系統400獲得的上下文資訊，XR系統400可以決定使用者正在搜尋與特定樓層相關聯的控制按鈕。例如，上下文資訊可以包括在酒店的登記資訊（例如，由XR系統400從與酒店相關聯的伺服器獲得的）、由使用者提供給XR系統400的語音命令、使用者偏好/輸入（例如，使用者可以將樓層號輸入到XR系統100中）、從使用者偵測到的辨識音訊（例如，XR系統400諸如使用始終線上音訊來辨識使用者說出單詞「樓層16」）、偵測到的房間號（例如，從由圖像感測器102擷取的一或多個圖像中偵測到的等）。Based on the contextual information obtained by the XR system 400, the XR system 400 may determine that the user is searching for control buttons associated with a particular floor. For example, contextual information may include registration information at a hotel (e.g., obtained by XR system 400 from a server associated with the hotel), voice commands provided by the user to XR system 400, user preferences/inputs (e.g., A user may enter a floor number into the XR system 100), recognition audio detected from the user (e.g., the XR system 400 recognizes the word "floor 16" from the user, such as using always-on audio), detects (eg, detected from one or more images captured by the image sensor 102 , etc.).

XR系統400可以在電梯控制台上呈現與特定樓層的控制按鈕相關聯的提示，以引導使用者找到正確的按鈕。例如，XR系統400可以決定使用者正在搜尋對應於樓層16的控制按鈕（與圖4A和圖4B中具有數值「16」的按鈕相關聯）。如圖4B所示，XR系統400可以顯示突出顯示對應於樓層16的控制按鈕的虛擬內容412。虛擬內容412看起來好像覆加在對應於樓層16的實際控制按鈕上，以幫助使用者容易地辨識使用者正在搜尋的控制按鈕。如圖4C所示，XR系統400可以顯示突出顯示對應於樓層16的控制按鈕的虛擬內容412，並且亦可以呈現文字414（「樓層16的按鈕在此處」）和提供進一步資訊的箭頭圖示，以幫助使用者辨識正確的控制按鈕。儘管圖4B和圖4C圖示突出顯示和文字作為視覺提示的實例，但是提示可以附加地或替代地包括其他類型的視覺提示、向使用者指示正確按鈕的辨識或位置的音訊提示（例如，基於手追蹤來引導使用者的手）及/或其他類型的提示。此舉對於具有殘疾的使用者或難以記住/學習正確按鈕的使用者尤其有用。The XR system 400 can present prompts on the elevator console associated with the control buttons for a particular floor to guide the user to the correct button. For example, the XR system 400 may determine that the user is searching for the control button corresponding to floor 16 (associated with the button having the value "16" in FIGS. 4A and 4B ). As shown in FIG. 4B , XR system 400 may display virtual content 412 highlighting the control button corresponding to floor 16 . The virtual content 412 appears to be overlaid on the actual control buttons corresponding to the floors 16 to help the user easily identify the control button the user is looking for. As shown in FIG. 4C , the XR system 400 can display virtual content 412 highlighting the control button corresponding to floor 16, and can also present text 414 ("the button for floor 16 is here") and an arrow icon providing further information , to help users identify the correct control button. While FIGS. 4B and 4C illustrate highlights and text as examples of visual cues, the cues may additionally or alternatively include other types of visual cues, audio cues that indicate to the user the identification or location of the correct button (e.g., based on hand tracking to guide the user's hand) and/or other types of cues. This is especially useful for users with disabilities or who have difficulty remembering/learning the correct button.

在一些情況下，在電梯中多次出行之後，XR系統400可以瞭解到使用者可以快速地及/或在沒有幫助的情況下找到正確的按鈕。作為回應，XR系統400可以決定停止提供虛擬內容來幫助辨識控制按鈕。在其他情況下，XR系統400可以繼續向使用者提供此種幫助（例如，當具有使用者特性的上下文資訊指示使用者有殘疾時）。在一些情況下，XR系統400可以類似地輔助使用者與其他控制台、設備、工業機械等進行互動。In some cases, after multiple trips in the elevator, the XR system 400 can learn that the user can find the correct button quickly and/or without assistance. In response, XR system 400 may decide to stop providing virtual content to help identify control buttons. In other cases, XR system 400 may continue to provide such assistance to the user (eg, when contextual information specific to the user indicates that the user has a disability). In some cases, XR system 400 may similarly assist a user in interacting with other consoles, devices, industrial machinery, and the like.

在另一說明性實例中，XR系統100可以幫助使用者與可以用來控制設備的遙控器互動。圖5A和圖5B是圖示使用者使用XR系統500與可以控制電視511的遙控器510互動的實例的圖。XR系統500可以基於各種因素來決定使用者可能難以使用遙控器510。例如，XR系統500可以（例如，基於使用圖像感測器102獲得的房間的一或多個圖像，基於XR系統500的環境光感測器等）決定電視511和使用者所在的房間具有較差的照明條件。此外，基於由XR系統500獲得的指示使用者特性的上下文資訊，XR系統500可以決定使用者具有差的近視力。XR系統500可以附加地或替代地決定遙控器510上的標籤或按鈕難以閱讀、是使用者不理解的語言，及/或以其他方式使得遙控器510難以與使用者互動。基於該上下文資訊，XR系統500可以決定使用者可能難以看到遙控器510的控制項/標籤。在一些情況下，XR系統500可以附加地或替代地偵測到使用者與遙控器互動存在困難。例如，XR系統500可以使用眼睛追蹤來決定使用者正在為了正確的按鈕而在眯眼及/或掃瞄遙控器。In another illustrative example, XR system 100 may facilitate user interaction with a remote control that may be used to control the device. 5A and 5B are diagrams illustrating an example of a user using the XR system 500 to interact with a remote control 510 that can control a television 511 . XR system 500 may determine that remote control 510 may be difficult for a user to use based on various factors. For example, XR system 500 may determine (e.g., based on one or more images of the room obtained using image sensor 102, based on an ambient light sensor of XR system 500, etc.) that television 511 and the room in which the user is located have Poor lighting conditions. Additionally, based on contextual information obtained by XR system 500 indicative of user characteristics, XR system 500 may determine that the user has poor near vision. XR system 500 may additionally or alternatively determine that labels or buttons on remote control 510 are difficult to read, are in a language that the user does not understand, and/or otherwise make remote control 510 difficult to interact with the user. Based on this contextual information, the XR system 500 can determine that the controls/labels of the remote control 510 may be difficult for the user to see. In some cases, the XR system 500 may additionally or alternatively detect that the user is having difficulty interacting with the remote. For example, the XR system 500 can use eye tracking to determine that the user is squinting and/or scanning the remote for the correct button.

XR系統500可以獲得指示使用者想要切換到播放使用者感興趣的事件的特定頻道（頻道34）的進一步的上下文資訊。例如，XR系統500可以決定使用者已經在使用者的數位日曆上安排了體育比賽。在另一實例中，XR系統500可以決定使用者總是在晚上的特定的時間觀看特定頻道。利用遙控器510可能難以與之互動以及（例如，因為使用者在當前時間頻繁觀看特定頻道中的事件）使用者想要切換到特定頻道的上下文知識，XR系統500可以顯示虛擬資料512，突出顯示遙控器510上正確的「3」和「4」按鈕，以幫助使用者辨識按鈕並選擇按鈕來切換到特定頻道34。在一些實例中，XR系統500可以順序地突出顯示按鈕「3」，隨後是按鈕「4」，使得使用者知道在按下按鈕「4」之前按下按鈕「3」。The XR system 500 may obtain further contextual information indicating that the user would like to switch to a particular channel (channel 34) broadcasting events of interest to the user. For example, the XR system 500 may determine that the user has scheduled a sports event on the user's digital calendar. In another example, the XR system 500 may determine that the user always watches a particular channel at a particular time of night. Using the contextual knowledge that the remote control 510 may be difficult to interact with and that the user wants to switch to a particular channel (for example, because the user frequently watches events in a particular channel at the current time), the XR system 500 can display a virtual profile 512, highlighting The correct "3" and "4" buttons on the remote control 510 to help the user identify the button and select the button to switch to a specific channel 34 . In some examples, the XR system 500 may sequentially highlight button "3," followed by button "4," so that the user knows to press button "3" before pressing button "4."

在一些情況下，除了或作為如前述突出顯示或強調數值的替代，XR系統500可以使用音訊來確認由XR系統500決定的選項（例如，事件及/或相關聯的按鈕/頻道）。例如，XR系統500可以提供音訊提示，要求使用者確認使用者希望切換到播放事件的特定頻道。XR系統500可以（例如，基於經由諸如輸入設備108的輸入設備提供的使用者輸入）接收來自使用者的確認並繼續該協助。在一個實例中，回應於接收到確認，XR系統500可以突出顯示遙控器510上對應於頻道（例如，頻道34）的正確按鈕（例如，「3」和「4」按鈕）。在另一實例中，回應於接收到確認，XR系統500可以自動向電視511及/或遙控器510發送命令，該命令使得電視511改變到頻道（例如，頻道34）。In some cases, XR system 500 may use audio to confirm options (eg, events and/or associated buttons/channels) determined by XR system 500 in addition to or instead of highlighting or emphasizing values as described above. For example, the XR system 500 may provide an audio prompt asking the user to confirm that the user wishes to switch to a particular channel broadcasting the event. XR system 500 may receive confirmation from the user (eg, based on user input provided via an input device such as input device 108 ) and proceed with the assistance. In one example, in response to receiving the confirmation, XR system 500 can highlight the correct buttons (eg, "3" and "4" buttons) on remote control 510 that correspond to the channel (eg, channel 34). In another example, in response to receiving the confirmation, XR system 500 can automatically send a command to television 511 and/or remote control 510 that causes television 511 to change to a channel (eg, channel 34).

在另一說明性實例中，XR系統100可以幫助使用者與恒溫器互動。圖6A和圖6B是圖示使用者使用XR系統600與恒溫器610互動的實例的圖。例如，恒溫器610可以被配置成使用手勢辨識來解釋一或多個手勢，並且可以基於偵測到的手勢來執行一或多個功能。然而，使用者可能不知道可用於使恒溫器610執行某些功能的正確手勢命令。類似於上文描述的，XR系統600使用獲得的上下文資訊來決定使用者在與恒溫器610互動時有困難。例如，XR系統600可以使用眼睛追蹤來決定使用者正盯著恒溫器610，及/或可以處理一或多個圖像來決定使用者正在執行手勢（例如，使用手605）但是恒溫器沒有執行基於手勢的任何功能。XR系統600可以使用任何其他上下文資訊來決定使用者難以與恒溫器610互動。In another illustrative example, XR system 100 may assist a user in interacting with a thermostat. 6A and 6B are diagrams illustrating an example of a user interacting with a thermostat 610 using the XR system 600 . For example, thermostat 610 may be configured to interpret one or more gestures using gesture recognition, and may perform one or more functions based on the detected gestures. However, the user may not be aware of the correct gesture commands that can be used to cause the thermostat 610 to perform certain functions. Similar to what was described above, the XR system 600 uses the acquired contextual information to determine that the user is having difficulty interacting with the thermostat 610 . For example, XR system 600 may use eye tracking to determine that the user is looking at thermostat 610, and/or may process one or more images to determine that the user is performing a gesture (e.g., using hand 605) but the thermostat is not. Any functionality based on gestures. XR system 600 may use any other contextual information to determine that it is difficult for the user to interact with thermostat 610 .

在一些情況下，使用者可以發出語音命令或其他輸入，其辨識使用者希望與恒溫器610進行的互動。例如，使用者可以背誦「將溫度設置為68度」,並且XR系統600可以辨識語音命令。XR系統600可以諸如從恒溫器610、從與恒溫器610相關聯的伺服器（例如，與Nest ^TM恒溫器相關聯的Nest ^TM伺服器）決定恒溫器610被配置為解釋的手勢命令。 In some cases, the user may issue a voice command or other input that identifies the user's desired interaction with thermostat 610 . For example, the user can recite "set the temperature to 68 degrees," and the XR system 600 can recognize the voice command. XR system 600 may determine which gesture commands thermostat 610 is configured to interpret, such as from thermostat 610 , from a servo associated with thermostat 610 (eg, a Nest ^™ servo associated with a Nest ^™ thermostat).

如圖6B所示，回應於辨識出語音命令、接收到指示使用者期望的設置的另一輸入，及/或決定使用者難以與恒溫器610互動，XR系統600可以向使用者呈現一組手勢命令（包括手勢命令612、手勢命令614和手勢命令616），該組手勢命令可以被應用（例如，有或沒有語音命令）以使得恒溫器610執行期望的功能。在一些情況下，如圖6B所示，手勢命令可以具有相應的數值，指示手勢命令612、614和616應該被執行的順序，以便使恒溫器610執行期望的功能。例如，使用者可以執行手勢命令612以使恒溫器610進入溫度調節模式。使用者隨後可以執行手勢命令614以使恒溫器610增加溫度。例如，每次使用者執行「豎起大拇指」手勢命令614時，恒溫器610可以使溫度增加一華氏度。使用者隨後可以執行手勢命令616以使恒溫器610退出溫度調節模式。As shown in FIG. 6B , in response to recognizing a voice command, receiving another input indicating a desired setting for the user, and/or determining that the user is having difficulty interacting with the thermostat 610, the XR system 600 may present a set of gestures to the user. commands (including gesture command 612 , gesture command 614 , and gesture command 616 ), the set of gesture commands can be applied (eg, with or without voice commands) to cause thermostat 610 to perform a desired function. In some cases, as shown in FIG. 6B , gesture commands may have corresponding numerical values indicating the order in which gesture commands 612 , 614 , and 616 should be executed in order to cause thermostat 610 to perform the desired function. For example, a user may execute gesture command 612 to put thermostat 610 into a temperature adjustment mode. The user may then execute gesture command 614 to cause thermostat 610 to increase the temperature. For example, the thermostat 610 may increase the temperature by one degree Fahrenheit each time the user performs the “thumbs up” gesture command 614 . The user may then execute gesture command 616 to exit thermostat 610 from thermostat mode.

在另一說明性實例中，XR系統100可以輔助使用者與數位相框互動。圖7A和圖7B是圖示使用XR系統700的使用者701的實例的圖，該XR系統700可以決定是否提供用於與數位相框710互動的使用者介面輸入選項。例如，數位相框710可以被配置為顯示與數位相框710所顯示的內容相關的中繼資料（例如，對所顯示的藝術、背景、人物等的描述）。例如，如圖7B所示，數位相框710可以在兩個人跳舞的照片旁邊顯示中繼資料712，其標題為「此情形是南茜和鮑勃在其婚禮當天」。In another illustrative example, XR system 100 may assist a user in interacting with a digital photo frame. 7A and 7B are diagrams illustrating an example of a user 701 using an XR system 700 that can decide whether to provide user interface input options for interacting with a digital photo frame 710 . For example, digital picture frame 710 may be configured to display metadata related to content displayed by digital picture frame 710 (eg, descriptions of displayed art, backgrounds, characters, etc.). For example, as shown in FIG. 7B , digital photo frame 710 may display metadata 712 next to a photo of two people dancing with the title "This is Nancy and Bob on their wedding day."

若使用者701沒有凝視數位相框710，若使用者匆忙走過數位相框710，及/或以其他方式可能對中繼資料712不感興趣，則使用者701可能對中繼資料不感興趣。XR系統700可以獲得上下文資訊，諸如指示使用者沒有在看數位相框710的偵測到的眼睛凝視、指示使用者正在行走的感測到的運動、使用者701的日曆（例如，指示使用者在不同的地點有即將到來的約會）、使用者701的偏好、使用者歷史資料、使用者通訊及/或其他上下文資訊。基於由XR系統700獲得的上下文資訊，XR系統700可以決定使用者正走過數位相框710，使用者沒有在看數位相框710，及/或使用者以其他方式可能對中繼資料712不感興趣。XR系統700隨後可以向數位相框710發送命令，以阻止數位相框710呈現中繼資料712（此舉可能潛在地分散使用者的注意力）。User 701 may not be interested in the metadata if user 701 is not gazing at digital picture frame 710 , if the user walks past digital picture frame 710 in a hurry, and/or otherwise may not be interested in metadata 712 . XR system 700 may obtain contextual information, such as detected eye gaze indicating that the user is not looking at digital photo frame 710, sensed motion indicating that the user is walking, user 701's calendar (e.g., indicating that the user is looking at different locations have upcoming appointments), user 701 preferences, user history data, user communications, and/or other contextual information. Based on contextual information obtained by XR system 700 , XR system 700 may determine that the user is walking past digital picture frame 710 , that the user is not looking at digital picture frame 710 , and/or that the user may not otherwise be interested in metadata 712 . XR system 700 may then send a command to digital picture frame 710 to prevent digital picture frame 710 from presenting metadata 712 (which could potentially distract the user).

圖8A是圖示當場景中存在多個設備時使用者802使用XR系統800與一或多個設備互動的實例的圖。在該實例中，場景包括數位相框810、遙控器812和數位恒溫器814。使用者802可以使用XR系統800來與場景中的任何設備（包括數位相框810、遙控器812及/或數位恒溫器814）互動。XR系統800可以從數位相框810、遙控器812和數位恒溫器814接收輸入選項及/或相關資料，並且可以呈現與對應於數位相框810、遙控器812及/或數位恒溫器814的輸入選項相關聯的使用者指導資料。8A is a diagram illustrating an example of a user 802 using an XR system 800 to interact with one or more devices when there are multiple devices in the scene. In this example, the scene includes a digital picture frame 810 , a remote control 812 and a digital thermostat 814 . User 802 can use XR system 800 to interact with any device in the scene, including digital photo frame 810 , remote control 812 and/or digital thermostat 814 . XR system 800 can receive input options and/or related data from digital picture frame 810, remote control 812, and digital thermostat 814, and can present information related to the input options corresponding to digital picture frame 810, remote control 812, and/or digital thermostat 814. linked user guides.

在一些情況下，當場景包括如圖8A所示的多個遠端設備時，XR系統800可能難以決定使用者802希望與場景中的多個設備中的何者互動及/或XR系統800應該為其呈現指導資料。在一些實例中，XR系統800可以從數位相框810、遙控器812和數位恒溫器814接收輸入選項。輸入選項可以包括關於在數位相框810、遙控器812和數位恒溫器814處可用/可接受的輸入類型的資訊。然而，當從場景中的多個設備接收此種資料時，XR系統800可能被來自多個設備的資料過載。資料過載會使得XR系統800難以向使用者802呈現相關資訊、向使用者呈現資訊而不造成明顯混亂、管理資料及/或與設備的互動等。In some cases, when a scene includes multiple remote devices as shown in FIG. It presents guidance material. In some examples, XR system 800 may receive input options from digital picture frame 810 , remote control 812 , and digital thermostat 814 . Input options may include information about the types of inputs available/accepted at digital picture frame 810 , remote control 812 , and digital thermostat 814 . However, when receiving such data from multiple devices in a scene, XR system 800 may be overloaded with data from multiple devices. Data overload can make it difficult for the XR system 800 to present relevant information to the user 802, present information to the user without significant confusion, manage data and/or interact with the device, and the like.

例如，參考圖8B，XR系統800可以顯示與數位相框810相關的資料820、與遙控器812相關的資料822以及與數位恒溫器814相關的資料824。對於XR系統800及/或使用者802而言，資料820、822和824可能變得難以應付。例如，如圖8B所示，當由XR系統800呈現時，資料820、822和824可能變得混亂，並且由XR系統800渲染的資訊可能變得過載，並且難以解析、管理、理解等。在一些實例中，XR系統800可以過濾來自數位相框810、遙控器812和數位恒溫器814的資料，以簡化及/或統一由XR系統800為數位相框810、遙控器812和數位恒溫器814呈現的資料。在一些情況下，XR系統800可以將呈現的資料限制為對應於特定相關設備的資料。For example, referring to FIG. 8B , XR system 800 may display profile 820 related to digital photo frame 810 , profile 822 related to remote control 812 , and profile 824 related to digital thermostat 814 . For XR system 800 and/or user 802, data 820, 822, and 824 may become overwhelming. For example, as shown in FIG. 8B, data 820, 822, and 824 may become cluttered when rendered by XR system 800, and the information rendered by XR system 800 may become overloaded and difficult to parse, manage, understand, etc. In some examples, XR system 800 may filter data from digital picture frame 810, remote control 812, and digital thermostat 814 to simplify and/or unify presentation by XR system 800 for digital picture frame 810, remote control 812, and digital thermostat 814. data of. In some cases, the XR system 800 may limit the presented materials to those corresponding to a particular device of interest.

為了說明，參考圖8C，XR系統800可以預測使用者希望與遠端設備812互動或將與遠端設備812互動。XR系統800可以使用該資訊來過濾出與數位相框810相關聯的資料820和與數位恒溫器814相關聯的資料824，以便簡化XR系統800呈現的資料。XR系統800可以呈現與遠端設備812相關聯的資料822，其被預測為與使用者802及/或當前上下文相關。在一些實例中，資料822可以包括使用者可以用來與遙控器812互動的輸入選項及/或通知使用者如何與遙控器812互動的輸入選項的指示。在一些情況下，資料822可以包括使用者指導資料，以促進與遠端設備812的使用者互動。在一些實例中，圖8C中所示的呈現可以清理及/或簡化XR系統800呈現的資料。To illustrate, referring to FIG. 8C , the XR system 800 can predict that the user wishes to interact with the remote device 812 or will interact with the remote device 812 . XR system 800 may use this information to filter out data 820 associated with digital photo frame 810 and data 824 associated with digital thermostat 814 in order to simplify the data presented by XR system 800 . The XR system 800 can present data 822 associated with the remote device 812 that is predicted to be relevant to the user 802 and/or the current context. In some examples, data 822 may include indications of input options that a user may use to interact with remote control 812 and/or input options that inform the user how to interact with remote control 812 . In some cases, material 822 may include user guidance material to facilitate user interaction with remote device 812 . In some examples, the presentation shown in FIG. 8C may clean up and/or simplify the data presented by XR system 800 .

在一些情況下，XR系統800可以使用上下文資訊來預測使用者802將與之互動的數位相框810、遙控器812和數位恒溫器814中的何者與使用者802相關等，及/或向使用者呈現來自數位相框810、遙控器812及/或數位恒溫器814的何者資料。在一些情況下，XR系統800可以呈現與數位相框810、遙控器812和數位恒溫器814相關的簡化資訊，XR系統800和使用者802可以使用該等資訊來過濾掉不太相關的資訊，並將呈現給使用者802的資訊量減少到與使用者802最相關的資訊。In some cases, XR system 800 may use contextual information to predict which of digital picture frame 810, remote control 812, and digital thermostat 814 that user 802 will interact with are relevant to user 802, etc., and/or provide information to the user. Which data from digital picture frame 810 , remote control 812 and/or digital thermostat 814 are presented. In some cases, XR system 800 can present simplified information related to digital photo frame 810, remote control 812, and digital thermostat 814, which can be used by XR system 800 and user 802 to filter out less relevant information and The amount of information presented to the user 802 is reduced to the information most relevant to the user 802 .

例如，參考圖8D，XR系統800可以呈現輸入選項的功能表840。功能表840可以指示可用於使用者互動的各種設備，並向使用者802提供選擇使用者感興趣的特定設備的能力。若使用者選擇特定設備，XR系統800可以呈現對應於該特定設備的資料，諸如輸入選項。例如，若使用者802從功能表840選擇數位相框810，則XR系統800可以呈現與數位相框810相關的資料，並且排除與遙控器812及/或恒溫器814相關的其他資料。For example, referring to FIG. 8D , XR system 800 may present a menu 840 of input options. Menu 840 may indicate various devices available for user interaction and provide user 802 with the ability to select a particular device of interest to the user. If the user selects a specific device, the XR system 800 can present information corresponding to the specific device, such as input options. For example, if user 802 selects digital photo frame 810 from menu 840 , XR system 800 may present data related to digital photo frame 810 and exclude other data related to remote control 812 and/or thermostat 814 .

在另一說明性實例中，XR系統100可以輔助使用者使用汽車控制。例如，駕駛車輛的使用者可能將車停在路邊。XR系統100可以偵測到使用者（例如，基於運動資訊、來自使用者的語音資料、來自車輛的資料等）靠邊停車。基於決定使用者將車輛靠到路邊，XR系統100可以決定使用者應該啟用車輛上的警示燈。使用者可以掃瞄車輛的儀錶板尋找警示燈按鈕。XR系統100可以使用眼睛追蹤、圖像分析及/或其他技術來決定使用者找不到或者很難找到警示燈按鈕。XR系統100可以使用此種上下文資訊來決定使用者需要幫助找到警示燈按鈕。XR系統100可以定位/辨識警示燈按鈕，並將AR內容覆加在警示燈按鈕周圍，以引導/輔助使用者定位警示按鈕。In another illustrative example, XR system 100 may assist a user in using vehicle controls. For example, a user driving a vehicle may pull over to the side of the road. The XR system 100 may detect that the user has pulled over (eg, based on motion information, voice data from the user, data from the vehicle, etc.). Based on the determination that the user has pulled the vehicle over to the side of the road, the XR system 100 may determine that the user should activate the warning lights on the vehicle. Users can scan the vehicle's dashboard for the warning light button. The XR system 100 may use eye tracking, image analysis, and/or other techniques to determine that the user cannot find or have difficulty finding the warning light button. The XR system 100 can use this contextual information to determine that the user needs help finding the warning light button. The XR system 100 can locate/recognize the warning light button, and overlay AR content around the warning light button to guide/assist the user in locating the warning light button.

儘管上文描述了XR系統100（和其他XR系統）使用輸入資料和上下文資訊來決定向使用者呈現的輸入選項的某些說明性實例，但是XR系統100可以基於輸入資料和上下文資訊來執行任何其他功能，以輔助XR系統100的使用者與一或多個其他設備互動。Although some illustrative examples of XR system 100 (and other XR systems) using input data and context information are described above to determine the input options presented to the user, XR system 100 can perform any task based on input data and context information. Other functions to assist the user of the XR system 100 to interact with one or more other devices.

圖9是圖示使用本文描述的一或多個技術來呈現與至少一個輸入選項相關聯的資訊的過程900的實例的流程圖。在方塊902處，過程900可以包括接收辨識與場景中的設備相關聯的一或多個輸入選項的資料。例如，XR系統（例如，XR系統100）可以接收辨識在場景中的一或多個遠端設備處什麼輸入選項可用的資料。9 is a flow diagram illustrating an example of a process 900 for presenting information associated with at least one input option using one or more techniques described herein. At block 902, process 900 may include receiving data identifying one or more input options associated with a device in a scene. For example, an XR system (eg, XR system 100 ) may receive data identifying what input options are available at one or more remote devices in a scene.

在方塊904處，過程900可以包括決定（包括使用至少一個記憶體）與場景、設備和與電子設備（例如，XR系統100）相關聯的使用者中的至少一個相關的資訊。在一些實例中，該資訊可以包括上下文資訊。上下文資訊可以提供例如關於場景、使用者、設備及/或電子設備的資訊。At block 904 , process 900 may include determining (including using at least one memory) information related to at least one of a scene, a device, and a user associated with the electronic device (eg, XR system 100 ). In some instances, this information can include contextual information. Contextual information may provide, for example, information about a scene, user, device, and/or electronic device.

在方塊906處，過程900可以包括：基於一或多個輸入選項和資訊來輸出與已經決定了相關上下文資訊的輸入選項相對應的使用者指導資料。在一些實例中，使用者指導資料可以包括與輸入選項相關聯的使用者輸入元素、與輸入選項相關聯的實體物件上的虛擬覆加及/或指示如何提供與輸入選項相關聯的輸入的提示中的至少一個。At block 906, process 900 may include outputting, based on the one or more input options and information, user guidance material corresponding to the input options for which relevant contextual information has been determined. In some examples, user guidance data may include user input elements associated with input options, virtual overlays on physical objects associated with input options, and/or prompts indicating how to provide input associated with input options at least one of the

在一些態樣，過程900可以包括：基於該資訊預測與設備的使用者互動；及基於一或多個輸入選項和預測的使用者互動，呈現對應於輸入選項的使用者指導資料。In some aspects, process 900 can include: predicting a user interaction with the device based on the information; and presenting user guidance material corresponding to the input option based on the one or more input options and the predicted user interaction.

在一些實例中，該設備可以包括具有網路通訊能力的連接設備，並且過程900可以包括：基於該資訊和一或多個輸入選項來決定表示預測的使用者互動的手勢；及呈現使用者指導資料。在一些實例中，預測的使用者互動可以包括對設備的預測的使用者輸入。在一些情況下，使用者指導資料可以包括手勢的指示，其在被偵測到時在設備處引動實際的使用者輸入。In some examples, the device may include a connected device with network communication capabilities, and process 900 may include: determining a gesture representing a predicted user interaction based on the information and one or more input options; and presenting user guidance material. In some examples, the predicted user interaction may include predicted user input to the device. In some cases, user guidance data may include indications of gestures that, when detected, cause actual user input at the device.

在一些實例中，該資訊包括使用者的眼睛注視和使用者的姿態中的至少一個，並且過程900可以包括：基於使用者的眼睛注視和使用者的姿態中的至少一個來預測與設備的使用者互動；在呈現使用者指導資料之後，偵測與輸入選項相關聯的實際使用者輸入，該實際使用者輸入表示預測的使用者互動；及向設備傳輸對應於與輸入選項相關聯的實際使用者輸入的命令。In some examples, the information includes at least one of the user's eye gaze and the user's gesture, and process 900 may include: predicting and using the device based on the at least one of the user's eye gaze and the user's gesture after presenting the user guidance material, detecting actual user input associated with the input options, the actual user input representing the predicted user interaction; and transmitting to the device corresponding to the actual usage associated with the input options command entered by the user.

在一些態樣，過程900可以包括從設備接收辨識與設備相關聯的一或多個輸入選項的資料。在一些態樣，過程900可以包括從伺服器接收辨識與設備相關聯的一或多個輸入選項的資料。In some aspects, process 900 can include receiving, from a device, data identifying one or more input options associated with the device. In some aspects, process 900 can include receiving from a server data identifying one or more input options associated with a device.

在一些情況下，設備沒有用於接收一或多個使用者輸入的外部使用者介面。在一些態樣，過程900可以包括基於該資訊來抑制呈現與設備相關聯的附加使用者指導資料。In some cases, the device has no external user interface for receiving one or more user inputs. In some aspects, process 900 can include refraining from presenting additional user guidance material associated with the device based on the information.

在一些態樣，過程900可以包括：在呈現使用者指導資料之後，獲得與輸入選項相關聯的使用者輸入；及向該設備傳輸對應於使用者輸入的指令。在一些情況下，指令可以被配置成控制設備的一或多個操作。In some aspects, process 900 can include: after presenting the user guidance material, obtaining user input associated with the input options; and transmitting to the device instructions corresponding to the user input. In some cases, the instructions may be configured to control one or more operations of the device.

在一些實例中，本文描述的過程（例如，過程900及/或本文描述的其他過程）可以由計算設備或裝置來執行。在一個實例中，過程900可以由圖1的XR系統100來執行。在另一實例中，過程900可以由具有圖10所示的計算系統1000的計算設備來執行。例如，具有圖10所示的計算架構的計算設備可以包括圖1的XR系統100的元件，並且可以實現圖9的操作。In some examples, the processes described herein (eg, process 900 and/or other processes described herein) can be performed by a computing device or apparatus. In one example, process 900 may be performed by XR system 100 of FIG. 1 . In another example, process 900 may be performed by a computing device having computing system 1000 shown in FIG. 10 . For example, a computing device having the computing architecture shown in FIG. 10 may include elements of the XR system 100 of FIG. 1 and may implement the operations of FIG. 9 .

該計算設備可以包括任何合適的設備，諸如行動設備（例如，行動電話）、臺式計算設備、平板計算設備、可穿戴設備（例如，VR頭戴式耳機、AR頭戴式耳機、AR眼鏡、聯網手錶或智慧手錶，或其他可穿戴設備）、伺服器電腦、自動車輛或自動車輛的計算設備、機器人設備、電視，及/或具有執行包括過程800在內的本文描述的過程的資源能力的任何其他計算設備。在一些情況下，計算設備或裝置可以包括各種元件，諸如一或多個輸入設備、一或多個輸出設備、一或多個處理器、一或多個微處理器、一或多個微型電腦、一或多個相機、一或多個感測器及/或被配置為執行本文描述的過程的步驟的（多個）其他元件。在一些實例中，計算設備可以包括顯示器、被配置為傳送及/或接收資料的網路介面、其任何組合及/或（多個）其他元件。網路介面可以被配置為傳送及/或接收基於網際網路協定（IP）的資料或其他類型的資料。The computing device may include any suitable device, such as a mobile device (e.g., a mobile phone), a desktop computing device, a tablet computing device, a wearable device (e.g., a VR headset, an AR headset, AR glasses, Internet-connected watches or smart watches, or other wearable devices), server computers, autonomous vehicles or computing devices for automated vehicles, robotic devices, televisions, and/or resources capable of performing the processes described herein, including process 800 any other computing device. In some cases, a computing device or apparatus may include various elements, such as one or more input devices, one or more output devices, one or more processors, one or more microprocessors, one or more microcomputers , one or more cameras, one or more sensors, and/or other element(s) configured to perform steps of the processes described herein. In some examples, a computing device may include a display, a network interface configured to transmit and/or receive data, any combination thereof, and/or other element(s). A network interface may be configured to transmit and/or receive Internet Protocol (IP) based data or other types of data.

計算設備的元件可以在電路系統中實現。例如，元件可以包括及/或可以使用電子電路或其他電子硬體來實現，電子電路或其他電子硬體可以包括一或多個可程式設計電子電路（例如，微處理器、圖形處理單元（GPU）、數位信號處理器（DSP）、中央處理單元（CPU）及/或其他合適的電子電路），及/或可以包括及/或使用電腦軟體、韌體或其任何組合來實現，以執行本文描述的各種操作。Elements of a computing device may be implemented in circuitry. For example, an element may include and/or be implemented using an electronic circuit or other electronic hardware that may include one or more programmable electronic circuits (e.g., a microprocessor, a graphics processing unit (GPU), ), digital signal processor (DSP), central processing unit (CPU) and/or other suitable electronic circuits), and/or may include and/or be implemented using computer software, firmware, or any combination thereof, to execute the Various operations described.

過程900被示為邏輯流程圖，其操作表示可以用硬體、電腦指令或其組合來實現的一系列操作。在電腦指令的上下文中，操作表示儲存在一或多個電腦可讀取儲存媒體上的電腦可執行指令，當由一或多個處理器執行時，執行所述操作。通常，電腦可執行指令包括執行特定功能或實現特定資料類型的常式、程式、物件、元件、資料結構等。描述操作的順序不意欲被解釋為限制，並且任何數量的所描述的操作可以以任何順序及/或並行組合來實現該等過程。Process 900 is shown as a logic flow diagram whose operations represent a series of operations that may be implemented in hardware, computer instructions, or a combination thereof. In the context of computer instructions, operations represent computer-executable instructions stored on one or more computer-readable storage media, which when executed by one or more processors, perform the described operations. Generally, computer-executable instructions include routines, programs, objects, components, data structures, etc. that perform specific functions or implement specific data types. The order in which the operations are described is not intended to be construed as a limitation, and any number of the described operations can be combined in any order and/or in parallel to implement the processes.

此外，過程900及/或本文描述的其他過程可以在配置有可執行指令的一或多個電腦系統的控制下執行，並且可以經由硬體或其組合實現為在一或多個處理器上共同執行的代碼（例如，可執行指令、一或多個電腦程式或一或多個應用程式）。如前述，代碼可以例如以包括可由一或多個處理器執行的複數個指令的電腦程式的形式儲存在電腦可讀取或機器可讀取儲存媒體上。電腦可讀取或機器可讀取儲存媒體可以是非暫時性的。Additionally, process 900 and/or other processes described herein can be performed under the control of one or more computer systems configured with executable instructions, and can be implemented via hardware or a combination thereof as collectively on one or more processors. Executed code (for example, executable instructions, one or more computer programs, or one or more application programs). As before, code may be stored on a computer-readable or machine-readable storage medium, for example, in the form of a computer program comprising a plurality of instructions executable by one or more processors. A computer-readable or machine-readable storage medium may be non-transitory.

圖10是圖示用於實現本技術的某些態樣的系統的實例的圖。具體而言，圖10圖示計算系統1000的實例，該計算系統可以是例如構成內部計算系統的任何計算設備、遠端計算系統、相機或其任何元件，其中該系統的元件使用連接1005彼此通訊。連接1005可以是諸如在晶片組架構中使用匯流排的實體連接或者直接連接到處理器1010。連接1005亦可以是虛擬連接、網路連接或邏輯連接。10 is a diagram illustrating an example of a system for implementing certain aspects of the present technology. In particular, FIG. 10 illustrates an example of a computing system 1000, which may be, for example, any computing device, a remote computing system, a camera, or any element thereof that constitutes an internal computing system, wherein the elements of the system communicate with each other using a connection 1005. . Connection 1005 may be a physical connection such as using a bus bar in a chipset architecture or directly to processor 1010 . The connection 1005 can also be a virtual connection, a network connection or a logical connection.

在一些實施例中，計算系統1000是分散式系統，其中本案中描述的功能可以分佈在資料中心、多個資料中心、同級網路等內。在一些實施例中，所描述的系統元件中的一或多個表示許多此類元件，每個元件執行描述該元件的部分或全部功能。在一些實施例中，元件可以是實體設備或虛擬設備。In some embodiments, computing system 1000 is a distributed system, where the functionality described herein may be distributed across a data center, multiple data centers, a peer network, or the like. In some embodiments, one or more of the described system elements represents a plurality of such elements, each element performing some or all of the functions described for that element. In some embodiments, an element may be a physical device or a virtual device.

示例性系統1000包括至少一個處理單元（CPU或處理器）1010和連接1005，該連接將包括系統記憶體1015（諸如唯讀記憶體（ROM）1020和隨機存取記憶體（RAM）1025）的各種系統元件耦合到處理器1010。計算系統1000可以包括高速記憶體的快取記憶體1012，該高速記憶體直接與處理器1010連接、靠近處理器1010或整合為處理器1010的一部分。Exemplary system 1000 includes at least one processing unit (CPU or processor) 1010 and connections 1005 that will include system memory 1015 such as read only memory (ROM) 1020 and random access memory (RAM) 1025 Various system elements are coupled to processor 1010 . Computing system 1000 may include cache memory 1012 of high speed memory coupled directly to processor 1010 , proximate to processor 1010 , or integrated as part of processor 1010 .

處理器1010可以包括任何通用處理器和硬體服務或軟體服務，諸如儲存在儲存設備1030中的服務1032、1034和1036，其被配置為控制處理器1010以及其中軟體指令被結合到實際處理器設計中的專用處理器。處理器1010基本上可以是包含多個核或處理器、匯流排、記憶體控制器、快取記憶體等的完全自包含的計算系統。多核處理器可以是對稱的或不對稱的。Processor 1010 may include any general-purpose processor and hardware services or software services, such as services 1032, 1034, and 1036 stored in storage device 1030, configured to control processor 1010 and where software instructions are incorporated into the actual processor A dedicated processor in the design. Processor 1010 may essentially be a completely self-contained computing system including multiple cores or processors, a bus, memory controller, cache memory, and the like. Multicore processors can be symmetric or asymmetric.

為了實現使用者互動，計算系統1000包括輸入設備1045，其可以表示任何數量的輸入機制，諸如用於語音的麥克風、用於手勢或圖形輸入的觸敏螢幕、鍵盤、滑鼠、動作輸入、語音等。計算系統1000亦可以包括輸出設備1035，其可以是多種輸出機制中的一或多個。在一些例子中，多模式系統可使使用者能夠提供多種類型的輸入/輸出以與計算系統1000通訊。計算系統1000可以包括通訊介面1040，其通常可以控制和管理使用者輸入和系統輸出。通訊介面可以使用有線及/或無線收發器執行或促進有線或無線通訊的接收及/或傳輸，包括彼等使用音訊插孔/插頭、麥克風插孔/插頭、通用序列匯流排（USB）埠/插頭、蘋果®閃電®埠/插頭、乙太網路埠/插頭、光纖埠/插頭、專有有線埠/插頭、藍芽®無線信號傳輸、藍芽®低功耗（BLE）無線信號傳輸、IBEACON®無線信號傳輸、射頻辨識（RFID）無線信號傳輸，近場通訊（NFC）無線信號傳輸、專用短程通訊（DSRC）無線信號傳輸、802.11 Wi-Fi無線信號傳輸、無線區域網路（WLAN）信號傳輸、可見光通訊（VLC）、全球互通微波存取性（WiMAX）、紅外（IR）通訊無線信號傳輸、公用交換電話網路（PSTN）信號傳輸、整合式服務數位網路（ISDN）信號傳輸、3G/4G/5G/LTE蜂巢資料網路無線信號傳輸、自組織網路信號傳輸、無線電波信號傳輸、微波信號傳輸、紅外信號傳輸、可見光信號傳輸、紫外光信號傳輸、沿電磁頻譜的無線信號傳輸，或其某些組合。通訊介面1040亦可以包括一或多個全球導航衛星系統（GNSS）接收器或收發器，用於基於從與一或多個GNSS系統相關聯的一或多個衛星接收到一或多個信號來決定計算系統1000的位置。GNSS系統包括但不限於基於美國的全球定位系統（GPS）、基於俄羅斯的全球導航衛星系統（GLONASS）、基於中國的北斗導航衛星系統（BDS）和基於歐洲的伽利略GNSS。對於在任何特定硬體配置上的操作沒有限制，因此本文的基本特徵可以容易地被替換為改良的硬體或韌體配置，因為其正在開發。To enable user interaction, computing system 1000 includes input devices 1045, which may represent any number of input mechanisms, such as a microphone for voice, a touch-sensitive screen for gesture or graphical input, keyboard, mouse, motion input, voice wait. Computing system 1000 may also include an output device 1035, which may be one or more of a variety of output mechanisms. In some examples, a multimodal system may enable a user to provide multiple types of input/output to communicate with computing system 1000 . Computing system 1000 can include communication interface 1040, which can generally control and manage user input and system output. Communication interfaces may use wired and/or wireless transceivers to perform or facilitate the reception and/or transmission of wired or wireless communications, including their use of audio jacks/plugs, microphone jacks/plugs, Universal Serial Bus (USB) ports/ Plug, Apple® Lightning® Port/Plug, Ethernet Port/Plug, Fiber Optic Port/Plug, Proprietary Wired Port/Plug, Bluetooth® Wireless Signal Transmission, Bluetooth® Low Energy (BLE) Wireless Signal Transmission, IBEACON® wireless signal transmission, radio frequency identification (RFID) wireless signal transmission, near field communication (NFC) wireless signal transmission, dedicated short-range communication (DSRC) wireless signal transmission, 802.11 Wi-Fi wireless signal transmission, wireless local area network (WLAN) Signal transmission, visible light communication (VLC), worldwide interoperability for microwave access (WiMAX), infrared (IR) communication wireless signal transmission, public switched telephone network (PSTN) signal transmission, integrated services digital network (ISDN) signal transmission , 3G/4G/5G/LTE cellular data network wireless signal transmission, self-organizing network signal transmission, radio wave signal transmission, microwave signal transmission, infrared signal transmission, visible light signal transmission, ultraviolet light signal transmission, wireless along the electromagnetic spectrum signal transmission, or some combination thereof. Communication interface 1040 may also include one or more global navigation satellite system (GNSS) receivers or transceivers for communicating based on receiving one or more signals from one or more satellites associated with one or more GNSS systems A location for the computing system 1000 is determined. GNSS systems include, but are not limited to, the US-based Global Positioning System (GPS), the Russian-based Global Navigation Satellite System (GLONASS), the Chinese-based Beidou Navigation Satellite System (BDS), and the European-based Galileo GNSS. There is no restriction on operation on any particular hardware configuration, so the basic features herein can easily be replaced with improved hardware or firmware configurations as they are developed.

儲存設備1030可以是非揮發性及/或非暫時性及/或電腦可讀取記憶體設備，並且可以是硬碟或可以儲存電腦可存取的資料的其他類型的電腦可讀取媒體，例如磁盒式磁帶、快閃記憶卡、固態記憶體設備、數位多功能磁碟、盒式磁帶、軟碟、撓性碟、硬碟、磁帶、磁條/條，任何其他磁儲存媒體、快閃記憶體、憶阻器記憶體、任何其他固態記憶體、壓縮光碟唯讀記憶體（CD-ROM）光碟、可重寫壓縮光碟（CD）光碟、數位視訊磁碟（DVD）光碟、藍光光碟、全息光碟、另一種光學媒體、安全數位（SD）卡、微型安全數位（microSD）卡、記憶棒®卡、智慧卡晶片、Europay萬事達卡和Visa（EMV）晶片、用戶辨識模組（SIM）卡、迷你/微/納/微微SIM卡、另一積體電路（IC）晶片/卡、隨機存取記憶體（RAM）、靜態RAM（SRAM）、動態RAM（DRAM）、唯讀記憶體（ROM）、可程式設計唯讀記憶體（PROM）、可抹除可程式設計唯讀記憶體（EPROM）、電擦除可程式設計唯讀記憶體（EEPROM）、快閃記憶體EPROM（FLASHEPROM）、快取記憶體（L1/L2/L3/L4/L5/L#）、電阻隨機存取記憶體（RRAM/ReRAM）、相變記憶體（PCM）、自旋轉移轉矩RAM（STT-RAM）、另一記憶體晶片或盒式磁帶，及/或其組合。Storage device 1030 may be a non-volatile and/or non-transitory and/or computer-readable memory device, and may be a hard disk or other type of computer-readable medium that can store data accessible by a computer, such as a magnetic Cassettes, flash memory cards, solid state memory devices, digital versatile disks, cassette tapes, floppy disks, flex disks, hard disks, tapes, magnetic strips/stripes, any other magnetic storage media, flash memory memory, memristor memory, any other solid-state memory, compact disc read-only memory (CD-ROM) discs, rewritable compact discs (CD) discs, digital video discs (DVD) discs, Blu-ray discs, holographic Optical discs, another optical media, Secure Digital (SD) cards, Micro Secure Digital (microSD) cards, Memory Stick® cards, Smart Card chips, Europay MasterCard and Visa (EMV) chips, Subscriber Identification Module (SIM) cards, Mini/micro/nano/pico SIM card, another integrated circuit (IC) chip/card, random access memory (RAM), static RAM (SRAM), dynamic RAM (DRAM), read only memory (ROM) , programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), flash memory EPROM (FLASHEPROM), fast Access memory (L1/L2/L3/L4/L5/L#), resistance random access memory (RRAM/ReRAM), phase change memory (PCM), spin transfer torque RAM (STT-RAM), Another memory chip or cartridge, and/or combinations thereof.

儲存設備1030可以包括軟體服務、伺服器、服務等，當定義此種軟體的代碼由處理器1010執行時，其使系統執行功能。在一些實施例中，執行特定功能的硬體服務可以包括儲存在電腦可讀取媒體中的軟體元件，該軟體元件與諸如處理器1010、連接1005、輸出設備1035等必要的硬體元件相結合，以執行功能。術語「電腦可讀取媒體」包括但不限於可攜式或非可攜式儲存設備、光學儲存設備和能夠儲存、包含或攜帶（多個）指令及/或資料的各種其他媒體。電腦可讀取媒體可以包括可以儲存資料的非暫時性媒體，並且不包括無線或經由有線連接傳播的載波及/或暫時性電子信號。非暫時性媒體的實例可以包括但不限於磁碟或磁帶，諸如壓縮光碟（CD）或數位通用磁碟（DVD）之類的光學儲存媒體、快閃記憶體、記憶體或記憶體設備。電腦可讀取媒體可以在其上儲存有代碼及/或機器可執行指令，該等指令可以表示程序、函數、副程式、程式、常式、子常式、模組、套裝軟體、指令、資料結構或程式語句的類或任何組合。程式碼片段可以經由傳遞及/或接收資訊、資料、引數、參數或記憶體內容來耦合到另一程式碼片段或硬體電路。可以經由任何合適的方式（包括記憶體共享、訊息傳遞、符記傳遞、網路傳輸等）傳遞、轉發或傳輸資訊、引數、參數、資料等。Storage device 1030 may include software services, servers, services, etc. that, when code defining such software is executed by processor 1010, cause the system to perform functions. In some embodiments, hardware services to perform specific functions may include software elements stored on computer readable media in combination with necessary hardware elements such as processor 1010, connections 1005, output devices 1035, etc. , to execute the function. The term "computer-readable medium" includes, but is not limited to, portable or non-portable storage devices, optical storage devices and various other media capable of storing, containing or carrying instruction(s) and/or data. Computer-readable media may include non-transitory media that can store data and do not include carrier waves and/or transitory electronic signals that travel wirelessly or via wired connections. Examples of non-transitory media may include, but are not limited to, magnetic disks or tapes, optical storage media such as compact discs (CDs) or digital versatile discs (DVDs), flash memory, memory or storage devices. A computer-readable medium may have stored thereon code and/or machine-executable instructions, which may represent a program, function, subroutine, program, routine, subroutine, module, package, instruction, data A class or any combination of structures or program statements. A code segment can be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, parameters, parameters, data, etc. may be passed, forwarded, or transmitted by any suitable means (including memory sharing, message passing, token passing, network transmission, etc.).

在一些實施例中，電腦可讀取儲存設備、媒體和記憶體可以包括包含位元串流等的有線或無線信號。然而，當提及時，非暫時性電腦可讀取儲存媒體明確地排除諸如能量、載波信號、電磁波和信號本身的媒體。In some embodiments, computer-readable storage devices, media, and memory may include wired or wireless signals including bit streams and the like. However, when mentioned, non-transitory computer readable storage media expressly excludes media such as energy, carrier signals, electromagnetic waves, and signals themselves.

在上文的描述中提供了具體的細節，以提供對本文提供的實施例和實例的全面理解。然而，一般技術者將理解，可以在沒有該等具體細節的情況下實施該等實施例。為了解釋清楚起見，在某些情況下，本技術可以被呈現為包括單獨的功能方塊，該等單獨的功能方塊包括以軟體或硬體和軟體的組合體現的方法來包括設備、設備元件、步驟或常式的功能方塊。除了圖中所示及/或本文所述的元件之外，亦可以使用其他元件。例如，電路、系統、網路、過程和其他元件可以以方塊圖形式被圖示為元件，以便不會在不必要的細節上使實施例模糊。在其他情況下，可以在沒有不必要的細節的情況下圖示已知的電路、過程、演算法、結構和技術，以避免使實施例模糊。In the above description specific details are provided to provide a thorough understanding of the embodiments and examples provided herein. However, one of ordinary skill will understand that the embodiments may be practiced without these specific details. For clarity of explanation, the technology may in some cases be presented as comprising individual functional blocks including methods embodied in software or a combination of hardware and software to include devices, device elements, A function block of a step or routine. Other elements may be used in addition to those shown in the figures and/or described herein. For example, circuits, systems, networks, processes and other elements may be shown in block diagram form as elements in order not to obscure the embodiments in unnecessary detail. In other instances, well-known circuits, procedures, algorithms, structures and techniques may be illustrated without unnecessary detail in order not to obscure the embodiments.

各個實施例可以在上文被描述為過程或方法，該過程和方法被圖示為流程圖、流程示意圖、資料流程圖、結構圖或方塊圖。儘管流程圖可以將操作描述為順序過程，但許多操作可以並行或併發地執行。此外，可以重新安排操作順序。過程在其操作完成時終止，但可能有其他步驟未包含在圖中。過程可以對應於方法、函數、程序、子常式、副程式等。當過程對應於函數時，其終止可以對應於函數返回到調用函數或主函數。Various embodiments may be described above as processes or methods illustrated as flowcharts, process schematics, material flow diagrams, block diagrams, or block diagrams. Although a flowchart can describe operations as a sequential process, many operations can be performed in parallel or concurrently. Additionally, the order of operations can be rearranged. A process terminates when its operations are complete, but there may be other steps not included in the diagram. A procedure may correspond to a method, function, procedure, subroutine, subroutine, or the like. When a procedure corresponds to a function, its termination may correspond to the function returning to the calling function or the main function.

可以使用儲存在電腦可讀取媒體中或以其他方式從電腦可讀取媒體中獲得的電腦可執行指令來實施根據上述實例的過程和方法。該等指令可以包括，例如，使得或以其他方式配置通用電腦、專用電腦或處理設備以執行某一功能或功能群組的指令和資料。可以經由網路存取使用的部分電腦資源。電腦可執行指令可以是例如二進位檔案、中間格式指令（諸如組合語言、韌體、原始程式碼等）。可用於儲存指令、所使用的資訊，及/或在根據所述實例的方法期間建立的資訊的電腦可讀取媒體的實例包括磁碟或光碟、快閃記憶體、設置有非揮發性記憶體的USB設備、網路儲存設備等。Procedures and methods according to the examples above may be implemented using computer-executable instructions stored on or otherwise obtained from a computer-readable medium. Such instructions may include, for example, instructions and materials which cause or otherwise configure a general purpose computer, special purpose computer or processing device to perform a certain function or group of functions. Part of the computer resources used can be accessed via the Internet. Computer-executable instructions may be, for example, binary files, intermediate format instructions (such as assembly language, firmware, source code, etc.). Examples of computer-readable media that can be used to store instructions, information used, and/or information created during methods according to the described examples include magnetic or optical disks, flash memory, non-volatile memory provided USB devices, network storage devices, etc.

實現根據該等揭示的過程和方法的設備可以包括硬體、軟體、韌體、中間軟體、微碼、硬體描述語言或其任何組合，並且可以採用各種形式因素中的任何一種。當在軟體、韌體、中間軟體或微碼中實現時，執行必要任務的程式碼或程式碼片段（例如，電腦程式產品）可以儲存在電腦可讀取或機器可讀取媒體中。（多個）處理器可以執行必要的任務。形狀因數的典型實例包括膝上型電腦、智慧手機、行動電話、平板電腦設備或其他小形狀因數個人電腦、個人數位助理、機架安裝設備、獨立設備等。本文描述的功能亦可以體現在周邊設備或外掛程式卡中。經由進一步的實例，此種功能亦可以在不同晶片之間的電路板上實施，或者在單個設備中執行的不同過程上實施。An apparatus for implementing processes and methods in accordance with these disclosures may comprise hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof and may take any of a variety of form factors. When implemented in software, firmware, middleware, or microcode, the code or code segments (eg, a computer program product) that perform the necessary tasks may be stored on a computer-readable or machine-readable medium. Processor(s) can perform the necessary tasks. Typical examples of form factors include laptops, smartphones, mobile phones, tablet devices or other small form factor personal computers, personal digital assistants, rack-mount devices, stand-alone devices, and the like. The functions described herein may also be embodied in peripheral devices or add-on cards. By way of further example, such functionality may also be implemented on circuit boards between different dies, or on different processes performed in a single device.

指令、用於傳送此類指令的媒體、用於執行該等指令的計算資源以及用於支援此類計算資源的其他結構是用於提供本案中描述的功能的示例性構件。Instructions, media for carrying such instructions, computing resources for executing such instructions, and other structures for supporting such computing resources are exemplary means for providing the functionality described herein.

在前面的描述中，參考本案的特定實施例描述了本案的各態樣，但是熟習此項技術者將認識到本案不限於此。因此，儘管本文已經詳細描述了本案的說明性實施例，但是應當理解，本發明的概念可以以不同的方式實施和使用，並且所附請求項意欲被解釋為包括該等變化，除了由現有技術限制的以外。上述應用的各種特徵和態樣可以單獨或聯合使用。此外，在不脫離本說明書的更廣泛的精神和範疇的情況下，實施例可以用於本文描述的環境和應用之外的任何數量的環境和應用中。因此，說明書和附圖應被視為說明性的而非限制性的。為了說明的目的，以特定的順序描述了方法。應當理解，在替代實施例中，該等方法可以以不同於所描述的順序來執行。In the foregoing description, aspects of the invention have been described with reference to particular embodiments thereof, but those skilled in the art will recognize that the invention is not limited thereto. Therefore, while illustrative embodiments of the present invention have been described in detail herein, it is to be understood that the inventive concepts may be embodied and used in various ways and the appended claims are intended to be construed to cover such variations except as provided by prior art beyond the limits. The various features and aspects of the applications described above can be used alone or in combination. Furthermore, the embodiments may be used in any number of environments and applications other than those described herein without departing from the broader spirit and scope of the description. Accordingly, the specification and drawings are to be regarded as illustrative rather than restrictive. For purposes of illustration, the methods are described in a particular order. It should be understood that, in alternative embodiments, the methods may be performed in an order different than that described.

一般技術者將理解，本文中使用的小於（「＜」）和大於（「＞」）符號或術語可以分別替換為小於或等於（「≦」）和大於或等於（「≧」）符號，而不脫離本說明書的範疇。Those of ordinary skill will understand that the less than ("<") and greater than (">") symbols or terms used herein may be replaced by less than or equal to ("≦") and greater than or equal to ("≧") symbols, respectively, and Do not depart from the scope of this manual.

在將元件描述為被「配置為」執行某些操作的情況下，可以例如經由設計電子電路或其他硬體來執行操作，經由程式設計可程式設計電子電路（例如微處理器，或其他合適的電子電路）來執行操作，或其任何組合來實現此種配置。Where an element is described as being "configured" to perform certain operations, the operations may be performed, for example, by designing an electronic circuit or other hardware, by programming an electronic circuit (such as a microprocessor, or other suitable electronic circuitry) to perform the operations, or any combination thereof to achieve such a configuration.

短語「耦合到」是指直接或間接實體連接到另一元件的任何元件，及/或直接或間接與另一元件通訊（例如，經由有線或無線連接及/或其他合適的通訊介面連接到另一元件）的任何元件。The phrase "coupled to" refers to any element that is directly or indirectly physically connected to another element, and/or communicates directly or indirectly with another element (for example, via a wired or wireless connection and/or other suitable communication interface to any element of another element).

請求項語言或其他語言列舉的集合「中的至少一個」及/或集合中的「一或多個」指示集合中的一個成員或集合中的多個成員（在任何組合中）滿足該請求項。例如，請求項語言列舉的「A和B中的至少一個」或「A或B中的至少一個」是指A、B或A和B。在另一實例中，請求項語言列舉的「A、B和C中的至少一個」或「A、B或C中的至少一個」是指A、B、C，或A和B，或A和C，或B和C，或A和B和C。語言集合「中的至少一個」及/或集合中的「一或多個」不將集合限制為集合中列出的專案。例如，申請專利範圍語言列舉的「A和B中的至少一個」或「A或B中的至少一個」可以表示A、B或A和B，並且可以另外包括A和B集合中未列出的專案。"At least one of" and/or "one or more" of a set enumerated in claim language or otherwise indicates that a member of the set or members of the set (in any combination) satisfy the claim . For example, "at least one of A and B" or "at least one of A or B" listed in the claim language refers to A, B, or A and B. In another example, "at least one of A, B, and C" or "at least one of A, B, or C" listed in the claim language refers to A, B, C, or A and B, or A and C, or B and C, or A and B and C. "At least one of" the language set and/or "one or more" of the set does not limit the set to the projects listed in the set. For example, "at least one of A and B" or "at least one of A or B" listed in the claim language may mean A, B, or A and B, and may additionally include those not listed in the set of A and B project.

結合本文揭示的實例描述的各種說明性邏輯區塊、模組、電路和演算法步驟可以實施為電子硬體、電腦軟體、韌體或其組合。為了清楚地說明硬體和軟體的此種可互換性，各種說明性的元件、方塊、模組、電路和步驟已經在上文根據其功能性進行了一般性的描述。將該等功能性實現為硬體或軟體取決於特定的應用和施加在整體系統上的設計約束。熟習此項技術者可以針對每個特定應用以不同的方式實現所描述的功能，但是此種實現決策不應被解釋為導致脫離本案的範疇。The various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the examples disclosed herein may be implemented as electronic hardware, computer software, firmware, or combinations thereof. To clearly illustrate this interchangeability of hardware and software, various illustrative elements, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Those skilled in the art may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

本文描述的技術亦可以在電子硬體、電腦軟體、韌體或其任何組合中實現。此種技術可以在多種設備中的任何一種中實現，諸如通用電腦、無線通訊設備手機或具有多種用途的積體電路設備，包括在無線通訊設備手機和其他設備中的應用。被描述為模組或元件的任何特徵可以在整合邏輯設備中一起實現，或者作為個別但可交互操作的邏輯設備單獨實現。若在軟體中實現，則該等技術可以至少部分地由包括程式碼的電腦可讀取資料儲存媒體來實現，該程式碼包括指令，當被執行時，執行上述方法、演算法及/或操作中的一或多個。電腦可讀取資料儲存媒體可以形成電腦程式產品的一部分，該電腦程式產品可以包括包裝材料。電腦可讀取媒體可以包括記憶體或資料儲存媒體，諸如隨機存取記憶體（RAM）（諸如同步動態隨機存取記憶體（SDRAM））、唯讀記憶體（ROM）、非揮發性隨機存取記憶體（NVRAM）、電子可抹除可程式設計唯讀記憶體（EEPROM）、快閃記憶體、磁或光資料儲存媒體等。附加地或替代地，該等技術可以至少部分地由電腦可讀取通訊媒體來實現，該電腦可讀取通訊媒體承載或傳送指令或資料結構形式的程式碼，並且可以由電腦存取、讀取及/或執行，諸如傳播的信號或波。The techniques described herein may also be implemented in electronic hardware, computer software, firmware, or any combination thereof. This technology can be implemented in any of a variety of devices, such as general-purpose computers, wireless communication device mobile phones, or integrated circuit devices with multiple uses, including applications in wireless communication device mobile phones and other devices. Any features described as modules or elements may be implemented together in an integrated logic device or separately as separate but interoperable logic devices. If implemented in software, the techniques may be implemented at least in part by a computer-readable data storage medium comprising program code, including instructions, which, when executed, perform the methods, algorithms and/or operations described above one or more of . A computer readable data storage medium may form a part of a computer program product, which may include packaging materials. Computer readable media may include memory or data storage media such as random access memory (RAM) such as synchronous dynamic random access memory (SDRAM), read only memory (ROM), non-volatile RAM Access memory (NVRAM), electronically erasable programmable read-only memory (EEPROM), flash memory, magnetic or optical data storage media, etc. Additionally or alternatively, the techniques may be implemented at least in part by a computer-readable communication medium that carries or transmits program code in the form of instructions or data structures that can be accessed, read by the computer, fetch and/or execute, such as a propagated signal or wave.

該程式碼可由處理器執行，處理器可包括一或多個處理器，諸如一或多個數位信號處理器（DSP）、通用微處理器、特殊應用積體電路（ASIC）、現場可程式設計邏輯陣列（FPGA）或其他等效整合或離散邏輯電路系統。此種處理器可以被配置為執行本案中描述的任何技術。通用處理器可以是微處理器，但是可選地，處理器可以是任何習知的處理器、控制器、微控制器或狀態機。處理器亦可以被實現為計算設備的組合，例如DSP和微處理器的組合、複數個微處理器、與DSP核心結合的一或多個微處理器，或者任何其他此種配置。因此，本文使用的術語「處理器」可以指任何前述結構、前述結構的任何組合，或者適於實現本文描述的技術的任何其他結構或裝置。The code is executable by a processor, which may include one or more processors, such as one or more digital signal processors (DSPs), general-purpose microprocessors, application-specific integrated circuits (ASICs), field-programmable Logic array (FPGA) or other equivalent integrated or discrete logic circuitry. Such a processor may be configured to perform any of the techniques described in this application. A general-purpose processor can be a microprocessor, but in the alternative, the processor can be any well-known processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, eg, a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. Accordingly, the term "processor," as used herein may refer to any of the foregoing structure, any combination of the foregoing, or any other structure or device suitable for implementation of the techniques described herein.

本案的說明性態樣包括：Illustrative aspects of the case include:

態樣1：一種用於輸出與至少一個輸入選項相關聯的資訊的裝置，包括：至少一個記憶體；及耦合到該至少一個記憶體的至少一個處理器，該至少一個處理器被配置為：接收辨識與場景中的設備相關聯的一或多個輸入選項的資料；決定（包括使用至少一個記憶體）與場景、設備和與裝置相關聯的使用者中的至少一個相關的資訊；及基於一或多個輸入選項和資訊，輸出與已經決定了相關上下文資訊的輸入選項相對應的使用者指導資料。Aspect 1: An apparatus for outputting information associated with at least one input option, comprising: at least one memory; and at least one processor coupled to the at least one memory, the at least one processor being configured to: receiving data identifying one or more input options associated with a device in a scene; determining (including using at least one memory) information related to at least one of a scene, a device, and a user associated with a device; and based on One or more input options and information, outputting user guidance data corresponding to the input options for which relevant contextual information has been determined.

態樣2：如態樣1之裝置，其中為了接收辨識一或多個輸入選項的資料，至少一個處理器被配置為：在場景中執行物件辨識，以辨識用於操作物件的一或多個輸入選項。Aspect 2: The device of Aspect 1, wherein in order to receive data identifying one or more input options, the at least one processor is configured to: perform object recognition in the scene to identify one or more Enter options.

態樣3：如態樣2之裝置，其中該物件是在到使用者的閾值接近度內和在使用者的視野（FOV）內的至少一個。Aspect 3: The device of Aspect 2, wherein the object is at least one of within a threshold proximity to the user and within a field of view (FOV) of the user.

態樣4：如態樣1至3中任一項之裝置，其中至少一個處理器被配置為：偵測場景中的一或多個附加設備；基於該資訊，決定預測使用者是否將與該設備或該一或多個附加設備互動的置信度值；及回應於所決定的置信度值超過閾值，預測與設備的使用者互動。Aspect 4: The device of any of Aspects 1 to 3, wherein the at least one processor is configured to: detect one or more additional devices in the scene; a confidence value for the device or the one or more additional device interactions; and predicting a user interaction with the device in response to the determined confidence value exceeding a threshold.

態樣5：如態樣4之裝置，其中至少一個處理器被配置為：回應於預測與設備的使用者互動，過濾與一或多個附加設備相關聯的內容；及輸出與設備相關的內容。Aspect 5: The apparatus of Aspect 4, wherein the at least one processor is configured to: filter content associated with one or more additional devices in response to a predicted user interaction with the device; and output device-related content .

態樣6：如態樣4至5中任一項之裝置，其中至少一個處理器被配置為辨別該設備和該一或多個附加設備中的何者被預測為使用者將與之互動。Aspect 6: The apparatus of any of Aspects 4 to 5, wherein at least one processor is configured to identify which of the device and the one or more additional devices is predicted to be interacted with by the user.

態樣7：如態樣4至6中任一項之裝置，其中該至少一個處理器被配置為簡化使用者介面內容的呈現，以避免內容過載、混亂和內容混亂中的至少一種。Aspect 7: The device of any of Aspects 4 to 6, wherein the at least one processor is configured to simplify presentation of user interface content to avoid at least one of content overload, clutter, and content clutter.

態樣8：如態樣1至7中任一項之裝置，其中該設備包括具有網路通訊能力的連接設備，並且該至少一個處理器被配置為：基於該資訊和一或多個輸入選項，決定表示預測的使用者互動的手勢，該預測的使用者互動包括對該設備的預測的使用者輸入；及呈現使用者指導資料，其中使用者指導資料包括手勢的指示，該手勢在被偵測到時在設備處引動實際使用者輸入。Aspect 8: The apparatus of any of Aspects 1 to 7, wherein the device includes a connected device with network communication capabilities, and the at least one processor is configured to: based on the information and one or more input options , determining a gesture representing a predicted user interaction comprising predicted user input to the device; and presenting user guidance data, wherein the user guidance data includes an indication of the gesture that was detected when Actual user input is elicited at the device when detected.

態樣9：如態樣1至8中任一項之裝置，其中使用者指導資料包括與輸入選項相關聯的使用者輸入元素、與輸入選項相關聯的實體物件上的虛擬覆加和指示如何提供與輸入選項相關聯的輸入的提示中的至少一個。Aspect 9: The device of any one of Aspects 1 to 8, wherein the user guidance data includes user input elements associated with input options, virtual overlays on physical objects associated with input options, and instructions on how to At least one of the prompts for input associated with the input option is provided.

態樣10：如態樣7之裝置，其中該至少一個處理器被配置為：基於該資訊，預測與設備的使用者互動；及基於一或多個輸入選項和預測的使用者互動，呈現對應於輸入選項的使用者指導資料。Aspect 10: The apparatus of Aspect 7, wherein the at least one processor is configured to: based on the information, predict a user interaction with the device; and based on the one or more input options and the predicted user interaction, present a corresponding User guidance information for input options.

態樣11：如態樣1至10中任一項之裝置，其中為了呈現使用者指導資料，至少一個處理器被配置為：在與裝置相關聯的顯示器處渲染被配置為看起來位於設備的表面上的虛擬覆加，該虛擬覆加包括與輸入選項相關聯的使用者介面元素，其中使用者介面元素包括與輸入選項相關聯的虛擬使用者輸入物件和被配置為接收對應於輸入選項的輸入的設備上的實體控制物件的視覺指示中的至少一個。Aspect 11: The apparatus of any of Aspects 1 to 10, wherein to present the user guidance material, the at least one processor is configured to render at a display associated with the apparatus configured to appear to be located on the device's A virtual overlay on the surface, the virtual overlay including a user interface element associated with the input option, wherein the user interface element includes a virtual user input object associated with the input option and configured to receive an input corresponding to the input option At least one of the visual indications of the physical control object on the input device.

態樣12：如態樣1至11中任一項之裝置，其中該資訊包括使用者的眼睛注視和使用者的姿態中的至少一個，該至少一個處理器被配置為：基於使用者的眼睛注視和使用者的姿態中的至少一個來預測與設備的使用者互動；在呈現使用者指導資料之後，偵測與輸入選項相關聯的實際使用者輸入，該實際使用者輸入表示預測的使用者互動；及向設備傳輸對應於與輸入選項相關聯的實際使用者輸入的命令。Aspect 12: The device of any of Aspects 1 to 11, wherein the information includes at least one of a user's eye gaze and a user's gesture, the at least one processor configured to: Predicting user interaction with the device by looking at at least one of gaze and gesture of the user; after presenting the user guidance material, detecting actual user input associated with the input options, the actual user input representing the predicted user interacting; and transmitting commands to the device corresponding to actual user input associated with the input options.

態樣13：如態樣1至12中任一項之裝置，其中為了輸出對應於輸入選項的使用者指導資料，至少一個處理器被配置為：顯示使用者指導資料。Aspect 13: The device of any of Aspects 1 to 12, wherein to output the user guidance data corresponding to the input options, the at least one processor is configured to: display the user guidance data.

態樣14：如態樣1至13中任一項之裝置，其中為了輸出對應於輸入選項的使用者指導資料，至少一個處理器被配置為：輸出表示使用者指導資料的音訊資料。Aspect 14: The device of any of Aspects 1 to 13, wherein to output the user guidance data corresponding to the input options, the at least one processor is configured to: output audio data representing the user guidance data.

態樣15：如態樣1至14中任一項之裝置，其中為了輸出對應於輸入選項的使用者指導資料，至少一個處理器被配置為：顯示使用者指導資料；及輸出與所顯示的使用者指導資料相關聯的音訊資料。Aspect 15: The device of any of Aspects 1 to 14, wherein to output the user guidance data corresponding to the input options, the at least one processor is configured to: display the user guidance data; and output the user guidance data corresponding to the displayed Audio data associated with user guidance data.

態樣16：如態樣1至15中任一項之裝置，其中至少一個處理器被配置為：從設備接收辨識與設備相關聯的一或多個輸入選項的資料。Aspect 16: The apparatus of any of Aspects 1 to 15, wherein the at least one processor is configured to: receive from the device data identifying one or more input options associated with the device.

態樣17：如態樣1至16中任一項之裝置，其中至少一個處理器被配置為：從伺服器接收辨識與設備相關聯的一或多個輸入選項的資料。Aspect 17: The apparatus of any of Aspects 1 to 16, wherein the at least one processor is configured to: receive from the server data identifying one or more input options associated with the device.

態樣18：如態樣1至17中任一項之裝置，其中該設備沒有用於接收一或多個使用者輸入的外部使用者介面。Aspect 18: The device of any of Aspects 1 to 17, wherein the device has no external user interface for receiving one or more user inputs.

態樣19：如態樣1至18中任一項之裝置，其中至少一個處理器被配置為：基於該資訊，抑制呈現與該設備相關聯的附加使用者指導資料。Aspect 19: The apparatus of any of Aspects 1 to 18, wherein at least one processor is configured to, based on the information, refrain from presenting additional user guidance material associated with the device.

態樣20：如態樣1至19中任一項之裝置，其中該裝置是擴展現實設備。Aspect 20: The device of any one of Aspects 1 to 19, wherein the device is an extended reality device.

態樣21：如態樣1至20中任一項之裝置，亦包括顯示器。Aspect 21: The device according to any one of Aspects 1 to 20, which also includes a display.

態樣22：如態樣21之裝置，其中顯示器被配置為至少顯示使用者指導資料。Aspect 22: The device of Aspect 21, wherein the display is configured to display at least user guidance material.

態樣23：如態樣1至22中任一項之裝置，其中至少一個處理器被配置為：在呈現使用者指導資料之後，獲得與輸入選項相關聯的使用者輸入；及向設備傳輸對應於使用者輸入的指令，該指令被配置為控制設備的一或多個操作。Aspect 23: The apparatus of any of Aspects 1 to 22, wherein the at least one processor is configured to: obtain user input associated with an input option after presenting the user-guidance material; and transmit to the device a corresponding Instructions entered by the user are configured to control one or more operations of the device.

態樣24：如態樣1至23中任一項之裝置，其中與場景、設備和使用者中的至少一個相關的資訊包括預測的與設備的使用者互動、場景中使用者的一或多個動作、與使用者相關聯的特性、與使用者和設備相關聯的歷史資訊、設備的使用者介面能力、與設備相關聯的資訊和與場景相關聯的資訊中的至少一個。Aspect 24: The apparatus of any of Aspects 1 to 23, wherein the information associated with at least one of the scene, the device, and the user includes predicted user interactions with the device, one or more of the users in the scene at least one of an action, characteristics associated with the user, historical information associated with the user and the device, user interface capabilities of the device, information associated with the device, and information associated with the scene.

態樣25：如態樣1至24中任一項之裝置，其中至少一個處理器被配置為：偵測場景中的一或多個附加設備；基於上下文資訊，決定指示使用者與設備或一或多個附加設備互動的可能性的置信度值；及回應於所決定的置信度值超過閾值，預測使用者互動。Aspect 25: The apparatus of any of Aspects 1 to 24, wherein the at least one processor is configured to: detect one or more additional devices in the scene; or a confidence value for the likelihood of the interaction of the plurality of additional devices; and predicting user interaction in response to the determined confidence value exceeding a threshold.

態樣26：如態樣1至25中任一項之裝置，其中至少一個處理器亦被配置為：從使用者接收對使用者互動資料的確認；回應於來自使用者的確認，與設備進行互動。Aspect 26: The device of any one of Aspects 1 to 25, wherein the at least one processor is also configured to: receive an acknowledgment of the user interaction data from the user; respond to the acknowledgment from the user, communicate with the device interactive.

態樣27：如態樣26之裝置，其中該確認是音訊確認。Aspect 27: The device of Aspect 26, wherein the confirmation is an audio confirmation.

態樣28：如態樣26至27中任一項的裝置，其中該確認是在該裝置處接收的使用者輸入。Aspect 28: The device of any of Aspects 26-27, wherein the confirmation is user input received at the device.

態樣29：一種用於輸出與至少一個輸入選項相關聯的資訊的方法，包括以下步驟：接收辨識與場景中的設備相關聯的一或多個輸入選項的資料；決定（包括使用至少一個記憶體）與場景、設備和與電子設備相關聯的使用者中的至少一個相關的資訊；及基於一或多個輸入選項和資訊，輸出與已經決定了相關上下文資訊的輸入選項相對應的使用者指導資料。Aspect 29: A method for outputting information associated with at least one input option, comprising the steps of: receiving data identifying one or more input options associated with a device in a scene; determining (including using at least one memory entity) information related to at least one of a scene, a device, and a user associated with an electronic device; and based on one or more input options and information, outputting a user corresponding to an input option for which relevant context information has been determined guidance material.

態樣30：如態樣29之方法，其中接收辨識一或多個輸入選項的資料包括：在場景中執行物件辨識，以辨識用於操作物件的一或多個輸入選項。Aspect 30: The method of Aspect 29, wherein receiving data identifying one or more input options includes: performing object recognition in the scene to identify one or more input options for manipulating the object.

態樣31：如態樣30之方法，其中該物件是在到使用者的閾值接近度內和在使用者的視野（FOV）內的至少一個。Aspect 31: The method of Aspect 30, wherein the object is at least one of within a threshold proximity to the user and within a field of view (FOV) of the user.

態樣32：如態樣29至31中任一項之方法，亦包括以下步驟：偵測場景中的一或多個附加設備；基於該資訊，決定預測使用者是否將與該設備或該一或多個附加設備互動的置信度值；及回應於所決定的置信度值超過閾值，預測與設備的使用者互動。Aspect 32: The method of any one of aspects 29 to 31, further comprising the steps of: detecting one or more additional devices in the scene; Confidence values for interactions with one or more additional devices; and predicting user interactions with the devices in response to the determined confidence values exceeding a threshold.

態樣33：如態樣29至32中任一項之方法，亦包括以下步驟：回應於預測與設備的使用者互動，過濾與一或多個附加設備相關聯的內容；及輸出與設備相關的內容。Aspect 33: The method of any of Aspects 29 to 32, further comprising the steps of: responsive to the predicted user interaction with the device, filtering content associated with one or more additional devices; and outputting device-related Content.

態樣34：如態樣29至33中任一項之方法，亦包括以下步驟：辨別該設備和該一或多個附加設備中的何者被預測為使用者將與之互動。Aspect 34: The method of any of Aspects 29 to 33, further comprising the step of identifying which of the device and the one or more additional devices is predicted to be interacted with by the user.

態樣35：如態樣34之方法，亦包括以下步驟：簡化使用者介面內容的呈現，以避免內容過載、混亂和內容混亂中的至少一種。Aspect 35: The method of Aspect 34, further comprising the step of simplifying presentation of user interface content to avoid at least one of content overload, clutter and content confusion.

態樣36：如態樣29至35中任一項之方法，其中使用者指導資料包括與輸入選項相關聯的使用者輸入元素、與輸入選項相關聯的實體物件上的虛擬覆加和指示如何提供與輸入選項相關聯的輸入的提示中的至少一個。Aspect 36: The method of any of Aspects 29 to 35, wherein the user guidance data includes user input elements associated with the input options, virtual overlays on physical objects associated with the input options, and instructions on how to At least one of the prompts for input associated with the input option is provided.

態樣37：如態樣29至36中任一項之方法，亦包括以下步驟：基於該資訊預測與設備的使用者互動；及基於一或多個輸入選項和預測的使用者互動，呈現對應於輸入選項的使用者指導資料。Aspect 37: The method of any one of Aspects 29 to 36, further comprising the steps of: predicting a user interaction with the device based on the information; and presenting a corresponding User guidance information for input options.

態樣38：如態樣37之方法，其中該設備包括具有網路通訊能力的連接設備，並且該方法亦包括以下步驟：基於該資訊和一或多個輸入選項，決定表示預測的使用者互動的手勢，該預測的使用者互動包括對該設備的預測的使用者輸入；及呈現使用者指導資料，其中使用者指導資料包括手勢的指示，該指示在被偵測到時在設備處引動實際使用者輸入。Aspect 38: The method of Aspect 37, wherein the device includes a connected device having network communication capabilities, and the method further comprises the step of: based on the information and the one or more input options, determining a user interaction representing a prediction gestures, the predicted user interaction comprising predicted user input to the device; and presenting user guidance data, wherein the user guidance data comprises indications of gestures which, when detected, trigger actual User input.

態樣39：如態樣29至38中任一項之方法，其中呈現使用者指導資料包括：在與電子設備相關聯的顯示器處渲染被配置為看起來位於設備的表面上的虛擬覆加，該虛擬覆加包括與輸入選項相關聯的使用者介面元素，其中使用者介面元素包括與輸入選項相關聯的虛擬使用者輸入物件和被配置為接收對應於輸入選項的輸入的設備上的實體控制物件的視覺指示中的至少一個。Aspect 39. The method of any of Aspects 29 to 38, wherein presenting the user guidance material comprises: rendering at a display associated with the electronic device a virtual overlay configured to appear to be located on a surface of the device, The virtual overlay includes a user interface element associated with the input option, wherein the user interface element includes a virtual user input object associated with the input option and a physical control on the device configured to receive input corresponding to the input option At least one of the visual indications of the object.

態樣40：如態樣29至39中任一項之方法，其中該資訊包括使用者的眼睛注視和使用者的姿態中的至少一個，該方法亦包括以下步驟：基於使用者的眼睛注視和使用者的姿態中的至少一個來預測與設備的使用者互動；在呈現使用者指導資料之後，偵測與輸入選項相關聯的實際使用者輸入，該實際使用者輸入表示預測的使用者互動；及向設備傳輸對應於與輸入選項相關聯的實際使用者輸入的命令。Aspect 40: The method of any one of aspects 29 to 39, wherein the information includes at least one of the user's eye gaze and the user's gesture, the method further comprising the step of: based on the user's eye gaze and at least one of the user's gestures to predict user interaction with the device; after presenting the user guidance material, detecting actual user input associated with the input options, the actual user input representing the predicted user interaction; and transmitting to the device a command corresponding to the actual user input associated with the input option.

態樣41：如態樣29至40中任一項之方法，其中輸出對應於輸入選項的使用者指導資料包括：顯示使用者指導資料。Aspect 41: The method of any of Aspects 29 to 40, wherein outputting the user guidance data corresponding to the input options comprises: displaying the user guidance data.

態樣42：如態樣29至41中任一項之方法，其中輸出對應於輸入選項的使用者指導資料包括：輸出表示使用者指導資料的音訊資料。Aspect 42: The method of any of Aspects 29 to 41, wherein outputting user guidance data corresponding to the input options includes outputting audio data representing the user guidance data.

態樣43：如態樣29至42中任一項之方法，其中輸出對應於輸入選項的使用者指導資料包括：顯示使用者指導資料；及輸出與所顯示的使用者指導資料相關聯的音訊資料。Aspect 43: The method of any of Aspects 29 to 42, wherein outputting user guidance data corresponding to the input options comprises: displaying the user guidance data; and outputting audio associated with the displayed user guidance data material.

態樣44：如態樣29至43中任一項之方法，亦包括以下步驟：從設備接收辨識與設備相關聯的一或多個輸入選項的資料。Aspect 44: The method of any of Aspects 29 to 43, further comprising the step of receiving from the device data identifying one or more input options associated with the device.

態樣45：如態樣29至44中任一項之方法，亦包括以下步驟：從伺服器接收辨識與設備相關聯的一或多個輸入選項的資料。Aspect 45: The method of any one of Aspects 29 to 44, further comprising the step of: receiving from a server data identifying one or more input options associated with the device.

態樣46：如態樣29至45中任一項之方法，其中該設備沒有用於接收一或多個使用者輸入的外部使用者介面。Aspect 46: The method of any of Aspects 29-45, wherein the device has no external user interface for receiving one or more user inputs.

態樣47：如態樣29至46中任一項之方法，亦包括以下步驟：基於該資訊，抑制呈現與該設備相關聯的附加使用者指導資料。Aspect 47: The method of any one of Aspects 29 to 46, further comprising the step of refraining from presenting additional user guidance material associated with the device based on the information.

態樣48：如態樣29至47中任一項之方法，亦包括以下步驟：在呈現使用者指導資料之後，獲得與輸入選項相關聯的使用者輸入；及向設備傳輸對應於使用者輸入的指令，該指令被配置為控制設備的一或多個操作。Aspect 48: The method of any one of Aspects 29 to 47, further comprising the steps of: after presenting the user guidance material, obtaining user input associated with the input option; and transmitting to the device a corresponding instructions configured to control one or more operations of a device.

態樣49：如態樣29至48中任一項之方法，其中與場景、設備和使用者中的至少一個相關的資訊包括預測的與設備的使用者互動、場景中使用者的一或多個動作、與使用者相關聯的特性、與使用者和設備相關聯的歷史資訊、設備的使用者介面能力、與設備相關聯的資訊和與場景相關聯的資訊中的至少一個。Aspect 49: The method of any of Aspects 29 to 48, wherein the information related to at least one of the context, the device, and the user includes predicted user interactions with the device, one or more of the users in the context at least one of an action, characteristics associated with the user, historical information associated with the user and the device, user interface capabilities of the device, information associated with the device, and information associated with the scene.

態樣50：如態樣29至49中任一項之方法，亦包括以下步驟：偵測場景中的一或多個附加設備；基於上下文資訊，決定指示使用者與設備或一或多個附加設備互動的可能性的置信度值；及回應於所決定的置信度值超過閾值，預測使用者互動。Aspect 50: The method of any one of Aspects 29 to 49, further comprising the steps of: detecting one or more additional devices in the scene; a confidence value for the likelihood of device interaction; and predicting a user interaction in response to the determined confidence value exceeding a threshold.

態樣51：如態樣29至50中任一項之方法，亦包括以下步驟：從使用者接收對使用者互動資料的確認；回應於來自使用者的確認，與設備進行互動。Aspect 51: The method according to any one of aspects 29 to 50, further comprising the steps of: receiving confirmation from the user of the user interaction data; and interacting with the device in response to the confirmation from the user.

態樣52：如態樣51之方法，其中該確認是音訊確認。Aspect 52: The method of Aspect 51, wherein the confirmation is an audio confirmation.

態樣53：如態樣51至52中任一項的方法，其中該確認是在該裝置處接收的使用者輸入。Aspect 53: The method of any of Aspects 51 to 52, wherein the confirmation is user input received at the device.

態樣54：一種其上儲存有指令的非暫時性電腦可讀取媒體，當由一或多個處理器執行時，該等指令使得一或多個處理器執行根據態樣29至53中任一項之方法。Aspect 54: A non-transitory computer-readable medium having stored thereon instructions which, when executed by the one or more processors, cause the one or more processors to perform any of aspects 29-53. One method.

態樣55：一種裝置，包括用於執行根據態樣29至53中任一項的方法的構件。Aspect 55: An apparatus comprising means for performing the method according to any one of aspects 29-53.

0~24:按鈕 100:XR系統 102:圖像感測器 104:加速度計 106:陀螺儀 107:儲存裝置 108:輸入設備 109:顯示器 110:計算元件 112:CPU 114:圖形處理單元（GPU） 116:DSP 118:ISP 120:XR引擎 122:輸入選項引擎 123:上下文管理引擎 124:圖像處理引擎 126:渲染引擎 200:手 230:拇指 231:圓 232:食指 233:菱形 234:中指 235:標誌點 236:無名指 238:小指 240:圖例 300:擴展現實系統 301:使用者 302:圖像感測器 303:物件 304:平面 309:顯示器 400:XR系統 405:手 410:控制台 412:虛擬內容 414:文字 500:XR系統 510:遙控器 511:電視 512:虛擬資料 600:XR系統 605:手 610:恒溫器 612:手勢命令 614:手勢命令 616:手勢命令 700:XR系統 701:使用者 710:數位相框 712:中繼資料 800:XR系統 802:使用者 810:數位相框 812:遙控器 814:數位恒溫器 820:資料 822:資料 824:資料 840:功能表 900:過程 902:方塊 904:方塊 906:方塊 1000:計算系統 1005:連接 1010:處理器 1012:快取記憶體 1015:系統記憶體 1020:唯讀記憶體（ROM） 1025:隨機存取記憶體（RAM） 1030:儲存設備 1032:服務 1034:服務 1035:輸出設備 1036:服務 1040:通訊介面 1045:輸入設備 0~24: button 100: XR system 102: Image sensor 104: Accelerometer 106: Gyroscope 107: storage device 108: Input device 109: Display 110: Computing element 112: CPU 114: Graphics Processing Unit (GPU) 116:DSP 118:ISP 120: XR engine 122:Enter option engine 123:Context management engine 124: Image processing engine 126: Rendering engine 200: hands 230: thumb 231: round 232: index finger 233: Rhombus 234: middle finger 235:mark point 236: ring finger 238: little finger 240:Legend 300: Extended Reality Systems 301: user 302: image sensor 303: object 304: plane 309: display 400:XR system 405: hand 410: console 412: virtual content 414: text 500:XR system 510: remote control 511: TV 512: virtual data 600:XR system 605: hand 610: thermostat 612: Gesture command 614: Gesture command 616: Gesture command 700:XR system 701: user 710:Digital photo frame 712:Metadata 800:XR system 802: user 810:Digital photo frame 812: remote control 814:Digital thermostat 820: data 822: data 824: data 840: function table 900: process 902: block 904: block 906: block 1000: computing system 1005: connect 1010: Processor 1012: cache memory 1015: System memory 1020: Read-only memory (ROM) 1025: random access memory (RAM) 1030: storage equipment 1032: Service 1034: Service 1035: output device 1036: Service 1040: communication interface 1045: input device

下文參考以下附圖詳細描述本案的說明性實施例：Illustrative embodiments of the present case are described in detail below with reference to the following drawings:

圖1是圖示根據本案的一些實例的示例性擴展現實（XR）系統的方塊圖；FIG. 1 is a block diagram illustrating an exemplary extended reality (XR) system according to some examples of the present disclosure;

圖2是圖示根據本案的一些實例的可用於追蹤手的位置以及手與虛擬環境的互動的手的示例性標誌點的圖；2 is a diagram illustrating exemplary landmark points of a hand that may be used to track the position of the hand and the interaction of the hand with a virtual environment, according to some examples of the present disclosure;

圖3是圖示根據本案的一些實例的由使用者穿戴的XR系統的實例的圖；3 is a diagram illustrating an example of an XR system worn by a user, according to some examples of the present disclosure;

圖4A、圖4B和圖4C是圖示根據本案的一些實例的使用者使用XR系統與電梯的控制台互動的實例的圖；4A, 4B, and 4C are diagrams illustrating examples of a user interacting with a console of an elevator using an XR system, according to some examples of the present disclosure;

圖5A和圖5B是圖示根據本案的一些實例的使用者使用XR系統與可以控制電視的遙控器互動的實例的圖；5A and 5B are diagrams illustrating examples of a user using an XR system to interact with a remote control that can control a television, according to some examples of the present disclosure;

圖6A和圖6B是圖示根據本案的一些實例的使用者使用XR系統與恒溫器互動的實例的圖；6A and 6B are diagrams illustrating examples of user interaction with a thermostat using an XR system, according to some examples of the present disclosure;

圖7A和圖7B是圖示根據本案的一些實例的使用者使用XR系統的實例的圖，該XR系統可以決定是否提供用於與圖片訊框互動的使用者介面輸入選項；7A and 7B are diagrams illustrating examples of a user using an XR system that can determine whether to provide user interface input options for interacting with a picture frame, according to some examples of the present disclosure;

圖8A至圖8D是圖示根據本案的一些實例的當場景中存在多個設備時使用者使用擴展現實系統與一或多個設備互動的實例的圖；8A-8D are diagrams illustrating examples of a user interacting with one or more devices using an augmented reality system when multiple devices are present in a scene, according to some examples of the present disclosure;

圖9是圖示根據本案的一些實例的用於呈現與至少一個輸入選項相關聯的資訊的過程的實例的流程圖；及9 is a flowchart illustrating an example of a process for presenting information associated with at least one input option, according to some examples of the present disclosure; and

圖10圖示根據本案的一些實例的示例性計算系統。Figure 10 illustrates an exemplary computing system according to some examples of the present disclosure.

國內寄存資訊(請依寄存機構、日期、號碼順序註記) 無國外寄存資訊(請依寄存國家、機構、日期、號碼順序註記) 無 Domestic deposit information (please note in order of depositor, date, and number) none Overseas storage information (please note in order of storage country, institution, date, and number) none

800:XR系統 800:XR system

802:使用者 802: user

810:數位相框 810:Digital photo frame

812:遙控器 812: remote control

814:數位恒溫器 814:Digital thermostat

Claims

An apparatus for outputting information associated with at least one input option, comprising: at least one memory; and at least one processor coupled to the at least one memory, the at least one processor configured to: receiving data identifying one or more input options associated with a device in a scene; including determining, using at least one memory, information associated with at least one of the scene, the device, and a user associated with the device; and Based on the one or more input options and the information, outputting user guidance data corresponding to an input option for which relevant contextual information has been determined.

The device according to claim 1, wherein the user guidance data includes a user input element associated with the input option, a virtual overlay on a physical object associated with the input option and instructions on how to provide information related to the input Option at least one of a prompt associated with an input.

The device according to claim 1, wherein the at least one processor is configured to: predicting a user interaction with the device based on the information; and Based on the one or more input options and the predicted user interaction, the user guidance material corresponding to the input options is presented.

The apparatus according to claim 1, wherein the device comprises a connection device having network communication capabilities, and the at least one processor is configured to: determining a gesture representing a predicted user interaction comprising a predicted user input to the device based on the information and the one or more input options; and The user guidance data is presented, wherein the user guidance data includes an indication of the gesture that, when detected, causes an actual user input at the device.

The device according to claim 1, wherein for presenting the user guidance material, the at least one processor is configured to: Rendering at a display associated with the device a virtual overlay configured to appear to be located on a surface of the device, the virtual overlay including a user interface element associated with the input option, wherein the use The user interface element includes at least one of a virtual user input object associated with the input option and a visual indication of a physical control object on the device configured to receive the input corresponding to the input option.

The device according to claim 1, wherein the information includes at least one of an eye gaze of the user and a gesture of the user, the at least one processor being configured to: predicting a user interaction with the device based on at least one of the eye gaze of the user and the gesture of the user; after presenting the user guidance material, detecting an actual user input associated with the input option, the actual user input representing the predicted user interaction; and A command corresponding to the actual user input associated with the input option is transmitted to the device.

The device according to claim 1, wherein in order to output the user guidance data corresponding to the input option, the at least one processor is configured to: Display the user guide information.

The device according to claim 1, wherein in order to output the user guidance data corresponding to the input option, the at least one processor is configured to: Audio data representing the user guidance data is output.

The device according to claim 1, wherein in order to output the user guidance data corresponding to the input option, the at least one processor is configured to: display the user guidance material; and Audio data associated with the displayed user guidance data is output.

The device according to claim 1, wherein the at least one processor is configured to: Data identifying the one or more input options associated with the device is received from the device.

The device according to claim 1, wherein the at least one processor is configured to: The data identifying the one or more input options associated with the device is received from a server.

The device according to claim 1, wherein the device does not have an external user interface for receiving one or more user inputs.

The device according to claim 1, wherein the at least one processor is configured to: Based on the information, rendering of additional user guidance material associated with the device is suppressed.

The device according to claim 1, wherein the device is an extended reality device.

The device according to claim 1 also includes a display.

The device according to claim 15, wherein the display is configured to display at least the user guidance information.

The device according to claim 1, wherein the at least one processor is configured to: obtaining a user input associated with the input option after presenting the user guidance material; and A command corresponding to the user input is transmitted to the device, the command being configured to control one or more operations of the device.

A method for outputting information associated with at least one input option, comprising the steps of: receiving data identifying one or more input options associated with a device in a scene; including determining, using at least one memory, information associated with at least one of the scene, the device, and a user associated with an electronic device; and Based on the one or more input options and the information, outputting user guidance data corresponding to an input option for which relevant contextual information has been determined.

The method according to claim 18, wherein the user guidance data includes a user input element associated with the input option, a virtual overlay on a physical object associated with the input option and instructions on how to provide information related to the input Option at least one of a prompt associated with an input.

The method according to claim 18 also includes the following steps: predicting a user interaction with the device based on the information; and Based on the one or more input options and the predicted user interaction, the user guidance material corresponding to the input options is presented.

The method according to claim 20, wherein the device includes a connection device with network communication capabilities, and the method also includes the following steps: determining a gesture representing a predicted user interaction comprising a predicted user input to the device based on the information and the one or more input options; and The user guidance data is presented, wherein the user guidance data includes an indication of the gesture that, when detected, causes an actual user input at the device.

The method according to claim 18, wherein the step of presenting the user guidance information comprises the following steps: rendering at a display associated with the electronic device a virtual overlay configured to appear to be located on a surface of the device, the virtual overlay including a user interface element associated with the input option, wherein the The user interface element includes at least one of a virtual user input object associated with the input option and a visual indication of a physical control object on the device configured to receive the input corresponding to the input option.

The method according to claim 18, wherein the information includes at least one of an eye gaze of the user and a gesture of the user, the method also comprising the steps of: predicting a user interaction with the device based on at least one of the eye gaze of the user and the gesture of the user; after presenting the user guidance material, detecting an actual user input associated with the input option, the actual user input representing the predicted user interaction; and A command corresponding to the actual user input associated with the input option is transmitted to the device.

The method according to claim 18, wherein the step of outputting the user guidance data corresponding to the input option comprises the following steps: Display the user guide information.

The method according to claim 18, wherein the step of outputting the user guidance data corresponding to the input option comprises the following steps: Audio data representing the user guidance data is output.

The method according to claim 18, wherein the step of outputting the user guidance data corresponding to the input option comprises the following steps: display the user guidance material; and Audio data associated with the displayed user guidance data is output.

The method according to claim 18 also includes the following steps: Data identifying the one or more input options associated with the device is received from the device.

The method according to claim 18 also includes the following steps: The data identifying the one or more input options associated with the device is received from a server.

The method according to claim 18, wherein the device has no external user interface for receiving one or more user inputs.

The method according to claim 18 also includes the following steps: Based on the information, rendering of additional user guidance material associated with the device is suppressed.

The method according to claim 18 also includes the following steps: obtaining a user input associated with the input option after presenting the user guidance material; and A command corresponding to the user input is transmitted to the device, the command being configured to control one or more operations of the device.