TW201514830A

TW201514830A - Interactive operation method of electronic apparatus

Info

Publication number: TW201514830A
Application number: TW102136408A
Authority: TW
Inventors: Bai-Ruei Huang; Chang-Hong Lin; Chia-Han Lee
Original assignee: Univ Nat Taiwan Science Tech
Priority date: 2013-10-08
Filing date: 2013-10-08
Publication date: 2015-04-16
Also published as: TWI499966B; US9256324B2; US20150097812A1

Abstract

An interactive operation method of an electronic apparatus is provided. An image sequence is captured by an image capturing unit. An image pre-processing is executed for an image of the image sequence. A fingertip candidate region is obtained from the image. Whether the fingertip candidate region is connected with a hand region is detected. If the fingertip candidate region is connected with the hand region, whether a click event is generated is detected. When the click event is generated, a corresponding function is executed.

Description

Interactive method of operation

本發明是有關於一種互動式控制機制，且特別是有關於一種基於手勢辨識的互動式操作方法。 The invention relates to an interactive control mechanism, and in particular to an interactive operation method based on gesture recognition.

隨著近年來電子裝置的發展，智慧型手機、平板電腦等電子裝置變得越來越普及。人們開始關注電子裝置可以提供給使用者的品質與能力等議題。例如，電子裝置可以提供一種人機介面(Human Machine Interface，HMI)來幫助使用者與電子裝置進行互動。而人機介面的設計取決於使用者的需求與習慣。例如，滑鼠、鍵盤和遙控器等實體控制器是目前最常見的人機介面。 With the development of electronic devices in recent years, electronic devices such as smart phones and tablet computers have become more and more popular. People began to pay attention to issues such as the quality and ability that electronic devices can provide to users. For example, the electronic device can provide a Human Machine Interface (HMI) to assist the user in interacting with the electronic device. The design of the human-machine interface depends on the needs and habits of the users. For example, physical controllers such as mice, keyboards, and remote controls are the most common human-machine interfaces available today.

使用者透過這些實體控制器來操作電腦或電視等電子裝置。隨著人機介面的發展，目前更從實體控制器慢慢朝向虛擬控制器來發展。而虛擬控制器不僅為使用者提供了全新的體驗，並且亦具備了多項好處。首先，虛擬控制器提供了多樣化的輸入方式，即，使用者能夠根據使用需求在實體控制器與虛擬控制器之間進行切換，以選擇適當的輸入方式。另外，虛擬控制器還可以根據使用者的需求來改變其大小和外觀，並且，虛擬控制器不佔用實體空間。 Users use these physical controllers to operate electronic devices such as computers or televisions. With the development of the human-machine interface, it is now moving from the physical controller to the virtual controller. The virtual controller not only provides users with a new experience, but also has many benefits. First, the virtual controller provides a variety of input methods, that is, the user can switch between the physical controller and the virtual controller according to the usage requirements to select an appropriate input method. In addition, the virtual controller can also be based on the user The need to change its size and appearance, and the virtual controller does not take up physical space.

擴增實境(Augmented Reality，簡稱AR)是一種實時地計算攝影機影像的位置及角度並加上相應圖像的技術，這種技術的目標是在螢幕上把虛擬世界套在現實世界並進行互動。大多數現有的擴增實境系統通常是基於網路攝影機。然而，這些類型的攝影機只能擷取二維(two dimensional，2D)資料。利用有限的二維數據來完成三維(three dimensional，3D)空間的定位不僅需要非常先進的算法，所獲得的結果也尚不精準，導致虛擬控制器無法解釋使用者的意圖和命令。 Augmented Reality (AR) is a technique for calculating the position and angle of camera images in real time and adding corresponding images. The goal of this technology is to put the virtual world on the screen and interact with it in the real world. . Most existing augmented reality systems are usually based on web cameras. However, these types of cameras can only capture two dimensional (2D) data. The use of limited two-dimensional data to complete three-dimensional (3D) spatial positioning requires not only very advanced algorithms, but also the results obtained are not accurate, resulting in the virtual controller unable to explain the user's intentions and commands.

本發明提供一種互動式操作方法，利用偵測使用者的手指，藉此透過手指傳遞指令讓使用者與電子裝置產生互動。 The present invention provides an interactive operation method for detecting a user's finger, thereby transmitting an instruction through a finger to allow the user to interact with the electronic device.

本發明的電子裝置的互動式操作方法，包括：透過取像單元擷取影像序列；對影像序列中的影像執行影像前處理；自影像獲得指尖候選區域；判斷指尖候選區域是否與手部區域相連；若指尖候選區域與手部區域相連，以指尖候選區域作為目標指尖區域；藉由持續追蹤目標指尖區域，判斷是否發生點選事件；以及在點選事件發生時，執行在電子裝置中所對應的功能。 The interactive operation method of the electronic device of the present invention comprises: capturing an image sequence through the image capturing unit; performing image preprocessing on the image in the image sequence; obtaining a fingertip candidate region from the image; and determining whether the fingertip candidate region is with the hand The area is connected; if the fingertip candidate area is connected to the hand area, the fingertip candidate area is used as the target fingertip area; by continuously tracking the target fingertip area, it is judged whether a click event occurs; and when the click event occurs, execution is performed The function corresponding to the electronic device.

在本發明的一實施例中，上述判斷指尖候選區域是否與手部區域相連的步驟包括：獲得指尖候選區域的中心點作為參考點；自參考點的四個方向分別獲得第一側點、第二側點、第三側點與第四側點，其中第一側點、第二側點、第三側點與第四側點分別位於指尖候選區域的外側；自影像的深度資訊中獲得第一側點、第二側點、第三側點與第四側點的第一深度值、第二深度值、第三深度值及第四深度值；以及判斷第一深度值、第二深度值、第三深度值及第四深度值是否大於0；在第一深度值、第二深度值、第三深度值及第四深度值僅其中之一大於0的情況下，判定指尖候選區域與手部區域相連；以及在其他情況下，判定指尖候選區域未與手部區域相連。 In an embodiment of the invention, the step of determining whether the fingertip candidate region is connected to the hand region comprises: obtaining a center point of the fingertip candidate region as a reference point; obtaining the first side point from the four directions of the reference point respectively Second side point, third side point and a four-sided point, wherein the first side point, the second side point, the third side point, and the fourth side point are respectively located outside the fingertip candidate area; the first side point and the second side point are obtained from the depth information of the image, a first depth value, a second depth value, a third depth value, and a fourth depth value of the third side point and the fourth side point; and determining the first depth value, the second depth value, the third depth value, and the fourth depth Whether the value is greater than 0; determining that the fingertip candidate region is connected to the hand region in a case where only one of the first depth value, the second depth value, the third depth value, and the fourth depth value is greater than 0; and in other cases Next, it is determined that the fingertip candidate area is not connected to the hand area.

在本發明的一實施例中，上述方法更包括：自目前所接收的影像的目標指尖區域獲得第一追蹤點；判斷第一追蹤點對應於顯示單元中的顯示位置是否位於功能項目的位置；以及若第一追蹤點所對應的顯示位置位於功能項目的位置，自前次所接收的影像的目標指尖區域取出第二追蹤點，並且比對第一追蹤點與第二追蹤點，以判斷是否發生點選事件。 In an embodiment of the invention, the method further includes: receiving from the present The target fingertip area of the image obtains the first tracking point; determines whether the first tracking point corresponds to the position of the display position in the display unit is located; and if the display position corresponding to the first tracking point is located at the position of the function item, The second tracking point is taken out from the target fingertip area of the previously received image, and the first tracking point and the second tracking point are compared to determine whether a click event occurs.

在本發明的一實施例中，上述比對第一追蹤點與第二追蹤點，以判斷是否發生點選事件的步驟包括：比對第一追蹤點與第二追蹤點之間的垂直軸位移量與水平軸位移量；依據深度資訊，比對第一追蹤點與第二追蹤點之間的深度變化量；倘若垂直軸位移量小於第一預設值、水平軸位移量小於第二預設值、且深度變化量小於第三預設值，則判定發生點選事件；以及倘若垂直軸位移量大於或等於第一預設值、水平軸位移量大於或等於第二預設值、及深度變化量大於或等於第三預設值其中至少一條件成立，則判定未發生點選事件。 In an embodiment of the invention, the comparing the first tracking point with the second tracking The step of determining whether a click event occurs includes: comparing a vertical axis displacement and a horizontal axis displacement between the first tracking point and the second tracking point; and comparing the first tracking point with the second according to the depth information Tracking the amount of depth change between the points; if the vertical axis displacement is less than the first preset value, the horizontal axis displacement is less than the second preset value, and the depth change is less than the third preset value, determining that a click event occurs; And if the vertical axis displacement is greater than or equal to the first preset value, the horizontal axis displacement is greater than or equal to the second preset value, and the depth variation is greater than or equal to the third preset value, wherein at least one condition is satisfied, the determination does not occur point Select the event.

在本發明的一實施例中，上述判斷是否發生點選事件的步驟更包括：基於第一追蹤點，於目前所接收的影像的目標指尖區域中取出第一計算點與第二計算點，其中第一追蹤點位於第一計算點與第二計算點之間；依據深度資訊，計算第一計算點與第二計算點的深度差值；在垂直軸位移量小於第一預設值、水平軸位移量小於第二預設值、且深度變化量小於第三預設值的情況下，若深度差值大於或等於第四預設值，則判定發生點選事件；若深度差值小於第四預設值，則判定未發生點選事件。 In an embodiment of the invention, the step of determining whether a click event occurs The step further includes: taking the first calculation point and the second calculation point in the target fingertip area of the currently received image based on the first tracking point, wherein the first tracking point is located between the first calculation point and the second calculation point Calculating the depth difference between the first calculation point and the second calculation point according to the depth information; the displacement on the vertical axis is less than the first preset value, the displacement on the horizontal axis is less than the second preset value, and the depth variation is less than the third In the case of the preset value, if the depth difference is greater than or equal to the fourth preset value, it is determined that a click event occurs; if the depth difference is less than the fourth preset value, it is determined that the click event does not occur.

在本發明的一實施例中，上述方法更包括：於顯示單元中顯示擴增實境互動介面；在擴增實境互動介面中顯示所接收的影像；於影像中獲得目標人臉區域時，在擴增實境互動介面顯示第一虛擬層，其中第一虛擬層包括功能項目；當功能項目被觸發時，在擴增實境互動介面顯示第二虛擬層，其中第二虛擬層包括虛擬控制介面。 In an embodiment of the invention, the method further includes: in the display unit Displaying the augmented reality interactive interface; displaying the received image in the augmented reality interactive interface; displaying the first virtual layer in the augmented reality interactive interface when the target face region is obtained in the image, wherein the first virtual layer The function item is included; when the function item is triggered, the second virtual layer is displayed in the augmented reality interaction interface, wherein the second virtual layer includes a virtual control interface.

在本發明的一實施例中，上述執行影像前處理的步驟包括：執行背景去除程序。上述背景去除程序包括：偵測影像中的多個待處理物件；以及依據影像的深度資訊，自上述待處理物件濾除一個或多個不重要物件，其中上述不重要物件的深度值大於預設深度值。即，將深度值小於預設深度值的待處理物件保留。 In an embodiment of the invention, the step of performing the pre-image processing package Include: Perform a background removal procedure. The background removal process includes: detecting a plurality of objects to be processed in the image; and filtering one or more unimportant objects from the object to be processed according to the depth information of the image, wherein the depth value of the unimportant object is greater than a preset Depth value. That is, the object to be processed whose depth value is smaller than the preset depth value is retained.

在本發明的一實施例中，上述在執行背景去除程序的步驟之後，更包括執行人臉姿態估測(face pose estimation)程序。上述人臉姿態估測程序包括：在剩餘的待處理物件中執行臉部偵測程序，而獲得多個人臉區域；依據影像的深度資訊，自所獲得的人臉區域中取出目標人臉區域，其中目標人臉區域的深度值為這些人臉區域的深度值中的最小值(即，離取像單元最近者)；以及保留深度值為最小值的目標人臉區域所在的其中一待處理物件，而濾除其他的待處理物件。 In an embodiment of the invention, after the step of executing the background removal procedure, the method further includes performing a face pose estimation procedure. Above The face pose estimation program includes: performing a face detection process on the remaining objects to be processed, and obtaining a plurality of face regions; and extracting a target face region from the obtained face region according to the depth information of the image, wherein The depth value of the target face region is the minimum value of the depth values of the face regions (ie, the closest to the image capturing unit); and one of the objects to be processed in which the target face region with the minimum depth value is the minimum value, Filter out other objects to be processed.

在本發明的一實施例中，上述在執行人臉姿態估測程序的步驟之後更包括執行手部偵測程序。上述手部偵側程序包括：利用膚色偵測演算法，獲得手部區域。 In an embodiment of the invention, the above-described method for performing a face pose estimation program The step further includes performing a hand detection process. The above hand detection program includes: using a skin color detection algorithm to obtain a hand region.

在本發明的一實施例中，上述執行手部偵測程序的步驟之後，更包括執行指尖偵測程序，以獲得指尖候選區域。 In an embodiment of the invention, after the step of executing the hand detection program, the method further includes performing a fingertip detection procedure to obtain a fingertip candidate area.

基於上述，藉由偵測使用者在三維空間中進行操作的手指，並在偵測到手指觸發了點選事件時，執行在電子裝置中所對應的功能。據此，使用者能夠透過手指來傳遞指令而與電子裝置產生互動。 Based on the above, the function corresponding to the electronic device is executed by detecting the finger operated by the user in the three-dimensional space and detecting that the pointing event is triggered by the finger. Accordingly, the user can interact with the electronic device by transmitting an instruction through the finger.

為讓本發明的上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明如下。 The above described features and advantages of the invention will be apparent from the following description.

100‧‧‧電子裝置 100‧‧‧Electronic devices

110‧‧‧取像單元 110‧‧‧Image capture unit

120‧‧‧處理單元 120‧‧‧Processing unit

130‧‧‧顯示單元 130‧‧‧Display unit

140‧‧‧儲存單元 140‧‧‧ storage unit

300‧‧‧影像處理模組 300‧‧‧Image Processing Module

310‧‧‧背景去除模組 310‧‧‧Background removal module

320‧‧‧人臉姿態估測模組 320‧‧‧Face posture estimation module

330‧‧‧手部偵測模組 330‧‧‧Hand Detection Module

340‧‧‧指尖偵測模組 340‧‧‧ fingertip detection module

350‧‧‧指尖追蹤模組 350‧‧‧ fingertip tracking module

360‧‧‧點選事件辨識模組 360‧‧‧Click event identification module

400‧‧‧影像 400‧‧‧ images

401‧‧‧手部區域 401‧‧‧Hand area

501、60、61‧‧‧候選指尖區域 501, 60, 61‧‧‧ candidate fingertip areas

601、611、621、631、P1‧‧‧第一側點 601, 611, 621, 631, P1‧‧‧ first side points

602、612、622、632、P2‧‧‧第二側點 602, 612, 622, 632, P2‧‧‧ second side points

603、613、623、633、P3‧‧‧第三側點 603, 613, 623, 633, P3‧‧‧ third side points

604、614、624、634、P4‧‧‧第四側點 604, 614, 624, 634, P4‧‧‧ fourth side point

701‧‧‧第一計算點 701‧‧‧ first calculation point

702‧‧‧第一追蹤點 702‧‧‧First Tracking Point

703‧‧‧第二計算點 703‧‧‧ second calculation point

710‧‧‧目標指尖區域 710‧‧‧ Target fingertip area

800‧‧‧擴增實境互動介面 800‧‧‧Augmented Reality Interactive Interface

810‧‧‧目標人臉區域 810‧‧‧ Target face area

820、830、860‧‧‧功能項目 820, 830, 860‧‧‧ functional items

840、850‧‧‧虛擬控制介面 840, 850‧‧‧ virtual control interface

R、R1、R2‧‧‧參考點 R, R1, R2‧‧‧ reference points

θ 1、θ 2‧‧‧角度 θ 1 , θ 2‧‧‧ angle

S205~S235‧‧‧互動操作方法的各步驟 Steps of the S205~S235‧‧‧ interactive operation method

圖1是依照本發明一實施例的互動式操作系統的方塊圖。 1 is a block diagram of an interactive operating system in accordance with an embodiment of the present invention.

圖2是依照本發明一實施例的互動式操作方法的流程圖。 2 is a flow chart of an interactive method of operation in accordance with an embodiment of the present invention.

圖3是依照本發明一實施例的影像處理模組的示意圖。 FIG. 3 is a schematic diagram of an image processing module according to an embodiment of the invention.

圖4是依照本發明一實施例的具有手部區域之影像的示意圖。 4 is a schematic illustration of an image with a hand region in accordance with an embodiment of the present invention.

圖5是依照本發明一實施例的手部區域的示意圖。 Figure 5 is a schematic illustration of a hand region in accordance with an embodiment of the present invention.

圖6A及圖6B是依照本發明一實施例的判斷目標指尖區域方式的示意圖。 6A and 6B are schematic diagrams showing the manner of determining a target fingertip region according to an embodiment of the invention.

圖7是依照本發明一實施例的用以判斷點擊事件的示意圖。 FIG. 7 is a schematic diagram of determining a click event according to an embodiment of the invention.

圖8A及圖8B是依照本發明的結合擴增實境互動介面的操作方法的示意圖。 8A and 8B are schematic diagrams of an operational method of incorporating an augmented reality interaction interface in accordance with the present invention.

隨著科技的發展，人們開始注重電子裝置提供給使用者的功能與設備本身之品質等議題。本發明提出一種電子裝置的互動操作方法，可讓使用者利用其手指在三維空間中傳遞指令來控制電子裝置。為了使本發明之內容更為明瞭，以下特舉實施例作為本發明確實能夠據以實施的範例。 With the development of technology, people began to pay attention to the functions of electronic devices provided to users and the quality of the devices themselves. The invention provides an interactive operation method of an electronic device, which allows a user to use his finger to transmit instructions in a three-dimensional space to control an electronic device. In order to clarify the content of the present invention, the following specific examples are given as examples in which the present invention can be implemented.

圖1是依照本發明一實施例的互動式操作系統的方塊圖。請參照圖1，互動式操作系統100包括取像單元110、處理單元120、顯示單元130以及儲存單元140。處理單元120耦接至取像單元110、顯示單元130以及儲存單元140。利用取像單元110擷取使用者的影像，而由處理單元120來辨識影像中使用者的動作，藉此來執行在電子裝置100中對應的功能。各構件詳細說明請參照下述。 1 is a block diagram of an interactive operating system in accordance with an embodiment of the present invention. Referring to FIG. 1 , the interactive operating system 100 includes an image capturing unit 110 , a processing unit 120 , a display unit 130 , and a storage unit 140 . The processing unit 120 is coupled to the image capturing unit 110, the display unit 130, and the storage unit 140. The image capturing unit 110 captures the image of the user, and the processing unit 120 recognizes the motion of the user in the image, thereby executing the function corresponding to the electronic device 100. For details of each component, please refer to the following.

取像單元110用以擷取影像。例如，取像單元110可以是深度攝影機(depth camera)或立體攝影機，或任何具有電荷耦合元件(Charge coupled device，CCD)鏡頭、互補式金氧半電晶體(Complementary metal oxide semiconductor transistors，CMOS)鏡頭、或紅外線鏡頭的攝影機、照相機。而取像單元110所面向的方向可以是便於拍攝到使用者的方向。 The image capturing unit 110 is configured to capture an image. For example, the image capturing unit 110 may be A depth camera or stereo camera, or any camera or camera having a charge coupled device (CCD) lens, a complementary metal oxide semiconductor transistor (CMOS) lens, or an infrared lens . The direction in which the image capturing unit 110 faces may be a direction that facilitates photographing to the user.

處理單元120用以分析取像單元110所擷取的影像。處理單元120例如是中央處理單元(Central Processing Unit，CPU)，或是其他可程式化之一般用途或特殊用途的微處理器(Microprocessor)、數位訊號處理器(Digital Signal Processor，DSP)、可程式化控制器、特殊應用積體電路(Application Specific Integrated Circuits，ASIC)、可程式化邏輯裝置(Programmable Logic Device，PLD)、場可程式閘陣列(Field-programmable Gate Array，FPGA)或其他類似裝置或這些裝置的組合。 The processing unit 120 is configured to analyze the image captured by the image capturing unit 110. The processing unit 120 is, for example, a central processing unit (CPU), or other programmable general purpose or special purpose microprocessor (Microprocessor), digital signal processor (DSP), programmable Controllers, Application Specific Integrated Circuits (ASICs), Programmable Logic Devices (PLDs), Field-programmable Gate Arrays (FPGAs), or other similar devices or A combination of these devices.

顯示單元130可為任一類型的顯示器，例如液晶顯示器(Liquid Crystal Display，LCD)或發光二極管(Light Emitting Diode，LED)等平面顯示器、或者為投影顯示器、亦或是軟性顯示器(soft display)等。 The display unit 130 can be any type of display, such as a liquid crystal display (LCD) or a light emitting diode (LED), or a flat display, or a projection display, or a soft display. .

儲存單元140例如是任意型式的固定式或可移動式隨機存取記憶體(Random Access Memory，RAM)、唯讀記憶體(Read-Only Memory，ROM)、快閃記憶體(Flash memory)、硬碟或其他類似裝置或這些裝置的組合，而用以記錄可由處理單元120執行的多個模組，進而來實現互動式操作方法。 The storage unit 140 is, for example, any type of fixed or removable random access memory (RAM), read-only memory (ROM), flash memory (Flash memory), hard Discs or other similar devices or combinations of these devices are used to record a plurality of modules executable by processing unit 120 to implement an interactive method of operation.

本實施例是以程式碼來實現。例如，儲存單元140中儲存有多個程式碼片段，上述程式碼片段在被安裝後，會由處理單元120來執行。據此，互動式操作系統100可以在複雜的自然環境下準確地偵測使用者之手指，並藉由手指傳遞指令讓使用者與機器產生互動。底下即進一步說明藉由偵測手指來控制互動式操作系統100的步驟。 This embodiment is implemented by a code. For example, the storage unit 140 stores There are a plurality of code segments, and the above code segments are executed by the processing unit 120 after being installed. Accordingly, the interactive operating system 100 can accurately detect the user's finger in a complex natural environment, and pass the finger to the user to interact with the machine. The steps of controlling the interactive operating system 100 by detecting a finger are further explained below.

圖2是依照本發明一實施例的互動式操作方法的流程圖。請同時參照圖1及圖2，在步驟S205中，透過取像單元110擷取影像序列。例如，取像單元110每隔一取樣頻率即擷取一張影像。接著，在步驟S210中，處理單元120對影像序列中的影像執行影像前處理。上述影像前處理包括背景去除程序、人臉姿態估測程序、手部偵測程序等。在獲得手部區域之後，在步驟S215中，處理單元120自影像中獲得指尖候選區域。也就是說在手部區域中找出可能為指尖的區域。 2 is a flow chart of an interactive method of operation in accordance with an embodiment of the present invention. Referring to FIG. 1 and FIG. 2 simultaneously, in step S205, the image sequence is captured by the image capturing unit 110. For example, the image capturing unit 110 captures an image every other sampling frequency. Next, in step S210, the processing unit 120 performs image pre-processing on the image in the image sequence. The above image pre-processing includes a background removal program, a face posture estimation program, a hand detection program, and the like. After obtaining the hand region, in step S215, the processing unit 120 obtains a fingertip candidate region from the image. This means finding the area that may be the fingertip in the hand area.

而後，在步驟S220中，處理單元120判斷指尖候選區域是否與手部區域相連。就一般人體結構而言可以知道，指尖的其中一側會與其他手部相連。因此，在此藉由判斷指尖候選區域是否與手部區域相連來找出正確的指尖區域。若指尖候選區域與手部區域相連，在步驟S220中，處理單元120便以此指尖候選區域作為目標指尖區域。若指尖候選區域並未與手部區域相連，表示目前所接收的影像中並未有指尖的出現，返回執行步驟S205，繼而繼續自取像單元110接收下一張影像。 Then, in step S220, the processing unit 120 determines that the fingertip candidate area is No connection to the hand area. As far as the general human body structure is concerned, one side of the fingertip is connected to other hands. Therefore, the correct fingertip area is found here by judging whether the fingertip candidate area is connected to the hand area. If the fingertip candidate area is connected to the hand area, in step S220, the processing unit 120 uses the fingertip candidate area as the target fingertip area. If the fingertip candidate area is not connected to the hand area, it indicates that there is no fingertip in the currently received image, and the process returns to step S205, and then continues to receive the next image from the image capturing unit 110.

在獲得目標指尖區域之後，在步驟S230中，處理單元120會藉由持續追蹤影像序列中的目標指尖區域，判斷是否發生點選事件。也就是偵測使用者是否在三維空間中執行一特定手勢來準備驅動電子裝置100中的功能。當偵測到特定手勢，即，辨識到點選事件發生時，在步驟S235中，處理單元120執行對應的功能。另一方面，若未偵測到點選事件的發生，則繼續執行步驟S230。舉例來說，在顯示單元130中顯示一畫面，使用者對著畫面中的一功能項目而在三維空間對應的位置處執行點選的手勢時，處理單元120透過分析取像單元110所取得的影像而得知在此處發生點選事件，進而處理單元120便可去執行此功能項目。 After obtaining the target fingertip region, in step S230, the processing unit 120 determines whether a click event occurs by continuously tracking the target fingertip region in the image sequence. That is, it is detected whether the user performs a specific gesture in the three-dimensional space to prepare to drive the functions in the electronic device 100. When a specific gesture is detected, that is, when a click event is recognized, the processing unit 120 performs a corresponding function in step S235. On the other hand, if the occurrence of the click event is not detected, the process proceeds to step S230. For example, when the display unit 130 displays a screen and the user performs a click gesture at a position corresponding to the three-dimensional space with respect to a function item in the screen, the processing unit 120 obtains the image acquisition unit 110 by analyzing the image capturing unit 110. The image knows that a click event occurs here, and the processing unit 120 can execute the function item.

底下再舉一例來詳細說明上述方法的實現過程。然，底下實施例僅為其中一種舉例說明，並不以此為限。 An example will be given below to explain the implementation process of the above method. However, the following embodiments are merely illustrative of one of them, and are not limited thereto.

圖3是依照本發明一實施例的影像處理模組的示意圖。在本實施例中，影像處理模組300是由程式碼片段所組成的電腦軟體，其儲存於儲存單元140中，以供處理單元120執行之。然，在其他實施例中，影像處理模組300亦可以為由各種晶片所組成的硬體，且耦接至處理單元120，而由處理單元120驅動執行之，在此並不限制影像處理模組300的實現方式。影像處理模組300包括背景去除模組310、人臉姿態估測(face pose estimation)模組320、手部偵測模組330、指尖偵測模組340、指尖追蹤模組350以及點選事件辨識模組360。 FIG. 3 is a schematic diagram of an image processing module according to an embodiment of the invention. In this embodiment, the image processing module 300 is a computer software composed of code segments stored in the storage unit 140 for execution by the processing unit 120. In other embodiments, the image processing module 300 may also be a hardware composed of various types of chips, and coupled to the processing unit 120, and driven by the processing unit 120, and does not limit the image processing mode. The implementation of group 300. The image processing module 300 includes a background removal module 310, a face pose estimation module 320, a hand detection module 330, a fingertip detection module 340, a fingertip tracking module 350, and a point. Event identification module 360 is selected.

背景去除模組310執行背景去除程序，藉以將背景去除，而保留最有可能存在使用者的區域。在此，可利用背景相減(background subtraction)法來獲得人體存在的區域。例如，事先建立一張不存在使用者的背景影像於電子裝置100中，之後，將取像單元110所擷取到的影像與背景影像執行相減的動作，如此便可獲得兩張影像之間具有差異性的區域。 The background removal module 310 performs a background removal process to remove the background. Keep the area most likely to exist for the user. Here, a background subtraction method can be used to obtain a region in which a human body exists. For example, a background image in which the user does not exist is created in the electronic device 100 in advance, and then the image captured by the image capturing unit 110 is subtracted from the background image, so that two images can be obtained. Areas with differences.

人臉姿態估測模組320執行人臉姿態估測程序，其是用以對背景去除模組310所輸出的影像進行臉部偵測程序以及臉部追蹤等處理。在獲得去除背景的影像之後，利用人臉姿態估測模組320來判斷影像中是否有存在人臉。例如，利用基於Haar-like特徵的自適應增強(Adaptive Boosting，AdaBoost)學習演算法(learning algorithm)來辨識影像中的人臉而獲得臉部區域。 The face pose estimation module 320 performs a face pose estimation program, which is used to The image output by the background removal module 310 is subjected to a face detection process and a face tracking process. After obtaining the image with the background removed, the face pose estimation module 320 is used to determine whether there is a face in the image. For example, a Haar-like feature based Adaptive Boosting (AdaBoost) learning algorithm is used to identify a face in an image to obtain a face region.

在偵測到臉部區域之後，人臉姿態估測模組320還會進一步使用連續自適性均值平移(Continuously Adaptive Mean-Shift，Camshift)演算法來持續地追蹤人臉位置。Camshift演算法會依據移動物件(例如臉部區域)的顏色來進行追蹤。因此，不管使用者如何移動、轉動其頭部，透過Camshift演算法仍然可以得到使用者的人臉位置。 After detecting the face region, the face pose estimation module 320 further uses the Continuously Adaptive Mean-Shift (Camshift) algorithm to continuously track the face position. The Camshift algorithm tracks the color of a moving object, such as a face area. Therefore, regardless of how the user moves and turns his head, the face position of the user can still be obtained through the Camshift algorithm.

手部偵測模組330執行手部偵測程序，其是利用膚色偵測(skin color detection)演算法，在影像中偵測手部區域。而偵測手部區域的動作例如可以進一步劃分為三部分，即，身體遮罩、膚色偵測以及影像增強。為了找出人臉和人體，在偵測到臉部區域之後，手部偵測模組330利用一個身體遮罩來覆蓋住影像中的臉部區域和人體區域。由於身體的深度值與人臉的深度值大致相同，因此上述身體遮罩的尺寸及覆蓋區域可根據所偵測到之臉部區域的深度值來自動變更。 The hand detection module 330 performs a hand detection process that detects a hand region in the image using a skin color detection algorithm. The action of detecting the hand area can be further divided into three parts, that is, body mask, skin color detection, and image enhancement. In order to find the face and the human body, after detecting the face area, the hand detection module 330 uses a body mask to cover the face area in the image and Human body area. Since the depth value of the body is substantially the same as the depth value of the face, the size and coverage area of the body mask can be automatically changed according to the depth value of the detected face region.

在獲得人體區域之後，手部偵測模組330進一步執行膚色偵測演算法。舉例來說，首先，利用下述方程式(1)，將影像由RGB色彩空間轉換為YCbCr色彩空間。在YCbCr色彩空間，Y代表影像亮度(luminance)，Cb與Cr分別代表影像的色度(chrominance)。 After obtaining the body area, the hand detection module 330 further performs a skin color detection algorithm. For example, first, the image is converted from the RGB color space to the YCbCr color space using Equation (1) below. In the YCbCr color space, Y represents image luminance, and Cb and Cr represent chrominance of the image, respectively.

在轉換色彩空間之後，利用底下方程式(2)將像素分為膚色或非膚色。即，像素的Cr值位於133~173之間，且Cb值位於77~127之間，即判定此像素為膚色像素。而不符合上述條件的像素一律視為非膚色像素。 After converting the color space, use the bottom program (2) to divide the pixels into skin color or non-skin. That is, the pixel has a Cr value between 133 and 173, and the Cb value is between 77 and 127, that is, the pixel is determined to be a skin color pixel. Pixels that do not meet the above conditions are treated as non-skinning pixels.

在執行膚色偵測演算法之後，還可進一步執行影像增強演算法以移除雜訊。例如，利用形態學(morphology)中的閉合運算(closing operation)或斷開運算(opening operation)來移除雜訊。之後，再使用高斯模糊濾波器(Gaussian Blur filter)來移除剩餘雜訊並使形狀平滑化。 After performing the skin tone detection algorithm, an image enhancement algorithm can be further performed to remove the noise. For example, a closing operation or an opening operation in a morphology is used to remove noise. Then, a Gaussian Blur filter is used to remove the remaining noise and smooth the shape.

舉例來說，圖4是依照本發明一實施例的具有手部區域之影像的示意圖。請參照圖4，影像400在經由上述背景去除模組310、人臉姿態估測模組320及手部偵測模組330的運算之後，便可獲得手部區域401。 For example, Figure 4 is a hand region in accordance with an embodiment of the present invention. A schematic representation of the image. Referring to FIG. 4 , after the image 400 is processed by the background removal module 310 , the face posture estimation module 320 , and the hand detection module 330 , the hand region 401 can be obtained.

指尖偵測模組340執行指尖偵測程序，以在手部區域中找出正確的指尖區域。具體而言，在獲得手部區域之後，利用所定義的指尖屬性(property)來偵測手部區域中的指尖，並且濾除非指尖區域。在此，可針對真正的手指來定義其所具備之特性，而事先定義出指尖屬性。指尖屬性包括：手部與指尖之間的關係像是樹狀一般具有分枝(branch)；以及指尖的其中一側會與手部相連。 The fingertip detection module 340 performs a fingertip detection procedure to find in the hand area Get the correct fingertip area. Specifically, after the hand region is obtained, the fingertips in the hand region are detected using the defined fingertip property, and the fingertip region is filtered out. Here, the characteristics of the finger can be defined for the real finger, and the fingertip property is defined in advance. Fingertip attributes include: the relationship between the hand and the fingertip is like a tree with branches; and one side of the fingertip is connected to the hand.

指尖偵測模組340可利用形態學的斷開運算(opening operation)先進行侵蝕(erosion)再進行膨脹(dilation)。也就是說，先對影像執行侵蝕來縮小區域，再對影像做膨脹來擴張區域；或者，重複侵蝕直到消除掉所有不想要的點或線等雜訊，再用膨脹來恢復原圖形。經過這樣的程序之後就可以把雜訊點移除。在此，為了略過手掌部分，以6×6的十字(cross-shaped)結構元素(structuring element)來進行斷開運算。在執行斷開運算之後會獲得一第一影像，將原始影像減去此第一影像之後便可獲得候選指尖區域。 Fingertip detection module 340 can utilize morphological disconnection operations (opening Operation) first erosion and then dilation. That is to say, the image is first eroded to narrow the area, and then the image is expanded to expand the area; or, the erosion is repeated until all unwanted points or lines of noise are removed, and then the original image is restored by expansion. After such a procedure, the noise points can be removed. Here, in order to skip the palm portion, a disconnection operation is performed with a 6×6 cross-shaped structuring element. After the disconnection operation is performed, a first image is obtained, and the original image is subtracted from the original image to obtain a candidate fingertip region.

之後，指尖偵測模組340還可進一步以3×3的方形(square) 結構元素來執行斷開運算，以使得候選指尖區域的形狀更為平滑化，且可移除雜訊。而為了後續的運算，利用一邊界框(bounding box)來表示候選指尖區域。舉例來說，圖5是依照本發明一實施例的手部區域的示意圖。請參照圖5，以圖4的手部區域401而言，在偵測到可能為手指的輪廓之後，以邊界框來框住所偵測到之可能為手指的輪廓，而獲得候選指尖區域501。 Thereafter, the fingertip detection module 340 can further further have a square of 3×3. The structural elements are used to perform a break operation to make the shape of the candidate fingertip region smoother and to remove noise. For subsequent operations, a bounding box is used to represent the candidate fingertip regions. For example, Figure 5 is a schematic illustration of a hand region in accordance with an embodiment of the present invention. Please refer to FIG. 5, in the case of the hand area 401 of FIG. After the contour of the finger may be reached, the detected contour of the finger may be framed by a bounding box to obtain the candidate fingertip region 501.

而即便上述運算即可獲得正確的指尖區域，然，為了避免仍存在有非指尖區域，還可進一步利用影像的深度資訊來濾除掉非指尖區域。針對影像來產生一深度圖(depth map)，而深度圖的目的是要確定的像素是否屬於前景。例如，在深度圖中，屬於背景的像素的深度值為0，被手部偵測模組330所建立的身體遮罩所覆蓋的像素的深度值為0，且未被手部偵側模組330所建立之身體遮罩所覆蓋的像素的深度值為大於0的數值。也就是說，在深度圖中，屬於背景物體的像素的深度值會歸零，而屬於前景物體的像素則具有大於0的深度值。 Even if the above operation can obtain the correct fingertip area, in order to avoid the presence of non-fingertip areas, the depth information of the image can be further utilized to filter out the non-fingertip area. A depth map is generated for the image, and the purpose of the depth map is to determine whether the pixel belongs to the foreground. For example, in the depth map, the depth of the pixel belonging to the background is 0, and the depth of the pixel covered by the body mask established by the hand detection module 330 is 0, and the hand detection module is not The depth of the pixel covered by the body mask established by 330 is a value greater than zero. That is, in the depth map, the depth values of the pixels belonging to the background object are zeroed, and the pixels belonging to the foreground object have depth values greater than zero.

在獲得深度資訊之後，指尖偵測模組340便可利用深度值來排除非指尖區域。底下搭配圖5來進行說明。首先，獲得指尖候選區域501的中心點作為參考點R，然後自參考點R的四個方向(上、下、左、右)分別獲得第一側點P1、第二側點P2、第三側點P3與第四側點P4。在此，第一側點P1、第二側點P2、第三側點P3與第四側點P4分別位於指尖候選區域501的外側。 After obtaining the depth information, the fingertip detection module 340 can use the depth value to exclude the non-fingertip region. The description will be made below with reference to FIG. 5. First, the center point of the fingertip candidate area 501 is obtained as the reference point R, and then the first side point P1, the second side point P2, and the third are respectively obtained from the four directions (up, down, left, and right) of the reference point R. Side point P3 and fourth side point P4. Here, the first side point P1, the second side point P2, the third side point P3, and the fourth side point P4 are located outside the fingertip candidate area 501, respectively.

假設H和W分別代表指尖候選區域501的高度和寬度。而第一側點P1與第二側點P2是由參考點R開始分別往上及往下取0.75×H所獲得的點。另外，第三側點P3與第四側點P4則是由參考點R開始分別往左及往右取0.9×W所獲得的點。上述0.75×H與0.9×W僅為舉例說明，在其他實施例中只要第一側點P1、第二側點 P2、第三側點P3與第四側點P4分別位於指尖候選區域501的外側即可。 It is assumed that H and W represent the height and width of the fingertip candidate region 501, respectively. The first side point P1 and the second side point point P2 are points obtained by taking the reference point R upward and downward and taking 0.75×H, respectively. Further, the third side point P3 and the fourth side point point P4 are points obtained by taking 0.9×W from the reference point R to the left and right, respectively. The above 0.75×H and 0.9×W are merely illustrative. In other embodiments, only the first side point P1 and the second side point are provided. P2, the third side point P3, and the fourth side point P4 may be located outside the fingertip candidate area 501, respectively.

在獲得四個側點之後，便自深度資訊(上述深度圖)中分別獲得第一側點P1、第二側點P2、第三側點P3與第四側點P4的第一深度值、第二深度值、第三深度值及第四深度值。接著，判斷第一深度值、第二深度值、第三深度值及第四深度值是否大於0。若第一深度值、第二深度值、第三深度值及第四深度值僅其中之一大於0，即判定指尖候選區域501與手部區域相連。進一步地說，經過處理過後的深度圖中，作為背景物體的像素，其深度值會歸零，而僅剩作為前景物體的像素具有大於0的深度數值。由於第一側點P1、第二側點P2、第三側點P3與第四側點P4已經過處理，倘若於第一、二、三、四側點中僅有一點屬於前景物體，即，第一至第四深度值中僅其中一個深度值大於0，則判定指尖候選區域501與手部區域相連。而在其他情況下，則判定指尖候選區域501未與手部區域相連。 After obtaining the four side points, the first depth value of the first side point P1, the second side point P2, the third side point P3, and the fourth side point P4 are obtained from the depth information (the above depth map), respectively. Two depth values, a third depth value, and a fourth depth value. Next, it is determined whether the first depth value, the second depth value, the third depth value, and the fourth depth value are greater than zero. If only one of the first depth value, the second depth value, the third depth value, and the fourth depth value is greater than 0, it is determined that the fingertip candidate region 501 is connected to the hand region. Further, in the processed depth map, the depth value of the pixel as the background object is zeroed, and only the pixel as the foreground object has a depth value greater than zero. Since the first side point P1, the second side point P2, the third side point P3, and the fourth side point P4 have been processed, if only one of the first, second, third, and fourth side points belongs to the foreground object, that is, If only one of the first to fourth depth values is greater than 0, it is determined that the fingertip candidate region 501 is connected to the hand region. In other cases, it is determined that the fingertip candidate area 501 is not connected to the hand area.

另外，由於使用者的手指不可能皆為直立，在其他情況下亦可為歪斜。故，在以中心點獲得四個側點之後，還可進一步順時針旋轉或逆時針旋轉此四個側點，之後再依據旋轉後的這四個側點的深度值來判斷指尖候選區域是否與手部區域相連。舉例來說，圖6A及圖6B是依照本發明一實施例的判斷目標指尖區域方式的示意圖。圖6A為順時針旋轉，圖6B為逆時針旋轉。 In addition, since the user's fingers may not be erect, in other cases they may be skewed. Therefore, after obtaining four side points at the center point, the four side points may be further rotated clockwise or counterclockwise, and then the depth of the four side points after the rotation is used to determine whether the fingertip candidate area is Connected to the hand area. For example, FIG. 6A and FIG. 6B are schematic diagrams showing the manner of determining a target fingertip region according to an embodiment of the invention. Figure 6A is a clockwise rotation and Figure 6B is a counterclockwise rotation.

在圖6A中，找出指尖候選區域60的中心點作為參考點R1，自參考點R1開始往上、下、左、右四個方向，在指尖候選區域 60的外側分別獲得第一側點601、第二側點602、第三側點603與第四側點604。接著，將第一側點601、第二側點602、第三側點603與第四側點604順時針選轉一角度θ 1後，重新獲得第一側點611、第二側點612、第三側點613與第四側點614。之後，倘若在第一側點611、第二側點612、第三側點613與第四側點614中只有一個側點的深度值大於0，即判定指尖候選區域60與手部區域相連。 In FIG. 6A, the center point of the fingertip candidate region 60 is found as the reference point R1, starting from the reference point R1, going up, down, left, and right, in the fingertip candidate area. The first side point 601, the second side point 602, the third side point 603, and the fourth side point 604 are respectively obtained on the outer side of the 60. Then, after the first side point 601, the second side point 602, the third side point 603, and the fourth side point 604 are clockwise rotated by an angle θ 1 , the first side point 611 and the second side point 612 are regained. The third side point 613 and the fourth side point 614. Thereafter, if the depth value of only one of the first side point 611, the second side point 612, the third side point 613, and the fourth side point 614 is greater than 0, it is determined that the fingertip candidate area 60 is connected to the hand area. .

同樣地，在圖6B中，找出指尖候選區域61的中心點作為參考點R2，自參考點R2開始往上、下、左、右四個方向，在指尖候選區域61的外側分別獲得第一側點621、第二側點622、第三側點623與第四側點624。接著，將第一側點621、第二側點622、第三側點623與第四側點624逆時針選轉一角度θ 2後，重新獲得第一側點631、第二側點632、第三側點633與第四側點634。之後，倘若在第一側點631、第二側點632、第三側點633與第四側點634中只有一個側點的深度值大於0，即判定指尖候選區域61與手部區域相連。 Similarly, in FIG. 6B, the center point of the fingertip candidate region 61 is found as the reference point R2, starting from the reference point R2 in the four directions of up, down, left, and right, respectively, on the outer side of the fingertip candidate region 61. The first side point 621, the second side point 622, the third side point 623, and the fourth side point 624. Then, after the first side point 621, the second side point 622, the third side point 623, and the fourth side point 624 are rotated counterclockwise by an angle θ 2 , the first side point 631 and the second side point 632 are regained. The third side point 633 and the fourth side point 634. Thereafter, if the depth value of only one of the first side point 631, the second side point 632, the third side point 633, and the fourth side point 634 is greater than 0, it is determined that the fingertip candidate area 61 is connected to the hand area. .

指尖追蹤模組350執行指尖追蹤程序，以追蹤使用者的手指。例如，首先指尖追蹤模組350利用角偵測(corner detection)在目標指尖區域中有效地找到多個良好的特徵點。而為了準確地分析使用者指尖的移動，取上述多個特徵點的質心來作為追蹤點。上述角偵測是一種在電腦視覺(computer vision)系統中用來擷取特徵並推斷影像內容的方法。角偵測一搬是用在移動偵測(motion detection)、影像追蹤(image registration)、視訊追蹤(video tracking)、物件識別(object recognition)等。 The fingertip tracking module 350 performs a fingertip tracking program to track the user's fingers. For example, first, the fingertip tracking module 350 effectively finds a plurality of good feature points in the target fingertip region using corner detection. In order to accurately analyze the movement of the user's fingertip, the centroid of the plurality of feature points is taken as a tracking point. The above angle detection is a method for capturing features and inferring image content in a computer vision system. Angle detection is used for motion detection, image registration, video tracking (video) Tracking), object recognition, and the like.

接著，首先指尖追蹤模組350執行一種連續影像的動態追蹤演算法，例如光流法(optical flow)。在此，採用Lucas-Kanade追蹤法來估計光流的變化，並使用影像金字塔(image pyramid)概念來擴展Lucas-Kanade追蹤法。影像金字塔(image pyramid)概念可以分析更快速的移動並獲得更準確的偏移(offset)。 Next, the fingertip tracking module 350 first performs a dynamic tracking algorithm of a continuous image, such as an optical flow. Here, the Lucas-Kanade tracking method is used to estimate the change of optical flow, and the image pyramid concept is used to extend the Lucas-Kanade tracking method. The image pyramid concept can analyze faster movements and achieve more accurate offsets.

點選事件辨識模組360執行點選事件辨識程序，藉此判斷使用者是否觸發了特定功能。一般而言，使用者若要在三維空間中以手指按下顯示單元130中的功能項目，大致可分為底下兩種行為。第一種行為，使用者的手指在上、下、左、右等方向並不會有大幅度的移動，而是往前方移動。第二種行為，指尖的頂端像素的深度值與其底端像素的深度值之間的差值會大於一個門檻值。 The click event recognition module 360 executes a click event recognition program to determine whether the user has triggered a specific function. In general, if a user wants to press a function item in the display unit 130 with a finger in a three-dimensional space, the user can roughly divide into two behaviors. In the first behavior, the user's fingers do not move in the direction of up, down, left, or right, but move forward. In the second behavior, the difference between the depth value of the top pixel of the fingertip and the depth value of the bottom pixel is greater than a threshold.

具體而言，點選事件辨識模組360自目前所接收的影像的目標指尖區域獲得第一追蹤點。接著，判斷第一追蹤點對應於顯示單元130中的顯示位置是否位於功能項目的位置。若第一追蹤點所對應的顯示位置位於功能項目的位置，則點選事件辨識模組360自前次所接收的影像的目標指尖區域取出第二追蹤點，接著，比對第一追蹤點與的第二追蹤點，以判斷是否發生點選事件。例如，比對第一追蹤點與該第二追蹤點之間的垂直軸位移量或水平軸位移量，並且依據深度資訊，比對第一追蹤點與第二追蹤點之間的深度變化量。 Specifically, the click event recognition module 360 obtains the first tracking point from the target fingertip area of the currently received image. Next, it is determined whether the first tracking point corresponds to whether the display position in the display unit 130 is located at the position of the function item. If the display position corresponding to the first tracking point is located at the location of the function item, the click event recognition module 360 takes out the second tracking point from the target fingertip area of the previously received image, and then compares the first tracking point with The second tracking point to determine if a click event occurred. For example, the vertical axis displacement amount or the horizontal axis displacement amount between the first tracking point and the second tracking point is compared, and the depth change amount between the first tracking point and the second tracking point is compared according to the depth information.

倘若前後兩張影像各自的追蹤點間的垂直軸位移量小於第一預設值與水平軸位移量皆小於第二預設值、並且前後兩張影像各自的追蹤點間的深度變化量小於第三預設值，則點選事件辨識模組360判定發生點選事件。另一方面，倘若垂直軸位移量大於或等於第一預設值、水平軸位移量大於或等於第二預設值、及深度變化量大於或等於第三預設值其中至少一條件成立，則點選事件辨識模組360判定未發生點選事件。例如，藉由底下方程式(3)來判斷是否發生點選事件。 If the vertical axis shift between the tracking points of the two images is smaller than the first If the preset value and the horizontal axis displacement are both smaller than the second preset value, and the depth change between the tracking points of the two images is smaller than the third preset value, the click event recognition module 360 determines that the click is selected. event. On the other hand, if the vertical axis displacement is greater than or equal to the first preset value, the horizontal axis displacement is greater than or equal to the second preset value, and the depth variation is greater than or equal to the third preset value, at least one of the conditions is true, then The click event identification module 360 determines that no click event has occurred. For example, it is determined by the bottom program (3) whether or not a click event occurs.

其中，(X_old,Y_old)為前次接收影像的第二追蹤點的座標，(X_new,Y_new)為目前接收影像的第一追蹤點的座標，|X_old-X_new|為水平軸位移量，而|Y_old-Y_new|為垂直軸位移量。另外，d_old為前次接收影像的第二追蹤點的深度值，d_new為目前接收影像的第一追蹤點的深度值，而|d_old-d_new|為深度變化量。而上述第一預設值、第二預設值及第三預設值在此分別舉例為10個像素、10個像素與0.5公分。 Where (X_old, Y_old) is the coordinate of the second tracking point of the previous received image, (X_new, Y_new) is the coordinate of the first tracking point of the currently received image, and |X_old-X_new| is the horizontal axis displacement, and | Y_old-Y_new| is the vertical axis displacement. In addition, d_old is the depth value of the second tracking point of the previous received image, d_new is the depth value of the first tracking point of the currently received image, and |d_old-d_new| is the depth variation amount. The first preset value, the second preset value, and the third preset value are respectively exemplified by 10 pixels, 10 pixels, and 0.5 cm.

另外，判斷是否發生點選事件的步驟中還可進一步依據目前影像的兩個計算點來判斷是否發生點選事件。舉例來說，圖7是依照本發明一實施例的用以判斷點擊事件的示意圖。圖7所示為目前接收的影像中的手部區域，在獲得目標指尖區域710後，由指尖追蹤模組350來獲得第一追蹤點702。之後，在第一追蹤點702的上方及下方分別取出兩個計算點，即，第一計算點701與第二計算點703。在此，第一追蹤點702位於第一計算點701與第二計算點703之間。接著，點選事件辨識模組360依據深度資訊，計算第一計算點701與第二計算點703的深度差值。若深度差值大於或等於第四預設值，則判定發生點選事件。若深度差值小於第四預設值，則判定未發生點選事件。例如，藉由底下方程式(4)來判斷是否發生點選事件。 In addition, the step of determining whether a click event occurs may further determine whether a click event occurs according to two calculation points of the current image. For example, FIG. 7 is a schematic diagram for determining a click event according to an embodiment of the invention. FIG. 7 shows the hand region in the currently received image. After the target fingertip region 710 is obtained, the first tracking point 702 is obtained by the fingertip tracking module 350. After that, at the first tracking point 702 Two calculation points, namely, a first calculation point 701 and a second calculation point 703, are taken out respectively. Here, the first tracking point 702 is located between the first calculation point 701 and the second calculation point 703. Then, the point event identification module 360 calculates the depth difference between the first calculation point 701 and the second calculation point 703 according to the depth information. If the depth difference is greater than or equal to the fourth preset value, it is determined that a click event occurs. If the depth difference is less than the fourth preset value, it is determined that no click event has occurred. For example, it is determined by the bottom program (4) whether or not a click event occurs.

其中，d_down表示位於第一追蹤點702的下方的第二計算點703的深度值，而d_up表示位於第一追蹤點702的上方的第一計算點701的深度值。而上述第四預設值在此舉例為1.2公分。 Wherein, d_down represents a depth value of the second calculation point 703 located below the first tracking point 702, and d_up represents a depth value of the first calculation point 701 located above the first tracking point 702. The fourth preset value is exemplified here as 1.2 cm.

另外，值得一提的是，亦可結合上述方程式(3)、(4)來判斷是否發生點選事件。例如，在前後兩張影像各自的追蹤點之間的垂直軸位移量小於第一預設值、水平軸位移量小於第二預設值、且深度變化量小於第三預設值的情況下，若目前影像中的第一計算點701與第二計算點703之間的深度差值大於或等於第四預設值，才判定發生點選事件。也就是說，在垂直軸位移量(|Y_old-Y_new|)小於10個像素、水平軸位移量(|X_old-X_new|)小於10個像素、且深度變化量(|d_old-d_new|)小於0.5公分的情況下，若深度差值(d_down-d_up)大於或等於1.2公分，方可判定發生點選事件。 In addition, it is worth mentioning that the above equations (3) and (4) can also be combined to determine whether a click event occurs. For example, if the vertical axis displacement between the tracking points of the two images before and after is smaller than the first preset value, the horizontal axis displacement is less than the second preset value, and the depth change amount is less than the third preset value, If the depth difference between the first calculation point 701 and the second calculation point 703 in the current image is greater than or equal to the fourth preset value, it is determined that the click event occurs. That is, the vertical axis displacement (|Y_old-Y_new|) is less than 10 pixels, the horizontal axis displacement (|X_old-X_new|) is less than 10 pixels, and the depth variation (|d_old-d_new|) is less than 0.5. In the case of centimeters, if the depth difference (d_down-d_up) is greater than or equal to 1.2 cm, the click event can be determined.

點選事件辨識模組360在判定發生點選事件後，處理單元 120便會去執行對應的功能。 The click event identification module 360 determines the processing unit after determining the occurrence of the click event. 120 will go to perform the corresponding function.

上述實施方式不僅可用於一個使用者，當取像單元110的取像範圍內存在有多數人的情況下，在執行適當的處理之後，便可針對單一使用者來進行處理。具體而言，在影像中存在有多數人的情況下，為了區別出不重要(uninterested)的物件而獲得場景中可能的人體物件，背景去除模組310可進一步偵測影像中的多個待處理物件(例如多個人像區域)，並且依據影像的深度資訊，自這些待處理物件中濾除掉不重要物件。在此，可設定一深度門檻值來濾除不重要物件(例如距離過遠的使用者)。舉例來說，假設深度門檻值設為150公分，且假設影像中包括三個待測物件A、B、C，其對應的深度值分別為160公分、110公分以及140公分。據此，背景去除模組310便會將深度值大於預設深度值的待測物件A濾除，而保留深度值小於預設深度值的待測物件B、C。 The above embodiment can be applied not only to one user but also to a single user after performing appropriate processing when there is a majority in the image capturing range of the image capturing unit 110. Specifically, in the case that there are a majority of people in the image, in order to distinguish the uninterested objects to obtain possible human objects in the scene, the background removal module 310 can further detect multiple to-be-processed images. An object (such as a plurality of portrait areas), and according to the depth information of the image, the unimportant objects are filtered out from the objects to be processed. Here, a depth threshold can be set to filter out unimportant objects (eg, users who are too far away). For example, suppose the depth threshold is set to 150 cm, and it is assumed that the image includes three objects A, B, and C to be tested, and the corresponding depth values are 160 cm, 110 cm, and 140 cm, respectively. According to this, the background removal module 310 filters the object A to be tested whose depth value is greater than the preset depth value, and retains the objects B and C whose depth value is smaller than the preset depth value.

之後，在人臉姿態估測模組320執行人臉姿態估測程序時，人臉姿態估測模組320在剩餘的待處理物件B、C中執行臉部偵測程序，而獲得多個人臉區域，並且依據影像的深度資訊，自這些人臉區域中取出目標人臉區域。在此，目標人臉區域的深度值為這些人臉區域的深度值中的最小值。即，越靠近取像單元110的使用者，其所獲得之人臉區域的深度值會越小。接著，人臉姿態估測模組320保留深度值為最小值的目標人臉區域所在的待處理物件B，而濾除其他的待處理物件C。也就是說，保留距離取像單元110最近的使用者對應於影像中的待處理物件。在此，取像單元110設置於顯示單元130的附近。 Then, when the face pose estimation module 320 executes the face pose estimation program, the face pose estimation module 320 performs a face detection process on the remaining objects B and C to obtain multiple faces. The area, and the target face area is taken out from these face areas according to the depth information of the image. Here, the depth value of the target face region is the minimum value among the depth values of these face regions. That is, the closer to the user of the image capturing unit 110, the smaller the depth value of the face region obtained. Then, the face pose estimation module 320 retains the object B to be processed in which the target face region whose depth value is the minimum value, and filters out other objects to be processed C. That is to say, the user who is the closest to the image capturing unit 110 corresponds to the object to be processed in the image. Here, the image capturing unit 110 is disposed on the display Near the unit 130.

此外，值得一提的是，上述實施例更可與擴增實境互動介面相結合，藉由手指的偵測使得手指可在三維空間中透過擴增實境互動介面與電腦互動。 In addition, it is worth mentioning that the above embodiment can be combined with the augmented reality interactive interface, and the finger can be detected by the finger to interact with the computer through the augmented reality interactive interface in the three-dimensional space.

舉例來說，圖8A及圖8B是依照本發明的結合擴增實境互動介面的操作方法的示意圖。底下搭配圖1來進行說明。在圖8A中，於顯示單元130中顯示擴增實境互動介面800。在擴增實境互動介面800中會顯示取像單元110目前所接收的影像。例如，取像單元110面向使用者來進行拍攝且設置於顯示單元130所在位置的附近(例如設置於顯示單元130的上方)，當在取像單元110的取像範圍內存在有使用者時，則在圖8A的擴增實境互動介面800所呈現的即時影像中亦會同步顯示此一使用者。據此，使用者可以透過觀看擴增實境互動介面800中對應的人像，來對應地操作擴增實境互動介面800。在此，關於使用者的手指辨識與點選事件的判斷可參照上述圖3所示的影像處理模組300的相關說明，在此省略不提。 For example, Figures 8A and 8B are schematic illustrations of methods of operation in conjunction with an augmented reality interactive interface in accordance with the present invention. The description will be made below with reference to Fig. 1. In FIG. 8A, an augmented reality interactive interface 800 is displayed in display unit 130. The image currently received by the image capturing unit 110 is displayed in the augmented reality interactive interface 800. For example, the image capturing unit 110 is photographed toward the user and disposed in the vicinity of the position where the display unit 130 is located (for example, disposed above the display unit 130). When there is a user in the image capturing range of the image capturing unit 110, Then, the user is simultaneously displayed in the live image presented by the augmented reality interactive interface 800 of FIG. 8A. Accordingly, the user can operate the augmented reality interactive interface 800 correspondingly by viewing the corresponding portrait in the augmented reality interactive interface 800. Here, regarding the determination of the user's finger recognition and the click event, reference may be made to the related description of the image processing module 300 shown in FIG. 3, which is omitted here.

當在影像中獲得目標人臉區域810時，如圖8A所示，擴增實境互動介面800中會顯示第一虛擬層，其中第一虛擬層包括至少一個功能項目。在本實施例中，第一虛擬層包括兩個功能項目820、830。其中，功能項目830是用以開啟第二虛擬層，功能項目820則是用來退出第一虛擬層。 When the target face region 810 is obtained in the image, as shown in FIG. 8A, the first virtual layer is displayed in the augmented reality interaction interface 800, wherein the first virtual layer includes at least one function item. In this embodiment, the first virtual layer includes two functional items 820, 830. The function item 830 is used to open the second virtual layer, and the function item 820 is used to exit the first virtual layer.

當功能項目830被觸發時，如圖8B所示，在擴增實境互動介面800顯示第二虛擬層，其中第二虛擬層包括至少一個虛擬控制介面。在圖8B中，第二虛擬層包括兩個虛擬控制介面840、850以及功能項目860。上述虛擬控制介面840例如為一選單，虛擬控制介面850為一虛擬鍵盤，功能項目860用來退出第二虛擬層，或直接關閉擴增實境互動介面800。而上述虛擬控制介面840、850所顯示的位置僅為舉例說明，並不以此為限。 When the function item 830 is triggered, as shown in FIG. 8B, the second virtual layer is displayed in the augmented reality interaction interface 800, wherein the second virtual layer includes at least one virtual control interface. In FIG. 8B, the second virtual layer includes two virtual control interfaces 840, 850 and a function item 860. The virtual control interface 840 is, for example, a menu, the virtual control interface 850 is a virtual keyboard, and the function item 860 is used to exit the second virtual layer, or directly close the augmented reality interaction interface 800. The positions displayed by the virtual control interfaces 840 and 850 are merely illustrative and not limited thereto.

據此，當使用者透過觀看對應於擴增實境互動介面800的人像，而在三維空間中來操作電子裝置100時，透過影像處理模組300根據使用者手指的運動軌跡，判斷其行為是否滿足所定義的點擊事件的條件(如上述方程式(3)、(4))，藉此可判斷使用者是否有意圖操作虛擬控制介面840或虛擬控制介面850、或者點擊功能項目860。 According to this, when the user operates the electronic device 100 in a three-dimensional space by viewing the portrait corresponding to the augmented reality interaction interface 800, the image processing module 300 determines whether the behavior is based on the motion trajectory of the user's finger. The conditions of the defined click event (such as equations (3), (4) above) are satisfied, thereby determining whether the user intends to operate the virtual control interface 840 or the virtual control interface 850, or click on the function item 860.

例如，使用者將其手指在三維空間進行移動，並透過擴增實境互動介面800而得知在影像中之人像的手指位置已對應至功能選項860的位置，此時使用者停止移動其手指，並在三維空間中執行點選手勢。進而，透過影像處理模組300便能夠判定發生點選事件，據此，圖8B的擴增實境互動介面800會返回至如圖8A所示的第一虛擬層或直接關閉擴增實境互動介面800(視功能項目860所設定的執行功能為準)。使用者亦可以上述相似的方式來操作虛擬控制介面840或虛擬控制介面850。 For example, the user moves his finger in a three-dimensional space and learns through the augmented reality interactive interface 800 that the finger position of the portrait in the image has corresponded to the position of the function option 860, at which time the user stops moving his finger. And perform a click gesture in 3D space. Furthermore, the image processing module 300 can determine that a click event occurs, and accordingly, the augmented reality interaction interface 800 of FIG. 8B returns to the first virtual layer as shown in FIG. 8A or directly closes the augmented reality interaction. Interface 800 (subject to the execution function set by function item 860). The user can also operate the virtual control interface 840 or the virtual control interface 850 in a similar manner as described above.

綜上所述，上述實施例提出一種互動式方法，藉由分析取像單元所獲得的影像來得知使用者的手指在三維空間中所傳遞的指令，而讓使用者與電子裝置產生互動。據此，使用者無需配戴或穿著任何輔助裝備，例如手上不標記任何顏色記號(color marker)亦不穿戴任何感應手套(data glove)，即可利用手指來與電子裝置進行互動。此外，透過上述實施方式，亦無需事先設定體感設備的位置和限制使用者所處的環境，而可在自然環境下即時運作。並且，還可進一步結合擴增實境互動介面，藉此更方便讓使用者與電子裝置進行互動操作。 In summary, the above embodiment provides an interactive method for understanding the instructions transmitted by the user's finger in the three-dimensional space by analyzing the image obtained by the image capturing unit, and allowing the user to interact with the electronic device. According to this, the user does not need to wear or wear Any auxiliary equipment, such as a hand without marking any color markers or wearing any data glove, can use your fingers to interact with the electronic device. In addition, through the above embodiments, it is also possible to operate in a natural environment without setting the position of the somatosensory device and limiting the environment in which the user is located. Moreover, the augmented reality interactive interface can be further combined, thereby facilitating the user to interact with the electronic device.

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明的精神和範圍內，當可作些許的更動與潤飾，故本發明的保護範圍當視後附的申請專利範圍所界定者為準。 Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention, and any one of ordinary skill in the art can make some changes and refinements without departing from the spirit and scope of the present invention. The scope of the invention is defined by the scope of the appended claims.

Claims

An interactive operation method of an electronic device includes: capturing an image sequence through an image capturing unit; performing an image pre-processing on an image in the image sequence; obtaining a fingertip candidate region from the image; determining the fingertip Whether the candidate region is connected to the one-hand region; if the fingertip candidate region is connected to the hand region, the fingertip candidate region is used as a target fingertip region; and by continuously tracking the target fingertip region, determining whether a point selection occurs An event; and when the click event occurs, performing a function corresponding to the electronic device.

The method of claim 1, wherein the determining whether the fingertip candidate region is connected to the hand region comprises: obtaining a center point of the fingertip candidate region as a reference point; from the reference point The first direction, the second side point, the third side point, and the fourth side point are respectively obtained in the four directions, wherein the first side point, the second side point, the third side point and the The fourth side points are respectively located outside the fingertip candidate area; the first side point, the second side point, the third side point, and the first side point are obtained from a depth information of the image. a depth value, a second depth value, a third depth value, and a fourth depth value; and determining whether the first depth value, the second depth value, the third depth value, and the fourth depth value are greater than 0; In a case where only one of the first depth value, the second depth value, the third depth value, and the fourth depth value is greater than 0, determining that the fingertip candidate region is connected to the hand region; and In this case, it is determined that the fingertip candidate area is not connected to the hand area.

The method of claim 1, further comprising: obtaining a first tracking point from the target fingertip region of the image currently received; determining that the first tracking point corresponds to a display position in a display unit Whether it is located at a location of a function item; and if the display position corresponding to the first tracking point is located at the location of the function item, comprising: taking a second tracking point from the target fingertip area of the image received last time; And comparing the first tracking point with the second tracking point to determine whether the clicking event occurs.

The method of claim 3, wherein the step of comparing the first tracking point with the second tracking point to determine whether the click event occurs comprises: comparing the first tracking point with the second a vertical axis displacement between the tracking points; a horizontal axis displacement between the first tracking point and the second tracking point; comparing the first tracking point with the second tracking according to a depth information a depth variation between points; If the vertical axis displacement is less than a first preset value, the horizontal axis displacement is less than a second preset value, and the depth change is less than a third preset value, determining that the click event occurs; and if The vertical axis displacement is greater than or equal to the first preset value, the horizontal axis displacement is greater than or equal to the second preset value, and the depth variation is greater than or equal to the third preset value, wherein at least one condition is met. Then it is determined that the click event does not occur.

The method of claim 4, wherein the step of determining whether the click event occurs, further comprising: extracting, based on the first tracking point, a target in the target fingertip region of the currently received image a calculation point and a second calculation point, wherein the first tracking point is located between the first calculation point and the second calculation point; and calculating, according to the depth information, one of the first calculation point and the second calculation point a depth difference; if the vertical axis displacement is less than the first preset value, the horizontal axis displacement is less than the second preset value, and the depth change is less than the third preset value, if the depth is If the difference is greater than or equal to a fourth preset value, it is determined that the click event occurs; if the depth difference is less than the fourth preset value, it is determined that the click event does not occur.

The method of claim 1, further comprising displaying an augmented reality interactive interface in a display unit; displaying the received image in the augmented reality interactive interface; obtaining a image in the image a first virtual layer is displayed in the augmented reality interaction interface, wherein the first virtual layer includes a function item; When the functional item is triggered, a second virtual layer is displayed in the augmented reality interaction interface, wherein the second virtual layer includes a virtual control interface.

The method of claim 1, wherein the performing the image pre-processing comprises: performing a background removal process, comprising: detecting a plurality of objects to be processed in the image; and determining a depth information according to the image Extracting one or more unimportant objects from the objects to be processed, wherein a depth value of the or the unimportant objects is greater than a predetermined depth value.

The method of claim 7, wherein after the step of performing the background removal procedure, the method further comprises: performing a face pose estimation procedure, comprising: performing a face detection on the remaining objects to be processed Measure the program to obtain a plurality of face regions; and according to the depth information of the image, extract a target face region from the face regions, wherein the depth value of the target face region is the depth of the face regions The minimum value of the value; and retaining one of the objects to be processed in the target face region where the depth value is a small value, and filtering out other objects to be processed.

The method of claim 8, wherein after the step of performing the face pose estimation procedure, the method further comprises: performing a hand detection procedure, including: The hand region is obtained using a skin tone detection algorithm.

The method of claim 9, wherein after performing the step of the hand detection procedure, the method further comprises: performing a fingertip detection procedure to obtain the fingertip candidate area.