JP2013500544A

JP2013500544A - Improved audio / video method and system

Info

Publication number: JP2013500544A
Application number: JP2012521853A
Authority: JP
Inventors: ジョン，ディー．ロード，
Original assignee: ディジマークコーポレイション
Priority date: 2009-07-24
Filing date: 2010-07-23
Publication date: 2013-01-07
Also published as: EP2457181A4; EP2457181A1; CN102473172A; WO2011011737A1; US20110069229A1; US8773589B2; US20150003802A1; US9940969B2; KR20120053006A

Abstract

音声データ及び又は映像データが、修正されたＭＰＥＧファイル又はデータストリームなどの単一のデータオブジェクトを使用することによって、補助センサデータ（例えば、加速度、方位又は傾きに関連する）と構造的かつ永続的に関連付けられる。この形で、様々な提示デバイスが、共伝達されたセンサデータを使用して音声コンテンツ又は映像コンテンツを改変することができる。例えば、加速度計データを映像データと関連付けることによって、一部のユーザは振動安定化バージョンの映像を見ることができ、他のユーザは、そのような動きアーティファクトがそのままにされた映像を見ることができるようになる。同様に、焦点面距離などのカメラパラメータを音声／映像コンテンツと併せて共伝達することによって、カメラが遠くの被写体から音声／映像を取り込むときには音量を減少させることができるようになる。
【選択図】図２Audio and / or video data is structured and persistent with auxiliary sensor data (eg related to acceleration, orientation or tilt) by using a single data object such as a modified MPEG file or data stream Associated with In this manner, various presentation devices can modify audio content or video content using the co-transmitted sensor data. For example, by associating accelerometer data with video data, some users can view a vibration-stabilized version of the video, and others can view video with such motion artifacts intact. become able to. Similarly, by co-transmitting camera parameters such as focal plane distance along with audio / video content, the volume can be reduced when the camera captures audio / video from a distant subject.
[Selection] Figure 2

Description

Related application data

米国において、本願は、２００９年７月２４日に出願された米国特許仮出願第６１／２２８３３６号の正規の出願であり、同特許仮出願の優先権の利益を主張する。 In the United States, this application is a legitimate application of US Provisional Patent Application No. 61/228336, filed July 24, 2009, and claims the priority benefit of the provisional application.

Introduction

映像／画像センサ（例えば、カメラ）を備えた携帯電話及び他のデバイスに関し、単一で既存の構成の画像／音声（例えば、ＭＰＥＧ）以外に追加データストリームを含む符号化形式があれば望ましい。このような構成で伝達できる上乗せデータストリームの中には、２次元／３次元加速度計／傾斜、２次元／３次元コンパス（磁力計）、レンズズーム、口径、焦点距離、及び被写界深度などが含まれる。 For mobile phones and other devices with video / image sensors (eg, cameras), it would be desirable to have a coding format that includes additional data streams in addition to a single existing configuration of video / audio (eg, MPEG). Some additional data streams that can be transmitted in this way are 2D / 3D accelerometer / tilt, 2D / 3D compass (magnetometer), lens zoom, aperture, focal length, depth of field, etc. Is included.

これらのデータを画像情報と共に伝達することによって、画像は、この補助情報に従って都合よく処理することができる。さらに、補助情報が画像情報と永続的に関連付けられている場合には、別のデバイスがその補助情報を用いて、映像（及び／又は音声）を別の方法で、別の時間に処理することもできる。 By conveying these data along with the image information, the image can be conveniently processed according to this auxiliary information. In addition, if the auxiliary information is permanently associated with the image information, another device may use the auxiliary information to process the video (and / or audio) in a different way and at a different time. You can also.

このような補助情報を使用することは、その画像を取り込んだデバイスによって、及び／又は他のデバイス／システム／アプリケーション（例えば、ビデオプレーヤアプリケーション、又はユーチューブなどのソーシャルメディアウェブサイト）によって可能である。このような処理は、画像取込み時に（例えば、手動又は自動ライブ制作で）、又は映像コンテンツから娯楽作品が制作されるときに、又はポストプロダクション中に（例えば、異なる形式に変換すること、又は異なる環境で消費するために）、又はコンテンツが最終的に視聴者に提示されるときなどに、行うことができる。 Use of such auxiliary information is possible by the device that captured the image and / or by other devices / systems / applications (eg, video player applications or social media websites such as YouTube). Such processing can be done at the time of image capture (eg, in manual or automated live production), or when an entertainment work is produced from video content, or during post-production (eg, converting to a different format, or different. For consumption in the environment) or when the content is finally presented to the viewer.

説明すると、本技術の一実施形態は、音声データとカメラデータ（例えば、焦点面距離）の両方が単一のデータオブジェクト（例えば、ＭＰＥＧファイル又はストリーム）から回復される方法である。音声データは、焦点面データに従って処理されて、ユーザに提示するための改変音声データが得られる。この処理は、離れた被写体に焦点面が一致しているときに取り込まれた音が、近くの被写体に焦点面が一致しているときと比べて減衰されるように音量を制御する（すなわち、近い方の被写体では音が大きくなる）ことを含むことができる。この効果は、選択に応じてユーザによって使用可能又は不能にすることができる。 To illustrate, one embodiment of the present technology is a method in which both audio data and camera data (eg, focal plane distance) are recovered from a single data object (eg, an MPEG file or stream). The audio data is processed according to the focal plane data to obtain modified audio data for presentation to the user. This process controls the volume so that the sound captured when the focal plane coincides with a distant subject is attenuated compared to when the focal plane coincides with a nearby subject (ie, Sound may be louder in a near subject). This effect can be enabled or disabled by the user depending on the selection.

別の実施形態は、映像データとセンサデータ（例えば、加速度計データ）の両方が単一のデータオブジェクトから回復される方法である。映像データは、加速度計データに従って処理されて、この場合もやはりユーザによって指定された選択のデータに基づいて、カメラの動き（例えば、振動）が補償される、又は補償されない。 Another embodiment is a method in which both video data and sensor data (eg, accelerometer data) are recovered from a single data object. The video data is processed according to the accelerometer data, again with or without compensation for camera movement (eg, vibration) based on the selection data specified by the user.

より一般的には、本技術のいくつかの態様には、（１）音声及び／又は映像情報、並びに（２）カメラ及び／又はセンサデータ、の両方を含む単一のデータオブジェクトが伴う。特定の実施形態は、別々のデータソースからこのような単一のオブジェクトを作り出す、及び／又はこのような単一のオブジェクトから個別のデータを回復する方法及び装置に関する。他の方法及び構成では、センサ及び／又はカメラデータを使用して、音声及び／又は映像情報を改変する。 More generally, some aspects of the present technology involve a single data object that includes both (1) audio and / or video information, and (2) camera and / or sensor data. Particular embodiments relate to a method and apparatus for creating such a single object from different data sources and / or recovering individual data from such a single object. In other methods and configurations, sensor and / or camera data is used to modify audio and / or video information.

本技術の上記及び他の特徴及び利点は、添付の図面を参照して改めて行う以下の詳細な説明から、より容易に明らかになろう。 The above and other features and advantages of the present technology will become more readily apparent from the following detailed description, which is provided with reference to the accompanying drawings.

音声及び映像データがシステム間でＭＰＥＧ符号化を用いて伝達される、従来技術の構成を示す図である。1 is a diagram showing a configuration of a conventional technique in which audio and video data are transmitted between systems using MPEG encoding. 本技術の一態様による構成を示す図である。It is a figure showing the composition by one mode of this art. 携帯電話から来る共通データストリームを別々の受信ユニットがどのようにして使用できるかを示す図である。Fig. 4 shows how different receiving units can use a common data stream coming from a mobile phone. 携帯電話の態様を示すブロック図である。It is a block diagram which shows the aspect of a mobile telephone.

Detailed description

図１は、従来技術の構成を示す。ビデオカメラなどの発信デバイスは、画像を取り込むカメラシステム（２次元画像センサを含む）、及び音声を取り込むマイクロフォンを含む。取り込まれたデータは、ＭＰＥＧなどの標準規格に準拠して記述される（すなわち、符号化される）。結果として得られる映像と音声のデータストリームは、他の処理及び記録デバイスの全ての方法と互換性がある、よく知られている方法（例えば、ＭＰＥＧデータストリーム）で一体化される。受信ユニットは、伝送された映像を復号してディスプレイ画面に供給する。同様に、受信ユニットは伝送された音声を復号してスピーカに供給する。 FIG. 1 shows the configuration of the prior art. A transmitting device such as a video camera includes a camera system (including a two-dimensional image sensor) that captures an image and a microphone that captures sound. The captured data is described (that is, encoded) in accordance with a standard such as MPEG. The resulting video and audio data stream is integrated in a well known manner (eg, MPEG data stream) that is compatible with all other processing and recording device methods. The receiving unit decodes the transmitted video and supplies it to the display screen. Similarly, the receiving unit decodes the transmitted voice and supplies it to the speaker.

より新規のデバイスは、従来技術の画像センサ／マイクロフォン以外に多数のセンサを含む。例えば、廉価なスマートフォンでも、加速度計（手ぶり、傾きなどを感知する）、及び磁力計（例えば、コンパス方向を感知する）などの構成要素を含む。加えて、レンズズーム及び開口サイズなどの光学取込みパラメータをデータとして取得し、後で画像（及び／又は音声）データの処理の際に使用することもできる。 Newer devices include numerous sensors in addition to prior art image sensors / microphones. For example, even an inexpensive smartphone includes components such as an accelerometer (sensing hand gesture, tilt, etc.) and a magnetometer (eg sensing compass direction). In addition, optical capture parameters such as lens zoom and aperture size can be acquired as data and later used when processing image (and / or audio) data.

図２は、本技術の諸態様を用いる例示的な実施を示す。発信ユニット（例えば、携帯電話）は、３次元の加速度情報、（通常は、直交する配置の３つのＭＥＭＳ加速度計）、３次元位置座標（例えば、ＧＰＳによって、又は他の方法で）、タイムスタンプデータ、３次元の方位情報（通常は、磁力計又はホール効果コンパスから得られる）、及び傾きセンサ（内蔵の加速度計データ、又はジャイロスコープデバイスを備えることができる）を含むデータを供給する。カメラデータには、焦点、ズーム、開口サイズ、被写界深度、露出時間、ＩＳＯ設定、レンズ焦点距離、焦点深度などに関する情報が含まれうる。このようなパラメータにより、画像がその空間領域とどのように関連しているかについての再計算が可能になる。変換器とカメラ／映像フレームそれぞれの間の相対的遅延など、他のシステム固有のタイミング情報が含まれうる。補助データには、例えば、関連する映像フレームのフレーム識別番号、又はＭＰＥＧＩ−フレームと結び付いた他の同期方法が含まれうる。 FIG. 2 illustrates an exemplary implementation using aspects of the present technology. The sending unit (eg mobile phone) has 3D acceleration information (usually 3 MEMS accelerometers arranged orthogonally), 3D position coordinates (eg by GPS or otherwise), timestamp Data is provided including data, three-dimensional orientation information (usually obtained from a magnetometer or Hall effect compass), and a tilt sensor (which may comprise built-in accelerometer data or a gyroscope device). The camera data can include information about focus, zoom, aperture size, depth of field, exposure time, ISO settings, lens focal length, depth of focus, and the like. Such parameters allow a recalculation of how the image relates to its spatial domain. Other system specific timing information may be included, such as the relative delay between the transducer and the camera / video frame respectively. Ancillary data may include, for example, the frame identification number of the associated video frame, or other synchronization method associated with the MPEG I-frame.

この詳述したばかりの情報は、例えばＭＰＥＧなどの単一のデータストリームとして発信ユニットで取り込まれた、若しくはＭＰＥＧデータファイルなどの単一のファイルに記憶された音声及び／又は映像情報と併せて共伝達される。（このような単一の構成物は、一括して単一のデータオブジェクトと呼ばれ、これらの異なる種類のデータを一緒に構成されるようにして永続的に関連付けるという利点を有する。） This just-detailed information is shared with the audio and / or video information captured at the sending unit as a single data stream, eg MPEG, or stored in a single file such as an MPEG data file. Communicated. (Such single constructs are collectively referred to as a single data object and have the advantage of permanently associating these different types of data together.)

図２にはまた、受信ユニットが示されている。この受信ユニットは、詳述したばかりのデータの一部又は全部を取得し、ユーザに提示される出力を作る際にそのデータを用いる。 FIG. 2 also shows a receiving unit. This receiving unit takes some or all of the data just detailed and uses that data in creating the output presented to the user.

このような出力が、共伝達される補助データとどのように関連しうるかについての例をいくつか考える。１つは振動補償（動き安定化）である。多くのカメラでは、カメラの振動を感知し（例えば、１つ又は複数の加速度計によって）、この影響がカメラから出力される前にその影響を取り除くように画像データを処理する。本技術を用いる一構成では、以前通りに振動を感知する。しかし、この影響をカメラ内で取り除くのではなく、関連する振動データが、取り込まれた画像データと共に伝達される。このデータを受け取るデバイス及びアプリケーションは、振動対策アルゴリズム（従来技術のカメラに使用されている種類のもの）を適用することを選択することができ、若しくは画像を、振動を含むその未処理の、元のままの形で提供することもできる。補助データを適用して振動するアーティファクトを取り除くかどうかは、自動的に（例えば、特定のビデオプレーヤでは、振動安定化されるようにして常に映像を見せることができる）、若しくはユーザの選択によって（例えば、ユーザインタフェース制御部の操作によって表示されるように、又は記憶された選択データを参照することによって）、個々別々に決めることができる。 Consider some examples of how such output can be associated with co-transmitted auxiliary data. One is vibration compensation (motion stabilization). Many cameras sense camera vibration (eg, by one or more accelerometers) and process the image data to remove this effect before it is output from the camera. In one configuration using the present technology, vibration is sensed as before. However, instead of removing this effect in the camera, the associated vibration data is transmitted along with the captured image data. Devices and applications that receive this data can choose to apply anti-vibration algorithms (of the kind used in prior art cameras), or images can be processed in their raw, It can also be provided as is. Whether to apply ancillary data to remove vibrating artifacts can be automatic (for example, certain video players can always show video as vibration-stabilized) or at the user's choice ( It can be determined individually, for example, as displayed by operation of the user interface controller or by referring to stored selection data.

別の例は音声処理である。ほとんどの音声取込みデバイスは、比較的一定の音声レベルを維持しようとする何らかの形の自動利得制御（ＡＧＣ）を含む。マイクロフォンによって感知された音声は、それが微弱な場合は増幅され、強い場合は減衰される。これは、例えば聴く人の快適さのために、一般に望ましい。本技術の態様によれば、ＡＧＣが以前通りに取込みの時点で適用される。しかし、提示の時点で、あるアプリケーションにより、カメラからの被写体の距離に従って音量を制御することができる。つまり、音声出力は、カメラの焦点面の位置を示す補助データに従って制御することができる。カメラの焦点がカメラに近い被写体に合っている場合（例えば、数フィート離れた人とのインタビュー）、提示システムは、第１の値の音声レベルを出力することができる。対照的に、カメラの焦点が離れた被写体に合っている場合、提示デバイスは、低い方の第２の値に音声レベルを低減して、離れている視覚効果が強まる音響効果を聴衆に与えることができる。本明細書ではこれを「次元」音声と呼ぶ。（この場合もやはり、このような処理を用いるかどうかは、自動又は手動で選択的に制御することができる。） Another example is voice processing. Most audio capture devices include some form of automatic gain control (AGC) that attempts to maintain a relatively constant audio level. The sound sensed by the microphone is amplified when it is weak and attenuated when it is strong. This is generally desirable, for example for listener comfort. According to aspects of the present technique, AGC is applied at the time of acquisition as before. However, at the time of presentation, a certain application can control the volume according to the distance of the subject from the camera. That is, the audio output can be controlled according to auxiliary data indicating the position of the focal plane of the camera. If the camera is focused on a subject close to the camera (eg, an interview with a person a few feet away), the presentation system can output a first value of audio level. In contrast, when the camera is focused on a distant subject, the presentation device reduces the sound level to a lower second value, giving the audience an acoustic effect that enhances the distant visual effect. Can do. This is referred to herein as “dimensional” speech. (Again, whether such processing is used can be selectively controlled automatically or manually.)

同様に、音響効果を追加すること、若しくは、共伝達された加速度計又は磁力計のデータによって示されるカメラの動きに基づいて音響効果を調整することが可能である。カメラの振動又はジッタが感知された場合、低周波数の連続音を既存の音声データの上に重ねることができる。データが、カメラが上又は下にパンしていることを示す場合は、（それぞれ）高くなる又は低くなる周波数の笛の音を加えることができる。等々。 Similarly, it is possible to add an acoustic effect or to adjust the acoustic effect based on camera movement as indicated by co-transmitted accelerometer or magnetometer data. When camera vibration or jitter is sensed, a low frequency continuous sound can be superimposed on the existing audio data. If the data indicates that the camera is panning up or down, a whistle sound with a higher or lower frequency (respectively) can be added. And so on.

図３は、本構成の多様性を示す。単一の供給データは、発信ユニットからの出力である。この供給データは、「ライブ」で、又は中間の記憶装置及び／又は制作物を介して、様々な受信ユニットに分配される。異なる受信ユニットは、データを様々に提示し、例えば、あるものは動き安定化ディスプレイを用いて、あるものは用いないで、あるものは次元音声を用いて、あるものは従来の（ＡＧＣ）音声を用いて、提示する。（受信ユニットはまた、携帯電話とすることもできる。受信ユニットの１つは、発信ユニットとすることができる。例示的な携帯電話のブロック図が図４に示されている。） FIG. 3 shows the diversity of this configuration. The single supply data is the output from the sending unit. This feed data is distributed to the various receiving units “live” or via intermediate storage and / or production. Different receiving units present various data, for example, some with motion-stabilized displays, some without, some with dimensional audio, some with conventional (AGC) audio. To present. (The receiving unit can also be a mobile phone. One of the receiving units can be a calling unit. A block diagram of an exemplary mobile phone is shown in FIG. 4).

図２の受信ユニットは、出力デバイスとしてディスプレイ及びスピーカだけを含むように示されているが、もちろん、もっと多い又は少ないデバイスを使用することができる。一構成では、例えば、発信ユニットで感知された動き情報を受信ユニットで動き（振動）として提示できるように、１つ又は複数の触覚に基づく出力デバイスが含まれる。触覚に関する技術及び出力デバイスについては、特許第７，４２５，６７５号、第７，５６１，１４２号、及び特許出願公開第２００９／００９６６３２号などのＩｍｍｅｒｓｉｏｎＣｏｒｐ．の特許から分かる。 The receiving unit of FIG. 2 is shown to include only a display and speakers as output devices, but of course, more or fewer devices can be used. In one configuration, for example, one or more haptic-based output devices are included so that motion information sensed at the sending unit can be presented as movement (vibration) at the receiving unit. Techniques and output devices related to haptics are described in Immersion Corp., such as Patent No. 7,425,675, No. 7,561,142, and Patent Application Publication No. 2009/0096632. Can be seen from the patent.

異種のデータストリームの符号化では、ＭＰＥＧ−２／Ｈ．２６２及びＨ．２６４／ＭＰＧ−４などの知られているプロトコルの基本的手法を用いることができる。これらのプロトコルは、他の拡張機能を付加するために修正することができ、（音声）ＭＰＧマルチチャネル（ＩＳＯ１４４９６−３）と同様である。別の手法は、ＭＰＥＧ−２の６つのＬＣＰＭ音声チャネルの中で、１つ又は複数の使用可能なチャネルを使用することである（これらは可逆的に符号化され、低いサンプル及びビットレートで行うことができる）。 For encoding different data streams, MPEG-2 / H. 262 and H.H. Basic techniques of known protocols such as H.264 / MPG-4 can be used. These protocols can be modified to add other extensions and are similar to (voice) MPG multi-channel (ISO 14496-3). Another approach is to use one or more available channels among the six LCPM audio channels of MPEG-2 (these are reversibly encoded and do at a lower sample and bit rate) be able to).

説明すると、空間変換器、変換器軸、カメラ設定、光学パラメータなどのそれぞれのデータがデジタルストリームに符号化され、その符号化の時点でＭＰＥＧ−２ストリームの音声チャネルの１つに挿入される。（現在の符号化では、チャネルの数及びビットレートの選択が可能であり、したがって、１つ又は複数のチャネルを使用可能にすることができる。）カメラ及びマイクロフォン（複数可）からの元の音声に必要とされる数以上の、十分な数の追加チャネルが実施される音声符号化法が選択される。変換器などのデータは、これらの追加音声チャネル（複数可）のデータストリームを含む。これにより、変換器データは音声／映像ストリームとの同期が保たれ、これは、その後の音声／映像データの処理にとって望ましい。（別の構成では、同期がプロトコル中で維持されないが、補助データストリームの１つ以上にデジタル透かしが入れられた同期信号を音声及び／又は映像データと共に参照することによって、後で確立される。例えば、特許第６，８３６，２９５号及び第６，７８５，４０１号を参照されたい。） To explain, each data such as a spatial transducer, transducer axis, camera settings, optical parameters, etc. is encoded into a digital stream and inserted into one of the audio channels of the MPEG-2 stream at the time of the encoding. (With current coding, the number of channels and bit rate can be selected, and therefore one or more channels can be enabled.) Original audio from camera and microphone (s) A speech coding method is selected in which a sufficient number of additional channels is implemented, more than the number required for. Data such as a transducer includes a data stream of these additional voice channel (s). This keeps the converter data synchronized with the audio / video stream, which is desirable for subsequent processing of the audio / video data. (In another configuration, synchronization is not maintained in the protocol, but is later established by referring to a synchronization signal with digital watermarks in one or more of the auxiliary data streams along with audio and / or video data. (See, for example, patents 6,836,295 and 6,785,401.)

空間データ及び加速度計データが非常に高いビットレートを必要としないので（以下でさらに論じるように）、これらのデータストリームは、より少数の音声チャネルの中に（単一音声チャネルでも）直列に結合することができる。大まかな例として、４つの変換器のそれぞれ３つの軸（全部で１２）にサンプル当たり３２ビットを与えると、それぞれが３２ビットの３２のカメラ設定を加えて、１つの画像フレーム当たりの合計が１７６バイトになる。画像フレームレートが６０フレーム／秒である場合、補助データレートは１０，５６０バイト／秒になり、このデータレートは、最も低い音声チャネル（８ビットで４４．１ｋＨｚ）でも十分に内側にある。一部の補助データは全てのフレームと共に送信する必要がないので、チャネル使用率はさらに低減することができる。 Since spatial data and accelerometer data do not require very high bit rates (as discussed further below), these data streams are coupled in series in fewer audio channels (even a single audio channel). can do. As a rough example, given 32 bits per sample on each of the 3 axes (12 in total) of the 4 transducers, adding 32 camera settings, each of 32 bits, resulting in a total of 176 per image frame. Become a byte. If the image frame rate is 60 frames / second, the auxiliary data rate will be 10,560 bytes / second, which is well inside even for the lowest audio channel (84.1 44.1 kHz). Since some auxiliary data does not have to be transmitted with every frame, the channel utilization can be further reduced.

データレートが低いので、補助データは、従来の音声よりも低いビットレートで符号化することができる。カメラの空間座標は、最も低いレートを有しうる。光学取込み系（レンズ、光学部品、露出など）を記述する補助データは、定期的に送信することができるが、変化したときに送信されるだけでよい（通常、変化が最初に現れるフレームと関連付けられている）。特定の一実施では、このようなデータは、編集されたストリームにおける画像後処理のためにＭＰＥＧ映像のＩ−フレームと整合されて送られる。ズーム、開口、方位、被写界深度、焦点距離、方位角、３次元ＧＰＳ、時間、その他は、フレームごとに一回だけ必要とされる。加速度情報は通常、カメラ空間運動を保持するためにより速いレートで収集される（又は、速度及び関連する空間位置を得るために積分される）。加速度情報は、フレームごとに一回よりも頻繁に収集できるが、フレーム間隔でしか送信されないことがあり、若しくはもっと多い又は少ない頻度で送信されることもある。（つまり、加速度計データ及び位置データは、映像フレームレートに限定する必要がない。） Since the data rate is low, the auxiliary data can be encoded at a lower bit rate than conventional speech. The camera's spatial coordinates may have the lowest rate. Ancillary data describing the optical capture system (lens, optics, exposure, etc.) can be sent periodically, but only needs to be sent when it changes (usually associated with the frame where the change first appears). Is). In one particular implementation, such data is sent in alignment with MPEG video I-frames for image post-processing in the edited stream. Zoom, aperture, azimuth, depth of field, focal length, azimuth, 3D GPS, time, etc. are required only once per frame. Acceleration information is typically collected at a faster rate to preserve camera space motion (or integrated to obtain velocity and associated spatial position). Acceleration information can be collected more frequently than once per frame, but may only be transmitted at frame intervals, or may be transmitted more or less frequently. (In other words, the accelerometer data and position data need not be limited to the video frame rate.)

データレートが比較的低いことを考えると、データ圧縮は不要である（しかし、もちろん使用することもできる）。変化が遅いデータについては、いくつかの実施では通常、差分アップデートを送信し、再同期のための完全な座標を少ない頻度で（ＭＰＥＧにおけるＢフレーム及びＰフレームに対するＩフレームに類似して）送信することができる。様々なデータ型（例えば、差分又は全部）は、関連するデータパケット又はフィールド内でタグ（例えば、ＸＭＬ形式）によって表示することができる。 Given the relatively low data rates, data compression is not necessary (but of course can be used). For slow-changing data, some implementations typically send differential updates and send full coordinates for resynchronization less frequently (similar to I-frames for B and P frames in MPEG) be able to. Various data types (eg, differential or all) can be displayed by tags (eg, XML format) in the associated data packet or field.

加速度データは、発信ユニットにおいて局所的に積分することができる（一次及び二次積分）。これらのパラメータは、各フレームの位置を特定するのに必要な精度よりも高い精度が得られるので、補助データストリームに含まれる。同様に、加速度、方位、コンパス、及び位置を追跡し組み合わせることによって、空間位置付けをカメラにおいて高い精度でより適切に計算することができる。加速度データの積分により、データの帯域幅が効果的に削減される。 The acceleration data can be integrated locally at the transmitting unit (primary and secondary integration). These parameters are included in the auxiliary data stream because they provide higher accuracy than is necessary to locate each frame. Similarly, by tracking and combining acceleration, orientation, compass, and position, spatial positioning can be calculated more accurately with high accuracy in the camera. The integration of the acceleration data effectively reduces the data bandwidth.

いくつかの実施形態では、補助データは、同期フィールド／タグ及び識別フィールド／タグと共にストリームに寄せ集められて、映像の解析及び修正が可能になる。様々なデータの精度もまた、ストリーム中で明示する（タグを付ける）ことができる。 In some embodiments, ancillary data is gathered together with the sync field / tag and identification field / tag into the stream to allow video analysis and modification. The accuracy of the various data can also be specified (tagged) in the stream.

別の構成では、空間情報の各サンプル群は、そのデータに対応するタイムスタンプと共に、範囲を定められ、又はパケットの形にされる。 In another configuration, each sample group of spatial information is delimited or packetized with a time stamp corresponding to the data.

特定の一構成において、補助データは、符号化され、例えば５．１音声実装の既存の音声チャネルの間で多重化され、及び／又は既存の音声チャネルに混合される。携帯電話及び多くの他のカメラなどからのモノラル音声では、余分のチャネル容量は本質的に自由に利用できる。別の実施では、補助データは、人の可聴範囲の最低部又は最高部近くの、例えば３００Ｈｚ未満又は１５ｋＨｚを超える１つ又は複数の搬送波に符号化される。（ＭＰ３又は他の符号化は、このようなデータ帯域を保持するように適応させることができる。）これらのデータ帯域は、再生の時点で、人が購入するために提示される音声中で適宜にフィルタリングすること、又は除去することができる。 In one particular configuration, the auxiliary data is encoded, multiplexed between existing voice channels, for example 5.1 voice implementations, and / or mixed into existing voice channels. For monaural audio such as from mobile phones and many other cameras, the extra channel capacity is essentially freely available. In another implementation, the auxiliary data is encoded on one or more carriers near or at the lowest or highest part of the human audible range, for example, less than 300 Hz or greater than 15 kHz. (MP3 or other encoding can be adapted to preserve such data bands.) These data bands are appropriate in the audio presented for purchase by a person at the time of playback. Can be filtered or removed.

さらに別の構成では、補助データは、音声（映像）の範囲の一部又は全部にわたる音声（又は映像）の微細な変化として、ステガノグラフィで伝達される。このようなデジタル透かし技術は、例えば特許第６，０６１，７９３号及び第６，５９０，９９６号に詳述されている。 In yet another configuration, the auxiliary data is transmitted in steganography as a fine change in audio (or video) over part or all of the audio (video) range. Such digital watermarking techniques are described in detail in, for example, Patent Nos. 6,061,793 and 6,590,996.

当業者には明らかなように、符号化構成と相補的な復号器構成が、補助データを抽出するために、例えば受信ユニット（複数可）に設けられる。 As will be apparent to those skilled in the art, a decoder configuration complementary to the encoding configuration is provided, for example, in the receiving unit (s) to extract auxiliary data.

さらに別の構成では、変換器データ及びカメラデータが専用のチャネル内で符号化される新規の標準符号化形式を採用し、この符号化は、映像フレーム符号化と同期される（この場合もやはり、フレームレートに限定される必要はないが）。 Yet another configuration employs a new standard encoding format in which the transducer data and camera data are encoded in a dedicated channel, which is synchronized with the video frame encoding (again, again). , But not necessarily limited to frame rate).

ストリーム中のデータフィールドタグ及び／又はパケットヘッダフィールドは、将来に追加データ型を含むことができるように拡張可能にすることができる。特定の一構成において、パケットヘッダは、短く保たれて、変換器データのいくつかの標準の群／パケットを、その群が相対座標群又は絶対座標群であるかどうか、及び／又はその群がカメラ情報を含むかどうか、簡単に識別する働きをする。いくつかの追加ヘッダビット組合せが、拡張可能なコンテンツを使用するパケットのために、任意選択で残しておかれる。その場合、（拡張可能な）パケット内の各データ要素又はデータ群は、順次に、又は（ＸＭＬのように）階層的に範囲が定められる。拡張性は、全てのデータ群で有用でありうるが、望ましくは帯域幅要件の主要部を形成しない。 Data field tags and / or packet header fields in the stream can be made extensible so that additional data types can be included in the future. In one particular configuration, the packet header is kept short to show some standard groups / packets of transducer data, whether the group is a relative coordinate group or an absolute coordinate group, and / or Easily identifies whether camera information is included. Several additional header bit combinations are optionally left for packets that use expandable content. In that case, each data element or group of data in the (expandable) packet is scoped sequentially or hierarchically (like XML). Scalability can be useful for all data groups, but desirably does not form the main part of bandwidth requirements.

符号化されたストリーム（又は対応するファイル構造）は、開始近くでタグ付けされて、プロトコルを知っているプレーヤが正しい解析オプションを使用可能にすることができ、音声（音声チャネルに符号化されていれば）などとして補助データを再生しようとすることがないように、そのストリームを無視する選択ができるようにする。 The encoded stream (or corresponding file structure) can be tagged near the start to enable players with knowledge of the protocol to use the correct parsing options, and the audio (encoded into the audio channel) And so on, so that the auxiliary data can be ignored so that the stream is not reproduced.

レガシーデバイス（例えば、レガシービデオ／オーディオプレーヤ）での再生のために、補助データはあらかじめ取り除くことができ、又は、プレーヤが符号化プロトコルを知っている場合は、プレーヤがそのデータを無視する選択をすることができる。 For playback on legacy devices (eg legacy video / audio players), the auxiliary data can be removed in advance, or if the player knows the encoding protocol, the player can choose to ignore the data can do.

共伝達補助データの使用法を示すいくつかの例を上に提供した。このような応用例の数は限りがない。いくつか他の例を以下に簡潔に記す。
・マッピングからの空間情報と組み合わせる（例えば、拡張リアリティシステムＬａｙａｒ．ｅｕに類似するが、ライブ映像ではなく記録映像により動作する）。
・３次元画像を構築する（カメラの動き及び振動さえも、立体視情報を与えるための光路設定と組み合わせて用い、その後に画像を補間する）。
・マッピングのためにカメラ空間位置を追跡する。
・位置検索又はソーシャルアプリケーションのための画像コンテンツのユーザタグ付け。
・視野内の被写体のポストプロダクション又はライブ再生のタグ付けを可能にする（例えば、以前に録画した休暇のビデオクリップが、同じ場所を訪れている間に、携帯電話からの友人のリアルタイム座標供給により、友人がどこにいるかをリアルタイムで現在、表示することができる）。
・事件のありうる視像の映像探索を可能にする（犯罪に関する法廷での応用例など − 手動で見直し、各フレームの視像が何であるかを識別するのではなく、高速サーチで、誰かが偶然に事件Ｘの視像を背景に捕らえていないか）。 Some examples showing the use of co-communication assistance data are provided above. The number of such applications is unlimited. Some other examples are briefly described below.
Combined with spatial information from mapping (eg, similar to extended reality system Layar.eu, but operates with recorded video instead of live video).
Build a 3D image (even camera movements and vibrations are used in combination with an optical path setting to give stereoscopic information and then interpolate the image).
Track camera space position for mapping.
User tagging of image content for location searches or social applications.
Allows post production or live playback tagging of subjects in the field of view (eg, by providing real-time coordinates of friends from a mobile phone while a previously recorded vacation video clip is visiting the same location) , Where you can see where your friends are in real time).
• Enable video search for possible views of incidents (eg crimes in courtroom applications-rather than manually reviewing and identifying what each frame's view is, someone with a high-speed search Have you accidentally captured the image of Incident X in the background?)

組み合わされた音声／映像データと補助データのストリームは、実況放送で使用する、又は記録することができる。実況の場合、そうすることで発信ユニットがイベント（例えばスポーツイベント）を取り込むことが、視聴者によって（自宅のパーソナルコンピュータ／テレビで）、又は中間サービス（放送スタジオ又はクラウド処理）によって行われる補助データの処理（例えば、補助データと一緒の音声／映像の処理）により可能になる。 The combined audio / video data and auxiliary data streams can be used or recorded in live broadcasts. In the live situation, auxiliary data that is done by the viewer (at home personal computer / television) or by an intermediate service (broadcast studio or cloud processing) so that the originating unit captures the event (eg a sports event) (For example, audio / video processing together with auxiliary data).

複数のカメラ視像を組み合わせてより複雑な空間視像を構築することができる。その場合、視聴者は、様々な角度から見ることを可能にした後処理効果付きでイベントを観察することができる。付けられた空間タグデータにより、又は物体認識を用いることにより、観測者／視聴者は、個々の演技者、競技者、自動車、馬を追跡することができ、ストリームに含まれた空間データを伴う複数の画像ソースが、印を付けた物体／人を追うために選択／補間／混合される。追加のマーカ及びタグを、特定の空間位置のシーンに挿入することができる。 A more complex spatial view can be constructed by combining a plurality of camera views. In that case, the viewer can observe the event with a post-processing effect that allows viewing from various angles. With attached spatial tag data or using object recognition, the observer / viewer can track individual performers, athletes, cars, horses, with spatial data included in the stream Multiple image sources are selected / interpolated / mixed to follow the marked object / person. Additional markers and tags can be inserted into the scene at a particular spatial location.

位置（ＧＰＳ）／加速度監視タグを、自動車、馬、フットボールを含む様々なスポーツイベント用に作ることができる（例えば、競技者のユニフォームに装着される）。補助データ中継器（位置、加速度など）を競技ボール、パック、又は他の移動備品若しくは静止備品にも組み込むことができ、このような補助データは、競技の音声／映像（例えばサイドラインから撮影）と併せて共伝達される。こうすることにより、見られている被写体に、カメラを基準とした既知の位置が与えられる。 Position (GPS) / acceleration monitoring tags can be made for a variety of sporting events including cars, horses, football (eg, worn on the athlete's uniform). Auxiliary data repeaters (position, acceleration, etc.) can also be incorporated into a game ball, pack, or other mobile or stationary equipment, such auxiliary data can be captured in the audio / video of the competition (eg taken from the sideline) Are transmitted together. This gives a known position relative to the camera to the viewed subject.

Other supplementary explanation

読者は、上記の議論に関連する様々なプロトコル及びデータ伝送標準規格／仕様に精通しているとみなされる。このような仕様を詳述する様々な標準規格文献（例えば、ＩＳＯ／ＩＥＣ１４４９６−３、ＩＳＯ／ＩＥＣ１３８１８−１、ＩＳＯ／ＩＥＣ１４４９６、ＩＴＵＨ．２６２、ＩＴＵＨ．２２２、ＲＦＣ３６４０など）は、参照により本明細書に組み込まれる。 The reader is considered familiar with the various protocols and data transmission standards / specifications relevant to the above discussion. Various standards documents detailing such specifications (eg, ISO / IEC 14496-3, ISO / IEC 13818-1, ISO / IEC 14496, ITU H.262, ITU H.222, RFC 3640, etc.) are by reference. Incorporated herein.

本発明の効果の原理を説明的な例を参照して記述し説明したが、本技術がそれだけに限定されないことが認められよう。 Although the principles of the effects of the present invention have been described and illustrated with reference to illustrative examples, it will be appreciated that the technology is not so limited.

例えば、例示的なプロトコル及び標準規格（例えば、ＭＰＥＧ−２）に特に言及したが、もちろん他も、詳述した目的のために適合させることができる。これらには、ＭＰＥＧ−４、ＭＰＥＧ−７、ＭＰＥＧ−２１などが含まれる。 For example, while specific reference is made to exemplary protocols and standards (eg, MPEG-2), of course others can be adapted for the purposes detailed. These include MPEG-4, MPEG-7, MPEG-21 and the like.

音声が取込みの時点でＡＧＣ処理され、その後音声レベルが補助データに従って制御可能に変えられる一実施形態に言及した。別の構成では、音声は、取込みの時点でＡＧＣ処理されない。そうしないで、音声は、その最初に取り込まれた形で符号化される。音声レベル、又は他の効果はなお、補助データに従って制御することができる。 We have referred to an embodiment in which the audio is AGC processed at the time of capture and then the audio level is controllably changed according to the auxiliary data. In another configuration, speech is not AGC processed at the time of capture. Instead, the speech is encoded in its originally captured form. The sound level or other effects can still be controlled according to the auxiliary data.

詳述した実施形態では、補助データを、共伝達された音声及び／又は映像データを処理する際に使用するが、こうすることは必須ではない。他の実施では、補助データは、音声／映像と無関係の目的に対し、受信ユニットによって用いられることがある（例えば、表示された地図上で取込み位置を特定するなど）。 In the detailed embodiment, the auxiliary data is used in processing the co-transmitted audio and / or video data, but this is not essential. In other implementations, the auxiliary data may be used by the receiving unit for purposes unrelated to audio / video (eg, identifying the capture location on the displayed map, etc.).

センサデータ（又はカメラデータ）自体を符号化する代わりに、このようなデータに基づく他の情報を符号化できることを理解されたい。例えば、瞬間位置情報を表すデータを携帯電話で処理して、動きベクトルデータを得ることができる。この動きベクトルデータ（及び他のこのような後処理データ）は、関連する音声及び／又は映像と一緒に単一のデータオブジェクト内で符号化することができる。動きは、微分処理によって位置と関連付けられるが、無数の他の種類の処理、例えば積分、フィルタリング等々もまた適用することができる。同様に、様々な種類の補助データを組み合わせる、又は別に一緒に処理することができる（例えば、位置データの導関数で動きの１つの推定値を生成でき、加速度計データの積分で動きの第２の推定値を生成でき、その後にこれら２つの推定値を平均することができる）。いくつかの構成では、元のセンサデータが記憶され、この元データのタグが、後の再解析のために必要に応じて、符号化されたデータストリーム内に含まれる。 It should be understood that instead of encoding the sensor data (or camera data) itself, other information based on such data can be encoded. For example, data representing the instantaneous position information can be processed by a mobile phone to obtain motion vector data. This motion vector data (and other such post-processing data) can be encoded in a single data object along with associated audio and / or video. Motion is related to position by differentiation, but myriad other types of processing, such as integration, filtering, etc., can also be applied. Similarly, various types of auxiliary data can be combined or otherwise processed together (eg, one estimate of motion can be generated with a derivative of position data, and a second motion can be calculated with integration of accelerometer data. Can be generated, and then these two estimates can be averaged). In some configurations, the original sensor data is stored, and tags of this original data are included in the encoded data stream as needed for later reanalysis.

同様に、詳述した種類の補助データは説明的なものであり、限定的なものではないことを理解されたい。例えば、別の構成では、競技者のスマートフォンが酸素センサ、又は心拍モニタを備えることができ、この情報は、関連する音声情報又は映像情報をやはり含むデータ構造中に格納することができる。（いくつかの実施形態では、このようなセンサは、ブルートゥース又は他の短距離接続技術によってスマートフォンと接続することができる。）このような構成によって、自転車レース映像を、各競争者の変化する心拍数及び血液酸素濃度を示す図による注釈付きで提示することができる。 Similarly, it should be understood that the type of auxiliary data detailed is illustrative and not limiting. For example, in another configuration, the athlete's smartphone can be equipped with an oxygen sensor, or a heart rate monitor, and this information can be stored in a data structure that also contains relevant audio or video information. (In some embodiments, such a sensor can be connected to a smartphone via Bluetooth or other short-range connection technology.) With such a configuration, a bicycle race video is displayed with each competitor's varying heart rate. Numbers and blood oxygen levels can be presented annotated with diagrams.

詳述した構成では、補助データ（例えば、センサデータ）が画像（映像）データの処理に有効に寄与できることが特に考慮されたが、他の処理構成もまた有効である。例えば、画像データを使用して、センサデータを処理する助けにすること、又は画像とセンサデータの組との間の同期をとることができる。映像ストリームは、データチャネル内に挿入されたデータ同期タグを含むことができ、このデータ同期タグに、ずっと大きいデータ又は代替データを同期させることができる。（データ同期タグは、フレーム番号又は時間からセンサ測定値のイベント識別子を計算することが不可能なことがあるので、映像フレーム／時間スタンプとは異なっていることが必要なことがある。また、映像後処理では、元のフレーム情報が失われることもある。）同様に、取り込まれた補助データが、単一のデータオブジェクト内（例えば、音声チャネル又はメタデータチャネル中）に都合よく含まれるにはあまりに大きい場合には、代わりに、このデータと関連付けられた識別番号を単一のデータオブジェクトの中に挿入することができる。 In the detailed configuration, it is particularly considered that auxiliary data (for example, sensor data) can effectively contribute to the processing of image (video) data, but other processing configurations are also effective. For example, image data can be used to help process sensor data, or to synchronize between an image and a set of sensor data. The video stream can include a data synchronization tag inserted in the data channel, to which much larger data or alternative data can be synchronized. (Data synchronization tags may need to be different from video frames / time stamps, as it may not be possible to calculate the sensor measurement event identifier from the frame number or time. In video post-processing, the original frame information may be lost.) Similarly, the captured auxiliary data is conveniently included within a single data object (eg, in an audio channel or metadata channel). If is too large, the identification number associated with this data can instead be inserted into a single data object.

例えば動画映像の取込み中のある時点に、静止写真及び変換器データを取り込むことを考える。画像のタグを映像ストリーム補助チャネルの中に挿入して、その写真と結び付けることができる。（写真は、元の映像フレーム／時間と結び付けるためのタグを有することができる。）写真取込み時間は、フレームレートよりも細かい分解能で記録することができる。 For example, consider capturing a still picture and converter data at some point during the capture of a moving image. An image tag can be inserted into the video stream auxiliary channel and associated with the photo. (Photos can have a tag to associate with the original video frame / time.) Photo capture time can be recorded with a resolution finer than the frame rate.

また、スマートフォンで位置データ（例えば、位置、方位、傾き）の高速サンプル列を、スマートフォンが映像を取り込んでいるときに映像フレームレートより大きいフレームレートで収集することを考える。結果として得られる補助データは、映像と共にデータオブジェクト内に含まれるには大きくなりすぎることがある。（又は、位置データが別の処理で取り込まれることがある。）この位置データの識別子がタグに付与され、このタグは（おそらく、このデータ取込みの開始点及び終止点で）映像ストリームの中に挿入される。このデータは、例えば、画像化された対象物の３次元モデルを合成する、スマートフォンを把持／使用することに関連するユーザの動きを定量化するなど、様々な目的のために処理することができる。 Also, consider collecting a high-speed sample sequence of position data (for example, position, orientation, and tilt) at a frame rate greater than the video frame rate when the smartphone is capturing video. The resulting auxiliary data can be too large to be included in the data object along with the video. (Or the location data may be captured in another process.) An identifier for this location data is given to the tag, and this tag (possibly at the beginning and end of this data capture) in the video stream. Inserted. This data can be processed for a variety of purposes, such as, for example, synthesizing a three-dimensional model of an imaged object, quantifying user movements associated with grasping / using a smartphone. .

詳述した構成では、単一の画像センサ、又は単一の音声センサが特に考慮されたが、本技術はまた、複数のセンサと共に使用するのにもよく適している。 Although the detailed configuration specifically considered a single image sensor or a single audio sensor, the present technology is also well suited for use with multiple sensors.

このような一構成では、複数のセンサが音声又は映像、及び／又は補助データを取り込む。取り込まれた情報は、クラウド資源又は他の処理システムに供給することができ、そこでは、これら複数のストリームを提示のために解析し、組み合わせ、又はその間で選択することができる。例えば、複数のカメラの相対位置を、関連する位置データ（例えば、映像と共に符号化されている）から計算することができる。１つのカメラが別のカメラの視野内にある場合、その存在又は位置をユーザインタフェースで示し（例えば、見ているフレーム内で長方形を強調表示することによって）、それによって、見ている人に別の視点が使用可能であることを示すことができる。映像記者がイベント又はインタビューを取り込む場合、空間位置情報は、見られているものとは異なる映像ストリーム中の、よりよい音声又は映像が入手可能でありうるところを示すことができる。 In one such configuration, multiple sensors capture audio or video and / or auxiliary data. The captured information can be provided to a cloud resource or other processing system where these multiple streams can be analyzed for presentation, combined, or selected between them. For example, the relative positions of multiple cameras can be calculated from associated position data (eg, encoded with the video). If one camera is within the field of view of another camera, its presence or position is indicated in the user interface (eg, by highlighting a rectangle in the viewing frame), thereby distinguishing it from the viewer Can be used. If the video reporter captures an event or interview, the spatial location information can indicate where better audio or video in a different video stream than is being viewed may be available.

あるイベントの複数の視点を解析し組み合わせて、イベントの３次元モデルを提示することができる。取り込まれた位置（動き）データにより、カメラの動きを補償すること、及び画像ごとに、カメラの向き、焦点距離、視野などに基づいて変換／再マッピングすることが可能である。 Multiple viewpoints of an event can be analyzed and combined to present a 3D model of the event. With the captured position (motion) data, it is possible to compensate for camera motion and to transform / remapping for each image based on camera orientation, focal length, field of view, etc.

複数の音声センサが１つの空間に配置され、それぞれがまた補助情報（例えば、それぞれの瞬間位置）を収集する場合、様々な効果を生成できる豊富なデータセットが得られる。それぞれの演技者がこのようなマイクロフォンを有する動画を映画化することを考える。結果として得られる音声／補助データストリームは、別々の需要者に別々に提示することができる。一需要者は、その演技の中心にいるかのように音声を聞きたいであろう。（例えば、マイクロフォンの位置の平均を計算することができ、その位置での、それぞれのマイクロフォンによる音場寄与を、適宜に人の両耳聴の既知のモデルに基づいてモデル化されたステレオ指向性と共に、計算し提示することができる。）別の需要者は、特定の女優を追い、その女優が聞いたように音声を聞きたいであろう。この場合もやはり、このようなシステムは、女優が動きまわるときに、その女優の位置での正味の音場を決定することができる。（女優の声が常に主であり、他は、女優からの距離に応じて聞こえる。）環境を提示することが、例えば右前、左前、後方中央の、多くの話し手を含む場合（例えば、マルチチャネルサラウンドサウンド）、データは、需要者に提示されるときに、音量だけでなく３６０度の指向性も考慮に入れて処理することができる。需要者は、静止又は動きのある元の演技の中で、本質的にどの仮想聴取点も選ぶことができる。 If multiple audio sensors are placed in one space and each also collects auxiliary information (eg, each instantaneous position), a rich data set can be obtained that can produce various effects. Consider that each performer turns a movie with such a microphone into a movie. The resulting audio / auxiliary data stream can be presented separately to different consumers. A consumer would like to hear the voice as if they were at the center of the performance. (For example, the average of microphone positions can be calculated, and the stereo field directivity modeled based on the known model of human binaural hearing, where appropriate, at each position, the sound field contribution by each microphone. Along with that, another consumer would follow a particular actress and would like to hear the voice as that actress heard. Again, such a system can determine the net sound field at the location of the actress as the actress moves around. (Actress's voice is always the main, others are heard depending on the distance from the actress.) When presenting the environment includes many speakers, eg front right, front left, back center (eg multi-channel Surround sound), the data can be processed taking into account 360 degree directivity as well as volume when presented to the consumer. The consumer can choose essentially any virtual listening point in the original performance with static or motion.

さらに別の実施形態では、需要者の位置を感知することができ（例えば、所持されたスマートフォンを使用して）、音声ソースは、需要者の位置及び移動に応じてプロセッサ（例えば、需要者のスマートフォン）によって提示することができる。例えば、カクテルパーティの音声提示では（前の段落の構成を用いて録音されたもの）、需要者は物理的な空間内を移動して、特に興味のある会話を立ち聞きすることができる。これは、マルチチャネルサラウンドサウンドを用いずに、（例えば、需要者のスマートフォンに差し込まれた簡単なイヤホンを用いて）行うことができ、又は、マルチチャネルスピーカ構成を使用することができる。 In yet another embodiment, the consumer's location can be sensed (eg, using a possessed smartphone) and the audio source can be processor (eg, the consumer's location) depending on the consumer's location and movement. Smartphone). For example, in a cocktail party audio presentation (recorded using the structure of the previous paragraph), a consumer can move through physical space and listen to conversations of particular interest. This can be done without using multi-channel surround sound (eg, using a simple earphone plugged into the consumer's smartphone), or a multi-channel speaker configuration can be used.

示したばかりの構成は、プレーヤがシステムと対話できる、又は他の、時間及び場所に関して現実／仮想のプレーヤと対話できるコンピュータベースのゲームにおいて、特定の適応可能性を有する。 The configuration just shown has particular applicability in computer-based games where the player can interact with the system or other real / virtual players with respect to time and place.

譲受人の特許第７，１９７，１６０号では、位置情報（例えば、緯度／経度）をどのように画像及び音声の中にステガノグラフィで符号化できるかを詳述している。 Assignee's Patent No. 7,197,160 details how position information (eg, latitude / longitude) can be encoded in images and audio with steganography.

携帯電話に言及してきたが、本技術には、電話及びそれ以外の携帯型と固定型の両方の、あらゆる種類のデバイスで有用性が見出されることを理解されたい。 Although reference has been made to mobile phones, it should be understood that the present technology finds utility in all types of devices, both telephones and other portable and fixed types.

（特に企図された携帯電話の中にはＡｐｐｌｅｉＰｈｏｎｅ４、及びＧｏｏｇｌｅのアンドロイド仕様に準拠する携帯電話、例えばＨＴＣＥｖｏ４Ｇ及びＭｏｔｏｒｏｌａＤｒｏｉｄＸがある。タッチインタフェースを含むｉＰｈｏｎｅの詳細は、Ａｐｐｌｅの特許出願公開第２００８／０１７４５７０号に提示されている。） (Particularly contemplated mobile phones include Apple iPhone 4 and mobile phones that comply with Google's Android specification, such as HTC Evo 4G and Motorola Droid X. Details of iPhone including the touch interface can be found in Apple's patent application. (It is presented in Publication No. 2008/0174570)

発信ユニット及び受信ユニットとしての機能を果たすことができる携帯電話及び他のデバイスの基本設計は、当業者によく知られている。一般的に言えば、それぞれが１つ又は複数のプロセッサ、１つ又は複数のメモリ（例えば、ＲＡＭ）、記憶装置（例えば、ディスク又はフラッシュメモリ）、ユーザインタフェース（例えば、キーボード、ＴＦＴ液晶ディスプレイ又は有機ＬＥＤディスプレイ画面、タッチ又は他の手ぶりセンサ、及びグラフィックユーザインタフェースを実装するソフトウェア命令を含みうる）、これらの要素の間の相互接続部（例えば、バス）、及び他のデバイスと通信するためのインタフェース（ＧＳＭ、ＣＤＭＡ、Ｗ−ＣＤＭＡ、ＣＤＭＡ２０００、ＴＤＭＡ、ＥＶ−ＤＯ、ＨＳＤＰＡ、ＷｉＦｉ、ＷｉＭａｘ、又はＢｌｕｅｔｏｏｔｈなどの無線、及び／又はイーサネットローカルエリアネットワーク、Ｔ−Ｉインターネット接続などの有線とすることができる）を含む。発信デバイスは通常、カメラ及び／又はマイクロフォンを、上記の補助データを得るための１つ又は複数の他の構成要素／システムと共に含む。 The basic design of mobile phones and other devices that can serve as a sending unit and a receiving unit are well known to those skilled in the art. Generally speaking, each one or more processors, one or more memories (eg RAM), storage devices (eg disk or flash memory), user interfaces (eg keyboard, TFT liquid crystal display or organic) For communicating with LED display screens, touch or other hand gesture sensors, and software instructions implementing a graphic user interface), interconnects (eg, buses) between these elements, and other devices Interface (wireless such as GSM, CDMA, W-CDMA, CDMA2000, TDMA, EV-DO, HSDPA, WiFi, WiMax, or Bluetooth, and / or wired such as Ethernet local area network, TI Internet connection, etc. Theft, including the can). The calling device typically includes a camera and / or microphone along with one or more other components / systems for obtaining the auxiliary data described above.

詳述した機能を実施するためのソフトウェア命令は、当業者であれば、本明細書に提示された説明から容易に書くことができ、例えば、Ｃ、Ｃ＋＋、ＶｉｓｕａｌＢａｓｉｃ、Ｊａｖａ、Ｐｙｔｈｏｎ、ＴｃＩ、Ｐｅｒｌ、Ｓｃｈｅｍｅ、Ｒｕｂｙなどで書かれる。本技術による携帯電話及び他のデバイスは、様々な機能及びステップを行うためのソフトウェアモジュールを含むことができる。 Software instructions for performing the detailed functions can be easily written by those skilled in the art from the description provided herein, such as C, C ++, Visual Basic, Java, Python, TcI, Written in Perl, Scheme, Ruby, etc. Mobile phones and other devices according to the present technology can include software modules for performing various functions and steps.

通常、各デバイスは、ハードウェア資源及び汎用機能とのインタフェースを提供するオペレーティングシステムソフトウェアを含むと共に、ユーザが所望する特定のタスクを実行するように選択的に呼び出すことができるアプリケーションソフトウェアも含む。知られている音声コーデック及び映像コーデック、ブラウザソフトウェア、通信ソフトウェア、及び媒体処理ソフトウェアは、本明細書に詳述した多くの用途に適合させることができる。ソフトウェア及びハードウェアのコンフィグレーションデータ／命令は通常、１つ又は複数のデータ構造体に命令として記憶され、このデータ構造体は、ネットワーク全体にわたってアクセスできる磁気ディスク又は光ディスク、メモリカード、ＲＯＭなどの実体のある媒体によって伝達される。いくつかの実施形態は、埋め込みシステム、すなわち特定目的コンピュータシステムとして実施することができ、このシステムでは、そのオペレーティングシステムソフトウェアとアプリケーションソフトウェアは、ユーザには見分けが付かない（例えば、基本的な携帯電話の場合に一般的であるように）。本明細書に詳述した機能は、オペレーティングシステムソフトウェア、アプリケーションソフトウェアの形で実施でき、及び／又は埋め込みシステムソフトウェアとして実施することができる。 Each device typically includes operating system software that provides an interface to hardware resources and general-purpose functions, as well as application software that can be selectively invoked to perform a specific task desired by the user. Known audio and video codecs, browser software, communication software, and media processing software can be adapted for many applications detailed herein. Software and hardware configuration data / instructions are typically stored as instructions in one or more data structures, which are entities such as magnetic or optical disks, memory cards, ROMs, etc. that can be accessed throughout the network. Transmitted by some medium. Some embodiments may be implemented as an embedded system, ie a special purpose computer system, in which the operating system software and application software are indistinguishable to the user (eg, a basic mobile phone As is common in the case of). The functions detailed herein may be implemented in the form of operating system software, application software, and / or may be implemented as embedded system software.

様々な機能を様々なデバイスで実施することができる。ある動作がある特定のデバイスによって実施されるという説明（例えば、発信携帯電話がレンズデータをＭＰＥＧストリームの音声チャネルの中に符号化する）は限定的ではなく例示的なものである。別のデバイスによる動作の実施（例えば、後続デバイスがＭＰＥＧストリームとは別にレンズデータを受け取る、又はレンズデータを知っている）もまた、明白に企図されている。 Different functions can be implemented on different devices. The description that an operation is performed by a particular device (eg, the originating mobile phone encodes lens data into the audio channel of the MPEG stream) is exemplary rather than limiting. Implementation of operations by another device (e.g., a subsequent device receives or knows lens data separate from the MPEG stream) is also explicitly contemplated.

（同様に、データがある特定のデバイスに記憶されるという説明もまた例示的なものであり、データは、局在デバイス、遠隔のデバイス、クラウド内、分散されるなど、どこにでも記憶することができる。） (Similarly, the description that data is stored on a particular device is also exemplary, and data can be stored anywhere, such as localized devices, remote devices, in the cloud, distributed, etc. it can.)

諸動作は、明確に特定されたハードウェアによって限定的に実施される必要はない。むしろ、一部の動作では、他のサービス（例えば、クラウドコンピューティング）を外部参照することができ、これらのサービスでは、さらに別の、一般に匿名のシステムによってサービスの実行をする。このような分散システムは、大規模であることもあり（例えば、世界中のコンピューティング資源を伴う）、局所的であることもある（例えば、ある携帯デバイスがブルートゥース通信を介して近くのデバイスを識別した場合に、これら近くのデバイスの１つ以上を、局所地形のデータを提供するなどのタスクに関与させる。） The operations need not be limitedly performed by clearly specified hardware. Rather, in some operations, other services (e.g., cloud computing) can be externally referenced, and these services perform the service with yet another, generally anonymous system. Such distributed systems can be large (eg, with computing resources around the world) or local (eg, a portable device can locate nearby devices via Bluetooth communications). If identified, one or more of these nearby devices are involved in tasks such as providing local terrain data.)

本開示では、説明的な諸実施形態における諸ステップの特定の順序付け、及び要素の特定の組合せを詳述したが、他の企図された方法では、諸ステップを順序付けし直すことができ（場合により一部を省き、また他のものを追加して）、他の企図された組合せでは、一部の要素を省き、また他のものを追加することなどができることを理解されたい。 Although this disclosure details specific ordering of steps and specific combinations of elements in illustrative embodiments, other contemplated methods can reorder the steps (in some cases It should be understood that some contemplated combinations may omit some elements, add others, etc., with some omitted and others added).

完全なシステムとして開示したが、詳述した構成の下位の組合せもまた、別に企図されている。 Although disclosed as a complete system, sub-combinations of the detailed configurations are also contemplated separately.

コンテンツ信号（例えば、画像信号、音声信号、変換器信号など）の詳細な処理には、これらの信号を様々な物理的形状に変換することが含まれることを理解されたい。画像及び映像（物理的空間を通って伝わり、物理的対象を描写する電磁波の形）は、カメラ又は他の取込み機器を使用して物理的対象から取り込むことができ、コンピューティングデバイスによって生成することもできる。同様に、物理的媒体を通って伝わる音圧波は、音声変換器（例えば、マイクロフォン）を使用して取り込み、電子信号（デジタル又はアナログの形式）に変換することができる。これらの信号は通常、上述の構成要素又は処理を実施するために、電子的でデジタル的な形式に処理されるが、さらに電子波、光波、磁気波及び電磁波の形を含む他の物理的な形で取り込み、処理、変換及び記憶することもできる。コンテンツ信号は、様々な方法で、様々な目的のために処理時に変換され、それによって、信号及び関連情報の様々なデータ構造表現が生成される。次いで、メモリ内のデータ構造信号は、探索、整列、読出し、書込み及び検索時の操作のために変換される。信号はまた、取込み、転送、記憶、及びディスプレイ又は音声変換器（例えば、スピーカ）による出力のためにも変換される。 It should be understood that detailed processing of content signals (eg, image signals, audio signals, transducer signals, etc.) includes converting these signals into various physical shapes. Images and video (the form of electromagnetic waves that travel through physical space and depict physical objects) can be captured from physical objects using a camera or other capture device and generated by a computing device You can also. Similarly, sound pressure waves traveling through a physical medium can be captured using an audio transducer (eg, a microphone) and converted to an electronic signal (digital or analog form). These signals are typically processed in electronic and digital form to perform the components or processes described above, but also other physical forms including electronic, light, magnetic and electromagnetic forms. It can also be captured, processed, converted and stored in the form. The content signal is transformed in processing in various ways and for various purposes, thereby producing various data structure representations of the signal and related information. The data structure signals in the memory are then converted for operations during search, alignment, reading, writing and searching. The signal is also converted for capture, transfer, storage, and output by a display or audio converter (eg, a speaker).

本明細書に詳述されている特徴及び構成は、２００９年６月１２日に出願された同時係属の出願第１２／４８４，１１５号（米国特許出願公開第２０１０／００４８２４２号として公開）に詳述されている特徴及び構成と組み合わせて用いることができる。例えば、出願第１２／４８４，１１５号に詳述されている物体認識アーキテクチャ及び技術は、本技術の実施の際に使用することができる。本出願で開示された方法、要素及び概念は、第１２／４８４，１１５号出願に詳述された方法、要素及び概念と組み合わされることが意図されている（逆も同様である）ことを理解されたい。このような組合せ全ての実施が、提示された教示により当業者には簡単明瞭なことである。 The features and arrangements detailed herein are detailed in co-pending application Ser. No. 12 / 484,115 filed Jun. 12, 2009 (published as US Patent Application Publication No. 2010/0048242). It can be used in combination with the features and configurations described. For example, the object recognition architecture and techniques detailed in application Ser. No. 12 / 484,115 can be used in the practice of this technique. It is understood that the methods, elements and concepts disclosed in this application are intended to be combined with the methods, elements and concepts detailed in the 12 / 484,115 application (and vice versa) I want to be. The implementation of all such combinations will be readily apparent to those skilled in the art from the teachings presented.

いくつかの実施形態では、発信デバイスは、１人又は複数の人からの生体計測情報（例えば、指紋、心拍、網膜パターン、呼吸数など）を取り込む。このデータは、この場合もやはり、例えば、音声／映像データをどのように提示すべきか（提示すべきかどうか）を決定する際に、受信ユニットで用いることができる。 In some embodiments, the calling device captures biometric information (eg, fingerprint, heart rate, retinal pattern, respiratory rate, etc.) from one or more people. This data can again be used by the receiving unit, for example, in determining how to present (whether to present) audio / video data.

詳述した技術は特に、ユーチューブなどのユーザ生成コンテンツ（ＵＧＣ）サイトで有用である。本明細書に詳述した補助データは、ＵＧＣサイトで受信及び記憶し、説明した方法で他のユーザが使用するために、後でそのユーザに提供することができる。別法として、ＵＧＣサイトでは、補助データを使用して音声／映像データを処理し、その後に、処理された音声／映像データだけをユーザに提供することもできる。（つまり、図２に示された「受信ユニット」は、実際は、音声及び／又は映像をさらに別のユニットに提供する中間処理ユニットになりうる。） The techniques detailed are particularly useful for user generated content (UGC) sites such as YouTube. The ancillary data detailed herein can be received and stored at the UGC site and later provided to the user for use by other users in the manner described. Alternatively, at the UGC site, the auxiliary data can be used to process the audio / video data and then only the processed audio / video data is provided to the user. (That is, the “receiving unit” shown in FIG. 2 may actually be an intermediate processing unit that provides audio and / or video to yet another unit.)

ＧＰＳデータに繰返し言及した。このデータは、任意の位置関連情報を簡潔に表現するものとして理解されたい。この情報は、人工衛星の全地球測位システム座標から得られなくてもよい。例えば、位置データを生成するのに適している別の技術は、デバイス（例えば、ＷｉＦｉ、携帯電話など）の間で通常に交換される無線信号に依拠する。いくつかの通信デバイスを考えると、信号自体、及びその信号を制御する不完全なデジタルクロック信号は、参照システムを形成し、このシステムから高精度の時間と位置の両方を抽出することができる。このような技術は、国際公開第０８／０７３３４７号パンフレットに詳述されている。当業者であれば、他のいくつかの位置推定技術について、到着時間技法に基づくもの、並びにラジオ及びテレビジョンの放送塔の位置に基づくもの（Ｒｏｓｕｍによって提案されている）、及びＷｉＦｉノードに基づくもの（ＳｋｙｈｏｏｋＷｉｒｅｌｅｓｓによって提案され、ｉＰｈｏｎｅで使用されている）などを含み、精通しているであろう。 Reference was made repeatedly to GPS data. This data should be understood as a concise representation of any location related information. This information may not be obtained from the global positioning system coordinates of the satellite. For example, another technique that is suitable for generating location data relies on radio signals that are normally exchanged between devices (eg, WiFi, cell phones, etc.). Considering some communication devices, the signal itself and the imperfect digital clock signal that controls the signal form a reference system from which both high precision time and position can be extracted. Such a technique is described in detail in WO 08/073347. The person skilled in the art is based on several other location estimation techniques, based on time-of-arrival techniques, and based on the location of radio and television broadcast towers (proposed by Rosum), and based on WiFi nodes. Stuff (suggested by Skyhook Wireless and used in iPhone) and others will be familiar.

位置情報は通常、緯度データ及び経度データを含むが、別法として、より多くのデータ、より少ないデータ、又は別のデータを含むこともできる。例えば、位置情報は、磁力計から得られるコンパス方向などの方位情報、又はジャイロスコープセンサ若しくは他のセンサから得られる傾き情報を含むことができる。位置情報はまた、デジタル高度計システムから得られるものなど、高度情報を含むこともできる。 The location information typically includes latitude data and longitude data, but may alternatively include more data, less data, or other data. For example, the position information can include orientation information such as a compass direction obtained from a magnetometer, or tilt information obtained from a gyroscope sensor or other sensor. The location information can also include altitude information, such as that obtained from a digital altimeter system.

Ｄｉｇｉｍａｒｃは、本対象に関連する他の様々な特許文献を有する。例えば、２０１０年７月１３日出願の出願第１２／８３５，５２７号、国際公開第２０１００２２１８５号パンフレット、及び特許第６，９４７，５７１号を参照されたい。 Digimarc has various other patent documents related to this subject. See, for example, Application No. 12 / 835,527, filed July 13, 2010, International Publication No. 20120022185, and Patent No. 6,947,571.

本明細書で説明した技術の無数の変換物及び組合せを明確に分類整理することは不可能である。出願者は、本明細書の諸概念が、これら概念の中と間の両方での組み合わせ、置き換え、交換、並びに引用された従来技術から分かる諸概念との組み合わせ、置き換え、交換が可能であることを認識し、意図している。さらに、詳述した技術は、有利な効果のために、現在の技術、及びこれから出てくる他の技術と一緒にできることを理解されたい。 It is impossible to clearly classify the myriad transformations and combinations of the techniques described herein. Applicants must be able to combine, replace and exchange the concepts herein, both within and between these concepts, as well as combine, replace and exchange with concepts known from the prior art cited. Recognize and intend. Furthermore, it should be understood that the techniques described in detail can be combined with current techniques and other techniques that will emerge from this for advantageous effects.

本明細書を過度に長くすることなく広範な開示を行うために、出願者は、上記で言及した文献及び特許開示を参照により組み込む。（このような文献は、その教示の特定のものに関連して上記で引用されていても、その全体が組み込まれる。）これらの参照文献は、本明細書に詳述した構成に組み込むことができる技術及び技法、並びに本明細書に詳述した技術及び技法を組み込むことができる技術及び技法を開示している。 In order to provide a broad disclosure without unduly lengthening this specification, Applicants incorporate by reference the literature and patent disclosures referred to above. (Such references are incorporated in their entirety, even if cited above in connection with specific ones of their teachings.) These references are incorporated into the configurations detailed herein. Disclosed are technologies and techniques that can be incorporated, as well as those that can incorporate the techniques and techniques detailed herein.

Claims

Receiving video information and converting the video information for representation within a video portion of a single data object;
Receiving audio information and converting the audio information for representation within the audio portion of a single data object;
Receiving sensor information including at least one parameter relating to acceleration, orientation or tilt, and transforming the sensor information for representation in the single data object;
Transmitting the single data object to a data receiver or storing the single data object on a computer readable medium, wherein the sensor information is structured by the single data object with the audio and video information. And thus adapted for use by a processor in modifying said audio or video information.

The method of claim 1, comprising modifying the audio or video information using a processor according to at least a portion of the sensor information.

The method of claim 1, comprising converting the sensor information for representation within the video portion of the single data object.

The method of claim 1, wherein the single data object comprises an MPEG data stream or an MPEG data file.

The method of claim 1, comprising converting the sensor information to a frequency range near or at the bottom of the range that a person can hear.

The method according to claim 1, comprising expressing the sensor information as a signal hidden by steganography in the audio information or video information.

The method of claim 1, wherein the sensor data includes acceleration data.

The method of claim 1, wherein the sensor data includes orientation data.

The method of claim 1, wherein the sensor data includes tilt data.

Receiving camera data and transforming the single data object for representation in the audio portion, wherein the camera data includes focus, zoom, aperture size, depth of field, exposure time, ISO settings The method of claim 1, further comprising the step of including at least one parameter related to a value and / or depth of focus.

Receiving a single data object;
Recovering audio data from the audio portion of the single data object;
Recovering sensor data from the audio portion of the single data object, the sensor data including at least one parameter related to acceleration, orientation, or tilt;
Including
The method wherein at least one of the recovering steps is performed by a hardware processor.

The method of claim 11, wherein the step of recovering sensor data includes the step of subjecting the audio data to a steganographic decoding process to extract the sensor data from the audio data.

The method of claim 11, comprising modifying the recovered voice data according to at least a portion of the sensor data.

The method of claim 11, further comprising recovering video data from a video portion of the single data object and modifying the recovered video according to at least a portion of the sensor data.

Receiving a single data object;
Recovering audio data from the audio portion of the single data object;
Recovering camera data from the audio portion of the single data object, wherein the camera data includes focus, zoom, aperture size, depth of field, exposure time, ISO setpoint, and / or depth of focus. Including at least one parameter related to
Including
The method wherein at least one of the recovering steps is performed by a hardware processor.

The method of claim 15, comprising modifying the recovered voice data according to at least a portion of the sensor data.

16. The method of claim 15, further comprising recovering video data from a video portion of the single data object and modifying the recovered video according to at least a portion of the sensor data.

Receiving a single data object;
Recovering both video data and sensor data from the single data object, the sensor data including at least one parameter related to acceleration, orientation, or tilt;
Processing the video data according to at least a portion of the sensor data to obtain modified video data for presentation to a user;
Including
A method wherein at least one of the steps is performed by a hardware processor.

Obtaining user selection data;
Processing the video data using at least a portion of the sensor data according to the user selection data;
Presenting the processed video to a user;
The method of claim 18 comprising:

The method of claim 19, comprising compensating for vibration of the image by using the sensor data.

20. The method of claim 19, comprising obtaining the user selection data from the user via a user interface.

A computer readable storage medium containing non-transitory software instructions, wherein the instructions are on a processor programmed with the instructions,
Recovering both video data and sensor data from a received single data object, wherein the sensor data includes at least one parameter related to acceleration, orientation or tilt;
Processing the video data according to at least a portion of the sensor data to produce modified video data for presentation to a user;
A computer-readable storage medium that executes

Receiving a single data object;
Recovering both video data and camera data from the single data object, wherein the camera data includes focus, zoom, aperture size, depth of field, exposure time, ISO setting, and / or lens focus. Including at least one parameter related to the distance;
Processing the video data according to at least a portion of the camera data to obtain modified video data for presentation to a user;
Including
A method wherein at least one of the steps is performed by a hardware processor.

Receiving a single data object;
Recovering both audio data and sensor data from the single data object, wherein the sensor data includes at least one parameter related to acceleration, orientation or tilt;
Processing the audio data according to at least a portion of the sensor data to obtain modified audio data for presentation to a user;
Including
A method wherein at least one of the steps is performed by a hardware processor.

Receiving a single data object;
Recovering both audio data and camera data from the single data object, wherein the camera data is a focus, zoom, aperture size, depth of field, exposure time, ISO setting, and / or lens focus. Including at least one parameter related to the distance;
Processing the audio data according to at least a portion of the camera data to obtain modified audio data for presentation to a user;
Including
A method wherein at least one of the steps is performed by a hardware processor.

26. The method of claim 25, wherein the step of processing includes a sub-step of modifying audio amplitude according to camera focus data to obtain dimensional audio.

Collecting sensor information from sensors carried by a movable object in a sporting event;
Collecting video information from the sporting event using a camera located far away from the movable object;
Creating a single data object that includes both data corresponding to the collected sensor information and data corresponding to the collected video information;
Storing the single data object in a computer readable storage medium or transmitting the single data object to a data receiver;
Including a method.

28. The method of claim 27, comprising collecting the sensor information from a sensor carried by a pack, ball, horse, competitor, or car.

28. The method of claim 27, wherein the sensor data includes at least one of acceleration data, orientation data, position data, and / or tilt data.

A mobile phone comprising a processor and at least a first sensor and a second sensor, wherein the processor detects information sensed by the first sensor and information sensed by the second sensor. A mobile phone configured to create a single data object including, wherein the first sensor comprises an image sensor or an audio sensor and the second sensor comprises an acceleration sensor, an orientation sensor, or a tilt sensor.

The mobile phone of claim 30, wherein the other sensor comprises an acceleration sensor.

The mobile phone of claim 30, wherein the other sensor comprises an orientation sensor.

The mobile phone of claim 30, wherein the other sensor comprises a tilt sensor.