CN111784615A

CN111784615A - Method and device for multimedia information processing

Info

Publication number: CN111784615A
Application number: CN202010738347.1A
Authority: CN
Inventors: 肖宇; 李艳丽; 雷娟; 张文波; 高波; 熊君君
Original assignee: Beijing Samsung Telecommunications Technology Research Co Ltd; Samsung Electronics Co Ltd
Current assignee: Beijing Samsung Telecommunications Technology Research Co Ltd; Samsung Electronics Co Ltd
Priority date: 2016-03-25
Filing date: 2016-03-25
Publication date: 2020-10-16
Also published as: US20200227089A1; EP3403413A4; US11081137B2; US20170278546A1; EP3403413B1; WO2017164716A1; CN107230187A; EP3403413A1; EP3716635A1; CN107230187B

Abstract

The invention provides a method for processing video information executed by electronic equipment, which comprises the following steps: obtaining, by a first multimedia capturing device, first video information corresponding to a first focal region; obtaining, by a second multimedia capture device, second video information corresponding to a second focal region; and displaying the first video information and the second video information on a display of the electronic device.

Description

Method and device for multimedia information processing

本申请是申请日为2016年03月25日、申请号为201610179848.4、发明名称为“多媒体信息处理的方法和装置”的发明专利申请的分案申请。This application is a divisional application of an invention patent application with an application date of March 25, 2016, an application number of 201610179848.4, and an invention titled "Method and Device for Multimedia Information Processing".

技术领域technical field

本发明涉及多媒体信息处理领域，具体而言，本发明涉及一种多媒体信息处理的方法和一种多媒体信息处理的装置。The present invention relates to the field of multimedia information processing, and in particular, the present invention relates to a method for processing multimedia information and a device for processing multimedia information.

背景技术Background technique

随着生活水平的提升，带有拍摄装置的终端设备变得越来越普及。获取高质量的图像和视频也成为提升终端设备竞争力的一个重要因素。已有的拍摄增强大都集中在提升图像质量方面，对视频质量的提升涉及较少。相比于图像增强，视频增强由于受时空一致性、处理时间有限的影响，因此实现起来相对困难。With the improvement of living standards, terminal equipment with photographing devices has become more and more popular. Obtaining high-quality images and videos has also become an important factor in enhancing the competitiveness of terminal equipment. Most of the existing shooting enhancements focus on improving the image quality, and the improvement of the video quality is less involved. Compared with image enhancement, video enhancement is relatively difficult to achieve due to the impact of spatial and temporal consistency and limited processing time.

在现有带有拍摄功能的终端设备中，如手机，采集的视频和图像质量往往达不到用户的需求。虽然有一些中高端终端设备采集的图像质量已经得到了很大的提升，但是视频质量还有很大的提升空间，尤其是在低光照环境下采集的视频。视频处理要比图像处理困难很多，主要有两方面原因：1)图像和视频帧的处理时间不同，例如在频率30fps视频中，处理每个视频帧的时间需控制在1/30秒内，而处理图像的时间可以更长，因此图像增强方法可根据场景灵活自动调整曝光、白平衡等参数，甚至采集多幅图像合成一幅图像，这使得拍摄出的图像比视频的亮度和色彩更准确；2)视频需要保持时空一致性，而图像没有时空限制，为了保证时空一致性，视频相邻帧间的视频采集参数，包括白平衡、曝光和对焦，需有平滑过渡，因此如果场景有突出变换，例如从室内到室外的光照变换，终端对视频采集参数调整有滞后性，而图像采集参数都是针对当前场景的亮度和颜色而定，这也导致拍摄出的图像比视频的亮度和色彩更准确。In the existing terminal devices with shooting functions, such as mobile phones, the quality of the collected videos and images often cannot meet the needs of users. Although the quality of images captured by some mid-to-high-end terminal devices has been greatly improved, there is still much room for improvement in video quality, especially for videos captured in low-light environments. Video processing is much more difficult than image processing. There are two main reasons: 1) The processing time of images and video frames is different. For example, in a video with a frequency of 30fps, the processing time of each video frame needs to be controlled within 1/30 of a second, while The image processing time can be longer, so the image enhancement method can flexibly and automatically adjust parameters such as exposure and white balance according to the scene, and even collect multiple images to synthesize one image, which makes the brightness and color of the captured image more accurate than that of the video; 2) Video needs to maintain temporal and spatial consistency, while images have no temporal and spatial constraints. In order to ensure temporal and spatial consistency, the video acquisition parameters between adjacent frames of the video, including white balance, exposure and focus, need to have smooth transitions. Therefore, if the scene has prominent changes , For example, when the lighting changes from indoor to outdoor, the terminal has a lag in adjusting the video acquisition parameters, and the image acquisition parameters are determined according to the brightness and color of the current scene, which also causes the captured image to be more bright and color than the video. precise.

现有技术中，对图像或视频的增强技术主要集中于采用相应的算法来对图像或视频进行增强，即基于视频或图像自身信息进行增强，增强效果均不太理想，存在增强后的图像及视频失真、清晰度不足等问题。In the prior art, enhancement techniques for images or videos mainly focus on using corresponding algorithms to enhance images or videos, that is, enhancing images or videos based on their own information. Video distortion, lack of clarity and other issues.

发明内容SUMMARY OF THE INVENTION

针对现有技术中对多媒体信息进行增强处理的局限性问题，本发明提出一种多媒体信息处理的方法，包括：Aiming at the limitation problem of carrying out enhanced processing to multimedia information in the prior art, the present invention proposes a method for processing multimedia information, including:

获取两个多媒体采集设备分别采集的第一类多媒体信息和第二类多媒体信息；Acquiring the first type of multimedia information and the second type of multimedia information respectively collected by the two multimedia collection devices;

根据第一类多媒体信息对第二类多媒体信息进行相应处理。Corresponding processing is performed on the second type of multimedia information according to the first type of multimedia information.

其中，多媒体信息包括图像信息、视频信息、音频信息中的至少一种。Wherein, the multimedia information includes at least one of image information, video information, and audio information.

优选地，第一类多媒体信息为图像信息，第二类多媒体信息为视频信息；或第一类多媒体信息为视频信息，第二类多媒体信息为图像信息。Preferably, the first type of multimedia information is image information, and the second type of multimedia information is video information; or the first type of multimedia information is video information, and the second type of multimedia information is image information.

优选地，根据第一类多媒体信息对第二类多媒体信息进行相应处理，具体包括：Preferably, the second type of multimedia information is processed correspondingly according to the first type of multimedia information, specifically including:

确定采集的第二类多媒体信息对应的需要增强的指标；Determine the indicators that need to be enhanced corresponding to the collected second type of multimedia information;

根据采集的第一类多媒体信息，对采集的第二类多媒体信息对应的确定出的指标进行增强处理。According to the collected multimedia information of the first type, enhance processing is performed on the determined index corresponding to the collected multimedia information of the second type.

其中，指标包括下述至少一项：The indicators include at least one of the following:

分辨率、颜色、亮度、噪声和模糊。Resolution, color, brightness, noise and blur.

优选地，通过以下至少一项来确定采集的第二类多媒体信息对应的需要增强的指标：Preferably, the index that needs to be enhanced corresponding to the collected second type of multimedia information is determined by at least one of the following:

根据检测到的增强开启触发操作来确定与其相匹配的需要增强的指标；Determine the matching indicator that needs to be enhanced according to the detected enhanced triggering operation;

根据预先设置来确定与其相匹配的需要增强的指标；Determine the indicators that need to be enhanced according to the preset settings;

依据自适应参数匹配方式来自适应确定需要增强的指标。The index that needs to be enhanced is adaptively determined according to the adaptive parameter matching method.

优选地，自适应参数匹配方式通过设备相关状态、增强开启历史记录数据、采集环境、采集参数及多媒体采集设备实时采集的多媒体信息的相关内容中的一项或多项信息来确定；Preferably, the adaptive parameter matching mode is determined by one or more pieces of information in the relevant state of the equipment, the enhanced open historical record data, the collection environment, the collection parameters and the relevant content of the multimedia information collected in real time by the multimedia collection equipment;

设备相关状态包含以下至少一项：设备电量状态、设备存储状态、采集多媒体信息时的设备运动状态；The device-related status includes at least one of the following: device power status, device storage status, and device motion status when collecting multimedia information;

多媒体采集设备实时采集的多媒体信息的相关内容包括以下至少一项：场景亮度、语义内容、显著物的清晰度。The relevant content of the multimedia information collected in real time by the multimedia collection device includes at least one of the following: scene brightness, semantic content, and definition of salient objects.

可选地，该方法还包括：Optionally, the method further includes:

若确定出的需要增强的指标为至少两个，则确定需要增强的指标的增强顺序；If it is determined that there are at least two indicators that need to be enhanced, then determine the enhancement sequence of the indicators that need to be enhanced;

根据采集的第一类多媒体信息，对采集的第二类多媒体信息对应的确定出的指标进行增强处理，具体包括：According to the collected first type of multimedia information, enhance processing is performed on the determined index corresponding to the collected second type of multimedia information, specifically including:

根据采集的第一类多媒体信息，按照确定出的增强顺序，对采集的第二类多媒体信息对应的需要增强的指标依次进行增强处理。According to the collected multimedia information of the first type, and in accordance with the determined enhancement sequence, the indexes that need to be enhanced corresponding to the collected multimedia information of the second type are sequentially enhanced.

优选地，通过以下至少一项来确定需要增强的指标的增强顺序：Preferably, the enhancement order of the indicators that need to be enhanced is determined by at least one of the following:

增强顺序设置触发操作；预先设置；自适应增强顺序设置方式。Enhancement sequence setting trigger operation; pre-set; adaptive enhancement sequence setting method.

优选地，自适应增强顺序设置方式通过设备相关状态、增强设置历史记录信息、采集环境、采集参数、多媒体采集设备实时采集的多媒体信息的相关内容及各个指标之间的影响关系中的一项或多项信息来确定；Preferably, the self-adaptive enhancement sequence setting method is configured by one of the relevant status of the device, the enhancement setting history record information, the collection environment, the collection parameters, the relevant content of the multimedia information collected in real time by the multimedia collection device, and the influence relationship between the various indicators or multiple pieces of information to determine;

其中，多媒体采集设备实时采集的多媒体信息的相关内容包括场景亮度、语义内容中的至少一项。The relevant content of the multimedia information collected in real time by the multimedia collection device includes at least one item of scene brightness and semantic content.

可选地，该方法还包括：设置两个多媒体采集设备中的主采集设备及辅采集设备；Optionally, the method also includes: setting a main collection device and an auxiliary collection device in the two multimedia collection devices;

根据图像信息对视频信息进行相应处理时，通过主采集设备采集获取视频信息，通过辅采集设备采集获取图像信息；When the video information is processed according to the image information, the video information is acquired by the main acquisition device, and the image information is acquired by the auxiliary acquisition device;

根据视频信息对图像信息进行相应处理时，通过主采集设备采集图像信息，通过辅采集设备采集视频信息。When the image information is processed according to the video information, the image information is collected through the main collection device, and the video information is collected through the auxiliary collection device.

优选地，通过以下至少一种方式来设置两个多媒体采集设备中主采集设备及辅采集设备：Preferably, the main collection device and the auxiliary collection device in the two multimedia collection devices are set in at least one of the following ways:

根据检测到的设置触发操作来设置主、辅采集设备；Set the main and auxiliary acquisition devices according to the detected setting trigger operation;

根据预先设置来设置主、辅采集设备；Set the main and auxiliary acquisition equipment according to the preset settings;

依据自适应设备设置方式来自适应设置主、辅采集设备。The primary and secondary acquisition devices are adaptively set according to the adaptive device setting method.

优选地，自适应设备设置方式通过设备相关状态、设备设置历史记录数据及多媒体采集设备实时采集的多媒体信息的相关内容中的一项或多项信息来确定；Preferably, the adaptive device setting mode is determined by one or more pieces of information in the relevant state of the device, the historical record data of the device setting, and the relevant content of the multimedia information collected in real time by the multimedia collection device;

设备相关状态包含设备电量状态和/或存储状态；Device-related status includes device power status and/or storage status;

多媒体采集设备实时采集的多媒体信息的相关内容包括以下至少一项：画面比例分布、目标物体在画面中的位置信息、画面质量信息。The relevant content of the multimedia information collected by the multimedia collection device in real time includes at least one of the following: screen ratio distribution, position information of the target object in the screen, and screen quality information.

可选地，该方法还包括：Optionally, the method further includes:

设置多媒体信息的采集参数及增强策略参数；Set the acquisition parameters and enhancement strategy parameters of multimedia information;

其中，获取两个多媒体采集设备分别采集的第一类多媒体信息和第二类多媒体信息，具体包括：Wherein, acquiring the first type of multimedia information and the second type of multimedia information collected by the two multimedia collection devices respectively includes:

获取两个多媒体采集设备基于采集参数分别采集的第一类多媒体信息和第二类多媒体信息；Acquiring the first type of multimedia information and the second type of multimedia information respectively collected by the two multimedia collection devices based on the collection parameters;

其中，根据第一类多媒体信息对第二类多媒体信息进行相应处理，具体包括：Wherein, the second type of multimedia information is processed correspondingly according to the first type of multimedia information, specifically including:

依据增强策略参数，根据第一类多媒体信息对第二类多媒体信息进行相应增强处理；According to the enhancement strategy parameter, corresponding enhancement processing is performed on the second type of multimedia information according to the first type of multimedia information;

其中，采集参数具体包括白平衡、曝光时间、感光度、高动态范围、分辨率、焦点区域、视频帧采集频率中的至少一项。The acquisition parameters specifically include at least one of white balance, exposure time, sensitivity, high dynamic range, resolution, focus area, and video frame acquisition frequency.

优选地，通过以下任一方式来设置多媒体信息的采集参数及增强策略参数：Preferably, the collection parameters and enhancement strategy parameters of multimedia information are set by any of the following methods:

根据检测到的参数设置操作来设置采集参数及增强策略参数；Set acquisition parameters and enhanced strategy parameters according to the detected parameter setting operations;

根据预先参数设置来设置采集参数及增强策略参数；Set acquisition parameters and enhanced strategy parameters according to pre-parameter settings;

依据自适应参数设置方式来自适应设置采集参数及增强策略参数。The acquisition parameters and the enhancement strategy parameters are adaptively set according to the adaptive parameter setting method.

优选地，自适应参数设置方式通过设备相关状态、参数历史记录数据、采集环境及多媒体采集设备实时采集的多媒体信息的相关内容中的至少一项来确定；Preferably, the adaptive parameter setting method is determined by at least one of equipment-related status, parameter history record data, collection environment, and relevant content of multimedia information collected in real time by the multimedia collection equipment;

多媒体采集设备实时采集的多媒体信息的相关内容包括以下至少一项：场景亮度、语义内容、显著物的清晰度、分辨率、曝光时间。The relevant content of the multimedia information collected by the multimedia collection device in real time includes at least one of the following: scene brightness, semantic content, sharpness, resolution, and exposure time of salient objects.

优选地，第一类多媒体信息为图像信息，第二类多媒体信息为视频信息时，获取两个多媒体采集设备分别采集的第一类多媒体信息和第二类多媒体信息，具体包括：Preferably, the first type of multimedia information is image information, and when the second type of multimedia information is video information, obtain the first type of multimedia information and the second type of multimedia information collected by two multimedia collection devices respectively, specifically including:

获取一个多媒体采集设备采集的视频信息，以及另一个多媒体采集设备依据关键帧采集频率同时采集的与视频信息相对应的关键帧图像信息；Obtain the video information collected by a multimedia collection device, and the key frame image information corresponding to the video information collected simultaneously by another multimedia collection device according to the key frame collection frequency;

其中，根据采集的第一类多媒体信息，对采集的第二类多媒体信息进行相应处理，具体包括：Wherein, according to the first type of multimedia information collected, corresponding processing is carried out to the collected second type of multimedia information, specifically including:

根据采集的关键帧图像信息对采集的视频信息对应的需要增强的指标进行增强处理。According to the collected key frame image information, the indexes that need to be enhanced corresponding to the collected video information are enhanced.

可选地，该方法还包括：Optionally, the method further includes:

设置关键帧采集频率；Set the key frame acquisition frequency;

其中，设置关键帧采集频率的方式包括以下至少一项：The method of setting the key frame collection frequency includes at least one of the following:

根据预设频率设置来设置关键帧采集频率；Set the key frame collection frequency according to the preset frequency setting;

依据自适应频率设置方式来自适应设置关键帧采集频率。Set the key frame collection frequency adaptively according to the adaptive frequency setting method.

优选地，自适应频率设置方式通过设备相关状态、采集频率历史记录数据、采集环境、采集参数及多媒体采集设备实时采集的多媒体信息的相关内容中的一项或多项信息来确定；Preferably, the adaptive frequency setting mode is determined by one or more information in the relevant content of the multimedia information collected in real time by the equipment related state, the collection frequency historical record data, the collection environment, the collection parameters and the multimedia information collected in real time by the multimedia collection equipment;

其中，设备相关状态包含以下至少一项：设备电量状态、设备存储状态、采集多媒体信息时的设备运动状态；The device-related state includes at least one of the following: device power state, device storage state, and device motion state when collecting multimedia information;

多媒体采集设备实时采集的多媒体信息的相关内容包括场景亮度、语义内容中的至少一项。The relevant content of the multimedia information collected in real time by the multimedia collection device includes at least one item of scene brightness and semantic content.

优选地，根据采集的关键帧图像信息对采集的视频信息对应的需要增强的指标进行增强处理，具体包括：Preferably, according to the collected key frame image information, the corresponding indexes of the collected video information that need to be enhanced are enhanced, specifically including:

根据采集到的关键帧图像信息将采集到的视频信息划分为若干个视频片段；利用视频片段两侧的关键帧图像信息对相应的视频片段对应的需要增强的指标进行增强处理。According to the collected key frame image information, the collected video information is divided into several video clips; the key frame image information on both sides of the video clip is used to enhance the corresponding video clip corresponding indicators that need to be enhanced.

优选地，当需要增强的指标包括分辨率、颜色、亮度中的至少一项时，增强处理的方式包括基于多视图重建的增强方式，和/或基于机器学习构建增强模型的增强方式。Preferably, when the index to be enhanced includes at least one of resolution, color, and brightness, the enhancement processing method includes an enhancement method based on multi-view reconstruction, and/or an enhancement method based on building an enhanced model based on machine learning.

优选地，基于多视图重建的增强方式，具体包括：Preferably, the enhancement method based on multi-view reconstruction specifically includes:

建立采集到的视频信息的视频像素和关键帧图像信息的图像像素的匹配关系，通过图像像素替换相匹配的视频像素。The matching relationship between the video pixels of the collected video information and the image pixels of the key frame image information is established, and the matched video pixels are replaced by the image pixels.

优选地，基于机器学习构建增强模型的方式，具体包括：Preferably, the method of constructing an enhanced model based on machine learning specifically includes:

在采集到的视频信息的关键帧图像所在位置处提取视频像素；Extract video pixels at the position of the key frame image of the collected video information;

以机器学习方式建立视频像素与关键帧图像信息的图像像素的映射增强模型；A mapping enhancement model of video pixels and image pixels of key frame image information is established by machine learning;

在采集到的视频信息的非关键帧图像所在位置，通过映射增强模型来转换视频像素。At the location of the non-key frame image of the captured video information, the video pixels are transformed by the mapping enhancement model.

优选地，当需要增强的指标包括噪声时，增强处理的方式包括基于字典重构的方式，和/或基于深度学习的方式。Preferably, when the index to be enhanced includes noise, the enhancement processing method includes a method based on dictionary reconstruction, and/or a method based on deep learning.

优选地，当需要增强的指标包括模糊时，增强处理的方式包括基于模糊核估计的方式，和/或基于深度学习的方式。Preferably, when the index to be enhanced includes blur, the enhancement processing method includes a method based on blur kernel estimation, and/or a method based on deep learning.

可选地，还包括：Optionally, also include:

在采集到的视频信息中检测到待处理的模糊帧时，确定对待处理的模糊帧对应的模糊指标进行增强处理；When a fuzzy frame to be processed is detected in the collected video information, it is determined that the fuzzy index corresponding to the fuzzy frame to be processed is enhanced;

其中，通过以下至少一种信息来检测待处理的模糊帧：Wherein, the fuzzy frame to be processed is detected by at least one of the following information:

采集视频帧时的设备运动状态；采集视频帧时的对焦信息；通过分类器对采集的视频信息进行分类的分类结果。The motion state of the device when collecting video frames; the focusing information when collecting video frames; the classification result of classifying the collected video information through the classifier.

优选地，第一类多媒体信息为图像信息，第二类多媒体信息为视频信息时，根据第一类多媒体信息对第二类多媒体信息进行相应处理，具体包括：Preferably, the first type of multimedia information is image information, and when the second type of multimedia information is video information, corresponding processing is carried out to the second type of multimedia information according to the first type of multimedia information, specifically including:

根据采集的图像信息对采集的视频信息进行存储处理，其中，存储内容包括以下至少一种情形：The collected video information is stored and processed according to the collected image information, wherein the stored content includes at least one of the following situations:

根据采集的图像信息对采集的视频信息进行增强处理后的视频信息；Video information obtained by enhancing the collected video information according to the collected image information;

采集到的视频信息和图像信息；The collected video information and image information;

采集到的视频信息和对视频信息进行增强处理时的增强模型；The collected video information and the enhancement model when the video information is enhanced;

根据采集的图像信息对采集的视频信息进行增强处理后的视频信息以及采集到的图像信息。The video information and the collected image information after enhancing the collected video information according to the collected image information.

可选地，还包括：Optionally, also include:

响应于接收到的播放触发操作，基于与存储内容相匹配的播放方式对视频信息进行播放；其中，播放方式包括以下至少一项：In response to the received playback trigger operation, the video information is played based on the playback mode that matches the stored content; wherein, the playback mode includes at least one of the following:

当存储增强处理后的视频信息时，直接播放增强处理后的视频信息；When the enhanced video information is stored, the enhanced video information is directly played;

当存储采集到的视频信息和图像信息时，根据采集的图像信息对采集的视频信息进行增强处理后播放；When the collected video information and image information are stored, the collected video information is enhanced and played according to the collected image information;

当存储采集到的视频信息和增强模型时，通过增强模型对采集的视频信息进行增强处理后播放；When storing the collected video information and the enhanced model, the collected video information is enhanced and played through the enhanced model;

当存储增强处理后的视频信息和采集到的图像信息时，将增强处理后的视频信息和采集到的图像信息关联播放。When the enhanced video information and the collected image information are stored, the enhanced video information and the collected image information are associated and played.

优选地，第一类多媒体信息为视频信息，第二类多媒体信息为图像信息时，获取两个多媒体采集设备分别采集的第一类多媒体信息和第二类多媒体信息，具体包括：Preferably, the first type of multimedia information is video information, and when the second type of multimedia information is image information, obtain the first type of multimedia information and the second type of multimedia information collected by two multimedia collection devices respectively, specifically including:

获取一个多媒体采集设备采集的图像信息，以及另一多媒体采集设备依据设置的视频帧采集频率采集的与图像信息相对应的视频片段；Obtain the image information collected by a multimedia collection device, and the video clips corresponding to the image information collected by another multimedia collection device according to the set video frame collection frequency;

根据采集的视频片段对采集的图像信息对应的需要增强的指标进行增强处理。The index that needs to be enhanced corresponding to the collected image information is enhanced according to the collected video clips.

优选地，当检测到采集图像信息的多媒体采集设备进入预览状态时，或当检测到采集图像信息多媒体采集设备开始采集图像信息时，另一多媒体采集设备依据设置的视频帧采集频率采集的与图像信息相对应的视频片段；Preferably, when it is detected that the multimedia collection device that collects image information enters the preview state, or when it is detected that the multimedia collection device that collects image information starts to collect image information, another multimedia collection device collects and images according to the set video frame collection frequency. Video clips corresponding to the information;

当检测到采集的视频片段中的视频帧数达到对应的上限值时，另一多媒体采集设备停止采集视频信息。When it is detected that the number of video frames in the captured video clip reaches the corresponding upper limit value, another multimedia capture device stops capturing video information.

优选地，根据采集的视频片段对采集的图像信息对应的需要增强的指标进行增强处理，具体包括：Preferably, according to the collected video clips, the indexes that need to be enhanced corresponding to the collected image information are enhanced, specifically including:

在采集的视频片段中确定视频关键帧；Determine video keyframes in the captured video clips;

基于模糊核估计的方式，根据视频关键帧对采集到的图像信息进行增强处理。Based on the method of blur kernel estimation, the collected image information is enhanced according to the video key frame.

优选地，通过自适应关键帧确定方式来确定视频关键帧；Preferably, the video key frame is determined by an adaptive key frame determination method;

其中，自适应关键帧确定方式通过画面模糊程度、内容相似度、视频帧质量中的一项或多项信息来确定。Wherein, the adaptive key frame determination method is determined by one or more pieces of information among picture blur degree, content similarity, and video frame quality.

可选地，还包括：Optionally, also include:

对采集到的图像信息进行清晰度分析；Sharpness analysis of the collected image information;

若图像信息属于模糊图像，则根据采集到的视频片段对采集到的图像信息对应的需要增强的指标进行增强处理；其中，需要增强的指标包括于模糊。If the image information belongs to a blurred image, the index that needs to be enhanced corresponding to the collected image information is enhanced according to the collected video clip; wherein, the index that needs to be enhanced is included in the blur.

优选地，第一类多媒体信息为视频信息，第二类多媒体信息为图像信息时，根据第一类多媒体信息对第二类多媒体信息进行相应处理，具体包括：Preferably, the first type of multimedia information is video information, and when the second type of multimedia information is image information, the second type of multimedia information is processed according to the first type of multimedia information, specifically including:

根据采集的视频信息对采集的图像信息进行存储处理，其中，存储内容包括以下至少一种情形：The collected image information is stored and processed according to the collected video information, wherein the stored content includes at least one of the following situations:

根据采集的视频信息对采集的图像信息进行增强处理后的图像信息；Image information obtained by enhancing the collected image information according to the collected video information;

采集到的图像信息以及采集到的视频信息中用于对图像信息进行增强处理的视频关键帧；The collected image information and the video key frames used for enhancing the image information in the collected video information;

采集到的图像信息和对图像信息进行增强处理时的增强模型；The collected image information and the enhancement model when the image information is enhanced;

根据采集的视频信息对采集的图像信息进行增强处理后的图像信息以及采集到的视频信息。According to the collected video information, the collected image information is enhanced and processed, and the collected video information is obtained.

可选地，还包括：Optionally, also include:

响应于接收到的显示触发操作，基于与存储内容相匹配的显示方式对图像信息进行显示；其中，显示方式包括以下至少一项：In response to the received display trigger operation, the image information is displayed based on a display mode that matches the stored content; wherein, the display mode includes at least one of the following:

当存储增强处理后的图像信息时，直接显示增强处理后的图像信息；When the enhanced image information is stored, the enhanced image information is directly displayed;

当存储采集到的视频信息和图像信息时，根据采集的视频信息对采集的图像信息进行增强处理后显示；When the collected video information and image information are stored, the collected image information is enhanced and displayed according to the collected video information;

当存储采集到的图像信息和进行增强处理的视频关键帧时，根据视频关键帧确定增强模型，并通过增强模式对采集的图像信息进行增强处理后显示；When storing the collected image information and the video key frame for enhanced processing, the enhancement model is determined according to the video key frame, and the collected image information is enhanced and displayed through the enhanced mode;

当存储采集到的图像信息和增强模型时，通过增强模型对采集的图像信息进行增强处理后显示；When the collected image information and enhanced model are stored, the collected image information is enhanced and displayed through the enhanced model;

当存储增强处理后的图像信息和采集到的视频信息时，将增强处理后的图像信息和采集到的视频信息关联显示。When the enhanced image information and the collected video information are stored, the enhanced image information and the collected video information are associated and displayed.

优选地，第一类多媒体信息与第二类多媒体信息为对焦于不同焦点区域的视频信息；其中，焦点区域包括全局区域和/或局部区域。Preferably, the first type of multimedia information and the second type of multimedia information are video information focused on different focus areas; wherein the focus areas include global areas and/or local areas.

根据采集到的对焦于一个焦点区域的视频信息，对采集到的对焦于另一个焦点区域的视频信息进行联合播放处理。According to the collected video information focused on one focal area, perform joint playback processing on the collected video information focused on another focal area.

其中，通过以下至少一种方式来确定焦点区域：Wherein, the focus area is determined by at least one of the following ways:

当检测到用户选定一个局部区域时，则确定已选定的局部区域为焦点区域，另一焦点区域为全局区域；When it is detected that a local area is selected by the user, it is determined that the selected local area is the focus area, and the other focus area is the global area;

当检测到用户选定两个局部区域时，则确定已选定的两个局部区域为焦点区域。When it is detected that the user has selected two local areas, it is determined that the selected two local areas are focus areas.

优选地，通过用户选定的焦点对象，检测用户选定的局部区域。Preferably, the user-selected local area is detected through the user-selected focus object.

优选地，全局区域和/或局部区域可通过分屏的布局方式进行联合播放。Preferably, the global area and/or the local area can be jointly played in a split-screen layout.

优选地，根据第一类多媒体信息对所述第二类多媒体信息进行相应处理，具体包括：Preferably, corresponding processing is performed on the second type of multimedia information according to the first type of multimedia information, specifically including:

根据采集到的对焦于一个焦点区域的视频信息，对采集到的对焦于另一个焦点区域的视频信息进行存储处理，其中，存储内容包括以下至少一种情形：According to the collected video information focusing on one focus area, the collected video information focusing on another focus area is stored and processed, wherein the storage content includes at least one of the following situations:

采集到的对焦于不同焦点区域的两个视频信息；The collected two video information focused on different focus areas;

根据采集到的对焦于一个焦点区域的视频信息，对采集到的对焦于另一个焦点区域的视频信息进行合成处理后的合成视频信息；According to the collected video information focusing on one focal area, the synthesized video information after synthesizing the collected video information focusing on another focal area;

确定出的对焦于不同焦点区域的两个视频信息中的感兴趣视频内容；The video content of interest in the determined two video information focused on different focus areas;

采集到的对焦于全局区域的视频信息以及该全局区域的视频信息中局部区域的位置信息。The collected video information focused on the global area and the location information of the local area in the video information of the global area.

优选地，该方法还包括：响应于接收到的播放触发操作，基于与存储内容相匹配的播放方式对视频信息进行播放；其中，播放方式包括以下至少一项：Preferably, the method also includes: in response to the received play trigger operation, the video information is played based on a play mode that matches the stored content; wherein, the play mode includes at least one of the following:

当存储采集到的对焦于不同焦点区域的两个视频信息时，将两个视频信息分别单独播放或联合播放；When storing the collected two video information focusing on different focus areas, play the two video information separately or jointly;

当存储合成视频信息时，播放合成视频；When the composite video information is stored, the composite video is played;

当存储确定出的对焦于不同焦点区域的两个视频信息中的感兴趣视频内容时，播放感兴趣视频内容；When storing the video content of interest in the two video information determined to focus on different focus areas, play the video content of interest;

当存储全局区域的视频信息以及该全局区域的视频信息中局部区域的位置信息时，通过位置信息确定局部区域的视频信息，并将全局区域的视频信息和局部区域的视频信息分别单独播放或联合播放。When storing the video information of the global area and the location information of the local area in the video information of the global area, the video information of the local area is determined by the location information, and the video information of the global area and the video information of the local area are played separately or jointly play.

优选地，第二类多媒体信息为视频信息，第一类多媒体信息为与视频信息相对应的音频信息。Preferably, the second type of multimedia information is video information, and the first type of multimedia information is audio information corresponding to the video information.

从采集到的视频信息中确定目标对象；Determine the target object from the collected video information;

针对目标对象对应的视频信息和/或音频信息进行凸显处理。Highlight processing is performed on the video information and/or audio information corresponding to the target object.

其中，通过以下至少一种方式从采集到的视频信息中确定目标对象：Wherein, the target object is determined from the collected video information in at least one of the following ways:

根据检测到的目标对象指定操作来确定目标对象；Determine the target object according to the specified operation of the detected target object;

依据采集到的视频信息中多个对象的数量及所处位置信息来确定目标对象。The target object is determined according to the number and location information of multiple objects in the collected video information.

优选地，针对目标对象对应的音频信息进行凸显处理，具体包括：Preferably, highlighting processing is performed on the audio information corresponding to the target object, which specifically includes:

对采集到的视频信息进行检测以确定所述视频信息中的对象数量、各个对象的位置及方位信息；The collected video information is detected to determine the number of objects in the video information, the position and orientation information of each object;

依据各个对象的位置及方位信息，确定各个对象分别对应的音频信息；Determine the audio information corresponding to each object according to the position and orientation information of each object;

确定目标对象对应的音频信息，并进行凸显处理。Determine the audio information corresponding to the target object, and perform highlight processing.

根据采集到的音频信息，对采集到的视频信息进行存储处理，其中，存储内容包括以下至少一种情形：According to the collected audio information, the collected video information is stored and processed, wherein the stored content includes at least one of the following situations:

采集到的视频信息及音频信息；The collected video information and audio information;

目标对象对应的视频信息及音频信息。Video information and audio information corresponding to the target object.

优选地，该方法还包括：响应于接收到的播放触发操作，基于与存储内容相匹配的播放方式对视频信息及音频信息进行播放；其中，播放方式包括以下至少一项：Preferably, the method also includes: in response to the received play trigger operation, the video information and the audio information are played based on a play mode that matches the stored content; wherein, the play mode includes at least one of the following:

当存储采集到的视频信息及音频信息时，将采集到的视频信息及音频信息相关联地播放；When storing the collected video information and audio information, the collected video information and audio information are associated and played;

当存储采集到的视频信息及音频信息时，将采集到的视频信息中的目标对象与相对应的音频信息相关联地播放；When storing the collected video information and audio information, the target object in the collected video information is played in association with the corresponding audio information;

当存储采集到的视频信息及音频信息时，将采集到的视频信息中的各个对象与相对应的音频信息相关联地播放；When storing the collected video information and audio information, each object in the collected video information is played in association with the corresponding audio information;

当存储目标对象对应的视频信息及音频信息时，将目标对象对应的视频信息及音频信息相关联地播放。When the video information and audio information corresponding to the target object are stored, the video information and audio information corresponding to the target object are associated and played.

本发明还提出一种多媒体增强处理的装置，包括：The present invention also provides a device for multimedia enhancement processing, including:

多媒体信息获取模块，用于获取两个多媒体采集设备分别采集的第一类多媒体信息和第二类多媒体信息；a multimedia information acquisition module, configured to acquire the first type of multimedia information and the second type of multimedia information collected by the two multimedia acquisition devices respectively;

处理模块，用于根据第一类多媒体信息对第二类多媒体信息进行相应处理。The processing module is configured to perform corresponding processing on the second type of multimedia information according to the first type of multimedia information.

本发明的实施例中，获取两个多媒体采集设备分别采集的第一类多媒体信息和第二类多媒体信息；根据第一类多媒体信息对第二类多媒体信息进行相应处理，即通过第一类多媒体信息与第二类多媒体信息之间的关联关系，实现基于第一类多媒体信息对第二类多媒体信息进行相应处理。而现有技术中，一般仅通过增强算法及自身信息对每类多媒体信息(如图片和视频)进行单独处理，并未考虑到同时获取的两类多媒体信息之间的关联关系，并利用关联关系执行多媒体信息增强，因此会出现图像或视频画面失真、清晰度较低等问题。本发明中通过同时获取到两类多媒体信息，根据一类多媒体信息对另一个多媒体信息进行增强，由于在增强过程中充分考虑到两类多媒体信息各自的特点及关联关系，可以克服仅通过增强算法及自身信息对每类多媒体信息分别进行增强处理的局限性，大大提高了增强处理后的多媒体信息的质量，保证了多媒体信息的真实度及清晰度。In the embodiment of the present invention, the first type of multimedia information and the second type of multimedia information respectively collected by the two multimedia collection devices are acquired; the second type of multimedia information is processed according to the first type of multimedia information, that is, the The association between the information and the second type of multimedia information implements corresponding processing of the second type of multimedia information based on the first type of multimedia information. However, in the prior art, each type of multimedia information (such as pictures and videos) is generally processed separately through enhancement algorithms and its own information, without considering the association relationship between the two types of multimedia information acquired at the same time, and using the association relationship Performs multimedia information enhancement, so problems such as image or video picture distortion and low definition may occur. In the present invention, two types of multimedia information are acquired at the same time, and another type of multimedia information is enhanced according to one type of multimedia information. Since the respective characteristics and associations of the two types of multimedia information are fully considered in the enhancement process, it is possible to overcome the problem of using the enhancement algorithm only. The limitation of enhancing processing for each type of multimedia information separately from its own information greatly improves the quality of the enhanced multimedia information and ensures the authenticity and clarity of the multimedia information.

本发明附加的方面和优点将在下面的描述中部分给出，这些将从下面的描述中变得明显，或通过本发明的实践了解到。Additional aspects and advantages of the present invention will be set forth in part in the following description, which will become apparent from the following description, or may be learned by practice of the present invention.

附图说明Description of drawings

本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present invention will become apparent and readily understood from the following description of embodiments taken in conjunction with the accompanying drawings, wherein:

图1示出了本发明一个实施例的多媒体信息处理的方法的流程图示意图；1 shows a schematic flowchart of a method for processing multimedia information according to an embodiment of the present invention;

图2为现有技术中一种对视频亮度和颜色增强方法的转换曲线亮度调整的示意图；Fig. 2 is a kind of schematic diagram of the conversion curve brightness adjustment to video brightness and color enhancement method in the prior art;

图3为现有技术中利用模糊图像的模糊核估计方式对视频帧去噪的示意图；Fig. 3 is the schematic diagram that utilizes the blur kernel estimation method of blurred image to denoise video frame in the prior art;

图4示出了本发明中具体实施例的视频联合增强模式的执行步骤示意图；4 shows a schematic diagram of the execution steps of the video joint enhancement mode according to a specific embodiment of the present invention;

图5示出了本发明中具体实施例的手持智能终端中采集视频的示意图；Fig. 5 shows the schematic diagram of collecting video in the handheld intelligent terminal of the specific embodiment of the present invention;

图6示出了本发明中具体实施例的手持智能终端中视频联合增强模式的示意图；Fig. 6 shows the schematic diagram of the joint video enhancement mode in the handheld intelligent terminal of the specific embodiment of the present invention;

图7示出了本发明中具体实施例的监控终端中视频联合增强模式的示意图；7 shows a schematic diagram of a joint video enhancement mode in a monitoring terminal according to a specific embodiment of the present invention;

图8示出了本发明中具体实施例的图像去模糊增强模式的第一示意图；8 shows a first schematic diagram of an image deblurring enhancement mode according to a specific embodiment of the present invention;

图9示出了本发明中具体实施例的图像去模糊增强模式的第二示意图；9 shows a second schematic diagram of an image deblurring enhancement mode according to a specific embodiment of the present invention;

图10示出了本发明中具体实施例的多焦点区域联合播放模式的左右分屏视频布局方式的示意图；Fig. 10 shows the schematic diagram of the left and right split screen video layout mode of the multi-focus area joint play mode of the specific embodiment of the present invention;

图11示出了本发明中具体实施例的多焦点区域联合播放模式的上下分屏视频布局方式的示意图；Fig. 11 shows the schematic diagram of the upper and lower split screen video layout mode of the multi-focus area joint play mode of the specific embodiment of the present invention;

图12示出了本发明中具体实施例的多焦点区域联合播放模式的大小屏视频布局方式的示意图；Fig. 12 shows the schematic diagram of the video layout mode of the large and small screens of the multi-focus area joint play mode of the specific embodiment of the present invention;

图13示出了本发明中具体实施例的多焦点区域联合播放模式的全局区域视频布局方式的示意图；Fig. 13 shows the schematic diagram of the global area video layout mode of the multi-focus area joint play mode of the specific embodiment of the present invention;

图14示出了本发明中具体实施例的多焦点区域联合播放模式的大小屏视频布局方式中大小屏切换的示意图；14 shows a schematic diagram of switching between large and small screens in the large and small screen video layout mode of the multi-focus area joint playback mode of a specific embodiment of the present invention;

图15示出了本发明中具体实施例的多焦点区域联合播放模式的大小屏视频布局方式中大小屏录制和播放的示意图；Fig. 15 shows the schematic diagram of recording and playing of large and small screens in the large and small screen video layout mode of the multi-focus area joint playback mode of the specific embodiment of the present invention;

图16示出了本发明中具体实施例的目标对象凸显播放模式中音视频凸显的示意图；Fig. 16 shows the schematic diagram of audio and video highlighting in the target object highlighting play mode of the specific embodiment of the present invention;

图17示出了本发明一个实施例的多媒体信息处理的装置的结构示意图。FIG. 17 shows a schematic structural diagram of an apparatus for processing multimedia information according to an embodiment of the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，仅用于解释本发明，而不能解释为对本发明的限制。The following describes in detail the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present invention, but not to be construed as a limitation of the present invention.

本申请使用的“模块”、“系统”等术语旨在包括与计算机相关的实体，例如但不限于硬件、固件、软硬件组合、软件或者执行中的软件。例如，模块可以是，但并不仅限于：处理器上运行的进程、处理器、对象、可执行程序、执行的线程、程序和/或计算机。举例来说，计算设备上运行的应用程序和此计算设备都可以是模块。一个或多个模块可以位于执行中的一个进程和/ 或线程内，一个模块也可以位于一台计算机上和/或分布于两台或更多台计算机之间。Terms such as "module" and "system" used in this application are intended to include computer-related entities such as, but not limited to, hardware, firmware, a combination of software and hardware, software, or software in execution. For example, a module can be, but is not limited to, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. For example, both an application running on a computing device and the computing device can be modules. One or more modules may be localized within a process and/or thread of execution, and a module may be localized on a single computer and/or distributed between two or more computers.

图1示出了本发明一个实施例的多媒体信息处理的方法的流程图示意图。Fig. 1 shows a schematic flowchart of a method for processing multimedia information according to an embodiment of the present invention.

步骤S110：获取两个多媒体采集设备分别采集的第一类多媒体信息和第二类多媒体信息；步骤S120：根据第一类多媒体信息对第二类多媒体信息进行相应处理。Step S110: Acquire the first type of multimedia information and the second type of multimedia information collected by the two multimedia collection devices respectively; Step S120: Perform corresponding processing on the second type of multimedia information according to the first type of multimedia information.

需要说明的是，两个多媒体采集设备可以置于同一终端设备上，上述终端设备可以为手机、Pad、监控装置等终端。两个多媒体采集设备同时进行多媒体信息的采集，其中一个采集设备采集第一类多媒体信息，另一个采集设备采集第二类多媒体信息。It should be noted that the two multimedia collection devices may be placed on the same terminal device, and the above-mentioned terminal device may be a terminal such as a mobile phone, a Pad, and a monitoring device. Two multimedia collection devices simultaneously collect multimedia information, one of which collects the first type of multimedia information, and the other collection device collects the second type of multimedia information.

由上可见，第一类多媒体信息和第二类多媒体信息之间是相互关联的；而现有技术中，一般仅通过增强算法及自身信息对多媒体信息(如图片和视频)进行单独处理，并未考虑到同时获取的两类多媒体信息之间的关联关系，并利用关联关系执行多媒体信息增强，因此会出现图像或视频画面失真、清晰度较低等问题。本发明中，通过同时获取到两类多媒体信息，并根据第一类类多媒体信息对第二类多媒体信息进行增强，由于在增强过程中充分考虑到两类多媒体信息各自的特点及关联关系，可以克服仅通过增强算法及自身信息对每类多媒体信息分别进行增强处理的局限性，大大提高了增强处理后的多媒体信息的质量，保证了多媒体信息的真实度及清晰度。It can be seen from the above that the first type of multimedia information and the second type of multimedia information are interrelated; while in the prior art, generally, multimedia information (such as pictures and videos) is separately processed only through enhancement algorithms and self-information, and The correlation relationship between the two types of multimedia information acquired at the same time is not considered, and the multimedia information enhancement is performed by using the correlation relationship, so problems such as image or video picture distortion and low definition may occur. In the present invention, by acquiring two types of multimedia information at the same time, and enhancing the second type of multimedia information according to the first type of multimedia information, since the respective characteristics and associations of the two types of multimedia information are fully considered in the enhancement process, it is possible to It overcomes the limitation that each type of multimedia information is separately enhanced through enhancement algorithms and its own information, greatly improves the quality of the enhanced multimedia information, and ensures the authenticity and clarity of the multimedia information.

具体地，多媒体信息包括图像信息、视频信息、音频信息中的至少一种。Specifically, the multimedia information includes at least one of image information, video information, and audio information.

为了提升摄像头配置，双目摄像头成为各大厂商着重研发与推广的一大亮点。相比于单目摄像头，双目摄像头有一些天然的优势：具备双套采集参数，两个摄像头可以设置不同的拍摄模式，可以获取三维深度信息用来提升分割、识别、跟踪和定位的精度。现有的具备双目摄像头的终端设备大都是利用深度信息对图像拍摄提供更多操作模式，如合并左右摄像头拍摄的图像得到一张高分辨率的图像；利用深度信息对目标区域进行分割；全景深拍照后对焦等。本发明的发明人发现，现有技术并没有充分利用两个摄像头双套的参数采集来对图像及视频质量进行增强，如何充分利用双目摄像头的优势来改善现有视频和图像质量也是需要解决的问题。In order to improve the camera configuration, the binocular camera has become a highlight of major manufacturers' R&D and promotion. Compared with the monocular camera, the binocular camera has some natural advantages: it has two sets of acquisition parameters, the two cameras can be set to different shooting modes, and the three-dimensional depth information can be obtained to improve the accuracy of segmentation, recognition, tracking and positioning. Most of the existing terminal devices with binocular cameras use depth information to provide more operation modes for image capture, such as combining images captured by the left and right cameras to obtain a high-resolution image; using depth information to segment the target area; panorama. Focus after taking a deep photo, etc. The inventor of the present invention finds that the prior art does not make full use of the parameter collection of two cameras and double sets to enhance the image and video quality, and how to make full use of the advantages of the binocular camera to improve the existing video and image quality also needs to be solved The problem.

需要说明的是，本发明中多媒体采集设备可以为终端设备上的双目摄像头，也可以采用其他的实现方式，在此不做限定。此外，两个摄像头可以并排摆放，模拟人眼功能，此外两个摄像头也可以采用其他方式设置，这里不做具体限制。It should be noted that, in the present invention, the multimedia collection device may be a binocular camera on a terminal device, or other implementation manners may be adopted, which is not limited herein. In addition, the two cameras can be placed side by side to simulate the function of human eyes, and the two cameras can also be set in other ways, which are not limited here.

其中，当第一类多媒体信息为图像信息时，第二类多媒体信息可以为视频信息。具体的，终端设备的一个摄像头采集视频信息，另一个摄像头同时采集对应的图像信息，利用采集的图像对视频进行增强处理，此时终端设备的增强处理模式可以称为图像增强视频模式。Wherein, when the first type of multimedia information is image information, the second type of multimedia information may be video information. Specifically, one camera of the terminal device collects video information, the other camera simultaneously collects corresponding image information, and uses the collected images to enhance the video. At this time, the enhancement processing mode of the terminal device can be called image enhancement video mode.

或当第一类多媒体信息为视频信息时，第二类多媒体信息可以为图像信息。具体的，终端设备的一个摄像头采集图像信息，另一个摄像头同时采集对应的视频信息，利用采集的视频对图像进行增强处理，此时终端设备的增强处理模式可以称为视频增强图像模式。Or when the first type of multimedia information is video information, the second type of multimedia information may be image information. Specifically, one camera of the terminal device collects image information, the other camera simultaneously collects corresponding video information, and uses the collected video to enhance the image. At this time, the enhancement processing mode of the terminal device can be called a video enhanced image mode.

由于摄像头采集的图像或视频对应很多指标，例如，亮度、分辨率等，在利用图像增强视频或利用视频增强图像时，可以对视频或图像的一个或多个指标进行增强处理，对此，本发明实施例提出，根据第一类多媒体信息对第二类多媒体信息进行相应处理，具体包括：Since the image or video collected by the camera corresponds to many indicators, such as brightness, resolution, etc., when using image to enhance video or using video to enhance image, one or more indicators of the video or image can be enhanced. The embodiment of the invention proposes to perform corresponding processing on the second type of multimedia information according to the first type of multimedia information, which specifically includes:

确定采集的第二类多媒体信息对应的需要增强的指标；根据采集的第一类多媒体信息，对采集的第二类多媒体信息对应的确定出的指标进行增强处理。Determine the index that needs to be enhanced corresponding to the collected multimedia information of the second type; and perform enhancement processing on the determined index corresponding to the collected multimedia information of the second type according to the collected multimedia information of the first type.

其中，上述指标包括下述至少一项：分辨率、颜色、亮度、噪声和模糊。Wherein, the above indicators include at least one of the following: resolution, color, brightness, noise and blur.

需要说明的是，下述实施例的详述中，将根据第一类多媒体信息对第二类多媒体信息进行相应处理的方式，具体为图像信息对视频信息进行处理的图像增强视频模式，视频信息对图像信息进行处理的视频增强图像模式，视频信息对视频信息进行处理的多焦点区域联合播放模式，音频信息对视频信息进行处理的目标对象凸显播放模式。其中，实施例一至实施例八为图像增强视频模式的具体实施例；实施例九为视频增强图像模式的具体实施例；实施例十为多焦点区域联合播放模式的具体实施例；实施例十一为目标对象凸显播放模式的具体实施例。It should be noted that, in the detailed description of the following embodiments, the corresponding processing method for the second type of multimedia information will be performed according to the first type of multimedia information, specifically the image enhancement video mode in which the image information processes the video information, the video information Video enhanced image mode for processing image information, multi-focus area joint playback mode for video information processing, and target object highlight playback mode for audio information processing video information. Among them, the first embodiment to the eighth embodiment are specific embodiments of the image-enhanced video mode; the ninth embodiment is a specific embodiment of the video-enhanced image mode; the tenth embodiment is a specific embodiment of the multi-focus area joint play mode; the eleventh embodiment A specific example of highlighting the play mode for the target object.

本发明的实施例一至实施例八中，针对采集到视频及图像，利用图像对视频进行增强处理，对应的确定出的指标包括如下五种：分辨率，亮度，颜色，噪声和模糊：拍摄高分辨率图像，对视频分辨率进行增强，得到高分辨率的视频；拍摄高质量的图像，对视频亮度进行调整，提高低光照环境下拍摄的视频亮度；拍摄高质量的图像，对视频颜色进行调整，提高非理想拍照环境下拍摄的视频颜色对比度以及RGB颜色分布；拍摄低噪声的图像，对视频进行去噪，得到低噪声高质量的视频；拍摄清晰的图像，利用图像对视频帧进行去模糊，得到清晰度提升的视频。本发明的实施例九中，利用视频对图像进行增强处理，对应的确定出的指标包括模糊：拍摄长曝光的高亮度图像时，如果图像出现模糊，则利用短曝光的视频帧来对模糊图像进行增强，得到高亮度，清晰度好的图像。In Embodiments 1 to 8 of the present invention, for the collected video and images, the images are used to enhance the video, and the corresponding determined indicators include the following five types: resolution, brightness, color, noise and blur: shooting high High-resolution images, enhance the video resolution to obtain high-resolution videos; shoot high-quality images, adjust the brightness of the video to improve the brightness of videos captured in low-light environments; shoot high-quality images, color the video. Adjust to improve the color contrast and RGB color distribution of the video shot in a non-ideal shooting environment; shoot a low-noise image, denoise the video, and obtain a low-noise and high-quality video; shoot a clear image, use the image to de-noise the video frame Blur, get sharper video. In the ninth embodiment of the present invention, the video is used to enhance the image, and the corresponding determined indicators include blur: when shooting a long-exposure high-brightness image, if the image is blurred, the short-exposure video frame is used to enhance the blurred image. Enhance the image to get high brightness and good definition.

在图像增强视频流程中，首先设置主/辅摄像头、需要增强的指标、采集参数、增强策略参数，启动两个摄像头进行拍摄，主摄像头拍摄视频，辅摄像头拍摄图像，同时通过增强策略参数，根据拍摄的图像对拍摄的视频中需要进行增强的指标进行增强处理，终端设备可以根据需要将采集的数据存储，后续对存储的数据进行播放或显示。In the image enhancement video process, first set the main/auxiliary camera, indicators to be enhanced, acquisition parameters, and enhancement strategy parameters, start two cameras to shoot, the main camera shoots video, and the auxiliary camera shoots images, and through the enhancement strategy parameters, according to The captured image performs enhancement processing on indicators that need to be enhanced in the captured video, and the terminal device may store the collected data as required, and subsequently play or display the stored data.

在视频增强图像流程中，首先设置主/辅摄像头、需要增强的指标、采集参数、增强策略参数，启动两个摄像头进行拍摄，主摄像头拍摄图像，辅摄像头拍摄视频，同时通过增强策略参数，根据拍摄的视频对拍摄的图像中需要进行增强的指标进行增强处理。In the video enhancement image process, first set the main/auxiliary camera, indicators to be enhanced, acquisition parameters, and enhancement strategy parameters, start two cameras to shoot, the main camera captures images, and the auxiliary camera captures video, and through the enhancement strategy parameters, according to The captured video performs enhancement processing on the indicators that need to be enhanced in the captured image.

需要说明的是，下述实施例的详述中，根据采集的第一类多媒体信息，对采集的第二类多媒体信息对应的确定出的指标进行增强处理的步骤中，可针对确定出的不同的指标进行增强处理，增强处理的指标不同，对应的增强模式也不同，其中，可以对上述指标中的一个指标进行增强处理，增强模式包括但不限于：分辨率增强模式、颜色增强模式、亮度增强模式、去噪增强模式和去模糊增强模式。也可以对上述指标中的至少两个指标进行增强处理，此时可以称为联合增强模式。It should be noted that, in the detailed description of the following embodiments, according to the collected first type of multimedia information, in the step of performing enhancement processing on the determined indicators corresponding to the collected second type of multimedia information, the different determined indicators may be Enhancement processing is performed on the index of the index, and the index of the enhancement processing is different, and the corresponding enhancement mode is also different. Among them, one of the above indicators can be enhanced, and the enhancement mode includes but is not limited to: resolution enhancement mode, color enhancement mode, brightness enhancement mode Enhancement Mode, Denoising Enhancement Mode, and Deblur Enhancement Mode. Enhancement processing may also be performed on at least two of the above-mentioned indicators, which may be referred to as a joint enhancement mode.

实施例一：增强处理模式为图像增强视频模式中的分辨率增强模式Embodiment 1: The enhancement processing mode is the resolution enhancement mode in the image enhancement video mode

从视频分辨率增强方面来说，现有终端设备中视频最大分辨率一般比图像最大分辨率小；例如，一类移动终端中的图像最大分辨率为5312*2988，而视频最大分辨率为3840*2160。由于受限于终端设备的CPU和内存，为了让用户实时看到自己拍摄的内容，只能相比图像分辨率降低视频分辨率，否则不能实时对视频进行处理。一种提升视频分辨率的方法是通过插值将每个视频帧分别增强分辨率，此类方法得到的视频帧细节会变模糊。另外一种提升视频分辨率的方法是使用高分辨率图像对低分辨视频进行增强，主要思想是用大量高分辨率图像和对应的低分辨率视频训练出映射模型，以该映射模型来增强视频分辨率，即提取图像和视频的像素块建立训练数据库，基于训练数据库学习得到映射模型，从而以该映射模型将高频细节信息迁移到低分辨率视频从而获取高分辨视频。此类方法需要预先采集训练数据，训练数据量小则泛化能量弱，导致增强效果差，训练数据大则存储空间大。在视频分辨率增强方面，仅依靠视频中每帧的信息进行放大，只是在图像尺寸上有所改变，并不能给用户提供更丰富的细节信息，也无法达到提高分辨率的目的。另外，高分辨率视频占用内存空间变大，如何在提升视频分辨率的同时不带来太多内存消耗也是现有技术未曾考虑的问题。In terms of video resolution enhancement, the maximum resolution of video in existing terminal equipment is generally smaller than the maximum resolution of images; for example, the maximum resolution of images in a class of mobile terminals is 5312*2988, while the maximum resolution of videos is 3840 *2160. Due to the limitation of the CPU and memory of the terminal device, in order to allow users to see their own content in real time, the video resolution can only be reduced compared to the image resolution, otherwise the video cannot be processed in real time. One way to increase the video resolution is to increase the resolution of each video frame by interpolation, and the details of the video frame obtained by such methods will be blurred. Another way to improve video resolution is to use high-resolution images to enhance low-resolution videos. The main idea is to train a mapping model with a large number of high-resolution images and corresponding low-resolution videos, and use the mapping model to enhance the video. Resolution, that is, extracting pixel blocks of images and videos to establish a training database, and learning a mapping model based on the training database, so as to use the mapping model to transfer high-frequency detail information to low-resolution videos to obtain high-resolution videos. Such methods need to collect training data in advance. If the amount of training data is small, the generalization energy will be weak, resulting in poor enhancement effect. If the amount of training data is large, the storage space will be large. In terms of video resolution enhancement, only relying on the information of each frame in the video to enlarge, only changes the image size, can not provide users with richer detailed information, and can not achieve the purpose of improving the resolution. In addition, the high-resolution video occupies a larger memory space, and how to improve the video resolution without causing too much memory consumption is also a problem that the prior art has not considered.

在本发明的实施例中，多媒体信息包括图像信息和视频信息。多媒体采集设备可以为双目摄像头。In the embodiment of the present invention, the multimedia information includes image information and video information. The multimedia acquisition device can be a binocular camera.

首先开启图像增强视频的分辨率增强模式，随后，启动双目摄像头的两个摄像头分别采集图像信息和视频信息，其次自适应设置摄像头的采集参数和关键帧，同时根据摄像头的采集参数和关键帧对视频信息进行分辨率增强，最后对增强结果进行压缩、传输和播放。First, enable the resolution enhancement mode of the image enhancement video, then enable the two cameras of the binocular camera to collect image information and video information respectively, and then set the camera's acquisition parameters and key frames adaptively, and at the same time according to the camera's acquisition parameters and key frames The video information is enhanced in resolution, and finally the enhanced result is compressed, transmitted and played.

步骤1，开启图像增强视频的分辨率增强模式。Step 1, enable the resolution enhancement mode of the image enhancement video.

通过以下至少一项来确定采集的第二类多媒体信息对应的需要增强的指标：The index that needs to be enhanced corresponding to the collected second type of multimedia information is determined by at least one of the following:

其中，自适应参数匹配方式通过设备相关状态、增强开启历史记录数据、采集环境、采集参数及多媒体采集设备实时采集的多媒体信息的相关内容中的一项或多项信息来确定；Wherein, the adaptive parameter matching method is determined by one or more pieces of information in the relevant state of the device, the enhanced open history record data, the collection environment, the collection parameters, and the relevant content of the multimedia information collected in real time by the multimedia collection equipment;

其中，设备相关状态包含以下至少一项：设备电量状态、设备存储状态、采集多媒体信息时的设备运动状态；多媒体采集设备实时采集的多媒体信息的相关内容包括以下至少一项：场景亮度、语义内容、显著物的清晰度。优选地，若确定出的需要增强的指标为至少两个，则确定需要增强的指标的增强顺序；根据采集的第一类多媒体信息，按照确定出的增强顺序，对采集的第二类多媒体信息对应的需要增强的指标依次进行增强处理。The device-related state includes at least one of the following: device power state, device storage state, and device motion state when collecting multimedia information; the relevant content of multimedia information collected by the multimedia collecting device in real time includes at least one of the following: scene brightness, semantic content , the sharpness of salient objects. Preferably, if the determined indicators that need to be enhanced are at least two, then determine the enhancement order of the indicators that need to be enhanced; Corresponding indicators that need to be enhanced are sequentially enhanced.

具体地，通过以下至少一项来确定需要增强的指标的增强顺序：Specifically, the enhancement order of the indicators that need to be enhanced is determined by at least one of the following:

增强顺序设置触发操作；预先设置；自适应增强顺序设置方式。Enhancement sequence setting trigger operation; pre-set; adaptive enhancement sequence setting mode.

其中，自适应增强顺序设置方式通过设备相关状态、增强设置历史记录信息、采集环境、采集参数、多媒体采集设备实时采集的多媒体信息的相关内容及各个指标之间的影响关系中的一项或多项信息来确定；Among them, the adaptive enhancement sequence setting method uses one or more of the relevant status of the device, the enhancement setting history record information, the collection environment, the collection parameters, the relevant content of the multimedia information collected in real time by the multimedia collection equipment, and the influence relationship between the various indicators. item information to determine;

本发明实施例一中，具体地，用户可以通过如语音、按键、手势、生物特征、外部控制器等增强开启触发操作来开启图像增强视频的分辨率增强模式，终端设备也可以通过预先设置(如系统默认设置)，或者根据自适应参数匹配方式来开启该模式自适应模式匹配方式，如根据设备相关状态、增强模式开启历史记录数据、采集环境、采集参数及摄像头实时采集的多媒体信息的相关内容自适应开启图像增强视频的分辨率增强模式，也可以由增强开启用户触发操作设置和系统默认设置的任意组合来开启；例如，用户通过按键方式启动了图像增强视频模式，终端设备再根据增强模式开启历史记录数据自适应开启图像增强视频模式中的分辨率增强模式。In Embodiment 1 of the present invention, specifically, the user can turn on the resolution enhancement mode of the image-enhanced video by enhancing the triggering operation such as voice, keys, gestures, biometrics, external controllers, etc., and the terminal device can also use preset ( Such as the system default setting), or according to the adaptive parameter matching mode to open the mode adaptive mode matching mode, such as according to the relevant status of the device, the enhanced mode to open the historical record data, the collection environment, the collection parameters and the correlation of the multimedia information collected in real time by the camera The resolution enhancement mode of image-enhanced video is enabled by content adaptation, and it can also be enabled by any combination of user-triggered operation settings and system default settings; Mode Enable History Data Adaptive Enable Resolution Enhancement Mode in Image Enhancement Video Mode.

关于语音开启，终端设备预先设定某种语音作为启动指令，例如“启动增强视频分辨率”，终端设备接收到用户发出的声控指令“启动增强视频分辨率”，则对该声控指令进行语音识别，确定此时开启图像增强视频模式中的分辨率增强模式。Regarding the voice activation, the terminal device presets a certain voice as the activation command, such as "start enhanced video resolution", and the terminal device receives the voice control command "start enhanced video resolution" issued by the user, and then performs voice recognition on the voice control command. , confirm that the resolution enhancement mode in the image enhancement video mode is turned on at this time.

关于按键开启，终端设备预先设定某种按键作为开启指令，按键可以为硬件按键，例如用户通过长按音量键表示增强视频分辨率，终端设备接收到用户的长按音量键事件后，确认此时需要开启图像增强视频模式中的分辨率增强模式。按键也可以为虚拟按键，例如屏幕上的虚拟控件按钮，终端设备在交互界面上显示该模式的按键，接收到用户点击虚拟按键的事件后，确认此时需要开启该模式。通过按键开启时，还可以结合用户触发时的压力、速度、时间、频率等多种特征信息的不同表示对应的不同含义，例如用力快速点击虚拟控件表示开启该模式等。Regarding key-on, the terminal device pre-sets a certain key as the turn-on command. The key can be a hardware key. For example, the user presses the volume key for a long time to increase the video resolution. After the terminal device receives the user's long-press volume key event, it confirms this When you need to turn on the resolution enhancement mode in the image enhancement video mode. The button can also be a virtual button, such as a virtual control button on the screen. The terminal device displays the button of this mode on the interactive interface. After receiving the event of the user clicking the virtual button, it is confirmed that the mode needs to be enabled at this time. When it is turned on by pressing the button, it can also be combined with the pressure, speed, time, frequency and other feature information when the user triggers the different representations corresponding to different meanings.

关于手势开启，终端设备预先设定某种手势作为启动指令，手势包括屏幕手势，例如双击屏幕/或长按屏幕等，通过屏幕手势开启时可以结合用户手势的压力、速度、时间、频率的不同表示对应的不同含义；如轻按，例如，压力小于第一预定值，重按，例如，压力大于或等于第一预定值，又如长按，例如，按压持续时间超过第二预定值，快速双击等任意一个表示开启。手势还包括隔空手势，如摇晃/翻转/倾斜终端，摇晃/翻转/倾斜时的不同方向，角度，速度，力度可以表示不同的含义，如上下摇晃、左右摇晃、空着画圆等任意一个表示开启该模式。上述手势可以是单一的手势，也可以是任意手势的任意组合，如长按屏幕并摇晃终端设备。Regarding gesture activation, the terminal device pre-sets a certain gesture as the activation command. The gesture includes screen gestures, such as double-clicking the screen/or long-pressing the screen. Indicates corresponding different meanings; such as light press, for example, the pressure is less than the first predetermined value, heavy press, for example, the pressure is greater than or equal to the first predetermined value, and long press, for example, the pressing duration exceeds the second predetermined value, fast Double-click any one of them to open. Gestures also include space gestures, such as shaking/flipping/tilting the terminal, different directions, angles, speeds, and strengths when shaking/flipping/tilting can represent different meanings, such as shaking up and down, shaking left and right, drawing a circle in an empty space, etc. Indicates that this mode is turned on. The above gestures can be a single gesture or any combination of any gestures, such as long pressing the screen and shaking the terminal device.

关于生物特征开启，生物特征包括但不限于手写特征和指纹特征，例如，终端设备在检测到的指纹与预先注册的用户指纹一致，则确认此时需要开启图像增强视频模式中的分辨率增强模式。Regarding the biometric feature, biometric features include but are not limited to handwriting features and fingerprint features. For example, if the fingerprint detected by the terminal device is consistent with the pre-registered user fingerprint, it is confirmed that the resolution enhancement mode in the image enhancement video mode needs to be turned on at this time. .

关于系统默认设置开启，终端设备在没有用户交互情况下默认设置图像增强视频模式中的分辨率增强模式为开启或者关闭状态。Regarding the system default setting of ON, the terminal device sets the resolution enhancement mode in the image enhancement video mode to ON or OFF by default without user interaction.

关于根据终端设备的设备相关状态自适应开启，设备相关状态包含电量、存储(如内存)、运动状态等，可以设置第一和第二预定电量，其中第一预定电量，例如20％，小于第二预定电量，例如80％，当终端设备的电量小于第一预定电量时，则关闭视频分辨率增强模式，当电量大于第二预定电量时，则开启视频分辨率增强模式，或者只设置一个开启电量，终端电量大于该开启电量时默认开启视频分辨率增强模式，否则关闭视频分辨率增强模式。Regarding the adaptive opening according to the device-related state of the terminal device, the device-related state includes power, storage (such as memory), motion state, etc., and the first and second predetermined power can be set, wherein the first predetermined power, for example 20%, is less than the first predetermined power. 2. A predetermined power level, such as 80%, when the power level of the terminal device is less than the first predetermined power level, the video resolution enhancement mode is turned off, and when the power level is greater than the second predetermined power level, the video resolution enhancement mode is turned on, or only one set to enable Power, when the terminal power is greater than the power on, the video resolution enhancement mode is turned on by default, otherwise the video resolution enhancement mode is turned off.

关于根据增强开启历史记录数据的自适应开启，统计最近若干次，例如 10次，采集过程中视频分辨率增强模式的开启次数，如果开启次数超过一定阈值，例如5次，则终端设备自动开启分辨率增强模式，否则关闭分辨率增强模式。或者根据上一次拍摄时的设置来确定本次拍摄是否开启。Regarding the adaptive startup based on the enhanced startup history data, count the number of recent times, such as 10 times, the number of times the video resolution enhancement mode is enabled during the collection process. rate enhancement mode, otherwise turn off the resolution enhancement mode. Or it can be determined whether the current shooting is enabled according to the settings of the previous shooting.

关于采集环境，由传感器采集得到，例如通过亮度传感器采集得到环境亮度等信息；可以根据环境亮度来自适应开启，例如当环境平均亮度低于设定阈值时，则开启该模式，否则关闭该模式。Regarding the collection environment, it is collected by sensors, for example, the environment brightness and other information are collected by the brightness sensor; it can be adaptively turned on according to the environmental brightness, for example, when the average brightness of the environment is lower than the set threshold, the mode is turned on, otherwise the mode is turned off.

关于采集参数，采集参数具体包括白平衡、曝光时间、感光度、高动态范围、分辨率、焦点区域、视频帧采集频率中的至少一项。可以根据采集参数来自适应开启，例如，当视频的曝光时间过长(高于设定阈值)时，则开启该模式，否则关闭该模式。Regarding the acquisition parameters, the acquisition parameters specifically include at least one of white balance, exposure time, sensitivity, high dynamic range, resolution, focus area, and video frame acquisition frequency. It can be turned on adaptively according to the acquisition parameters. For example, when the exposure time of the video is too long (higher than the set threshold), this mode is turned on, otherwise it is turned off.

关于根据实时采集的相关内容的自适应开启，具体的，实时采集的相关内容包含场景亮度、语义内容、显著物的清晰度等。可以根据场景亮度来自适应开启，例如当场景平均亮度低于设定阈值时，则开启该模式，否则关闭该模式。可以根据场景语义内容来自适应开启，例如检测到场景中有目标对象，如车辆、人物等出现时，则开启该模式，否则关闭该模式。可以检测到场景的显著区，如车牌区域等的信噪比，如果该显著区信噪比低于某个给定阈值，则终端设备自动开启分辨率增强模式。Regarding the adaptive activation according to the relevant content collected in real time, specifically, the relevant content collected in real time includes scene brightness, semantic content, clarity of salient objects, and the like. It can be turned on adaptively according to the brightness of the scene. For example, when the average brightness of the scene is lower than the set threshold, the mode will be turned on, otherwise, the mode will be turned off. It can be turned on adaptively according to the semantic content of the scene. For example, when it is detected that there are target objects in the scene, such as vehicles, people, etc., the mode will be turned on, otherwise, the mode will be turned off. The signal-to-noise ratio of the salient area of the scene, such as the license plate area, can be detected. If the signal-to-noise ratio of the salient area is lower than a given threshold, the terminal device automatically turns on the resolution enhancement mode.

步骤2：使用双目摄像头的一个摄像头拍摄图像，另一个摄像头拍摄视频。Step 2: Use one camera of the binocular camera to capture images and the other camera to capture video.

该步骤包括设置主、辅摄像头，设置摄像头参数和增强策略参数，以及设置关键帧的选取。This step includes setting main and auxiliary cameras, setting camera parameters and enhancement strategy parameters, and setting key frame selection.

设置两个多媒体采集设备中的主采集设备及辅采集设备；Set the main collection device and the auxiliary collection device in the two multimedia collection devices;

通过以下至少一种方式来设置两个多媒体采集设备中主采集设备及辅采集设备：The main collection device and the auxiliary collection device in the two multimedia collection devices are set in at least one of the following ways:

其中，自适应设备设置方式通过设备相关状态、设备设置历史记录数据及多媒体采集设备实时采集的多媒体信息的相关内容中的一项或多项信息来确定；Wherein, the adaptive device setting mode is determined by one or more pieces of information in the relevant state of the device, the historical record data of the device setting, and the relevant content of the multimedia information collected in real time by the multimedia collection device;

其中，设备相关状态包含设备电量状态和/或存储状态；多媒体采集设备实时采集的多媒体信息的相关内容包括以下至少一项：画面比例分布、目标物体在画面中的位置信息、画面质量信息。Wherein, the relevant state of the device includes the power state and/or storage state of the device; the relevant content of the multimedia information collected in real time by the multimedia collection device includes at least one of the following: screen ratio distribution, position information of the target object in the screen, and screen quality information.

步骤2.1，设置主、辅摄像头。Step 2.1, set the primary and secondary cameras.

令主摄像头采集视频信息，辅摄像头采集图像信息，终端设备可以采用以下三种方式中的一种来设置主摄像头和辅摄像头：一是终端设备预先设置 (如默认设置)；二是终端设备接收用户通过按键、手势、外部控制器等至少一种方式发送的设置触发操作进行设置；三是终端设备根据设备相关状态、设备设置历史记录数据、实时采集的相关内容等自适应调整来设定哪个摄像头为主摄像头，哪个摄像头为辅摄像头。To make the main camera collect video information and the auxiliary camera to collect image information, the terminal device can use one of the following three methods to set the main camera and the auxiliary camera: one is the terminal device preset (such as the default setting); the second is the terminal device receives The user performs settings through a setting trigger operation sent by at least one method such as buttons, gestures, and external controllers; third, the terminal device sets which one to set according to the relevant state of the device, the historical record data of device settings, and the relevant content collected in real time. The camera is the main camera, which camera is the secondary camera.

关于系统默认设置，终端设备默认其中一个摄像头为主，另外一个为辅摄像头，例如正对场景拍摄的某一侧摄像头为主摄像头，另一侧摄像头为辅摄像头。Regarding the system default settings, the terminal device defaults to one of the cameras as the main camera and the other as the auxiliary camera. For example, the camera on one side of the scene is the main camera, and the camera on the other side is the auxiliary camera.

关于按键设置，终端设备预先设定通过按键控制主、辅摄像头。按键可以为硬件按键，例如音量“+”健启动某一侧摄像头为主摄像头，音量“-” 健启动另一侧摄像头为主摄像头。按键也可以为虚拟按键，例如屏幕上的虚拟控件，菜单等。Regarding the button settings, the terminal device is preset to control the main and auxiliary cameras through the buttons. The buttons can be hardware buttons, for example, the volume "+" button activates the main camera on one side, and the volume "-" button activates the other side camera as the main camera. The keys can also be virtual keys, such as virtual controls on the screen, menus, etc.

关于手势设置，终端设备预先设定某个手势来切换主、辅摄像头，例如顺时针划圆表示设定某一侧为主摄像头，逆时针划圆表示设定另一侧为辅摄像头。Regarding gesture settings, the terminal device presets a certain gesture to switch between the main camera and the auxiliary camera. For example, swiping a circle clockwise means setting one side as the main camera, and swiping a circle counterclockwise means setting the other side as the auxiliary camera.

关于外部控制器，外部控制器包括但不限于：手写笔、遥控器、智能眼镜、智能头戴式设备等设备，这些设备跟可以通过以下技术之一来访问终端设备：wifi、NFC、蓝牙和数据网络，设备上配有按键或触摸屏等控制区域来控制主、辅摄像头启动，例如遥控器上的向上键表示某一侧摄像头为主，向下键表示另一侧摄像头为主。Regarding external controllers, external controllers include but are not limited to: stylus, remote control, smart glasses, smart headsets and other devices that can access end devices through one of the following technologies: wifi, NFC, Bluetooth and For data network, the device is equipped with a control area such as buttons or a touch screen to control the startup of the main and auxiliary cameras. For example, the up key on the remote control means that one side of the camera is the main camera, and the down key means that the other side is the main camera.

关于根据设备设置历史记录数据的自适应设置，统计最近若干次，例如 9次，采集中主、辅摄像头的设置方式，例如如果某一侧摄像头被设为主摄像头的次数较多，则终端设备启动时自动设置该侧的摄像头为主摄像头。或者根据上一次拍摄时的设置来确定本次拍摄时的主辅摄像头。Regarding the adaptive setting based on the historical record data of the device settings, count the most recent times, such as 9 times, the setting method of the main and auxiliary cameras in the collection. The camera on that side is automatically set as the main camera at startup. Or determine the main and auxiliary cameras in this shooting according to the settings in the previous shooting.

关于终端设备根据实时采集的相关内容自适应设置摄像头，终端设备根据两个摄像头采集到的内容选择主摄像头，例如，对拍摄内容进行打分，得分高的作为主摄像头。打分参数包括但不限于：画面比例分布、目标物体在画面中的位置信息、画面质量信息等。终端设备可以在拍摄过程中根据拍摄画面实时地自适应调整主摄像头，并将不同的主摄像头拍摄的视频片段按照拍摄时间进行拼接得到一个完整不间断的视频序列。Regarding the terminal device adaptively setting the camera according to the relevant content collected in real time, the terminal device selects the main camera according to the content collected by the two cameras. Scoring parameters include, but are not limited to: screen ratio distribution, position information of the target object in the screen, screen quality information, etc. The terminal device can adaptively adjust the main camera in real time according to the shooting picture during the shooting process, and splices the video clips shot by different main cameras according to the shooting time to obtain a complete and uninterrupted video sequence.

如果在视频拍摄过程中开启的图像增强视频的分辨率增强模式，则可以将当前拍摄视频信息的摄像头设置为主摄像头，另一个摄像头为辅摄像头拍摄图像信息；也可以根据上述实时采集内容自适应设置摄像头。If the resolution enhancement mode of the image-enhanced video is enabled during the video shooting process, the camera that currently captures video information can be set as the main camera, and the other camera is the auxiliary camera to capture image information; it can also be adaptive based on the above real-time acquisition content. Set up the camera.

步骤2.2，设置摄像头的采集参数和增强策略参数。Step 2.2, set the acquisition parameters and enhancement strategy parameters of the camera.

具体地，设置多媒体信息的采集参数及增强策略参数；Specifically, setting the acquisition parameters and enhancement strategy parameters of multimedia information;

获取两个多媒体采集设备基于采集参数分别采集的第一类多媒体信息和第二类多媒体信息；依据增强策略参数，根据第一类多媒体信息对第二类多媒体信息进行相应增强处理；Acquiring the first type of multimedia information and the second type of multimedia information respectively collected by the two multimedia collection devices based on the collection parameters; and performing corresponding enhancement processing on the second type of multimedia information according to the first type of multimedia information according to the enhancement strategy parameter;

其中，通过以下任一方式来设置多媒体信息的采集参数及增强策略参数：Wherein, the collection parameters and enhancement strategy parameters of multimedia information are set in any of the following ways:

其中，自适应参数设置方式通过设备相关状态、参数历史记录数据、采集环境及多媒体采集设备实时采集的多媒体信息的相关内容中的至少一项来确定；Wherein, the adaptive parameter setting mode is determined by at least one of the relevant state of the equipment, the parameter history record data, the collection environment and the relevant content of the multimedia information collected in real time by the multimedia collection equipment;

其中，设备相关状态包含以下至少一项：设备电量状态、设备存储状态、采集多媒体信息时的设备运动状态；多媒体采集设备实时采集的多媒体信息的相关内容包括以下至少一项：场景亮度、语义内容、显著物的清晰度、分辨率、曝光时间。The device-related state includes at least one of the following: device power state, device storage state, and device motion state when collecting multimedia information; the relevant content of multimedia information collected by the multimedia collecting device in real time includes at least one of the following: scene brightness, semantic content , the sharpness, resolution and exposure time of salient objects.

具体地，设置双目摄像设备的采集参数及在增强处理过程中的内部增强策略参数。摄像头的采集参数是采集过程中需要设定的摄像头参数，增强策略参数是所选取的视频增强方法中的内部参数等。其设置方式包括而不限于以下四种：第一种是预先参数设置(如系统默认的固定值设置)，即定义参数为固定值，例如在基于机器学习的亮度增强方法，设定像素块尺寸为固定值，如5个像素；第二种是终端设备接收用户通过语音、按键或外部控制器等至少一种方式发送的参数设置操作来设置参数；第三种是终端设备通过设备相关状态、参数历史记录数据、采集环境或实时采集的相关内容等自适应设置采集参数和增强策略参数；第四种是自适应设置结合用户调整设置进行参数设置，例如终端设备默认固定一套参数值，用户通过按键方式来调整某些参数值。Specifically, the acquisition parameters of the binocular camera device and the internal enhancement strategy parameters in the enhancement process are set. The acquisition parameters of the camera are the camera parameters that need to be set in the acquisition process, and the enhancement strategy parameters are the internal parameters in the selected video enhancement method. The setting methods include but are not limited to the following four: The first is the pre-parameter setting (such as the system default fixed value setting), that is, the parameter is defined as a fixed value, for example, in the brightness enhancement method based on machine learning, the pixel block size is set. It is a fixed value, such as 5 pixels; the second is that the terminal device receives the parameter setting operation sent by the user through at least one method such as voice, button or external controller to set parameters; Parameter historical record data, collection environment or related content collected in real time, etc. adaptively set acquisition parameters and enhanced strategy parameters; the fourth is adaptive setting combined with user adjustment settings to set parameters. Adjust some parameter values by pressing keys.

关于用户交互下的语音设置，例如终端设备预先设定语音开启指令“采集高动态范围的图像”，如果终端设备接受到该指令，则对声控指令进行语音识别，确定开启采集高动态范围图像。或者，终端设备预先设定语音开启指令“图像白平衡为日光灯”，则设置图像白平衡为日光灯。或者，终端设备预先设定语音开启指令“提高图像曝光值”，则提高图像的曝光值。Regarding the voice settings under user interaction, for example, the terminal device pre-sets the voice-on command “capture high dynamic range images”. If the terminal device receives this command, it will perform voice recognition on the voice control command and determine to enable the capture of high dynamic range images. Or, the terminal device pre-sets the voice turn-on command "image white balance is fluorescent light", then the image white balance is set to fluorescent light. Alternatively, the terminal device pre-sets a voice on command "increase the exposure value of the image" to increase the exposure value of the image.

关于用户交互下的按键设置，按键可以为硬件按键，例如“+”键表示增加曝光值，“-”键表示减少曝光值，“HOME”表示采集高动态范围图像。按键也可以为虚拟按键，例如屏幕上的滑动条，按钮，菜单等，交互界面上布局虚拟按键，终端设备检测到用户点击了该虚拟按键的事件后，确认改变设置参数。还可以结合用户按的压力、速度、时间、频率等多种特征信息表示不同的含义，如轻按代表减少曝光值，重按代表增强曝光值。Regarding the key settings under user interaction, the keys can be hardware keys, for example, the "+" key means to increase the exposure value, the "-" key means to reduce the exposure value, and "HOME" means to capture high dynamic range images. The button can also be a virtual button, such as a slide bar, button, menu, etc. on the screen. The virtual button is arranged on the interactive interface. After the terminal device detects the event that the user clicks the virtual button, it confirms to change the setting parameters. It can also be combined with the pressure, speed, time, frequency and other characteristic information of the user to express different meanings. For example, pressing lightly means reducing the exposure value, and pressing it again means increasing the exposure value.

关于用户交互下的外部控制器设置，外部控制器包括而不限于手写笔、遥控器、智能手表、智能眼镜、智能头戴式设备、智能衣服、或远程设备等，这些控制器以下列技术至少之一来访问终端设备：Wifi、红外、蓝牙、网络，控制器上配有按键或触摸屏等控制区域来控制终端设备，例如手写笔做出空中手势，设定向上为增加曝光值，向下为减少曝光值，终端设备识别出这些操作则启动调节参数，遥控器上设置调节白平衡、曝光、视频采集频率等按键，控制器检测到用户点击了按键，发送到终端设备来调节参数。Regarding the external controller settings under user interaction, the external controllers include, but are not limited to, a stylus, a remote control, a smart watch, smart glasses, a smart head-mounted device, a smart clothes, or a remote device, etc. These controllers use at least the following technologies One to access the terminal device: Wifi, infrared, Bluetooth, network, the controller is equipped with a control area such as buttons or touch screen to control the terminal device, such as a stylus to make air gestures, set up to increase the exposure value, down to increase the exposure value Reduce the exposure value. The terminal device recognizes these operations and starts to adjust the parameters. The remote control sets buttons to adjust white balance, exposure, and video capture frequency. The controller detects that the user has clicked the buttons and sends them to the terminal device to adjust the parameters.

关于根据场景亮度、语义内容的自适应设置，可以根据场景的类型来调节白平衡，例如终端设备识别出场景是白天，则调整白平衡为日光，场景是夜晚偏黄则调节白平衡为钨丝灯。可以根据场景中的兴趣目标来调节对焦，例如定位显著区域或人体区域并将对焦该区域。可以根据场景光线调整曝光量，例如检测到场景亮度均值低于给定阈值，则终端设备提高曝光量，否则减低曝光量。可以根据场景中亮度方差自适应调节高动态范围的采图数，例如亮度方差高于第一给定阈值则提高采图数，亮度方差低于第二给定阈值则减低采图数。可以根据采集图像的尺寸来设置亮度增强中的图像像素块的尺寸，例如定义像素块为图像尺寸乘以某一个比例系数。Regarding the adaptive settings based on scene brightness and semantic content, the white balance can be adjusted according to the type of the scene. For example, if the terminal device recognizes that the scene is daytime, adjust the white balance to daylight, and adjust the white balance to tungsten when the scene is yellowish at night. lamp. Focus can be adjusted based on objects of interest in the scene, such as locating a salient area or a body area and focusing on that area. The exposure can be adjusted according to the scene light. For example, if the average brightness of the scene is detected to be lower than a given threshold, the terminal device will increase the exposure, otherwise it will decrease the exposure. The number of images taken in the high dynamic range can be adaptively adjusted according to the brightness variance in the scene. For example, if the brightness variance is higher than the first given threshold, the number of images taken is increased, and if the brightness variance is lower than the second given threshold, the number of images taken is reduced. The size of the image pixel block in brightness enhancement can be set according to the size of the captured image, for example, the pixel block is defined as the image size multiplied by a certain scale factor.

关于根据设备相关状态的自适应设置，设备相关状态包括电量、存储(如内存)等，可以根据电量控制曝光值减低计算量，当电量小于第一预定电量，例如50％时，则减低曝光值，当电量小于第二预定电量，如5％时，则不进行高动态范围设定，其中，第一预定电量大于第二预定电量。还可以根据电量控制去模糊中的参数减少计算量，小于第一预定电量，例如50％，则减少模糊核的尺寸，小于第二预定电量，例如5％，则固定模糊核尺度为最小值，可以由电量控制基于机器学习颜色增强中的参数来减少计算量，小于第一预定电量，例如50％，则减少单词库侧重中的单词数，小于第二预定电量，例如 5％，则固定单词数为最小值，可以由电量确定亮度增强中的参数来减少计算量，小于第一预定电量，例如50％，则减少基于机器学习方法中待采样的像素数，小于第二预定电量，例如20％，则替换机器学习方法中的模型映射法为高斯混合模型法。可以根据内存调整视频帧采集频率，如果剩余内存大于第一预定空间，例如1G，则自动调整为指定的高采集频率，例如3640*1920；反之，如果剩余内存小于第二预定空间，例如300M，则调整为指定的低采集频率，例如1920*1080。Regarding the adaptive setting according to the relevant state of the device, the relevant state of the device includes power, storage (such as memory), etc., and the exposure value can be controlled according to the power to reduce the calculation amount. When the power is less than the first predetermined power, such as 50%, then reduce the exposure value , when the power level is less than the second predetermined power level, such as 5%, the high dynamic range setting is not performed, wherein the first predetermined power level is greater than the second predetermined power level. It is also possible to reduce the amount of calculation according to the parameters in the power control deblurring. If it is less than the first predetermined power, such as 50%, the size of the blur kernel is reduced, and if it is less than the second predetermined power, such as 5%, the size of the blur kernel is fixed to the minimum value. The amount of calculation can be reduced by power control based on the parameters in the color enhancement of machine learning. If it is less than the first predetermined power, such as 50%, the number of words in the word bank will be reduced. If it is less than the second predetermined power, such as 5%, the word will be fixed. The number is the minimum value, and the parameters in the brightness enhancement can be determined by the power to reduce the calculation amount. If it is less than the first predetermined power, such as 50%, the number of pixels to be sampled based on the machine learning method is reduced, which is less than the second predetermined power, such as 20 %, then replace the model mapping method in the machine learning method with the Gaussian mixture model method. The video frame capture frequency can be adjusted according to the memory. If the remaining memory is larger than the first predetermined space, such as 1G, it will be automatically adjusted to the specified high capture frequency, such as 3640*1920; otherwise, if the remaining memory is smaller than the second predetermined space, such as 300M, Then adjust to the specified low acquisition frequency, such as 1920*1080.

关于根据参数历史记录数据的自适应设置，例如根据用户设置的曝光值数历史记录数据调整曝光量，设置方式包括而不限于下面这种：计算参数历史记录数据中的曝光值和图像亮度均值，用最小二乘法回归出一个映射关系表，根据该映射表调整曝光值。例如根据用户喜好的设置调整高动态范围设置，设置方式包括而不限于下面这种，统计最近N次，例如10次，亮度增强中设置高动态范围的次数，如果次数>N/2则优先设置高动态范围。或者将本次拍摄时的参数值设置为上一次拍摄时的参数值。Regarding the adaptive setting according to the parameter historical record data, for example, adjusting the exposure amount according to the exposure value set by the user and the historical record data, the setting methods include but are not limited to the following: calculating the exposure value and the average image brightness in the parameter historical record data, A mapping table is regressed by the least squares method, and the exposure value is adjusted according to the mapping table. For example, adjust the high dynamic range setting according to the user's preferences. The setting methods include but are not limited to the following, count the most recent N times, such as 10 times, the number of times the high dynamic range is set in the brightness enhancement, if the number of times > N/2, it is set first High dynamic range. Or set the parameter value of the current shooting to the parameter value of the previous shooting.

步骤2.3，自适应设置关键帧采集频率。Step 2.3, adaptively set the key frame collection frequency.

当第一类多媒体信息为图像信息，第二类多媒体信息为视频信息时，获取一个多媒体采集设备采集的视频信息，以及另一个多媒体采集设备依据关键帧采集频率同时采集的与视频信息相对应的关键帧图像信息；根据采集的关键帧图像信息对采集的视频信息对应的需要增强的指标进行增强处理。When the first type of multimedia information is image information and the second type of multimedia information is video information, the video information collected by one multimedia collection device and the corresponding video information collected simultaneously by another multimedia collection device according to the key frame collection frequency are obtained. Key frame image information; according to the collected key frame image information, the index that needs to be enhanced corresponding to the collected video information is enhanced.

优选地，还包括：设置关键帧采集频率；其中，设置关键帧采集频率的方式包括以下至少一项：Preferably, it also includes: setting the key frame collection frequency; wherein, the method for setting the key frame collection frequency includes at least one of the following:

其中，自适应频率设置方式通过设备相关状态、采集频率历史记录数据、采集环境、采集参数及多媒体采集设备实时采集的多媒体信息的相关内容中的一项或多项信息来确定；Wherein, the adaptive frequency setting mode is determined by one or more pieces of information in the relevant state of the equipment, the historical record data of the collection frequency, the collection environment, the collection parameters, and the relevant content of the multimedia information collected in real time by the multimedia collection equipment;

其中，设备相关状态包含以下至少一项：设备电量状态、设备存储状态、采集多媒体信息时的设备运动状态；多媒体采集设备实时采集的多媒体信息的相关内容包括场景亮度、语义内容中的至少一项。The device-related state includes at least one of the following: device power state, device storage state, and device motion state when collecting multimedia information; the relevant content of the multimedia information collected by the multimedia collecting device in real time includes at least one of scene brightness and semantic content .

关于预设频率设置，可以按照系统固定频率选取关键帧，例如将采集频率固定为给定频率，例如1次/秒，如果视频采集频率为30帧/秒，即为每30 帧采集一个图像。Regarding the preset frequency setting, you can select key frames according to the fixed frequency of the system. For example, the capture frequency is fixed to a given frequency, such as 1 time/second. If the video capture frequency is 30 frames/second, it means that an image is captured every 30 frames.

关于根据实时采集的相关内容自适应设定，可根据环境亮度变换自适应选取关键帧采集频率，实时检测环境的亮度均值，如果相邻帧的亮度均值之差大于第一阈值，例如50，则启动关键帧选取。可以根据语义内容自适应选取关键帧，提取环境的整体特征描述因子，包括而不限于以下几种：颜色直方图、梯度直方图、纹理直方图和神经网络训练出的特征，计算相邻帧的特征描述因子之差，如果描述因子之差大于第二阈值，则启动关键帧选取。可以结合环境亮度或内容和给定频率选取关键帧，同样是定频率采集场景环境，但采集频率根据场景变动进行自适应调整。如果场景环境亮度或内容频繁进行更替，则将关键帧采集频率增大，相反，如果场景环境亮度或内容基本保持不变，则将关键帧采集频率减低。Regarding the adaptive setting based on the relevant content collected in real time, the key frame collection frequency can be adaptively selected according to the environmental brightness transformation, and the average brightness value of the environment can be detected in real time. If the difference between the average brightness values of adjacent frames is greater than the first threshold, such as 50, then Initiate keyframe selection. Keyframes can be adaptively selected according to the semantic content, and the overall feature description factors of the environment can be extracted, including but not limited to the following: color histogram, gradient histogram, texture histogram, and features trained by neural networks, and calculate the characteristics of adjacent frames. The difference between the feature description factors. If the difference between the description factors is greater than the second threshold, key frame selection is started. The key frame can be selected in combination with the ambient brightness or content and a given frequency. The scene environment is also collected at a fixed frequency, but the collection frequency is adaptively adjusted according to the scene changes. If the scene environment brightness or content changes frequently, increase the key frame collection frequency. On the contrary, if the scene environment brightness or content remains basically unchanged, reduce the key frame collection frequency.

关于根据设备相关状态选取关键帧采集频率，设备相关状态包括电量、存储(如内存)、设备运动状态等。可以根据电量设定关键帧采集频率，例如当电量小于第一预定电量时，例如50％，则减低关键帧采集频率，当电量大于第二预定电量，如80％时，则提高关键帧采集频率，其中，第一预定电量小于第二预定电量。可以根据内存设置关键帧频率，例如当内存小于第一预定值时，例如500M，则降低关键帧采集频率，当内存大于第二预定值时，例如700M，则提高关键帧采集频率。也可以根据终端设备运动状态调整采集频率，根据终端设备内部传感器判断终端设备的运动状态，如果运动幅度大于某一阈值，则提高关键帧采集频率，以保证得到足够的高质量关键帧图像。Regarding the selection of the key frame collection frequency according to the device-related state, the device-related state includes power, storage (eg, memory), device motion state, and the like. The key frame collection frequency can be set according to the power. For example, when the power is less than the first predetermined power, such as 50%, the key frame collection frequency is reduced, and when the power is greater than the second predetermined power, such as 80%, the key frame collection frequency is increased. , wherein the first predetermined power level is less than the second predetermined power level. The key frame frequency can be set according to the memory. For example, when the memory is less than the first predetermined value, such as 500M, the key frame collection frequency is reduced, and when the memory is greater than the second predetermined value, such as 700M, the key frame collection frequency is increased. The acquisition frequency can also be adjusted according to the motion state of the terminal device, and the motion state of the terminal device can be judged according to the internal sensor of the terminal device. If the motion amplitude is greater than a certain threshold, the key frame acquisition frequency is increased to ensure that enough high-quality key frame images are obtained.

步骤3，对视频信息进行分辨率增强。Step 3, enhancing the resolution of the video information.

具体地，当需要增强的指标包括分辨率、颜色、亮度中的至少一项时，增强处理的方式包括基于多视图重建的增强方式，和/或基于机器学习构建增强模型的增强方式。Specifically, when the index that needs to be enhanced includes at least one of resolution, color, and brightness, the enhancement processing method includes an enhancement method based on multi-view reconstruction, and/or an enhancement method based on building an enhanced model based on machine learning.

其中，基于多视图重建的增强方式，具体包括：建立采集到的视频信息的视频像素和关键帧图像信息的图像像素的匹配关系，通过图像像素替换相匹配的视频像素。Wherein, the enhancement method based on multi-view reconstruction specifically includes: establishing a matching relationship between the video pixels of the collected video information and the image pixels of the key frame image information, and replacing the matched video pixels by the image pixels.

其中，基于机器学习构建增强模型的方式，具体包括：在采集到的视频信息的关键帧图像所在位置处提取视频像素；以机器学习方式建立视频像素与关键帧图像信息的图像像素的映射增强模型；在采集到的视频信息的非关键帧图像所在位置，通过映射增强模型来转换视频像素。Wherein, the method of building an enhanced model based on machine learning specifically includes: extracting video pixels at the location of the key frame image of the collected video information; building a mapping enhancement model between the video pixels and the image pixels of the key frame image information by means of machine learning ; Convert the video pixels through the mapping enhancement model at the location of the non-key frame image of the captured video information.

步骤4，增强后的视频的存储。Step 4, storage of the enhanced video.

第一类多媒体信息为图像信息，第二类多媒体信息为视频信息时，根据第一类多媒体信息对第二类多媒体信息进行相应处理，具体包括：根据采集的图像信息对采集的视频信息进行存储处理，其中，存储内容包括以下至少一种情形：When the first type of multimedia information is image information, and the second type of multimedia information is video information, correspondingly processing the second type of multimedia information according to the first type of multimedia information, specifically including: storing the collected video information according to the collected image information processing, wherein the stored content includes at least one of the following situations:

在本实施例中终端设备生成四类数据：采集的原始视频信息、关键帧图像信息、分辨率增强的映射增强模型和增强后的视频信息。In this embodiment, the terminal device generates four types of data: collected original video information, key frame image information, a resolution-enhanced mapping enhancement model, and enhanced video information.

第一种是在步骤3之后，直接存储增强后的视频信息，不保存关键帧图像信息，即在存储前完成增强处理，在存储时，视频框中显示原始视频帧中的画面，上面显示缓冲图标，表示正在进行增强处理，缓冲完成后，存储结束。The first is to directly store the enhanced video information after step 3, without saving the key frame image information, that is, to complete the enhancement processing before storage, and during storage, the video frame displays the picture in the original video frame, and the buffer is displayed above. The icon indicates that enhancement processing is in progress. After buffering is completed, the storage ends.

第二种是先保存原始视频信息以及学习到的增强模型，不保存关键帧图像信息。在用户打开视频时进行步骤3的增强处理。由于每个视频片段有一个增强模型，将所有的增强模型单独存储，建立一个增强模型和视频帧号的映射表。The second is to save the original video information and the learned enhanced model first, without saving the key frame image information. The enhancement process of step 3 is performed when the user opens the video. Since each video segment has an enhancement model, all enhancement models are stored separately, and a mapping table between enhancement models and video frame numbers is established.

第三种是先存储原始视频信息以及拍摄的关键帧图像信息，这种方式不需要在存储前对原始视频信息进行处理，完成拍摄，可即刻完成存储。终端设备自动根据处理器的忙闲来安排处理时间，学习增强模型对原始视频信息进行步骤3的增强处理，增强处理完成后删除关键帧图像信息。该存储方式也是本发明提出的一种高清视频压缩存储的方法。为了节省视频的存储空间，可以存储低分辨率视频信息和高分辨率图像信息来取代直接存储高分率视频信息。低分辨率视频信息和高清图像信息可以分别由本发明中两个摄像头同时采集视频信息和图像信息得到，也可以直接将高分辨率视频信息进行关键帧提取得到高分率图像信息，然后将原始高分辨率视频信息压缩得到低分辨率视频信息。基于低分辨率视频信息和关联的高分率图像信息，可以利用分辨率增强方法得到高清分辨率视频信息。The third method is to store the original video information and the captured key frame image information first. This method does not need to process the original video information before storing. After shooting, the storage can be completed immediately. The terminal device automatically arranges the processing time according to the busyness of the processor, learns the enhancement model to perform the enhancement processing in step 3 on the original video information, and deletes the key frame image information after the enhancement processing is completed. This storage method is also a high-definition video compression storage method proposed by the present invention. In order to save video storage space, low-resolution video information and high-resolution image information can be stored instead of directly storing high-resolution video information. The low-resolution video information and the high-definition image information can be obtained by simultaneously collecting the video information and the image information by the two cameras in the present invention, or the high-resolution video information can be directly extracted from the key frame to obtain the high-resolution image information, and then the original high-resolution image information can be obtained. The high-resolution video information is compressed to obtain low-resolution video information. Based on low-resolution video information and associated high-resolution image information, resolution enhancement methods can be used to obtain high-definition resolution video information.

第四种是经过步骤3增强处理之后，存储增强后的视频信息，同时保存关键帧图像信息，增强后视频信息的获取方式可以是前三种存储方式的任意一种得到，关键帧图像信息可以跟视频信息同时保存在视频序列中，也可以保存在照片列表中，建立视频信息和图像信息的关联关系。保存关键帧图像信息可以给用户提供一些高清的图像信息供用户进行其他操作。The fourth method is to store the enhanced video information and save the key frame image information after the enhancement processing in step 3. The enhanced video information can be obtained by any one of the first three storage methods, and the key frame image information can be obtained by It can be saved in the video sequence at the same time as the video information, and can also be saved in the photo list to establish the association between the video information and the image information. Saving key frame image information can provide users with some high-definition image information for users to perform other operations.

存储方式通过以下至少一项来设置：The storage method is set by at least one of the following:

根据检测到的设置操作来确定存储方式；Determine the storage method according to the detected setting operation;

根据预先设置(如系统默认设置)来确定存储方式；Determine the storage method according to the preset settings (such as system default settings);

依据自适应存储设置方式来自适应确定存储方式。The storage mode is adaptively determined according to the adaptive storage setting mode.

具体地，自适应存储设置方式通过设备相关状态、存储设置历史记录数据中的至少一项来确定。Specifically, the adaptive storage setting mode is determined by at least one of a device-related state and storage setting history data.

设置操作通过用户的语音、按键、按键、手势、对外部控制器的操控来实现。设备相关状态包括存储空间、电量等设备信息。The setting operation is realized through the user's voice, keys, keys, gestures, and manipulation of external controllers. Device-related status includes device information such as storage space and power.

即针对如何设置第一存储方式，本发明给出三种设置方法，终端设备可以根据以下三种中的一种来对第一存储方式进行选择。第一种是终端设备默认设置；第二种是终端设备接受用户通过语音、按键或外部控制器等方式以及这些方式的组合来设备或更改存储方式；第三种是终端设备通过存储空间、电量或历史数据自适应设置存储方式。That is, for how to set the first storage mode, the present invention provides three setting methods, and the terminal device can select the first storage mode according to one of the following three methods. The first one is the default setting of the terminal device; the second one is that the terminal device accepts the user to configure or change the storage method by means of voice, keys or external controllers, and a combination of these methods; the third one is that the terminal device uses storage space, power Or set the storage method of historical data adaptively.

关于系统默认设置，终端设备设置四种存储方式中的一种作为默认值，在终端设备没有接收到更改存储方式的指令前都使用该默认存储方式对视频和图像进行存储。Regarding the system default settings, the terminal device sets one of the four storage modes as the default value, and uses this default storage mode to store videos and images until the terminal device receives an instruction to change the storage mode.

关于用户交互下的语音设置，例如终端设备预先设定语音指令“储存增强后的视频”，如果接收到该指令，则对声控指令进行语音识别，确定设置存储方式为存储增强后的视频信息。Regarding the voice settings under user interaction, for example, the terminal device presets the voice command "store enhanced video". If the command is received, it will perform voice recognition on the voice control command, and determine that the storage method is set to store the enhanced video information.

关于用户交互下的按键设置，按键可以为硬件按键，例如音量上下键来对四种存储方式进行选择，home键来确定当前选择的存储方式为最终的存储方式。按键也可以为虚拟按键，例如屏幕上的按钮、菜单，交互界面上的虚拟键盘等，终端设备检测到用户点击该虚拟按键的事件后，确认选择的存储方式。Regarding the key settings under user interaction, the keys can be hardware keys, such as the volume up and down keys to select four storage modes, and the home key to determine the currently selected storage mode as the final storage mode. The keys can also be virtual keys, such as buttons on the screen, menus, virtual keyboards on the interactive interface, etc. After the terminal device detects the event that the user clicks the virtual key, it confirms the selected storage method.

关于用户交互下的手势设置，终端设备预先设定某种手势来选择增强方式。手势包括屏幕手势，例如从左向右或者从右向左滑动屏幕来更换存储方式。手势还包括隔空手势，包括摇晃/倾斜终端，摇晃/倾斜时的不同方向表示不同的含义，如上下摇晃更换存储方式，左右倾斜来更换存储方式，上述手势可以是单一的手势，也可以是任意手势的任意组合，例如右手横向滑动选择增强方式，同时上下摇晃表示确定当前选择的存储方式为最终的存储方式。Regarding gesture settings under user interaction, the terminal device pre-sets a certain gesture to select the enhancement mode. Gestures include screen gestures, such as swiping the screen from left to right or right to left to change storage modes. Gestures also include space gestures, including shaking/tilting the terminal. Different directions when shaking/tilting represent different meanings, such as shaking up and down to change the storage method, and tilting left and right to change the storage method. The above gestures can be a single gesture or Any combination of any gestures, such as sliding the right hand horizontally to select the enhancement mode, and shaking up and down at the same time indicates that the currently selected storage mode is the final storage mode.

关于用户通过外部控制器设置，外部控制器包括而不限于手写笔、遥控器、智能手表、智能眼镜、智能头盔、智能衣服、或远程设备等，这些控制器通过Wifi和/或红外和/或蓝牙和/或网络跟交互界面交通，例如，遥控器上某些按键代表设置不同的存储方式，检测到用户点击了按键，发送到交互控制系统，设置存储方式。Regarding user settings via external controllers, external controllers include, but are not limited to, styluses, remote controls, smart watches, smart glasses, smart helmets, smart clothes, or remote devices, etc., which control via Wifi and/or infrared and/or Bluetooth and/or the network communicate with the interactive interface. For example, some buttons on the remote control represent setting different storage methods. It is detected that the user has clicked the button and sent to the interactive control system to set the storage method.

关于根据存储空间的自适应设置，根据存储空间，可以选择不同的存储方式，如果剩余存储空间小于某一阈值，例如低于终端设备存储空间的50％，则设置为第三种压缩存储方式；反之，如果剩余存储空间高于某一阈值，例如高于终端设备存储空间的50％，则存储方式不受存储空间影响。Regarding the adaptive setting according to the storage space, different storage methods can be selected according to the storage space. If the remaining storage space is less than a certain threshold, for example, lower than 50% of the storage space of the terminal device, the third compression storage method is set; Conversely, if the remaining storage space is higher than a certain threshold, for example, higher than 50% of the storage space of the terminal device, the storage method is not affected by the storage space.

关于根据电量的自适应设置，可以根据电量控制存储方式，当电量小于某一阈值，例如低于50％时，则选择耗电量小的存储方式，即第三种存储方式，即直接存储原始视频信息和关键帧图像信息者第二种存储方式，即原始视频和学习模型，不对视频进行增强处理；当电量小于第二预定电量，例如低于15％时，则选择耗电量最小的第三种存储方式，即原始视频信息和关键帧图像信息；如果电量高于某一阈值，例如高于50％，则存储方式不受电量影响。Regarding the adaptive setting according to the power, the storage method can be controlled according to the power. When the power is less than a certain threshold, for example, lower than 50%, the storage method with low power consumption is selected, that is, the third storage method, that is, the original storage method is directly stored. The second storage method of video information and key frame image information is the original video and the learning model, and the video is not enhanced; when the power is less than the second predetermined power, for example, less than 15%, the first one with the least power consumption is selected. Three storage methods, namely original video information and key frame image information; if the battery level is higher than a certain threshold, for example, higher than 50%, the storage method is not affected by the battery level.

关于根据存储设置历史记录数据的自适应设置，例如根据用户以往设置的存储方式，来对用户喜好进行分析，设置为用户偏好的存储方式。Regarding the adaptive setting according to the storage setting history data, for example, according to the storage method previously set by the user, the user's preference is analyzed, and the user's preferred storage method is set.

步骤5，视频的播放。Step 5, video playback.

例如，终端设备根据检测到的播放触发操作，对存储的视频进行播放。For example, the terminal device plays the stored video according to the detected play triggering operation.

方案一：该播放方式对应的存储方式为存储增强后的视频信息。终端设备检测到用户打开视频的操作，直接打开增强后的视频信息。检测到用户点击播放的操作，例如，检测到用户点击播放按钮的操作，即可马上进行视频播放，这样，用户直接可以查看增强后的效果。Solution 1: The storage mode corresponding to the playback mode is to store the enhanced video information. The terminal device detects the user's operation of opening the video, and directly opens the enhanced video information. When the operation of the user clicking to play is detected, for example, the operation of the user clicking the play button is detected, the video can be played immediately, so that the user can directly view the enhanced effect.

方案二：该播放方式对应的存储方式为存储原始视频信息和增强模型。终端设备检测到用户的打开操作，打开原始视频信息和增强模型的组合。检测到用户点击播放的操作，例如，检测到用户点击播放按钮的操作，终端设备开始基于增强模型对原始视频信息进行增强处理。在允许的缓冲时间内完成操作，然后播放增强后的视频信息。Scheme 2: The storage mode corresponding to this playback mode is to store the original video information and the enhanced model. The terminal device detects the user's opening operation and opens the combination of the original video information and the enhanced model. After detecting the operation of the user clicking to play, for example, detecting the operation of the user clicking the play button, the terminal device starts to enhance the original video information based on the enhancement model. The operation is completed within the allowed buffer time, and then the enhanced video information is played.

方案三：该播放方式对应的存储方式为存储原始视频信息和关键帧图像信息。终端设备检测到用户的打开操作，打开原始视频信息和拍摄的关键帧图像信息组合。检测到用户点击播放的操作，例如，检测到用户点击播放按钮的操作，如果终端设备已经完成了增强处理，则可以直接播放视频查看增强后的视频信息。如果终端设备在后台只是做了部分工作，没有完成增强步骤，则接收到用户点击播放操作后需要时间缓冲来进行视频增强，完成增强后开始播放增强后的视频信息，用户可以看到增强后的视频信息。Solution 3: The storage mode corresponding to this playback mode is to store the original video information and key frame image information. The terminal device detects the user's opening operation, and opens the combination of the original video information and the captured key frame image information. The operation of the user clicking to play is detected, for example, the operation of the user clicking the play button is detected. If the terminal device has completed the enhancement processing, the video can be directly played to view the enhanced video information. If the terminal device only does part of the work in the background and does not complete the enhancement step, it will need time buffering to perform video enhancement after receiving the user's click play operation. After the enhancement is completed, the enhanced video information will be played, and the user can see the enhanced video information. video information.

方案四：该播放方式对应的存储方式为存储增强后的视频信息和关键帧图像信息。终端设备检测到用户的打开操作，打开增强后的视频信息和关键帧图像信息的组合，增强的视频信息的播放方式不仅包括了前三种的播放方式，而且基于关键帧图像信息可以得到更多的播放方式。在建立视频信息和图像序列的关联关系后，可以通过图像序列链接到视频信息，也可以通过视频信息链接到图像序列，通过长按关键帧图像序列或者视频播放按钮完成这种图像和视频间的跳转，也可以通过语音，手势等设置来完成跳转。Scheme 4: The storage mode corresponding to this playback mode is to store the enhanced video information and key frame image information. The terminal device detects the user's opening operation, and opens the combination of the enhanced video information and key frame image information. The enhanced video information playback mode not only includes the first three playback modes, but also can get more information based on the key frame image information. playback method. After establishing the relationship between the video information and the image sequence, you can link to the video information through the image sequence, or link to the image sequence through the video information, and complete this kind of image and video by long pressing the key frame image sequence or the video play button. Jump, you can also complete the jump through voice, gesture and other settings.

方案五：终端设备检测到用户的打开操作，打开增强后的视频和关键帧的组合。在存储时记录每个关键帧在视频中出现的位置，用户可以通过点击关键帧序列中的某张图像链接到相应的视频位置，从该位置开始播放视频。可以但不限于长按视频，出现关键帧序列，点击序列中的某张图像开始播放视频。查看视频时，呈现给用户的是一个图像组合，用户可以点开图像序列进行查看，然后点击图像进行视频播放。Solution 5: The terminal device detects the user's opening operation, and opens the combination of the enhanced video and key frames. The position of each key frame in the video is recorded during storage, and the user can click an image in the key frame sequence to link to the corresponding video position and start playing the video from that position. It is possible but not limited to long-press the video, a sequence of key frames will appear, and click an image in the sequence to start the video. When viewing a video, the user is presented with a combination of images, the user can click on the image sequence to view it, and then click on the image to play the video.

方案六：终端设备检测到用户的打开操作，例如检测到用户点击播放的操作，显示缓冲标志，进行分辨率转换，该转换可以包括但不限于以下几种方法：一是将整个视频片段转换成高分辨率视频信息后从头播放；二是缓冲一部分就开始播放，边播放边进行视频转换，受限于手机的处理能力，播放中可能会出现中断来完成缓冲；三是可以点击关键帧图像信息中的某一张，只从该张图像信息对应的视频位置为起点之后的视频片段进行转换，缓冲方式可以是前两种的任意一种，从图像信息对应的视频位置进行播放；四是用户可以选择播放低分辨率视频信息进行查看，如果对视频很感兴趣，可以通过按钮或者其他操作进行高分辨率视频播放，选择高分辨率播放后，可以按照前三种播放方法进行播放。也可以将压缩视频进行分享，减少手机能源消耗，在对方进行播放时可以进行多种播放选择。Scheme 6: The terminal device detects the user's opening operation, for example, detects the user's click-to-play operation, displays the buffer mark, and performs resolution conversion. The conversion may include but not limited to the following methods: First, convert the entire video clip into The high-resolution video information is played from the beginning; the second is to start playing a part of the buffer, and the video conversion is performed while playing. Due to the processing capacity of the mobile phone, there may be interruptions during playback to complete the buffering; the third is to click on the key frame image information. For a certain one, only convert the video clip after the starting point from the video position corresponding to the image information. You can choose to play low-resolution video information for viewing. If you are interested in the video, you can play high-resolution video through buttons or other operations. After selecting high-resolution playback, you can play it according to the first three playback methods. You can also share the compressed video to reduce the energy consumption of the mobile phone, and you can choose a variety of playback options when the other party is playing.

实施例二：增强处理模式为图像增强视频模式中的颜色增强模式Embodiment 2: The enhancement processing mode is the color enhancement mode in the image enhancement video mode

从视频亮度和颜色增强方面来说，目前，终端设备在低光照情形下拍摄的视频普遍亮度暗、质量差。现有技术大都是通过一些先验模型对视频进行处理，比如，设置非线性映射表，如图2所示，对每帧图像根据亮度信息计算得到该帧图像的直方图均衡映射表，根据预设映射表以及计算得到的映射表加权得到一个转换曲线，根据这一曲线对视频帧中的像素点亮度进行调整来对视频帧进行增强。另一种方式为对视频进行伽马校正进行预处理得到视频的传输参数，使用改进的图像退化还原模型根据原始视频和预处理得到的传输参数处理得到最终的增强结果。上述方法都是基于视频自带信息来进行增强的，增强强度要通过人为设置参数来进行调整。在视频亮度及颜色调整方面，利用一些预设的映射曲线对视频中每帧图像进行调整可以达到亮度颜色的变化，但是其变化的基础也受限于预设的参数，而且变化趋势是否适用于各种场景也有待斟酌。在不同的光照条件下，调节的参数需要自适应的进行调整，预设的映射曲线有可能会出现不真实的变化结果，比如夜间拍摄的视频被调整的过亮或者一些图像区域颜色失真。In terms of video brightness and color enhancement, at present, videos shot by terminal devices in low-light situations are generally dark in brightness and poor in quality. In the prior art, video is mostly processed by some prior models, for example, a non-linear mapping table is set, as shown in FIG. 2 , the histogram equalization mapping table of each frame of image is calculated according to the brightness information, and according to the pre- The mapping table and the calculated mapping table are weighted to obtain a conversion curve, and the brightness of the pixels in the video frame is adjusted according to the curve to enhance the video frame. Another way is to perform gamma correction on the video to obtain the transmission parameters of the video, and use the improved image degradation restoration model to process the original video and the transmission parameters obtained by the preprocessing to obtain the final enhancement result. The above methods are enhanced based on the video's own information, and the enhancement strength should be adjusted by manually setting parameters. In terms of video brightness and color adjustment, using some preset mapping curves to adjust each frame of image in the video can achieve the change of brightness and color, but the basis of the change is also limited by the preset parameters, and whether the change trend is suitable for Various scenarios are also up for consideration. Under different lighting conditions, the adjusted parameters need to be adjusted adaptively, and the preset mapping curve may have unrealistic changes, such as the video shot at night being adjusted to be too bright or the color distortion of some image areas.

该实施例的基本流程为：首先开启图像增强视频模式中的颜色增强模式，启动两个摄像头分别采集图像信息和视频信息；其次，设置主、辅摄像头，设置摄像头的采集参数和增强策略参数，以及设置关键帧图像信息的选取，同时根据增强策略参数和关键帧图像信息对视频进行颜色增强，最后对采集结果进行压缩、传输和播放。The basic process of this embodiment is as follows: firstly, enable the color enhancement mode in the image enhancement video mode, start two cameras to collect image information and video information respectively; secondly, set the main and auxiliary cameras, and set the acquisition parameters and enhancement strategy parameters of the cameras, And set the selection of key frame image information, at the same time enhance the color of the video according to the enhancement strategy parameters and the key frame image information, and finally compress, transmit and play the collected results.

步骤1：开启图像增强视频模式中的颜色增强模式。Step 1: Turn on Color Enhancement Mode in Image Enhancement Video Mode.

在本实施例中，步骤1采用与实施例一中类似的开启方式，区别在于开启功能的指令描述内容，例如，语音开启中的指令为“启动增强视频颜色”，按键开启的指令为长按Home键，虚拟按键中增强视频颜色的按钮，手持开启指令为摇晃终端等，在此不再赘述。In this embodiment, step 1 adopts a similar opening method to that in the first embodiment, the difference lies in the description content of the command to turn on the function, for example, the command in the voice turn on is "start enhanced video color", and the command to turn on the key is long press The Home key, the button to enhance the video color in the virtual key, the hand-held open command is to shake the terminal, etc., which will not be repeated here.

步骤2：使用双目摄像头的一个摄像头拍摄图像信息，另一个摄像头拍摄视频信息。Step 2: Use one camera of the binocular camera to capture image information and the other camera to capture video information.

该步骤主要包括采集参数及增强策略参数的设置，主、辅摄像头的设置，采集图像和视频的参数设置，以及关键帧图像信息的选取。This step mainly includes the setting of acquisition parameters and enhancement strategy parameters, the settings of main and auxiliary cameras, the parameter settings of acquired images and videos, and the selection of key frame image information.

在本实施例中，设置主、辅摄像头的方式可以采用实施例一步骤2.1中类似的设置主、辅摄像头方式，在此不再赘述。In this embodiment, the method for setting the main and auxiliary cameras can be similar to the method for setting the main and auxiliary cameras in step 2.1 of the first embodiment, and details are not repeated here.

在本实施例中，设置摄像头的采集参数和增强策略参数的方式可以采用实施例一步骤2.2中类似的设置采集参数和增强策略参数的方式，在此不再赘述。In this embodiment, the manner of setting the acquisition parameters of the camera and the enhancement strategy parameters may be similar to the manner of setting the acquisition parameters and the enhancement strategy parameters in step 2.2 of the first embodiment, which will not be repeated here.

在本实施例中，关键帧的选取方式可以采用实施例一步骤2.3中类似的关键帧选取方式，在此不再赘述。In this embodiment, the key frame selection method can be similar to the key frame selection method in step 2.3 of the first embodiment, which is not repeated here.

步骤3，对视频信息进行颜色增强处理。Step 3, performing color enhancement processing on the video information.

以关键帧为界限将视频划分为一个个片段，用片段两侧的关键帧图像对该视频片段进行增强。颜色增强方式包括而不限于以下两种：一种基于多视图重建的方法，一种基于机器学习的方法。The video is divided into segments based on keyframes, and the video segment is enhanced with keyframe images on both sides of the segment. Color enhancement methods include but are not limited to the following two methods: a method based on multi-view reconstruction and a method based on machine learning.

关于基于多视图重建的颜色增强，以多视图重建方法建立视频像素和图像像素的匹配关系，用图像像素来替换生成视频像素。Regarding color enhancement based on multi-view reconstruction, a multi-view reconstruction method is used to establish the matching relationship between video pixels and image pixels, and image pixels are used to replace the generated video pixels.

关于基于机器学习的颜色增强，在关键帧处以机器学习方式建立视频像素和图像像素的映射模型，在非关键帧处以该映射模型来转换视频像素。With regard to machine learning based color enhancement, a mapping model of video pixels and image pixels is built in a machine learning manner at key frames, and video pixels are transformed with this mapping model at non-key frames.

步骤4，视频的存储。Step 4, the storage of the video.

在本实施例中终端设备生成四类数据：采集的原始视频信息、关键帧图像信息、颜色增强模型和增强后的视频信息。针对不同的数据类型，存储方式包括而不限于以下四种。In this embodiment, the terminal device generates four types of data: collected original video information, key frame image information, color enhancement model, and enhanced video information. For different data types, storage methods include but are not limited to the following four.

本实施例中的四种存储方式和实施例一中步骤4介绍的四种存储方式相同，在此不再赘述。The four storage modes in this embodiment are the same as the four storage modes introduced in step 4 in the first embodiment, and are not repeated here.

设置存储方式的方法采用与实施例一中步骤4类似的设置方法，不同的是该增强模式不改变视频分辨率，在四种存储方式中，第一种存储增强后的视频信息占用空间最小，当存储空间小于某一阈值，例如小于终端设备存储空间的50％，则设置第一种存储方式。其他设置方法相同，在此不再赘述。The method for setting the storage mode adopts the setting method similar to step 4 in the first embodiment, the difference is that the enhanced mode does not change the video resolution, among the four storage modes, the first storage enhanced video information occupies the smallest space, When the storage space is less than a certain threshold, for example, less than 50% of the storage space of the terminal device, the first storage mode is set. The other setting methods are the same and will not be repeated here.

步骤5，视频的播放。Step 5, video playback.

在对双目摄像头采集的数据进行压缩、存储和传输后，在播放阶段中解压并播放高质量视频信息。针对不同的存储方式，其播放方式包括而不限于实施例一中步骤5介绍的前五种中的一种，五种播放模式与实施例一步骤5 中前五种描述相同，在此不再赘述。After the data collected by the binocular camera is compressed, stored and transmitted, the high-quality video information is decompressed and played back in the playback stage. For different storage modes, the playback modes include, but are not limited to, one of the first five described in step 5 in the first embodiment. The five playback modes are the same as the first five in step 5 in the first embodiment. Repeat.

实施例三：增强处理模式为图像增强视频模式中的亮度增强模式Embodiment 3: The enhancement processing mode is the brightness enhancement mode in the image enhancement video mode

该实施例的基本流程为：首先开启图像增强视频模式中的亮度增强模式，启动两个摄像头分别采集图像信息和视频信息，其次设置摄像头的采集参数和增强策略参数，以及设置关键帧图像信息的选取，同时根据增强策略参数和关键帧图像信息对视频进行亮度增强，最后对采集结果进行压缩、传输和播放。The basic process of this embodiment is: firstly enable the brightness enhancement mode in the image enhancement video mode, start two cameras to collect image information and video information respectively, secondly set the camera's acquisition parameters and enhancement strategy parameters, and set the key frame image information At the same time, the brightness of the video is enhanced according to the enhancement strategy parameters and key frame image information, and finally the collected results are compressed, transmitted and played.

步骤1：开启图像增强视频模式中的亮度增强模式。Step 1: Turn on Brightness Boost Mode in Image Enhancement Video Mode.

在本实施例中，步骤1的开启方式采用实施例一中类似的开启方式，区别在于开启功能的指令描述内容，例如，语音开启中的指令为“启动增强视频亮度”，按键开启的指令为长按End键，虚拟按键中增强视频亮度的按钮，手持开启指令为抖动终端等，在此不再赘述。In this embodiment, the opening method of step 1 adopts the similar opening method in the first embodiment, and the difference lies in the description content of the instruction to enable the function. Long press the End key, the button for enhancing the video brightness in the virtual key, the hand-held open command is the shaking terminal, etc., which will not be repeated here.

步骤2：使用双目摄像头的一个摄像头拍摄图像信息，另一个摄像头拍摄视频信息；Step 2: Use one camera of the binocular camera to capture image information, and the other camera to capture video information;

在本实施例中，设置主、辅摄像头的方式可以采用实施例一步骤2.1中类似的设置主、辅摄像头的方式，在此不再赘述。In this embodiment, the method for setting the main and auxiliary cameras can be similar to the method for setting the main and auxiliary cameras in step 2.1 of the first embodiment, which will not be repeated here.

步骤2.2，设置摄像头参数和增强策略参数。Step 2.2, set camera parameters and enhancement strategy parameters.

步骤2.3，选取关键帧。Step 2.3, select keyframes.

步骤3，对视频进行亮度增强。Step 3, enhance the brightness of the video.

亮度增强在Lab颜色模型的L通道，或者HSV颜色模型的V通道中执行，首先将图像或视频进行颜色空间转换，然后提取L或者V通道分量进行独立增强。以关键帧为界限将视频划分为一个个视频片段，用视频片段两侧的关键帧图像对该视频片段进行增强。增强方式包括而不限于以下两种：一种基于多视图重建的方法，一种基于机器学习的方法。Luminance enhancement is performed in the L channel of the Lab color model, or the V channel of the HSV color model. The image or video is first converted to color space, and then the L or V channel components are extracted for independent enhancement. The video is divided into video clips with the key frame as the boundary, and the video clip is enhanced with the key frame images on both sides of the video clip. The enhancement methods include but are not limited to the following two methods: a method based on multi-view reconstruction and a method based on machine learning.

关于基于多视图重建的亮度增强，以多视图重建方法建立视频像素和图像像素的匹配关系，用图像像素亮度来替换生成视频像素亮度。Regarding the brightness enhancement based on multi-view reconstruction, the multi-view reconstruction method is used to establish the matching relationship between video pixels and image pixels, and the image pixel brightness is used to replace the generated video pixel brightness.

关于基于机器学习的亮度增强，在关键帧处以机器学习方式建立视频像素亮度和图像像素亮度的映射模型，在非关键帧处以该映射模型来转换视频亮度。Regarding the brightness enhancement based on machine learning, a mapping model of video pixel brightness and image pixel brightness is established by machine learning at key frames, and the video brightness is converted at non-key frames with this mapping model.

步骤4，视频的存储。Step 4, the storage of the video.

在本实施例中终端设备生成四类数据：采集的原始视频信息、关键帧图像信息、亮度增强模型和增强后的视频信息。针对不同的数据类型，存储方式包括而不限于以下四种。In this embodiment, the terminal device generates four types of data: collected original video information, key frame image information, brightness enhancement model and enhanced video information. For different data types, storage methods include but are not limited to the following four.

本实施例中的四种存储方式和实施例二中步骤4介绍的四种存储方式相同，在此不再赘述。The four storage modes in this embodiment are the same as the four storage modes introduced in step 4 in the second embodiment, and will not be repeated here.

步骤5，视频的播放。Step 5, video playback.

在本实施例中，视频的解压播放的方式可以采用实施例二步骤5中相同的视频播放的方式，在此不再赘述。In this embodiment, the way of decompressing and playing the video can be the same as the way of playing the video in step 5 of the second embodiment, which will not be repeated here.

实施例四、增强处理模式为图像增强视频模式中的去噪增强模式Embodiment 4. The enhancement processing mode is the denoising enhancement mode in the image enhancement video mode

从视频去噪和去模糊方面来说，由于视频每帧曝光时间很短，视频噪声较图像噪声更多。现有技术大都采用图像去噪的方法对视频帧进行降噪处理，最终实现视频去噪的目的。假定噪声类型为高斯噪声，通过一些滤波方法对噪声进行去除，也有使用字典学习的方法对噪声进行去除。导致视频模糊可能有以下几个因素：1)手持手机拍摄视频时，由于手的抖动会导致画面模糊； 2)降噪处理使得一些区域变得模糊；3)聚焦不准造成的散焦模糊等。现有技术大都是对模糊图像的模糊核进行估计，利用得到的模糊核对模糊图像进行反卷积得到清晰图像；如图3所示。In terms of video denoising and deblurring, the video noise is more than the image noise due to the short exposure time of each frame of the video. Most of the existing technologies use an image denoising method to perform noise reduction processing on video frames, and finally achieve the purpose of video denoising. Assuming that the noise type is Gaussian noise, some filtering methods are used to remove the noise, and there are also dictionary learning methods to remove the noise. The following factors may cause video blur: 1) When shooting video with a mobile phone, the hand shake will cause the picture to be blurred; 2) Noise reduction processing makes some areas blurred; 3) Defocus blur caused by inaccurate focusing, etc. . Most of the existing technologies estimate the blur kernel of the blurred image, and use the obtained blur kernel to deconvolve the blurred image to obtain a clear image; as shown in FIG. 3 .

同样的拍摄环境下，相比于图像信息，视频帧视频画面噪声比较明显，本实施例通过具有相似内容的高质量关键帧图像信息来对低质量视频片段进行去噪来提高视频质量。具体实施步骤如下：In the same shooting environment, compared with image information, video frame video image noise is more obvious. In this embodiment, low-quality video clips are denoised by high-quality key frame image information with similar content to improve video quality. The specific implementation steps are as follows:

步骤1：开启图像增强视频模式中的去噪增强模式。Step 1: Turn on the denoising enhancement mode in the image enhancement video mode.

在本实施例中，步骤1采用与实施例一中类似的开启方式，区别在于一些指令描述和阈值设置的不同，具体区别描述如下。In this embodiment, step 1 adopts a similar opening method as in the first embodiment, the difference lies in the difference in some instruction descriptions and threshold settings, and the specific differences are described as follows.

在用户开启视频去噪模式中使用不同的指令描述，例如，语音开启中的指令为“启动视频去噪”，按键开启的指令为长按Home键，虚拟按键中视频去噪的按钮，手持开启指令为摇晃终端等，在此不再赘述。Different command descriptions are used when the user turns on the video denoising mode. For example, the command in the voice activation is "start video denoising", the key activation command is long press the Home button, and the video denoising button in the virtual key is hand-held to open. The command is to shake the terminal, etc., which will not be repeated here.

在终端设备根据设备相关状态和模式开启历史记录数据自适应开启模式中使用不同的阈值设置，例如，电量阈值，次数阈值等。其他描述方式相同，在此不再赘述。Different threshold settings are used in the terminal device to adaptively turn on the mode according to the device-related state and mode turn-on history data, for example, the power threshold, the number of times threshold, and so on. The other description methods are the same, and are not repeated here.

在终端设备自适应模式匹配方式自适应开启视频去噪增强模式中给出三种方法，一种方法是终端设备利用已有方法对拍摄到的环境进行检测，如果拍摄环境被检测为低光照环境，例如夜景拍摄，则开启去噪增强模式；第二种方法是终端设备对相机拍摄参数进行检测，如果感光度高于某一阈值，则开启去噪增强模式；第三种可以是前两种方法的组合，两个条件同时满足则开启去噪增强模式，即拍摄环境中光照强度低于某一阈值，且感光度高于某一阈值则开启去噪增强模式。Three methods are given in the adaptive mode matching mode of the terminal device to adaptively enable the video denoising enhancement mode. One method is that the terminal device uses the existing method to detect the captured environment. If the shooting environment is detected as a low-light environment , such as night scene shooting, turn on the denoising enhancement mode; the second method is that the terminal device detects the camera shooting parameters, and if the sensitivity is higher than a certain threshold, the denoising enhancement mode is turned on; the third method can be the first two The combination of methods, the denoising enhancement mode is turned on when the two conditions are satisfied at the same time, that is, the light intensity in the shooting environment is lower than a certain threshold, and the sensitivity is higher than a certain threshold, the denoising enhancement mode is turned on.

在终端设备根据实时采集内容自适应开启视频去噪模式中计算拍摄帧的信噪比，信噪比低于某一阈值则开启视频去噪模式。The terminal device automatically turns on the video denoising mode according to the real-time captured content to calculate the signal-to-noise ratio of the captured frame. If the signal-to-noise ratio is lower than a certain threshold, the video denoising mode is turned on.

该步骤主要包括采集参数及增强策略参数的设置，主、辅摄像头的设置，采集图像和视频的参数设置，以及关键帧图像信息的选取。This step mainly includes the setting of acquisition parameters and enhancement strategy parameters, the settings of the main and auxiliary cameras, the parameter settings of the acquired images and videos, and the selection of key frame image information.

在本实施例中，设置主、辅摄像头的方式可以采用实施例1步骤2.1中类似的设置主、辅摄像头的方式，在此不再赘述。In this embodiment, the method for setting the main and auxiliary cameras may be similar to the method for setting the main and auxiliary cameras in step 2.1 of Embodiment 1, which will not be repeated here.

在本实施例中，设置摄像头的采集参数和增强策略参数可以采用实施例一步骤2.2中类似的设置摄像头的采集参数和增强策略参数方式，在此不再赘述。In this embodiment, the setting of the camera's acquisition parameters and the enhancement strategy parameters may be similar to the method of setting the camera's acquisition parameters and the enhancement strategy parameters in step 2.2 of the first embodiment, which will not be repeated here.

除了前述的参数设置方式，本实施例针对去噪模式提出了新的参数设置方法。在本实施例中是利用双目摄像头中一个采集图像信息，一个采集视频信息，对视频信息进行去噪。在此主要是对图像采集的分辨率，曝光时间，感光度进行设置，出于节能以及算法设计两方面的考虑，辅摄像头采集图像信息的分辨率应与视频信息的分辨率设置保持一致，如果图像信息的最低分辨率比当前视频信息的分辨率高，则采用图像信息的最低分辨率进行采集。如果在电量允许的前提下，曝光时间可以根据终端设备运动状态进行调整。如果终端设备内部传感器检测到终端设备处于平稳的拍摄状态，则提高曝光时间，例如取曝光时间范围中的最大值，提高图像信息和视频信息亮度，降低噪声干扰；如果传感器检测到终端设备存在抖动或者其他运动趋势，则适当减少曝光时间，避免图像信息出现模糊，影响视频信息去噪效果。In addition to the aforementioned parameter setting methods, this embodiment proposes a new parameter setting method for the denoising mode. In this embodiment, one of the binocular cameras is used to collect image information and the other to collect video information, and the video information is denoised. The resolution, exposure time, and sensitivity of image acquisition are mainly set here. For the consideration of energy saving and algorithm design, the resolution of the image information collected by the auxiliary camera should be consistent with the resolution setting of the video information. If the minimum resolution of the image information is higher than that of the current video information, the minimum resolution of the image information is used for collection. If the power allows, the exposure time can be adjusted according to the movement state of the terminal device. If the internal sensor of the terminal device detects that the terminal device is in a stable shooting state, increase the exposure time, for example, take the maximum value in the exposure time range, increase the brightness of image information and video information, and reduce noise interference; if the sensor detects that the terminal device is shaking Or other motion trends, appropriately reduce the exposure time to avoid blurring of image information and affect the denoising effect of video information.

步骤2.3，选取关键帧图像信息。Step 2.3, select key frame image information.

在本实施例中，关键帧选取方式可以采用实施例一步骤2.3中类似的关键帧选取方式，在此不再赘述。In this embodiment, the key frame selection method may adopt the key frame selection method similar to that in step 2.3 of the first embodiment, which will not be repeated here.

针对去噪模式，本实施给出新的关键帧图像信息选取方式。如果光照强度越低，则关键帧采集频率提高，反之光照强度越高，则降低关键帧采集频率；利用终端设备自身传感器检测终端设备运动状态，如果运动幅度大于某一阈值，则提高采集频率，以保证得到足够的高质量关键帧图像。如果关键帧采集时终端设备出现运动，则在运动结束时采集另一张关键帧图像信息作为前一帧关键帧图像信息的备选，当前一帧关键帧图像信息出现模糊时，则可以利用备选关键帧图像信息对视频片段进行去噪，以保证去噪效果。For the denoising mode, this implementation provides a new way of selecting key frame image information. If the light intensity is lower, the collection frequency of key frames will be increased. On the contrary, the higher the light intensity will be, the collection frequency of key frames will be reduced. The terminal equipment's own sensor is used to detect the motion state of the terminal equipment. If the motion amplitude is greater than a certain threshold, the collection frequency will be increased. In order to ensure that enough high-quality key frame images are obtained. If the terminal device moves when the key frame is collected, another key frame image information is collected at the end of the movement as an alternative to the previous frame key frame image information. Select the key frame image information to denoise the video clip to ensure the denoising effect.

步骤3：使用关键帧序列去除视频的噪声。Step 3: Use a sequence of keyframes to remove noise from the video.

当需要增强的指标包括噪声时，增强处理的方式包括基于字典重构的方式，和/或基于深度学习的方式。When the index to be enhanced includes noise, the enhancement processing method includes a method based on dictionary reconstruction, and/or a method based on deep learning.

在原始视频信息中检测到待处理的模糊帧时，确定对待处理的模糊帧进行去模糊增强处理；When the fuzzy frame to be processed is detected in the original video information, it is determined that the fuzzy frame to be processed is deblurred and enhanced;

多媒体采集设备的运动状态；在原始视频信息中对焦失败的帧；通过分类器对原始视频信息进行分类的分类结果。The motion state of the multimedia acquisition device; the frames that failed to focus in the original video information; the classification result of classifying the original video information by the classifier.

具体地，每一个高质量关键帧对应一个视频子序列，假设视频关键帧1，其对应的视频子序列中包含60帧，利用相邻的1帧或者多帧关键帧图像对视频进行去噪，例如用关键帧1对这60帧图像进行去噪，去噪的方式包括但不限于以下方式之一：一种是基于字典重构的方法，如NLM(non local means) 算法；一种是基于深度学习的方法，如卷积神经网络CNN(convolutional network)。Specifically, each high-quality key frame corresponds to a video subsequence. Suppose video key frame 1 contains 60 frames in its corresponding video subsequence, and the video is denoised by using adjacent one or more key frame images. For example, use key frame 1 to denoise these 60 frames of images, and the denoising methods include but are not limited to one of the following methods: one is a method based on dictionary reconstruction, such as NLM (non local means) algorithm; Deep learning methods, such as convolutional neural network CNN (convolutional network).

与传统去噪方法相比，本实施例使用一个参考图像，即关键帧图像，对噪声视频进行去噪，针对这一个特性，对现有方法进行了改进。下面对上述两种方法进行详细介绍。Compared with the traditional denoising method, this embodiment uses a reference image, that is, a key frame image, to denoise the noisy video. For this characteristic, the existing method is improved. The above two methods are described in detail below.

1)基于字典重构的方法，即改进的NLM1) Method based on dictionary reconstruction, i.e. improved NLM

如果输入的关键帧图像与视频帧图像的分辨率不一致，首先将关键帧图像与视频的尺度进行统一，将视频进行放缩或者关键帧图像进行放缩，或者同时对视频和关键帧图像进行放缩，使得关键帧图像和视频具有相同的尺寸。利用立体匹配的方法计算关键帧图像与视频片段中每帧的视差，对视频帧中的像素和关键帧图像的像素进行对齐，即图像间一致性像素的位置关系。如果输入的关键帧图像和视频帧图像的分辨率一致，则直接利用立体匹配的方法计算视差进行图像对齐。在得到关键帧图像和视频帧图像像素间的位置关系后，在图像间查找相似块。对于每个视频帧，以该帧图像中一个像素点为中心取一个大小为a*a的像素块p，其中a可以预先设定，也可以根据图像大小或者其他因素进行自适应调整。基于得到的视频帧与关键帧图像一致性像素间的位置关系，找到视频帧中该像素在关键帧图像中的位置，以该位置为中心，取b*b大小的邻域，其中b可以预先设定，也可以根据图像大小等其他因素进行自适应调整。以该邻域块中的每个像素为中心取一个大小为a*a 像素块q，一共可以得到b*b个像素块，计算像素块p与b*b个像素块间的距离，距离计算公式可以采用但不限于欧式距离平方和，即块间相同位置的像素值差的平方和为，如a取值8，像素块距离即为低质量像素块中64个像素与关键帧像素块中64个像素值差的平方和。如果距离小于某一阈值则认为两个像素块相似。找到的所有小于该阈值的相似块，标记为集合Ω。然后需要计算相似块的权重，根据这些相似块的距离计算每块的权重w，距离越大权重越小，反之则越大。权重可采用但不限于高斯函数来计算。最后，利用这些相似的像素块对高噪声的像素块进行重建，即对相似块进行加权平均，在此也可以对原始像素块分配一定权重，在重构中占有一定比例来保证重构后的像素块与原始像素块的一致性。对视频帧中的每个像素都按此方法进行基于块重构的方法进行重新估计来达到对每张视频帧的去噪，对视频片段中的每帧去噪来对整个视频片段进行去噪。If the resolution of the input key frame image is inconsistent with the video frame image, first unify the scale of the key frame image and the video, zoom the video or the key frame image, or zoom the video and the key frame image at the same time. Scale down so that the keyframe image and the video have the same dimensions. Using the method of stereo matching to calculate the disparity between the key frame image and each frame in the video clip, and align the pixels in the video frame with the pixels of the key frame image, that is, the positional relationship of the consistent pixels between the images. If the input key frame image and the video frame image have the same resolution, the stereo matching method is directly used to calculate the disparity for image alignment. After obtaining the positional relationship between the pixels of the key frame image and the video frame image, similar blocks are found between the images. For each video frame, take a pixel point in the frame image as the center to take a pixel block p of size a*a, where a can be preset, or can be adaptively adjusted according to the image size or other factors. Based on the obtained positional relationship between the video frame and the consistent pixel of the key frame image, find the position of the pixel in the video frame in the key frame image, take the position as the center, and take the neighborhood of size b*b, where b can be pre- It can also be adaptively adjusted according to other factors such as image size. Taking each pixel in the neighborhood block as the center, take a pixel block q of size a*a, a total of b*b pixel blocks can be obtained, and the distance between the pixel block p and the b*b pixel blocks can be calculated. The formula can be used but is not limited to the square sum of Euclidean distances, that is, the square sum of the pixel value differences at the same position between blocks is, if a is 8, the pixel block distance is the 64 pixels in the low-quality pixel block and the key frame pixel block. The sum of the squares of the 64 pixel value differences. Two pixel blocks are considered similar if the distance is less than a certain threshold. All similar blocks found that are smaller than this threshold are marked as the set Ω. Then the weight of similar blocks needs to be calculated, and the weight w of each block is calculated according to the distance of these similar blocks. The larger the distance, the smaller the weight, and vice versa. The weights can be calculated using, but not limited to, a Gaussian function. Finally, use these similar pixel blocks to reconstruct high-noise pixel blocks, that is, weighted average of similar blocks. Here, a certain weight can also be assigned to the original pixel block, which occupies a certain proportion in the reconstruction to ensure the reconstructed image. The consistency of the pixel block with the original pixel block. Each pixel in the video frame is re-estimated by the block-based reconstruction method to achieve denoising of each video frame, and each frame in the video clip is denoised to denoise the entire video clip. .

由于关键帧图像与视频帧存在视差，有些视频帧的像素在关键帧中找不到对应的像素，对于这些像素也就没有办法在高质量的图像中查找相似的像素块进行重构。对于这些像素点，可以采用最初的NLM方法在自身所在的视频帧中查找相似像素块重构进行去噪处理。同时为了节省处理时间，可以考虑视频帧间的关系，如果视频片段中帧间内容变化不大，则可以同时对多个视频帧中像素进行重构，即不同帧中相同的像素点可以利用统一的相似像素块进行重构，达到对多个像素点同时去噪的效果。Due to the parallax between the key frame image and the video frame, the pixels of some video frames cannot find corresponding pixels in the key frame, and there is no way to find similar pixel blocks in high-quality images for reconstruction. For these pixels, the original NLM method can be used to find similar pixel blocks in the video frame where they are located and reconstruct them for denoising. At the same time, in order to save processing time, the relationship between the video frames can be considered. If the content of the video clips does not change much between the frames, the pixels in multiple video frames can be reconstructed at the same time, that is, the same pixels in different frames can be used. The similar pixel blocks are reconstructed to achieve the effect of denoising multiple pixels at the same time.

2)基于深度学习的方法2) Methods based on deep learning

如果输入的关键帧图像与视频帧图像的分辨率不一致，与第一种方法中采取一样的步骤对关键帧图像与视频的尺度进行统一。利用立体匹配的方法计算关键帧图像与视频片段中每帧的视差，对视频帧中的像素和关键帧图像的像素进行对齐。根据对齐结果，对关键帧图像和视频帧进行裁剪，得到一对具有相同尺寸，相同内容的图像对，一张从关键帧图像中裁剪得到的无噪声图像，另一张从视频帧图像中裁剪得到的噪声图像，两张图像同一位置中的像素点内容一致，但是一个是来自高质量的关键帧图像，称为无噪声的像素点，一个是来自低质量的视频帧图像，称为噪声像素点。基于多对图像，取对应的像素块，可以得到很多对像素块。基于这些像素块，利用现有的基于深度学习的方法，训练得到一个具有去噪功能的卷积网，对视频帧中的像素块进行去噪。该模型可以线下采集大量样本训练得到一个初始模型预装在终端设备中，然后再利用得到的图像对，对该模型进行修正，来对视频帧进行去噪；也可以在线训练一个模型对视频帧进行去噪。If the input key frame image and the video frame image have different resolutions, take the same steps as in the first method to unify the key frame image and video scale. Using the method of stereo matching, the disparity between the key frame image and each frame in the video clip is calculated, and the pixels in the video frame and the pixels in the key frame image are aligned. According to the alignment results, the key frame image and the video frame are cropped to obtain a pair of image pairs with the same size and the same content, one is a noise-free image cropped from the key frame image, and the other is cropped from the video frame image. In the obtained noise image, the content of the pixels in the same position of the two images is the same, but one is from a high-quality key frame image, called noise-free pixels, and the other is from a low-quality video frame image, called noise pixels point. Based on multiple pairs of images, many pairs of pixel blocks can be obtained by taking corresponding pixel blocks. Based on these pixel blocks, using existing deep learning-based methods, a convolutional network with denoising function is trained to denoise the pixel blocks in the video frame. The model can collect a large number of samples offline to train to obtain an initial model pre-installed in the terminal device, and then use the obtained image pairs to modify the model to denoise the video frame; it can also train a model online to denoise the video frame. frame for denoising.

步骤4：视频的存储。Step 4: Storage of the video.

在本实施例中终端设备生成四类数据：采集的原始视频信息、关键帧图像信息、去噪增强模型和增强后的视频信息。针对不同的数据类型，存储方式包括而不限于以下四种。In this embodiment, the terminal device generates four types of data: collected original video information, key frame image information, denoising enhancement model and enhanced video information. For different data types, storage methods include but are not limited to the following four.

设置存储方式的方法采用与实施例二中步骤4相同的设置方法，在此不再赘述。The method for setting the storage mode adopts the same setting method as step 4 in the second embodiment, and will not be repeated here.

步骤5，视频的播放。Step 5, video playback.

在对双目摄像头采集的数据进行压缩、存储和传输后，在播放阶段中解压和浏览高质量视频。针对不同的存储方式，其播放方式包括而不限于实施例二中步骤5介绍的五种中的一种，五种播放模式与实施二步骤5中描述相同，在此不再赘述。After the data captured by the binocular camera is compressed, stored and transmitted, the high-quality video is decompressed and browsed during the playback stage. For different storage modes, its playback mode includes but is not limited to one of the five kinds of steps 5 introduced in the second embodiment, and the five kinds of playback modes are the same as described in the second step 5 of the implementation, and will not be repeated here.

实施例五、增强处理模式为图像增强视频模式中的去模糊增强模式Embodiment 5. The enhancement processing mode is the deblurring enhancement mode in the image enhancement video mode

在拍摄视频时，多种因素会导致画面模糊，主要有：1)手持终端设备拍摄中手的抖动，2)运动物体导致的画面模糊，3)对焦失败导致的目标区域模糊。本实施例通过双目摄像头中的一个摄像头采集图像，另一个摄像头采集视频，利用图像对视频进行去模糊，得到一个高质量的视频。具体步骤如下：When shooting video, a variety of factors can lead to blurred images, mainly including: 1) hand shake during shooting with handheld terminal devices, 2) blurred images caused by moving objects, and 3) blurred target areas caused by failure to focus. In this embodiment, one camera of the binocular cameras is used to collect images, the other camera is used to collect videos, and the images are used to deblur the videos to obtain a high-quality video. Specific steps are as follows:

步骤1：终端设备确定需要使用双目摄像头实现图像辅助视频去模糊，开启像增强视频模式中的去模糊增强模式。Step 1: The terminal device determines that the binocular camera needs to be used to achieve image-assisted video deblurring, and enables the deblurring enhancement mode in the image enhancement video mode.

在本实施例中，步骤1采用与实施例一中类似的开启方式，区别在于一些指令描述和阈值设置的不同，具体区别描述如下：In the present embodiment, step 1 adopts the opening mode similar to that in the first embodiment, the difference lies in the difference between some instruction descriptions and threshold settings, and the specific differences are described as follows:

在用户开启去模糊增强模式中使用不同的指令描述，例如，语音开启中的指令为“启动视频去模糊”，按键开启的指令为长按Home键，虚拟按键中视频去模糊的按钮，手持开启指令为摇晃终端等，在此不再赘述。Different command descriptions are used when the user turns on the enhanced deblurring mode. For example, the command in voice activation is "start video deblurring", the command in button activation is long press the Home button, and the button for video deblurring in virtual keys is hand-held to activate. The command is to shake the terminal, etc., which will not be repeated here.

在终端设备根据采集环境(即拍摄环境)和参数自适应开启视频去模糊增强模式中给出三种方法，一种方法是终端设备利用已有方法对拍照终端运动趋势进行检测，如果拍摄终端处于运动状态，如手持终端手的抖动引起的拍摄终端的抖动，则开启视频去模糊增强模式；第二种方法是终端设备对拍摄参数进行检测，如果视频采集曝光时间高于某一阈值，例如，高于300ms,则开启视频去模糊增强模式；第三种可以是前两种方法的组合，两个条件同时满足则开启视频去模糊增强模式，即拍摄终端处于运动状态，且曝光时间高于某一阈值则开启视频去模糊增强模式。Three methods are given in the terminal device adaptively enabling the video deblurring enhancement mode according to the acquisition environment (ie the shooting environment) and parameters. One method is that the terminal device uses the existing method to detect the movement trend of the camera terminal. The motion state, such as the shaking of the shooting terminal caused by the shaking of the hand of the handheld terminal, enables the video deblurring enhancement mode; the second method is that the terminal device detects the shooting parameters, if the video capture exposure time is higher than a certain threshold, for example, Above 300ms, the video deblurring enhancement mode is turned on; the third method can be a combination of the first two methods, and the video deblurring enhancement mode is turned on if the two conditions are satisfied at the same time, that is, the shooting terminal is in motion, and the exposure time is higher than a certain A threshold value enables video deblurring enhancement mode.

在终端设备根据实时采集的相关内容自适应开启视频去模糊增强模式中计算拍摄帧的模糊度指标，低于某一阈值则开启视频去模糊增强模式。The terminal device automatically turns on the video deblurring enhancement mode according to the relevant content collected in real time, and calculates the blurriness index of the shooting frame. If it is lower than a certain threshold, the video deblurring enhancement mode is turned on.

除了前述参数设置方式，本实施列针对去模糊模式提出了新的设置方法。在本实施例中是利用双目摄像头中的一个摄像头采集图像，一个摄像头采集视频，对视频进行去模糊。在此主要是对图像采集的分辨率，曝光时间，感光度进行设置，出于节能以及算法设计两方面的考虑，辅摄像头采集图像的分辨率应与视频的分辨率设置保持一致，如果图像的最低分辨率比当前视频的分辨率高，则采用图像的最低分辨率进行采集。如果拍照环境亮度正常，或者高于某一阈值，则缩短视频和图像曝光时间，适当提高感光度，降低出现画面模糊的概率；如果终端设备传感器检测到终端设备存在抖动或者其他运动趋势，则适当减少曝光时间，避免图像出现模糊，影响视频去模糊效果。In addition to the aforementioned parameter setting methods, this embodiment proposes a new setting method for the deblurring mode. In this embodiment, one camera in the binocular camera is used to collect images, and one camera is used to collect video, and the video is deblurred. Here, the resolution, exposure time, and sensitivity of image capture are mainly set. For the consideration of energy saving and algorithm design, the resolution of the image captured by the auxiliary camera should be consistent with the resolution setting of the video. If the minimum resolution is higher than that of the current video, the minimum resolution of the image is used for acquisition. If the brightness of the photographing environment is normal or higher than a certain threshold, shorten the exposure time of the video and image, increase the sensitivity appropriately, and reduce the probability of picture blurring; Reduce exposure time to avoid blurry images and affect video deblurring.

在本实施例中，关键帧图像信息选取方式可以采用实施例一步骤2.3中类似的关键帧图像信息选取方式，在此不再赘述。In this embodiment, the key frame image information selection method can adopt the similar key frame image information selection method in step 2.3 of the first embodiment, and will not be repeated here.

针对去模糊模式，本实施给出新的关键帧选取方式。如果曝光时间变长，视频出现模糊的概率变高，则关键帧采集频率提高，反之曝光时间变短，则降低关键帧采集频率；利用终端设备自身传感器检测终端设备运动状态，如果关键帧采集时终端设备出现运动，则在运动结束时采集另一张关键帧作为前一帧关键帧的备选，当前一帧关键帧出现模糊时，则可以利用备选关键帧对视频片段进行去模糊，保证了去模糊的效果。For the deblurring mode, this implementation provides a new key frame selection method. If the exposure time becomes longer and the probability of video blurring becomes higher, the key frame collection frequency will increase; otherwise, if the exposure time becomes shorter, the key frame collection frequency will be reduced. When the terminal device moves, another key frame is collected at the end of the motion as an alternative to the previous key frame. When the previous key frame is blurred, the alternative key frame can be used to deblur the video clip to ensure that the deblurring effect.

步骤3：使用关键帧序列去除视频的模糊。Step 3: Deblur the video using a sequence of keyframes.

当需要增强的指标包括模糊时，增强处理的方式包括基于模糊核估计的方式，和/或基于深度学习的方式。When the index to be enhanced includes blur, the enhancement processing method includes a method based on blur kernel estimation, and/or a method based on deep learning.

具体地，首先需要判断哪些视频帧画面出现模糊，可以采用但不限于以下三种方案中的一种：Specifically, it is necessary to first judge which video frames appear blurry, which can be adopted but not limited to one of the following three schemes:

一，根据终端设备内置传感器监测终端设备自身的运动状态，如果运动幅度大于某一阈值则判定在此时间段拍摄的视频帧为待处理的模糊帧；二，检测对焦失败的帧，如果用户指定了焦点区域，拍摄中间焦点发生移动，即焦点没有对准目标区域，则认为该目标区域为待处理的模糊区域；三，通过机器学习的方法，利用大量的模糊图像与清晰图像训练得到一个分类器，用得到的分类器对视频帧进行分类，被归类到模糊图像的视频帧就是待处理的模糊帧。First, monitor the motion state of the terminal device itself according to the built-in sensor of the terminal device. If the motion amplitude is greater than a certain threshold, it is determined that the video frame captured in this time period is the blurred frame to be processed; Second, the frame that fails to focus is detected. If the focus area is detected, the focus moves in the middle of the shooting, that is, the focus is not on the target area, then the target area is considered to be the blurred area to be processed; third, through the method of machine learning, a large number of blurred images and clear images are used to train to obtain a classification The obtained classifier is used to classify the video frame, and the video frame classified into the blurred image is the blurred frame to be processed.

如果检测到模糊帧，则对这些模糊帧进行去模糊处理。首先找到与这些视频帧内容相似的且清晰的关键帧图像，去模糊的方式包括但不限于以下两种。一种是基于模糊核估计的方法，一种是基于深度学习的方法。If blurred frames are detected, these blurred frames are deblurred. First, find clear key frame images that are similar to the content of these video frames, and the deblurring methods include but are not limited to the following two. One is a method based on fuzzy kernel estimation, and the other is a method based on deep learning.

与传统去模糊方法相比，本实施例使用一个参考图像，即关键帧图像信息，对模糊的视频信息进行去模糊，针对这一个特性，对现有方法进行了改进。下面对上述两种方法进行详细介绍。Compared with the traditional deblurring method, the present embodiment uses a reference image, that is, key frame image information, to deblur the blurred video information. For this characteristic, the existing method is improved. The above two methods are described in detail below.

1)基于模糊核估计1) Based on fuzzy kernel estimation

如果输入的关键帧图像与视频帧图像的分辨率不一致，与去噪方法中采取一样的步骤对关键帧图像与视频的尺度进行统一。利用立体匹配的方法计算关键帧图像与视频片段中每帧的视差，对视频帧中的像素和关键帧图像的像素进行对齐。根据对齐结果，对关键帧图像和模糊的视频帧进行裁剪，得到一对具有相同尺寸，相同内容的图像对，一张从关键帧图像中裁剪得到的干净图像，另一张从视频帧图像中裁剪得到的模糊图像。可以裁剪一张大尺寸的图像对，可以裁剪多对小尺度的图像对。利用最小二乘优化方法或者其他优化方法来对每队图像来估计模糊核，如果多对图像则可以估计得到多个模糊核。如果得到一个模糊核，利用该模糊核对该帧图像进行去模糊；如果得到多个模糊核，可以多个模糊核加权平均得到一个平均的模糊核，利用该模糊核对该帧图像进行去模糊。If the resolution of the input key frame image is inconsistent with the video frame image, the same steps are taken as in the denoising method to unify the scale of the key frame image and the video. Using the method of stereo matching, the disparity between the key frame image and each frame in the video clip is calculated, and the pixels in the video frame and the pixels in the key frame image are aligned. According to the alignment results, crop the key frame image and the blurred video frame, and get a pair of images with the same size and the same content, one clean image cropped from the key frame image, and the other from the video frame image. Cropped blurred image. One large-scale image pair can be cropped, and multiple pairs of small-scale image pairs can be cropped. The least squares optimization method or other optimization methods are used to estimate the blur kernel for each group of images. If there are multiple pairs of images, multiple blur kernels can be estimated. If a blur kernel is obtained, use the blur kernel to deblur the frame image; if multiple blur kernels are obtained, an average blur kernel can be obtained by weighted averaging of the multiple blur kernels, and the frame image is deblurred by the blur kernel.

考虑到视频帧的连续性，如果连续的几帧图像都需要去模糊处理，本实施例给出两种处理方式。一种是分别对每张图像进行上述去模糊操作；一种是挑选几帧不连续的图像来估计模糊核，即每两帧图像之间隔了几帧模糊图像，如果挑选出的最近得两帧估计得到的模糊核相似则可以认为这两帧之间的图像也是有相似的模糊核导致的，则可以使用同样的模糊核进行去模糊。如果两帧的模糊核相似性不高，则需挑选之间的帧中的一帧或者几帧重新计算模糊核进行去模糊。为了尽量减少计算时间，尽可能减少计算核的次数，可以采用二分的方法，但不限于该方法，选取关键帧来计算模糊核，如果二分区间两头的模糊核相似，则可以同时将区间中的视频帧用统一的模糊核进行去模糊，如果二分区间两头的模糊核不相似，则继续将区间进行二分，依此进行下去，直到对所有视频帧完成去模糊处理。Considering the continuity of video frames, if several consecutive frames of images need to be deblurred, this embodiment provides two processing methods. One is to perform the above deblurring operation on each image separately; the other is to select several frames of discontinuous images to estimate the blur kernel, that is, there are several frames of blurred images between every two frames of images, if the two most recent frames are selected. If the estimated blur kernels are similar, it can be considered that the images between the two frames are also caused by similar blur kernels, and the same blur kernel can be used for deblurring. If the similarity of the blur kernels of the two frames is not high, it is necessary to select one or several frames between the frames to recalculate the blur kernel for deblurring. In order to reduce the calculation time and the number of times of calculating the kernel as much as possible, the binary method can be used, but it is not limited to this method. Select key frames to calculate the blur kernel. If the blur kernels at both ends of the binary interval are similar, you can simultaneously divide the The video frames are deblurred with a unified blur kernel. If the blur kernels at both ends of the two-part interval are not similar, the interval will continue to be divided into two, and so on until all video frames are deblurred.

2)基于深度学习的方法2) Methods based on deep learning

预处理方法跟第一种方法中采取一样的步骤，将关键帧图像与视频的尺度进行统一。利用立体匹配的方法对视频帧中的像素和关键帧图像的像素进行对齐。根据对齐结果，对关键帧图像和视频帧进行裁剪，得到一对具有相同尺寸，相同内容的图像对，一张从关键帧图像中裁剪得到的干净图像，另一张从视频帧图像中裁剪得到的噪声图像。基于多对图像，或者将多对图像中提取对应的像素块，可以得到很多对像素块。基于这些像素块对或者图像对，利用现有的基于深度学习的方法，训练得到一个具有去模糊功能的卷积网，对视频帧进行去模糊。该模型可以线下采集大量样本训练得到一个初始模型预装在终端设备中，然后再利用得到的图像对，对该模型进行修正，来对视频帧进行去模糊；也可以在线训练一个模型对视频帧进行去模糊。The preprocessing method takes the same steps as in the first method to unify the scale of the keyframe image with the video. The pixels in the video frame and the pixels of the key frame image are aligned by the method of stereo matching. According to the alignment results, the key frame image and the video frame are cropped to obtain a pair of image pairs with the same size and the same content, one clean image cropped from the key frame image, and the other cropped from the video frame image. noise image. Based on multiple pairs of images, or by extracting corresponding pixel blocks from multiple pairs of images, many pairs of pixel blocks can be obtained. Based on these pixel block pairs or image pairs, using existing deep learning-based methods, a convolutional network with deblurring function is trained to deblur the video frame. The model can collect a large number of samples offline to train to obtain an initial model pre-installed in the terminal device, and then use the obtained image pair to modify the model to deblur the video frame; it can also train a model online to deblur the video frame. Frames are deblurred.

步骤4：视频的存储。Step 4: Storage of the video.

在本实施例中终端设备生成四类数据：采集的原始视频信息、关键帧图像信息、去模糊增强模型和增强后的视频信息。针对不同的数据类型，存储方式包括而不限于以下四种。In this embodiment, the terminal device generates four types of data: collected original video information, key frame image information, deblurred enhancement model, and enhanced video information. For different data types, storage methods include but are not limited to the following four.

步骤5，视频的播放。Step 5, video playback.

实施例六：多模式联合的图像增强视频模式(联合增强模式)Embodiment 6: Multi-mode combined image enhancement video mode (joint enhancement mode)

本方法还包括：确定在分辨率增强模式、颜色增强模式、亮度增强模式、去噪增强模式及去模糊增强模式中的至少两种增强模式的联合增强模式，并确定与至少两种增强模式相应的模式增强顺序。The method further includes: determining a joint enhancement mode of at least two enhancement modes among the resolution enhancement mode, the color enhancement mode, the luminance enhancement mode, the denoising enhancement mode and the deblurring enhancement mode, and determining the joint enhancement mode corresponding to the at least two enhancement modes The mode enhancement order of .

通过以下至少一项来确定联合增强模式及相应的模式增强顺序的方式包括以下至少一项：The manner of determining the joint enhancement mode and the corresponding mode enhancement order by at least one of the following includes at least one of the following:

系统默认设置；自适应模式设置方式；模式设置触发操作。System default setting; adaptive mode setting method; mode setting trigger operation.

其中，自适应模式设置方式通过设备相关状态、模式设置历史记录信息、多媒体采集设备实时采集的相关内容及各个增强模式之间的影响关系中的一项或多项信息来确定；Wherein, the adaptive mode setting mode is determined by one or more pieces of information in the relevant state of the device, the mode setting history record information, the relevant content collected in real time by the multimedia collection device, and the influence relationship between each enhancement mode;

其中，多媒体采集设备实时采集的相关内容包括场景亮度、语义内容中的至少一项。The relevant content collected in real time by the multimedia collection device includes at least one of scene brightness and semantic content.

模式设置触发操作通过与用户交互下的语音、按键、手势、通过外部控制器等来实现。The mode setting trigger operation is realized through voice, keystrokes, gestures, external controllers, etc. under the interaction with the user.

如前所述，图像增强视频模式包括分辨率、颜色、亮度、去噪和去模糊增强。本实施例用图像对视频的这五个模式进行联合增强，基本流程为：首先开启多模式联合的视频增强，其次设置主、辅摄像头分别采集图像信息和视频信息，并设置摄像头的采集参数、增强策略参数、选取关键帧图像信息和选取待增强的模式，同时根据增强策略参数、关键帧图像信息和增强模式对视频进行增强，最后将采集结果压缩、传输和播放。As mentioned earlier, image enhancement video modes include resolution, color, brightness, denoising, and deblur enhancement. In this embodiment, images are used to jointly enhance the five video modes. The basic process is as follows: firstly, enable multi-mode joint video enhancement; secondly, set the main and auxiliary cameras to collect image information and video information respectively, and set the camera's acquisition parameters, Enhance the strategy parameters, select key frame image information and select the mode to be enhanced, at the same time enhance the video according to the enhancement strategy parameters, key frame image information and enhancement mode, and finally compress, transmit and play the collected results.

步骤1：开启多模式联合的视频增强Step 1: Turn on multi-mode combined video enhancement

在本实施例中，开启方式采用实施例一步骤1中类似的开启方式，区别在于开启功能的指令描述内容，例如，语音开启中的指令为“启动视频多模式联合增强”，按键开启的指令为按F1键，虚拟按键为多模式联合增强的按钮，手持开启指令为屏幕画圆等，在此不再赘述。In this embodiment, the opening method is similar to that in step 1 of the first embodiment, and the difference lies in the description content of the instruction to enable the function. In order to press the F1 key, the virtual key is a multi-mode joint enhanced button, and the hand-held open command is to draw a circle on the screen, etc., which will not be repeated here.

步骤2：使用双目摄像头中的一个摄像头拍摄图像，另一个摄像头拍摄视频。Step 2: Use one of the binocular cameras to capture images and the other to capture video.

该步骤包括设置主、辅摄像头，设置摄像头的采集参数和增强策略参数，选取关键帧图像信息，以及选取待增强的模式和模式增强顺序。This step includes setting main and auxiliary cameras, setting acquisition parameters and enhancement strategy parameters of the cameras, selecting key frame image information, and selecting a mode to be enhanced and a mode enhancement sequence.

在本实施例中，设置主、辅摄像头的方式可以采用实施例一步骤2.1中类似的主、辅摄像头设置方式，在此不再赘述。In this embodiment, the method for setting the main and auxiliary cameras may adopt the similar setting methods for the main and auxiliary cameras in step 2.1 of the first embodiment, which will not be repeated here.

在本实施例中，摄像头的采集参数和增强策略参数的设置方式可以采用实施例一步骤2.2中类似的采集参数和增强策略参数的设置方式，在此不再赘述。In this embodiment, the setting methods of the acquisition parameters of the camera and the enhancement strategy parameters may adopt the similar setting methods of the acquisition parameters and the enhancement strategy parameters in step 2.2 of the first embodiment, which will not be repeated here.

在本实施例中，关键帧图像信息的选取方式可以采用实施例一步骤2.3 中类似的关键帧选取方式，在此不再赘述。In this embodiment, the key frame image information selection method may adopt a similar key frame selection method in step 2.3 of the first embodiment, which will not be repeated here.

步骤2.4，选取联合增强的模式和模式增强顺序。Step 2.4, select the mode of joint enhancement and the mode enhancement order.

关于默认系统设置，终端设备默认设置增强某些模式和这些模式的增强顺序，例如启动时仅为开启颜色增强和亮度增强，并且先进行颜色增强，再进行亮度增强。Regarding the default system settings, the default settings of the terminal device enhance some modes and the enhancement order of these modes, for example, only color enhancement and brightness enhancement are turned on at startup, and color enhancement is performed first, and then brightness enhancement is performed.

关于用户交互下的语音设置，例如，用户预先设定语音指令“多模式联合增强，开启视频颜色增强”，终端设备接受到该指令，则对声控指令进行语音识别，确定开启颜色增强功能，用户再发出语音指令“多模式联合增强，关闭视频颜色增强”，终端设备接受到该指令，则确定关闭视频颜色增强，其中，模式增强顺序跟语音指令的发出顺序一致。Regarding the voice settings under user interaction, for example, the user presets the voice command "multi-mode joint enhancement, enable video color enhancement", and the terminal device receives the command, performs voice recognition on the voice control command, and determines to enable the color enhancement function. The voice command "multi-mode joint enhancement, turn off video color enhancement" is issued again, and the terminal device determines to turn off video color enhancement upon receiving the command, wherein the mode enhancement sequence is consistent with the issuing sequence of the voice commands.

关于用户交互下的按键设置，按键可以为硬件按键，例如F1表示开启/ 关闭颜色增强，F2表示开启/关闭亮度增强，F3表示开启/关闭分辨率增强， F4表示开启/关闭去模糊，F5表示开启/关闭去噪。按键也可以为虚拟按键，例如屏幕上的按钮、菜单，交互界面上的虚拟键盘等，系统检测到用户点击该虚拟按键的事件后，确认开启/关闭某一增强模式。用户按下还可以结合按的压力、速度、时间、频率等多种特征信息表示不同的含义，例如轻按代表关闭某一增强模式，重按代表开启某一增强模式。其中，模式增强顺序跟用户按键的点击顺序一致。Regarding the key setting under user interaction, the key can be a hardware key, for example, F1 means enable/disable color enhancement, F2 means enable/disable brightness enhancement, F3 means enable/disable resolution enhancement, F4 means enable/disable deblurring, F5 means Turn on/off denoising. The keys can also be virtual keys, such as buttons on the screen, menus, virtual keyboards on the interactive interface, etc. After the system detects the event of the user clicking the virtual key, it confirms to enable/disable an enhanced mode. The user can also press to express different meanings in combination with the pressure, speed, time, frequency and other characteristic information of pressing . For example, pressing lightly means turning off a certain enhanced mode, and pressing again means turning on a certain enhanced mode. Among them, the mode enhancement sequence is consistent with the click sequence of user buttons.

关于用户交互下的手势设置，系统预先设定某一手持来开启/关闭某一增强模式。手势包括屏幕手势，例如从左向右滑动屏幕表示开启/关闭颜色增强，从右向左滑动屏幕代表开启/关闭亮度增强。手势还包括隔空手势，包括摇晃 /翻转/倾斜终端，摇晃/翻转/倾斜时的不同方向，角度，速度，力度可以表示不同的含义，如上下摇晃、左右摇晃、空着画圆等，上述手势可以是单一的手势，例如左手横向滑动表示开启/关闭颜色增强，也可以是任意手势的任意组合，例如右手横向滑动并空着画圆表示开启/关闭分辨率增强。其中，视频模式增强顺序跟用户手势的控制顺序一致。Regarding the gesture settings under user interaction, the system pre-sets a certain hold to enable/disable an enhanced mode. Gestures include screen gestures, such as swiping the screen from left to right to turn on/off color enhancement, and swiping from right to left to turn on/off brightness enhancement. Gestures also include space gestures, including shaking/flipping/tilting the terminal, different directions, angles, speeds, and strengths when shaking/flipping/tilting can represent different meanings, such as shaking up and down, shaking left and right, drawing a circle in an empty space, etc. The gesture can be a single gesture, such as swiping with the left hand horizontally to turn on/off color enhancement, or it can be any combination of any gestures, such as swiping with the right hand horizontally and drawing a circle in an empty space to turn on/off the resolution enhancement. Among them, the video mode enhancement sequence is consistent with the control sequence of user gestures.

关于通过外部控制器的设置，外部控制器包括但不限于手写笔、遥控器、智能手表、智能眼镜、智能头戴式设备、智能衣服、或远程设备等，这些控制器通过Wifi和/或红外和/或蓝牙和/或网络跟终端设备通信，例如，遥控器上某些按键代表启动不同增强模式，终端设备检测到用户点击了按键，开启/ 关闭其增强模式。其中，视频模式增强顺序跟外部控制器的指令发出顺序一致。Regarding settings via external controllers, including but not limited to styluses, remote controls, smart watches, smart glasses, smart head-mounted devices, smart clothes, or remote devices, etc., these controllers via Wifi and/or infrared And/or Bluetooth and/or the network communicates with the terminal device, for example, some keys on the remote control represent different enhanced modes, and the terminal device detects that the user has clicked the key and turns on/off its enhanced mode. Among them, the video mode enhancement sequence is consistent with the command issuing sequence of the external controller.

关于根据实时采集内容的自适应模式联合增强，实时采集的内容包括场景亮度、运动物体、语义内容等。可以根据场景亮度来开启/关闭某些增强模式，例如检测到场景光线偏暗，则启动亮度、颜色、分辨率增强和去噪，当光线变亮，则关闭亮度、颜色、分辨率增强和去噪。可以根据运动物体来开启/关闭某些增强模式，例如检测到场景运动物体有/无模糊，则自动启动/关闭去模糊模式，检测到场景运动物体尺度小于一定阈值，例如运动物体长度小于图像长度的1/10，则自动启动分辨率增强模式。可以根据语义内容来开启/关闭某些增强模式，例如检测视频场景从室内切换到室外，则开启颜色增强以适应白平衡变化，例如检测到视频场景中有/无车辆、人体和文字，则开启/关闭分辨率增强模式。Regarding the joint enhancement of the adaptive mode according to the real-time collected content, the real-time collected content includes scene brightness, moving objects, semantic content, etc. Some enhancement modes can be turned on/off according to the brightness of the scene. For example, if the scene is detected to be dark, the brightness, color, resolution enhancement and denoising will be activated. When the light becomes brighter, the brightness, color, resolution enhancement and denoising will be turned off. noise. Some enhancement modes can be turned on/off according to moving objects. For example, if a moving object in the scene is detected with/without blur, the deblurring mode will be automatically enabled/disabled. If the scale of a moving object in the scene is detected to be less than a certain threshold, for example, the length of the moving object is less than the image length. 1/10, the resolution enhancement mode is automatically activated. Some enhancement modes can be turned on/off according to the semantic content, such as detecting the video scene switching from indoor to outdoor, turn on color enhancement to adapt to white balance changes, such as detecting the presence/absence of vehicles, human bodies and text in the video scene, turn on /Turn off resolution enhancement mode.

关于根据实时采集的相关内容的自适应模式增强顺序设置，如果终端设备自适应选择了多个模式进行增强，需要对增强模式进行优先级排序，排序的原则是在该拍摄环境下，哪个模式更需要增强，即增强后对视频质量的提升影响最大。例如在夜间光照不够的情形下拍摄视频，不考虑运动因素，则亮度增强模式的优先级最高，去噪模式的优先级次之，接下来是颜色增强，然后是去模糊，最后是分辨率增强；又例如在运动的拍摄环境下，光照强度正常，则去模糊优先级最高，其他增强模式可以根据其他条件进行排序；又如在更复杂的拍照环境下，既存在光照不足问题，还存在终端设备的运动，用户可以对增强模式进行排序，也可以根据用户的历史数据对用户关注的模式进行排序，用户关注最多的模式排序越靠前。Regarding the setting of the enhancement order of the adaptive mode based on the relevant content collected in real time, if the terminal device adaptively selects multiple modes for enhancement, the enhancement mode needs to be prioritized. The principle of sorting is which mode is better in the shooting environment Enhancement is required, that is, the enhancement has the greatest impact on the improvement of video quality. For example, when shooting a video under the condition of insufficient light at night, regardless of the motion factor, the brightness enhancement mode has the highest priority, the denoising mode has the next priority, followed by color enhancement, then deblurring, and finally resolution enhancement ; Another example is in a moving shooting environment, the light intensity is normal, the deblurring priority is the highest, and other enhancement modes can be sorted according to other conditions; for example, in a more complex shooting environment, there are not only insufficient lighting problems, but also terminals. According to the movement of the device, the user can sort the enhanced modes, or sort the modes that the user pays attention to according to the user's historical data, and the mode that the user pays the most attention to is ranked higher.

关于根据设备相关状态的自适应设置，设备状态包括电量、内存等。可以根据五个增强模式的电量消耗进行模式设定和增强排序，假设排序结果为分辨率>去模糊>去噪>颜色增强>亮度增强，如果电量小于第一阈值，例如 50％，则不进行分辨率增强，如果电量小于第二阈值，例如40％，则不进行去模糊，如果电量小于第三阈值，例如30％，则不进行去噪，如果电量小于第四阈值，例如20％，则不进行颜色增强，如果电量小于第五阈值，例如10％，则不进行亮度增强。可以根据内存的自适应设置，例如根据五个增强模式的缓储空间进行模式设定和增强排序，假设排序结果为分辨率>去模糊>去噪> 颜色增强>亮度增强，如果内存小于第一阈值，例如500M，则不进行分辨率增强，如果内存小于第二阈值，例如400M，则不进行去模糊，如果内存小于第三阈值，例如300M，则不进行去噪，如果内存小于第四阈值，例如200M，则不进行颜色增强，如果内存小于第五阈值，例如100M，则不进行亮度增强。Regarding adaptive settings based on device-related states, device states include power, memory, and the like. Mode setting and enhancement sorting can be performed according to the power consumption of the five enhancement modes. Suppose the sorting result is resolution > deblur > denoise > color enhancement > brightness enhancement. If the power is less than the first threshold, such as 50%, it will not be performed. Resolution enhancement, if the battery is less than a second threshold, such as 40%, no deblurring, if the battery is less than a third threshold, such as 30%, no denoising, if the battery is less than a fourth threshold, such as 20%, then No color enhancement is performed, and no brightness enhancement is performed if the battery level is less than a fifth threshold, eg, 10%. The mode setting and enhancement sorting can be performed according to the adaptive settings of the memory, for example, according to the buffer space of the five enhancement modes. It is assumed that the sorting result is resolution > deblur > denoising > color enhancement > brightness enhancement. If the memory is smaller than the first Threshold, such as 500M, no resolution enhancement, if the memory is less than the second threshold, such as 400M, no deblurring, if the memory is less than the third threshold, such as 300M, no denoising, if the memory is less than the fourth threshold , for example, 200M, no color enhancement is performed, and if the memory is less than the fifth threshold, for example, 100M, no brightness enhancement is performed.

关于根据模式设置历史记录信息的自适应设置，记录所有增强模式被用户选择的次数，按照该次数进行优先级排序，例如，排序结果为分辨率>去模糊>去噪>颜色增强>亮度增强，当下次启动时，优先进行分辨率增强，再进行去噪、去模糊、颜色增强和亮度增强。或者根据上一次拍摄时的增强模式来确定本次拍摄需要增强的模式。Regarding the adaptive setting based on the mode setting history information, the number of times all enhancement modes are selected by the user is recorded, and the priority is sorted according to the number of times. For example, the sorting result is Resolution > Deblur > Denoise > Color Enhancement > Brightness Enhancement, When the next startup, priority will be given to resolution enhancement, followed by denoising, deblurring, color enhancement and brightness enhancement. Or the mode that needs to be enhanced for this shooting is determined according to the enhanced mode in the previous shooting.

此外，五个增强模式之间存在影响关系，如表1所示。在表1中，“X” 代表模式之间不相干，“O”代表模式A的增强会影响模式B的效果。考虑到模式之间的相关性，如果增强某一个模式，相关模式可以选择性的不进行增强或者进行增强，例如如果增强分辨率，则视频帧相比较清晰，可以不去模糊，如果夜景下亮度增强后，则噪声也会相应变得明显，则需要去噪。In addition, there are influence relationships among the five enhancement modes, as shown in Table 1. In Table 1, "X" represents the incoherence between modes, and "O" represents that the enhancement of mode A affects the effect of mode B. Considering the correlation between modes, if a certain mode is enhanced, the related mode can be selectively not enhanced or enhanced. For example, if the resolution is enhanced, the video frame is relatively clear and can not be de-blurred. If the brightness in the night scene is bright After enhancement, the noise will become obvious accordingly, and denoising is required.

表1.增强模式之间的影响关系Table 1. Influence relationship between enhancement modes

步骤3，对视频信息进行多模式联合增强。Step 3, multi-mode joint enhancement is performed on the video information.

在步骤2.4确定增强的模式后，该步骤对增强模式进行一一增强，在此考虑模式增强的顺序，即对步骤2.4中选择出的模式确定增强顺序。然后以增强顺序根据之前实施例一至五中的增强方法对选择的增强模式一一进行处理即可。After determining the enhanced mode in step 2.4, this step enhances the enhanced modes one by one, and the order of mode enhancement is considered here, that is, the enhancement order is determined for the mode selected in step 2.4. Then, the selected enhancement modes can be processed one by one according to the enhancement methods in the previous embodiments 1 to 5 in the enhancement order.

步骤4，视频的存储。Step 4, the storage of the video.

在本实施例中，如果增强模式包括分辨率增强，则视频存储方式可以采用实施例一中步骤4中相同的视频存储方式，在此不再赘述。存储方式的设置方法也与实施例一中步骤5中的设置方法相同，在此不再赘述。视频的存储方式可以采用实施例二中步骤4中相同的视频存储方式，在此不再赘述。存储方式的设置方法也与实施例二中步骤5中的设置方法相同，在此不再赘述。In this embodiment, if the enhancement mode includes resolution enhancement, the video storage mode can adopt the same video storage mode in step 4 in the first embodiment, which will not be repeated here. The setting method of the storage mode is also the same as the setting method in step 5 in the first embodiment, and will not be repeated here. The video storage mode can adopt the same video storage mode in step 4 in the second embodiment, which will not be repeated here. The setting method of the storage mode is also the same as the setting method in step 5 in the second embodiment, and will not be repeated here.

步骤5，视频的播放。Step 5, video playback.

在本实施例中，播放方式和存储方式对应，如果增强模式包括分辨率增强，则视频的播放方式采用实施例一步骤5中相同的视频播放方式，在此不再赘述。如果增强模式不包括分辨率增强，则视频播放方式采用实施例二步骤5中相同的视频播放的方式，在此不再赘述。In this embodiment, the playback mode corresponds to the storage mode. If the enhancement mode includes resolution enhancement, the video playback mode adopts the same video playback mode as in Step 5 of the first embodiment, which will not be repeated here. If the enhancement mode does not include resolution enhancement, the video playback mode adopts the same video playback mode in step 5 of the second embodiment, and details are not repeated here.

下面以一个具体实施例详细介绍视频联合增强模式，该模式包括开启步骤，模式增强步骤，参数设置步骤，存储步骤，播放步骤。图4中14个子步骤涵盖了这五个步骤。The video joint enhancement mode is described in detail below with a specific embodiment, and the mode includes an opening step, a mode enhancement step, a parameter setting step, a storage step, and a playback step. The 14 sub-steps in Figure 4 cover these five steps.

步骤1)：开始终端设备的照相机，进入拍照界面，通过语音控制，用户说“开启视频增强”，终端设备接收到“开启视频增强”，终端设备进入视频增强模式。Step 1): Start the camera of the terminal device, enter the photographing interface, and through voice control, the user says "enable video enhancement", the terminal device receives "enable video enhancement", and the terminal device enters the video enhancement mode.

步骤2)：视频增强模式开启后，拍照界面上面出现红色字体的“视频增强模式”，下面出现“视频增强”按钮，用户单击该按钮，进入增强模式选择界面。Step 2): After the video enhancement mode is turned on, the red font "Video Enhancement Mode" appears on the camera interface, and the "Video Enhancement" button appears below. The user clicks this button to enter the enhanced mode selection interface.

步骤3)：进入模式选择界面，“视频增强”字样变成黄色字体，出现六个选项，分别对应“亮度”，“颜色”，“去噪”，“去模糊”，“分辨率”，“自动”。前五个选项分别对应不同的增强模式，用户可以通过勾选一个或者任意多个来进行模式增强，也可以选择最后一个“自动”让终端设备根据拍照环境等其他因素自适应的选择需要增强的模式。勾选完成后，再次单击“增强模式” 按钮，进入摄像头设置界面。Step 3): Enter the mode selection interface, the word "Video Enhancement" will turn into a yellow font, and six options will appear, corresponding to "Brightness", "Color", "Denoise", "Deblur", "Resolution", " automatic". The first five options correspond to different enhancement modes. Users can select one or any of them to enhance the modes, or select the last option “Auto” to allow the terminal device to adaptively select the mode that needs to be enhanced according to other factors such as the photographing environment. model. After checking, click the "Enhanced Mode" button again to enter the camera setting interface.

步骤4)：拍照界面显示两个摄像头的采集画面，用户可以通过切换摄像头查看拍照场景，用户可自由设定哪个摄像头为主摄像头拍摄原始视频，另外一个摄像头为辅摄像头拍摄关键帧照片。Step 4): The capture screen of the two cameras is displayed on the camera interface. The user can view the camera scene by switching cameras. The user can freely set which camera is the main camera to capture the original video, and the other camera is the auxiliary camera to capture key frame photos.

步骤5)：选定摄像头后，点击设置按钮，可以摄像头进行采集参数设置，切换摄像头，可以完成对两个摄像头采集参数的设置。采集参数包括但不限于曝光时间，感光度。完成采集参数设置后，单击“录像”按钮，进入下一个步骤。Step 5): After selecting the camera, click the setting button to set the acquisition parameters of the camera, and switch the camera to complete the setting of the acquisition parameters of the two cameras. Acquisition parameters include but are not limited to exposure time, light sensitivity. After completing the acquisition parameter settings, click the "Record" button to enter the next step.

步骤6)：进入录像界面，开始录像，主摄像头拍摄视频，辅摄像头开始拍摄关键帧照片。在录像界面，完成拍摄后，用户按下“停止”按钮结束拍摄，通过不同的交互方式可以进行不同的存储播放模式。单击“停止”按钮跳转到图中步骤7，直接存储增强后的视频；长按“停止”按钮，跳转到步骤10，存储原始视频信息和关键帧图像信息。Step 6): Enter the recording interface, start recording, the main camera shoots video, and the auxiliary camera starts to shoot key frame photos. In the video recording interface, after completing the shooting, the user presses the "Stop" button to end the shooting, and different storage and playback modes can be performed through different interaction methods. Click the "Stop" button to jump to step 7 in the figure to directly store the enhanced video; long press the "Stop" button to jump to step 10 to store the original video information and keyframe image information.

步骤7)：视频进行增强处理，界面右下角图像框中显示原始视频信息，并显示缓冲标志，提示用户视频增强的进度，增强完成，缓冲图标消失，图像框中显示增强后的视频信息。完成存储后进入正常拍摄界面。Step 7): the video is enhanced, the original video information is displayed in the image box in the lower right corner of the interface, and a buffering sign is displayed to remind the user of the progress of video enhancement, the enhancement is completed, the buffering icon disappears, and the enhanced video information is displayed in the image box. After the storage is completed, it will enter the normal shooting interface.

步骤8)：拍摄的视频增强完成，且存储在终端设备中，在进行下次拍摄前，右下角图像框显示最近时间内拍摄的视频，可以点击该图像框对增强后的视频信息进行查看。Step 8): The enhancement of the captured video is completed and stored in the terminal device. Before the next capture, the image frame in the lower right corner displays the video captured in the most recent time. You can click the image frame to view the enhanced video information.

步骤9)：进入视频播放界面，点击播放按钮即进行视频播放。Step 9): Enter the video playback interface, click the play button to play the video.

步骤10)：直接存储原始视频信息和关键帧图像信息，右下角图像框中显示原始视频信息。后台会根据处理器使用情况来对视频增强进行选择性处理，如果处理器有空闲则对视频进行增强处理。单击右下角图像框查看视频。终端设备接到单击指令后，首先判断视频增强是否完成，如果后台完成了增强步骤，则跳转到步骤11)，如果后台未完成增强步骤，则跳转到步骤12)。Step 10): directly store the original video information and key frame image information, and display the original video information in the image box in the lower right corner. The background will selectively process the video enhancement according to the processor usage, and if the processor is idle, the video will be enhanced. Click the bottom right image box to view the video. After receiving the click instruction, the terminal device firstly judges whether the video enhancement is completed, if the enhancement step is completed in the background, then jumps to step 11), if the background enhancement step is not completed, then jumps to step 12).

步骤11)：视频增强已经完成，显示增强后的视频播放界面，单击播放按钮即可播放视频。Step 11): The video enhancement has been completed, and the enhanced video playback interface is displayed. Click the play button to play the video.

步骤12)：视频增强未完成，终端设备继续对视频进行增强，背景图像可以显示原始视频信息，同时出现缓冲标志提示增强进度。增强完成后，缓冲图标自动消失，跳转到步骤13)。Step 12): the video enhancement is not completed, the terminal device continues to enhance the video, the background image can display the original video information, and a buffer mark appears to indicate the enhancement progress. After the enhancement is completed, the buffer icon will disappear automatically, and jump to step 13).

步骤13)：显示增强后的视频播放界面，单击播放按钮进入步骤14)。Step 13): display the enhanced video playback interface, click the play button to enter step 14).

步骤14)：播放视频。Step 14): Play the video.

实施例七：手持终端中的多模式联合的图像增强视频模式Embodiment 7: Multi-mode combined image-enhanced video mode in a handheld terminal

下面以一个具体实施例详述手持智能终端中视频联合增强模式的执行流程。The execution flow of the joint video enhancement mode in the handheld smart terminal will be described in detail below with a specific embodiment.

步骤1)：用户拿起手持智能终端，发出语音指令“开启视频增强”，手持智能终端启动两个摄像头，默认某一侧摄像头A采集图像，另一侧摄像头 B采集视频，如图5所示。Step 1): The user picks up the handheld smart terminal, issues a voice command "enable video enhancement", the handheld smart terminal starts two cameras, and the camera A on one side captures images by default, and the camera B captures video on the other side, as shown in Figure 5 .

步骤2)：如图6(a)所示，通过预览当前拍摄场景，手持智能终端检测到场景为白天，设置白平衡为日光，检测到场景亮度偏高，自适应设置图像曝光度偏低，手持智能终端默认设置视频采集频率为30fps，尺寸为640*480，图像尺寸为1920*1080，关键帧采集频率设为1次/分钟。Step 2): As shown in Figure 6(a), by previewing the current shooting scene, the handheld smart terminal detects that the scene is daytime, sets the white balance to sunlight, detects that the scene brightness is high, and adaptively sets the image exposure to low, The default setting of the handheld smart terminal is that the video capture frequency is 30fps, the size is 640*480, the image size is 1920*1080, and the key frame capture frequency is set to 1 time per minute.

步骤3)：根据用户在视频增强主屏幕下的操作，手持智能终端响应于以下事件，包括：打开设置功能，然后在触摸屏上选取白平衡设置，如图6(b) 所示，用滑动条滚动调整了白平衡，并调整曝光量，更改视频采集频率为 25fps，关键帧采集频率为2次/分钟。Step 3): According to the user's operation under the main screen of video enhancement, the handheld smart terminal responds to the following events, including: opening the setting function, and then selecting the white balance setting on the touch screen, as shown in Figure 6(b), using the sliding bar Scrolling adjusted the white balance, adjusted the exposure, changed the video capture frequency to 25fps, and the key frame capture frequency to 2 times/min.

步骤4)：根据用户点击交互面板上的Capture图标的操作，如图6(c)所示，智能终端启动视频采集，并且默认开始时的模式组合为亮度增强和颜色增强。Step 4): According to the operation of the user clicking the Capture icon on the interactive panel, as shown in Figure 6(c), the intelligent terminal starts video capture, and the default mode combination at the beginning is brightness enhancement and color enhancement.

步骤5)：当场景中有一个快速移动的小孩，视频帧出现了模糊，智能终端自适应地启动了去模糊模式，并将关键帧采集频率提升为4次/分钟。Step 5): When there is a fast-moving child in the scene, and the video frame is blurred, the intelligent terminal adaptively starts the deblurring mode, and increases the key frame collection frequency to 4 times/min.

步骤6)：随着小孩移动出画面和视频帧模糊消失，智能终端自适应地关闭了去模糊模式，并将关键帧采集频率恢复为2次/分钟。Step 6): as the child moves out of the screen and the blurred video frames disappear, the intelligent terminal adaptively turns off the deblurring mode, and restores the key frame collection frequency to 2 times/min.

步骤7)：此时，由于室外日光越来越亮手持智能终端检测到亮度增强，从而自适应地减低了曝光量。Step 7): At this time, since the outdoor sunlight is getting brighter and brighter, the handheld smart terminal detects that the brightness is increased, thereby adaptively reducing the exposure amount.

步骤8)：当用户进入室内光线也随之变暗，手持智能终端检测亮度减弱，从而自适应地提高了曝光量。Step 8): When the user enters the room, the light also becomes dark, and the detection brightness of the handheld smart terminal is weakened, thereby adaptively increasing the exposure.

步骤9)：用户发出语音指令“启动高动态范围图像”，手持智能终端将普通亮度采集切换为高动态范围采集。Step 9): The user issues a voice command "start high dynamic range image", and the handheld smart terminal switches the ordinary brightness acquisition to the high dynamic range acquisition.

步骤10)：虽然光线再次变弱，视频中出现大量噪声，手持智能终端检测到噪声增强自适应地启动了去噪模式。Step 10): Although the light becomes weak again and a lot of noise appears in the video, the handheld smart terminal detects the noise enhancement and adaptively activates the de-noising mode.

步骤11)：这时，电量低于30％，手持智能终端自适应地关闭高动态范围拍摄，当电量低于10％，手持智能终端又关闭了颜色增强模式，当电量低于5％，系统将关闭所有增强模式。Step 11): At this time, when the battery power is lower than 30%, the handheld smart terminal adaptively turns off the high dynamic range shooting. When the battery power is lower than 10%, the handheld smart terminal turns off the color enhancement mode. When the battery power is lower than 5%, the system All enhanced modes will be turned off.

步骤12)：用户发出语音指令“关闭视频增强”，手持智能终端识别出声控指令关闭了视频增强，并压缩存储增强后的视频；Step 12): the user sends out a voice command "close video enhancement", and the hand-held intelligent terminal identifies that the voice control command has closed the video enhancement, and compresses and stores the enhanced video;

步骤13)：此后，用户每次打开该增强视频，手持智能终端都识别并进行该视频的播放观看。Step 13): After that, every time the user opens the enhanced video, the handheld smart terminal recognizes and plays and watches the video.

实施例八：监控终端中的多模式联合的图像增强视频模式Embodiment 8: Multi-mode combined image-enhanced video mode in monitoring terminal

在室外环境下，目前的监控设备往往具有采集分辨率低、采集亮度差，对光线敏感，阴雨天噪声大等缺点。本实施例给出一种将单目监控摄像头替换为双目监控摄像头的方案，并在后台进行视频质量增强处理，提高分辨率、颜色、亮度和去噪去模糊。In an outdoor environment, the current monitoring equipment often has shortcomings such as low acquisition resolution, poor acquisition brightness, sensitivity to light, and large noise in cloudy and rainy days. This embodiment provides a solution for replacing a monocular surveillance camera with a binocular surveillance camera, and performs video quality enhancement processing in the background to improve resolution, color, brightness, and denoising and blurring.

在高速公路上搭建了一个双摄像头监控的智能终端设备，如图7(a)所示，对过往车辆进行拍摄监控，其中一路摄像头采集视频，视频尺寸固定为 480*320，另一路摄像头采集图像，图像尺寸固定为1920*1080，通过网络将两路数据传输到监控后端。监控后端有一台处理器、一个显示屏和一套控制面板，如图6(b)所示，其中后台处理器对两路数据进行实时处理，显示屏幕上展示当前监控视频、已采集关键帧和虚拟控制面板，硬件控制面板包括鼠标和键盘等装置，对参数、增强模式组合、关键帧选取进行设置。下面以一个实施例介绍监控情况下视频质量增强方法。A dual-camera monitoring intelligent terminal device is built on the highway, as shown in Figure 7(a), to monitor the passing vehicles. One of the cameras captures video, and the video size is fixed at 480*320, and the other camera captures images. , the image size is fixed at 1920*1080, and two channels of data are transmitted to the monitoring backend through the network. The monitoring backend has a processor, a display screen and a set of control panels, as shown in Figure 6(b), in which the backend processor processes the two channels of data in real time, and displays the current monitoring video and captured key frames on the display screen. And the virtual control panel, the hardware control panel includes devices such as mouse and keyboard, and sets parameters, enhanced mode combination, and key frame selection. The video quality enhancement method in the monitoring situation is described below with an embodiment.

首先，操作员通过按键F1启动“视频增强”，根据场景情况执行以下步骤：First, the operator starts "video enhancement" by pressing the key F1, and performs the following steps according to the situation:

步骤1)，默认关键帧采集模式为系统自适应调整，如果操作员不进行设置，则跳转到步骤2)，否则，操作员通过虚拟控制面板设置关键帧采集模式是N帧/秒；Step 1), the default key frame acquisition mode is system adaptive adjustment, if the operator does not set, then jump to step 2), otherwise, the operator sets the key frame acquisition mode through the virtual control panel to be N frames/sec;

步骤2)，默认模式组合为分辨率增强和去模糊增强，如果操作员不进行设置，则跳转到步骤3)，否则，操作员通过虚拟控制面板选取和组合五个增强模式。Step 2), the default mode combination is resolution enhancement and deblurring enhancement, if the operator does not set it, then jump to step 3), otherwise, the operator selects and combines five enhancement modes through the virtual control panel.

步骤3)，显示屏幕上实时显示原始采集视频、增强视频以及一组最近采集的关键帧。显示效果有三个：如图7(c)所示的原始视频，如图7(d)所示的颜色增强和去噪增强，如图7(e)所示的亮度增强。操作员通过菜单可以选取任意一种展示方式。In step 3), the original captured video, the enhanced video and a group of recently captured key frames are displayed on the display screen in real time. There are three display effects: the original video as shown in Figure 7(c), the color enhancement and denoising enhancement as shown in Figure 7(d), and the brightness enhancement as shown in Figure 7(e). The operator can choose any display mode through the menu.

步骤4)，当监控终端发现场景有一个超速车辆，可以自适应提高关键帧采集率获取更多高清晰图像，可以调节图像曝光量增减场景亮度，可以设置对焦区域为车辆车牌，可以选取白平衡来调整色差。Step 4), when the monitoring terminal finds that there is a speeding vehicle in the scene, it can adaptively increase the key frame acquisition rate to obtain more high-definition images, adjust the image exposure to increase or decrease the brightness of the scene, set the focus area as the vehicle license plate, and select the white color. Balance to adjust for chromatic aberration.

步骤5)，当拍摄场景为夜色、阴雨天，监控终端根据时间和亮度自适应启动去噪和亮度增强模式。Step 5), when the shooting scene is night, cloudy and rainy, the monitoring terminal adaptively starts denoising and brightness enhancement modes according to time and brightness.

步骤6)，每隔6个小时，监控终端将前6个小时采集的视频、关键帧、模式增强组合方式、和设置参数压缩存储到数据库中。Step 6), every 6 hours, the monitoring terminal compresses and stores the video collected in the first 6 hours, the key frame, the mode enhancement combination mode, and the setting parameters into the database.

步骤7)，为了调用观看之前存储的数据，用户通过数据库查询到某段数据，智能终端在显示屏幕上展示增强前后的视频和关键帧。Step 7), in order to call the data stored before viewing, the user queries a certain piece of data through the database, and the intelligent terminal displays the video and key frames before and after the enhancement on the display screen.

实施例九、增强处理模式为视频增强图像模式中的去模糊增强模式Embodiment 9. The enhancement processing mode is the deblurring enhancement mode in the video enhancement image mode

当第一类多媒体信息为视频信息，第二类多媒体信息为图像信息时，获取一个多媒体采集设备采集的图像信息，以及另一多媒体采集设备依据设置的视频帧采集频率采集的与图像信息相对应的视频片段；When the first type of multimedia information is video information and the second type of multimedia information is image information, the image information collected by one multimedia collection device and the corresponding image information collected by another multimedia collection device according to the set video frame collection frequency are obtained. video clips;

优选地，还包括：当检测到采集图像信息的多媒体采集设备进入预览状态时，或当检测到采集图像信息多媒体采集设备开始采集图像信息时，另一多媒体采集设备依据设置的视频帧采集频率采集的与图像信息相对应的视频片段；Preferably, it also includes: when it is detected that the multimedia collection device that collects image information enters a preview state, or when it is detected that the multimedia collection device that collects image information starts to collect image information, another multimedia collection device collects video frames according to the set frequency of collection. The video clip corresponding to the image information;

其中，根据采集的视频片段对采集的图像信息对应的需要增强的指标进行增强处理，具体包括：Wherein, according to the collected video clips, the indexes that need to be enhanced corresponding to the collected image information are enhanced, which specifically includes:

优选地，还包括：对采集到的图像信息进行清晰度分析；若图像信息属于模糊图像，则根据采集到的视频片段对采集到的图像信息对应的需要增强的指标进行增强处理；其中，需要增强的指标包括于模糊。Preferably, the method further includes: performing definition analysis on the collected image information; if the image information belongs to a blurred image, performing enhancement processing on the index that needs to be enhanced corresponding to the collected image information according to the collected video clips; Enhanced metrics are included in blur.

当第一类多媒体信息为视频信息，第二类多媒体信息为图像信息时，根据采集的视频信息对采集的图像信息进行存储处理，其中，存储内容包括以下至少一种情形：When the first type of multimedia information is video information, and the second type of multimedia information is image information, the collected image information is stored and processed according to the collected video information, wherein the storage content includes at least one of the following situations:

可选地，响应于接收到的显示触发操作，基于与存储内容相匹配的显示方式对图像信息进行显示；其中，显示方式包括以下至少一项：Optionally, in response to the received display trigger operation, the image information is displayed based on a display mode that matches the stored content; wherein, the display mode includes at least one of the following:

本实施例是通过双目摄像头中的一个摄像头采集图像，另一个摄像头采集视频，得到高质量图像。为了拍一张高亮度清晰的图像，用户往往会采用长曝光拍照，在没有三脚架固定的情况下，手持智能终端容易产生抖动，拍摄画面会出现模糊。在拍照的同时启动另一个摄像头拍摄一小段视频，视频每帧曝光时间短，边缘和纹理信息保持较好，而且相比于图像，视频是动态的，可以利用视频的这一特性来对静态图像进行增强。通过视频帧和照片可以估计出运动模糊核，进而对图像进行去模糊，得到一张高亮度清晰的图像。主要步骤如下。In this embodiment, one camera in the binocular camera collects images, and the other camera collects videos to obtain high-quality images. In order to take a high-brightness and clear image, users often take long-exposure photos. In the absence of a tripod, hand-held smart terminals are prone to shake, and the captured image will appear blurry. Start another camera at the same time as taking a picture to shoot a short video. The exposure time of each frame of the video is short, and the edge and texture information are well preserved. Compared with the image, the video is dynamic. to enhance. The motion blur kernel can be estimated from the video frames and photos, and then the image is deblurred to obtain a high-brightness and clear image. The main steps are as follows.

步骤1：开启视频增强图像模式中的去模糊增强模式。Step 1: Turn on the deblur enhancement mode in the video enhancement image mode.

在用户开启图像去模糊增强模式中使用不同的指令描述，例如，语音开启中的指令为“启动图像去模糊”，按键开启的指令为长按Home键，虚拟按键中图像去模糊的按钮，手持开启指令为摇晃终端等，在此不再赘述。Different command descriptions are used when the user turns on the image deblurring enhancement mode. For example, the command in voice activation is "start image deblurring", the command in button activation is long-pressing the Home button, and the button for image deblurring in the virtual key is held by hand. The opening command is to shake the terminal, etc., which will not be repeated here.

在终端设备根据拍摄环境和采集参数自适应开启图像去模糊模式中给出三种方法，一种方法是智能终端利用现有方法对终端设备运动趋势进行检测，如果拍摄终端设备处于运动状态，如手持终端设备的手的抖动引起的手持终端设备产生抖动，则开启图像去模糊增强模式；第二种方法是终端设备对拍摄参数进行检测，如果图像采集的曝光时间高于某一阈值，例如高于300ms，则开启图像去模糊增强模式；第三种可以是前两种方法的组合，两个条件同时满足则开启图像去模糊增强模式，即拍摄终端处于运动状态，且曝光时间高于某一阈值则开启图像去模糊增强模式。Three methods are given in the terminal device adaptively enabling the image deblurring mode according to the shooting environment and acquisition parameters. One method is that the intelligent terminal uses the existing method to detect the movement trend of the terminal device. If the shooting terminal device is in a moving state, such as If the handheld terminal device shakes due to the hand shake of the handheld terminal device, turn on the image deblurring enhancement mode; the second method is that the terminal device detects the shooting parameters. If the exposure time of the image acquisition is higher than a certain threshold, such as high In 300ms, the image deblurring enhancement mode is turned on; the third method can be a combination of the first two methods, and the image deblurring enhancement mode is turned on when the two conditions are satisfied at the same time, that is, the shooting terminal is in motion, and the exposure time is higher than a certain Threshold turns on image deblurring enhancement mode.

在终端设备根据实时采集的相关内容自适应开启图像去模糊模式中计算拍摄图像的模糊度指标，低于某一阈值则开启图像去模糊模式，用于后续图像的拍摄。The terminal device automatically turns on the image deblurring mode according to the relevant content collected in real time to calculate the blurriness index of the captured image, and if it is lower than a certain threshold, the image deblurring mode is turned on for subsequent image capture.

该步骤主要包括采集参数及增强策略参数的设置，主、辅摄像头的设置，采集图像和视频的参数设置，以及视频关键帧的选取。This step mainly includes the setting of acquisition parameters and enhancement strategy parameters, the settings of the main and auxiliary cameras, the parameter settings of the acquired images and videos, and the selection of video key frames.

在本实施例中，设置主、辅摄像头的方式可以采用实施例一步骤2.1中类似的设置主、辅摄像头的方式，在此不再赘述。在本实施中主摄像头负责拍摄图像，辅摄像头负责拍摄视频。In this embodiment, the method for setting the main and auxiliary cameras can be similar to the method for setting the main and auxiliary cameras in step 2.1 of the first embodiment, which will not be repeated here. In this implementation, the main camera is responsible for capturing images, and the secondary camera is responsible for capturing video.

步骤2.2，设置摄像头的采集参数和增强策略参数Step 2.2, set the camera's acquisition parameters and enhancement strategy parameters

本实施例中参数是双目摄像头采集图像和视频片段需要设置的参数，除了实施列一中提到的摄像头的采集参数，增加了视频帧采集频率，即每秒采集的视频帧数，以及图像去模糊算法中的参数等。The parameters in this embodiment are the parameters that need to be set for the binocular camera to collect images and video clips. In addition to the acquisition parameters of the camera mentioned in column 1, the frequency of video frame collection is increased, that is, the number of video frames collected per second, and the image Parameters in deblurring algorithms, etc.

设置摄像头的采集参数和增强策略参数可以采用实施例一类似的设置方式，不同的是增加了对视频帧采集频率的设置以及针对图像去模糊进行的自适应参数调整，下面分别对新增加的设置方法进行介绍。To set the acquisition parameters and enhancement strategy parameters of the camera, a similar setting method can be used in the first embodiment, except that the settings for the video frame acquisition frequency and the adaptive parameter adjustment for image deblurring are added. The newly added settings are described below. method is introduced.

关于终端设备的系统默认设置，将视频帧采集频率设置为某一默认值，在接收到修改视频帧采集频率指令之前视频帧采集频率设置为默认值，其他方式相同，在此不再赘述。Regarding the system default settings of the terminal device, the video frame capture frequency is set to a certain default value, and the video frame capture frequency is set to the default value before receiving the instruction to modify the video frame capture frequency. Other methods are the same, and will not be repeated here.

关于用户交互设置，对于视频帧采集频率，用户可以通过语音，滑动条，按键，文本输入等方式来设置视频片段帧数。采集参数受到终端设备自身取值范围的限制，用户手动设置也需要在一定范围内选择，否则接收到终端设备发出的警告，具体内容在此不再赘述。Regarding the user interaction setting, for the video frame collection frequency, the user can set the frame number of the video clip through voice, slide bar, button, text input, etc. The collection parameters are limited by the value range of the terminal device itself, and the manual setting by the user also needs to be selected within a certain range, otherwise, a warning from the terminal device will be received, and the specific content will not be repeated here.

关于根据环境的自适应设置，例如，终端设备检测到拍摄终端处于运动状态，则增加辅摄像头采集的视频帧采集频率。Regarding the adaptive setting according to the environment, for example, if the terminal device detects that the shooting terminal is in a motion state, it increases the frequency of video frame capture collected by the secondary camera.

关于根据电量的自适应设置，例如由电量控制视频帧采集频率，电量小于某一阈值，例如，小于50％，则减少每秒采集的视频帧的帧数，小于5％则固定帧数为最小值。关于根据存储空间的自适应设置。根据存储空间，可以调整视频帧采集频率，如果剩余存储空间大于某一阈值，例如，大于终端设备总存储空间的50％或者大于500M，则自动调整为高的视频帧采集频率，例如一秒30帧，反之，则调整为低的视频帧采集频率，例如一秒25帧。根据存储空间，可以调整视频帧采集频率的多少，如果剩余存储空间大于某一阈值，例如，大于终端设备总存储空间的30％或者大于300M，则增加辅摄像头采集的视频帧采集频率，反之则减少辅摄像采集的视频帧采集频率。Regarding the adaptive setting according to the power, for example, the video frame collection frequency is controlled by the power. If the power is less than a certain threshold, for example, if it is less than 50%, the number of frames of video frames collected per second will be reduced. If it is less than 5%, the fixed number of frames will be the minimum. value. About adaptive settings based on storage space. According to the storage space, the video frame collection frequency can be adjusted. If the remaining storage space is larger than a certain threshold, for example, larger than 50% of the total storage space of the terminal device or larger than 500M, it will automatically adjust to a high video frame collection frequency, such as 30 per second. frame, otherwise, adjust to a low video frame capture frequency, such as 25 frames per second. According to the storage space, you can adjust the frequency of video frame collection. If the remaining storage space is larger than a certain threshold, for example, larger than 30% of the total storage space of the terminal device or larger than 300M, increase the video frame collection frequency collected by the secondary camera, and vice versa. Reduce the frequency of video frame capture for secondary camera capture.

关于根据历史数据的自适应设置，例如根据用户喜好设置视频帧采集频率，可以采用但不限于下面这种方法：统计最近N次图像去模糊中设置视频的视频帧采集频率，终端设备推荐设置视频帧采集频率为之前设置过的采集频率的均值。Regarding the adaptive setting based on historical data, for example, setting the video frame collection frequency according to user preferences, the following method can be used but not limited to: Count the video frame collection frequency of the video set in the last N times of image deblurring, and the terminal device recommends setting the video frame frequency. The frame acquisition frequency is the average of the previously set acquisition frequencies.

关于根据采集到的相关内容来自适应的设置采集参数，在本实施例中是利用双目摄像头中一个采集图像，一个采集视频，对图像进行去模糊。在此主要是对视频和图像采集的分辨率，曝光时间，感光度进行设置，出于节能以及算法设计两方面的考虑，辅助摄像头采集视频的分辨率应与图像的分辨率设置保持一致，如果视频的最高分辨率比当前图像的分辨率低，则采用视频的最高分辨率进行采集。如果拍照环境亮度正常，或者高于某一阈值，则缩短图像和视频曝光时间，适当提高感光度，降低出现画面模糊的概率；如果传感器检测到终端设备存在抖动或者其他运动趋势，则适当减少曝光时间，避免图像和视频出现模糊，影响图像最终的效果。Regarding the adaptive setting of acquisition parameters according to the acquired related content, in this embodiment, one of the binocular cameras is used to acquire an image and the other to acquire a video to deblur the image. Here, the resolution, exposure time, and sensitivity of video and image capture are mainly set. For the consideration of energy saving and algorithm design, the resolution of the video captured by the auxiliary camera should be consistent with the resolution of the image. If If the highest resolution of the video is lower than that of the current image, the highest resolution of the video is used for acquisition. If the brightness of the photographing environment is normal or higher than a certain threshold, shorten the image and video exposure time, appropriately increase the sensitivity, and reduce the probability of picture blurring; if the sensor detects that the terminal device has jitter or other motion trends, reduce the exposure appropriately. time, to avoid blurring of images and videos, which will affect the final effect of the image.

步骤2.3，视频片段的采集Step 2.3, acquisition of video clips

该步骤介绍辅助摄像头何时开始进行视频拍摄，可以采用但不限于以下两种方式中的一种：一种是在图像拍摄前，即浏览界面时进行拍摄；一种是照片开始拍摄时同时启动视频拍摄。This step describes when the auxiliary camera starts to shoot video, which can be used but not limited to one of the following two methods: one is to shoot before the image is captured, that is, when browsing the interface; the other is to start at the same time when the photo starts to shoot. Video shooting.

在浏览界面进行拍摄时，为了节省视频存储空间以及保证视频画面与照片内容的强相关性，浏览界面拍摄的视频只保留部分帧，该部分视频的帧数可以用户通过语音，滑动条，文本输入等方式进行设置，也可以根据总的视频帧数来自适应的调节该部分视频的帧数，例如，设置为总帧数的10％。总的视频帧数多，则该部分可以保存的视频帧数就多，反之则变少。设置一个缓冲序列来存储这部分视频帧，将该序列存储的最大帧数设置为终端设备设置的视频片段的帧数，当该序列存满时，来了新的视频帧，则去除序列中的最早拍摄的视频帧，如果视频序列中的视频帧是按照拍摄时间存储的，则去除第一帧，留出空间存储新的视频帧，按照此方法依此进行视频序列的更新，只保留最新的视频片段。拍照开始时继续拍摄视频，可以根据以下两种判断条件停止视频拍摄，一是拍摄的帧数与之前帧数之和达到视频总帧数上限时停止视频拍摄；二是在拍照完成时，视频帧数未到达帧数上限也停止拍摄，存储得到的总的视频帧数即可。When shooting in the browsing interface, in order to save the video storage space and ensure the strong correlation between the video image and the photo content, only some frames of the video captured in the browsing interface are reserved. It is also possible to adjust the frame number of this part of the video adaptively according to the total number of video frames, for example, set it to 10% of the total number of frames. If the total number of video frames is large, the number of video frames that can be saved in this part will be more, and vice versa. Set a buffer sequence to store this part of the video frame, and set the maximum number of frames stored in the sequence to the frame number of the video clip set by the terminal device. When the sequence is full and a new video frame comes, remove the frame in the sequence. The earliest captured video frame, if the video frame in the video sequence is stored according to the shooting time, the first frame is removed, and space is reserved to store new video frames, and the video sequence is updated according to this method, and only the latest video frame is kept. video clips. Continue to shoot video at the beginning of shooting, and stop video shooting according to the following two judgment conditions. One is to stop video shooting when the sum of the number of frames shot and the number of previous frames reaches the upper limit of the total number of video frames; If the number of frames does not reach the upper limit of the number of frames, stop shooting, and store the total number of video frames obtained.

在拍照启动时，同时启动视频拍摄，视频帧数达到终端设备设置的视频帧数时停止视频拍摄，如在拍照完成时，视频帧数未达到终端设备设置的视频帧数也停止拍摄，存储得到的视频片段即可。When shooting starts, start video shooting at the same time, and stop video shooting when the number of video frames reaches the video frame number set by the terminal device. video clips.

步骤3：使用视频去除图像的模糊；Step 3: Use the video to remove the blur from the image;

得到图像和视频片段后，终端设备对图像画面清晰度进行分析，可以通过照片拍摄期间，终端传感器获得的参数，或者视频片段是否存在运动目标，或者文献中的已有的分类器对图像进行分类。如果画面属于清晰图像，则不进行去模糊处理，否则继续下面的去模糊处理。After obtaining the images and video clips, the terminal device analyzes the clarity of the image, and can classify the images according to the parameters obtained by the terminal sensor during the photo shooting, or whether there is a moving target in the video clip, or existing classifiers in the literature. . If the picture is a clear image, do not perform deblurring, otherwise continue with the following deblurring.

为了节省处理时间，可以选择视频片段中的几帧对图像进行去模糊，即确定视频关键帧。视频关键帧帧数可以根据画面模糊程度进行自适应调整，也可以设置为一个固定值或者一个固定的比例，例如，设置为总的帧数的五分之一等。视频关键帧的选择可以通过内容相似度，选择最相似的几帧；也可以通过视频帧质量，选择质量排在前的几帧；也可以使用组合的指标来选择视频帧，例如对视频质量进行排序，然后根据模糊程度设置关键帧帧数。To save processing time, you can select several frames in the video clip to deblur the image, i.e. determine video key frames. The number of video key frames can be adaptively adjusted according to the degree of image blur, and can also be set to a fixed value or a fixed ratio, for example, set to one-fifth of the total number of frames. The selection of video key frames can be based on the similarity of content to select the most similar frames; the quality of the video frames can also be used to select the first few frames of quality; the combined indicators can also be used to select video frames, such as video quality. Sort and then keyframe the number of frames based on the degree of blur.

得到视频关键帧之后，首先将视频关键帧与图像的尺度进行统一，将视频关键帧进行放缩或者图像进行放缩，或者同时对视频关键帧和图像进行放缩，使得图像和视频帧具有相同的尺寸。利用立体匹配的方法计算关键帧与图像的视差，对视频帧和图像进行对齐，找到视频帧与图像的对应区域，将每帧视频，即非模糊图像，与模糊图像的对应区域建立一个模糊图像到清晰图像的对应关系，基于每对对应区域，可以利用文献中已有的模糊核估计方法求解多个模糊核。将求解得到的所有模糊核进行加权平均得到一个最终的模糊核，在此每个核的权重可以平均分配；也可以根据内容相似度进行分配，相似度高的权重高，相似度小的，权重小；也可以根据视频帧的质量进行权重分配，质量越高权重越高。最后，利用上述模糊核对拍摄图像进行去模糊。也可以采用深度学习方法利用这些图像对学习一个去模糊模型，对图像进行去模糊。After getting the video key frame, first unify the video key frame and the image scale, zoom the video key frame or the image, or zoom the video key frame and the image at the same time, so that the image and the video frame have the same size. size of. Use the method of stereo matching to calculate the disparity between the key frame and the image, align the video frame and the image, find the corresponding area between the video frame and the image, and create a blurred image by combining each frame of video, that is, the non-blurred image, with the corresponding area of the blurred image. To the correspondence between clear images, based on each pair of corresponding regions, multiple blur kernels can be solved using the existing blur kernel estimation methods in the literature. A final fuzzy kernel is obtained by weighted average of all the obtained fuzzy kernels, where the weight of each kernel can be distributed evenly; it can also be distributed according to the similarity of the content, the weight of high similarity is high, the weight of small similarity is high Small; the weight can also be allocated according to the quality of the video frame, the higher the quality, the higher the weight. Finally, the captured image is deblurred using the above blurring check. Deep learning methods can also be used to learn a deblurring model using these image pairs to deblur the image.

除了利用图像与视频帧的内容关系计算模糊核之外，也可以利用现在算法基于连续的视频帧来估计运动目标或者终端设备自身的运动轨迹，来对模糊核进行修正，提高模糊核的精度，达到更好的去模糊效果。In addition to calculating the blur kernel by using the content relationship between the image and the video frame, the current algorithm can also be used to estimate the motion trajectory of the moving target or the terminal device itself based on continuous video frames to correct the blur kernel and improve the accuracy of the blur kernel. achieve better deblurring effect.

步骤4：图像的存储；Step 4: Storage of images;

在本实施例中终端设备生成四类数据：原始图像信息、视频片段，去模糊模型和去模糊后的高质量图像。其存储方式包括而不限于以下五种中的一种。In this embodiment, the terminal device generates four types of data: original image information, video clips, deblurred models and deblurred high-quality images. Its storage method includes but is not limited to one of the following five.

第一种是直接将增强后的图像存储，不保存视频片段，即在存储前完成增强处理，在存储时，照片框中显示原始图像，上面显示缓冲图标，表示正在进行增强处理，缓冲完成后，完成存储；The first is to store the enhanced image directly without saving the video clip, that is, to complete the enhancement process before storage. During storage, the original image is displayed in the photo frame, and a buffer icon is displayed on it, indicating that the enhancement process is in progress. , complete the storage;

第二种是存储原始图像以及学习到的去模糊模型；The second is to store the original image and the learned deblurring model;

第三种是存储原始图像以及选择的视频关键帧，即用于学习去模糊模型的视频帧，这种方式降低了存储前图像的处理时间，只需找出视频关键帧，计算去模糊模型可以在存储完成后，终端设备自动根据终端处理器的忙闲来自己安排处理时间，计算完去模糊模型或者直接将图像去模糊后，即可以删除存储的视频关键帧；The third is to store the original image and the selected video key frames, that is, the video frames used to learn the deblurring model. This method reduces the processing time of the image before storage. It only needs to find out the video key frames and calculate the deblurring model. After the storage is completed, the terminal device automatically arranges the processing time according to the busyness of the terminal processor, and after calculating the deblurring model or directly deblurring the image, the stored video key frames can be deleted;

第四种是直接存储原始图像以及保存的视频片段，所以增强步骤都是终端设备在存储之后执行；The fourth is to directly store the original image and the saved video clip, so the enhancement steps are performed by the terminal device after storage;

第五种是在存储增强图像的同时也将视频关键帧保存，增强图像的得到方式可以是前四种存储方式的任意一种得到，保存的关键帧序列播放可以看到一张动态图的效果。The fifth is to save the video key frames while storing the enhanced image. The enhanced image can be obtained by any of the first four storage methods. The saved key frame sequence can be played to see the effect of a dynamic image. .

针对如何设置存储方式，本发明给出三种设置方式，终端设备可以根据以下三种中的一种来对存储方式进行选择。第一种是终端设备默认设置；第二种是终端设备接收用户通过语音、按键或外部控制器等方式以及这些方式的组合来更改存储方式；第三种是终端设备通过存储空间、电量或历史数据自适应设置存储方式。Regarding how to set the storage mode, the present invention provides three setting modes, and the terminal device can select the storage mode according to one of the following three. The first is the default setting of the terminal device; the second is that the terminal device accepts the user to change the storage method by means of voice, keys or external controllers, and a combination of these methods; the third is that the terminal device uses storage space, power or history Data adaptive settings storage method.

关于系统默认设置，终端设备设置五种存储方式中的一种作为默认值，在终端设备没有接受到更改存储方式的指令前都使用该默认存储方式对视频和图像进行存储。Regarding the system default settings, the terminal device sets one of the five storage modes as the default value, and uses this default storage mode to store videos and images before the terminal device receives an instruction to change the storage mode.

关于用户交互下的语音设置，例如用户预先设定语音指令“储存增强后的图像”，如果接受到该指令，则对声控指令进行语音识别，确定设置存储方式为第一种存储方式，即存储增强后的图像。其他用户交互方法与实施一中步骤4中采用相同的设置方法，在此不再赘述。Regarding the voice settings under user interaction, for example, the user presets the voice command "store enhanced image". If the command is received, the voice control command will be recognized by voice, and the storage mode of the setting is determined as the first storage mode, that is, storage Enhanced image. Other user interaction methods are the same as those used in step 4 in Implementation 1, and will not be repeated here.

关于根据存储空间的自适应设置，根据存储空间，可以选择不同的存储方式，如果剩余存储空间小于某一阈值，例如低于终端设备存储空间的10％，则设置为第一种存储方式；如果剩余存储空间小于某一阈值，例如低于终端设备存储空间的40％，则可以设置为前三种和第五种中的一种；如果剩余存储空间高于某一阈值，例如高于终端设备存储空间的50％，则存储方式不受存储空间影响。Regarding the adaptive setting according to the storage space, different storage methods can be selected according to the storage space. If the remaining storage space is less than a certain threshold, for example, less than 10% of the storage space of the terminal device, the first storage method is set; if If the remaining storage space is less than a certain threshold, for example, it is lower than 40% of the storage space of the terminal device, it can be set to one of the first three and the fifth type; if the remaining storage space is higher than a certain threshold, such as higher than the terminal device storage space 50% of the storage space, the storage method is not affected by the storage space.

关于根据电量的自适应设置，可以根据电量控制存储方式，当电量小于第一预定电量时，例如低于50％时，则选择耗电量小的存储方式，即第二或者第三种存储方式，即直接存储原始图像和视频关键帧或者学习模型，不对图像进行增强处理；当电量小于第二预定电量时，第二预定电量小于第一预定电量，例如低于15％时，则选择耗电量最小的第四种存储方式，即原始图像和视频片段；如果电量高于某一阈值，例如高于50％，则存储方式不受电量影响。Regarding the adaptive setting according to the power, the storage mode can be controlled according to the power. When the power is less than the first predetermined power, for example, less than 50%, the storage mode with low power consumption is selected, that is, the second or third storage mode , that is, directly store the original image and video key frames or learn the model, without performing enhancement processing on the image; when the power is less than the second predetermined power, the second predetermined power is less than the first predetermined power, for example, when it is lower than 15%, select power consumption The fourth storage method with the smallest amount is the original image and video clip; if the battery level is higher than a certain threshold, such as higher than 50%, the storage method is not affected by the battery level.

关于根据存储方式历史记录数据的自适应设置，例如根据用户以往设置的存储方式，来对用户喜好进行分析，设置为用户偏好的存储方式。Regarding the adaptive setting of the historical data according to the storage mode, for example, according to the storage mode previously set by the user, the user's preference is analyzed, and the storage mode is set as the user's preference.

步骤5，图像的显示Step 5, the display of the image

终端设备根据检测到的来自用户的显示操作操作，对存储的图像进行显示。The terminal device displays the stored image according to the detected display operation from the user.

在显示存储的图像时，对应不同的存储方式，有不同的显示方式，本发明给出五种显示模式。终端设备可以选择但不限于以下五种显示方式中一种。When displaying the stored image, there are different display modes corresponding to different storage modes, and the present invention provides five display modes. The terminal device can choose but not limited to one of the following five display modes.

第一种：针对第一种存储方式，终端设备检测到用户的打开操作，直接显示去模糊后的图像。检测到用户点击查看的操作，例如，检测到用户点击查看按钮的操作，即可马上进行图像显示，这样，用户直接可以查看去模糊后的效果。The first: for the first storage method, the terminal device detects the user's opening operation and directly displays the deblurred image. The operation of the user clicking to view is detected, for example, the operation of the user clicking the view button is detected, and the image can be displayed immediately, so that the user can directly view the effect after deblurring.

第二种：针对第二种存储方式，终端设备检测到用户的打开操作，打开原始图像和去模糊模型的组合。检测到用户点击查看的操作，例如，检测到用户点击查看按钮的操作，终端设备开始基于去模糊模型对图像进行去模糊处理。在允许的缓冲时间内完成操作，然后显示增强后的图像。The second type: For the second storage method, the terminal device detects the user's opening operation and opens the combination of the original image and the deblurring model. After detecting the operation of the user clicking to view, for example, detecting the operation of the user clicking the view button, the terminal device starts to deblur the image based on the deblurring model. Complete the operation within the allowed buffer time, and then display the enhanced image.

第三种：针对第三种和第四种存储方式，终端设备检测到用户的打开操作，例如，检测到用户点击查看按钮的操作，如果终端设备已经完成了去模糊处理，则可以直接显示图像查看增强后的效果。如果终端设备在后台只是做了部分工作，没有完成去模糊步骤，则接收到用户点击查看操作后需要时间缓冲来进行图像去模糊处理，完成处理后开始显示图像，用户可以看到去模糊后的图像。The third type: For the third and fourth storage methods, the terminal device detects the user's opening operation, for example, detects the user's operation of clicking the view button. If the terminal device has completed the deblurring process, the image can be displayed directly Check out the enhanced effect. If the terminal device only does part of the work in the background and does not complete the deblurring step, it will take time to buffer the image after receiving the user's click to view operation. After the processing is completed, the image will be displayed and the user can see the deblurred image image.

第四种：针对第五种存储方式，终端设备检测到用户的打开操作，打开去模糊后的图像和视频关键帧的组合。去模糊后的图像的显示方法和第一种显示方法相同，不同的是多了动态图的显示。关键帧序列播放可以呈现一张动态图的效果，可以通过长按照片查看与该图像关联的动态图，但不限于长按这种控制方式，语音，手势等设置都可以来显示动态图。Fourth: For the fifth storage method, the terminal device detects the user's opening operation, and opens the combination of the deblurred image and video key frames. The display method of the deblurred image is the same as the first display method, the difference is that there is more dynamic image display. The key frame sequence playback can present the effect of a dynamic image. You can view the dynamic image associated with the image by long-pressing the photo, but it is not limited to this control method. Voice, gesture and other settings can be used to display the dynamic image.

下面以一个具体实施列详述图像去模糊增强模式。The image deblurring enhancement mode is described in detail below with a specific implementation column.

1)开启交互步骤如图8所示：1) The steps of opening the interaction are shown in Figure 8:

步骤1)：开启终端设备的照相机，进入拍照界面，通过长按屏幕开启视频增强图像模式。Step 1): Turn on the camera of the terminal device, enter the camera interface, and turn on the video enhanced image mode by long pressing the screen.

步骤2)：视频增强图像模式开启后，拍照界面显示两个摄像头的采集画面，用户可以通过切换摄像头查看拍照场景来自由设定哪个摄像头为主摄像头拍摄照片，另外一个摄像头为辅摄像头拍摄视频。Step 2): After the video-enhanced image mode is turned on, the camera interface displays the capture images of the two cameras, and the user can freely set which camera is the main camera to take pictures by switching cameras to view the camera scene, and the other camera is the auxiliary camera to take videos.

步骤3)：选定摄像头后，可以分别对两个摄像头进行采集参数设置，为了拍摄高亮度图像，可以提高曝光时间，降低感光度，提高画面质量。完成参数设置后进入下一个步骤。Step 3): After the camera is selected, the acquisition parameters can be set for the two cameras respectively. In order to shoot high-brightness images, the exposure time can be increased, the sensitivity can be decreased, and the picture quality can be improved. After completing the parameter setting, go to the next step.

步骤4)：结束交互参数设置，进入拍照界面，画面显示的是主摄像头视域内的画面。Step 4): End the interactive parameter setting, enter the photo-taking interface, and the screen displays the screen within the field of view of the main camera.

2)模式增强，存储和播放步骤如下：2) Mode enhancement, storage and playback steps are as follows:

步骤1)：在拍照界面，用户看到有兴趣的画面时，按下拍照按钮进行拍照，通过不同的操作方式可以进行不同的存储播放模式。单击跳转到图9中步骤2)，直接存储增强后的照片；长按跳转到步骤5)，存储原始图像和视频片段；Step 1): In the photographing interface, when the user sees an interesting picture, press the photographing button to take a photograph, and different storage and playback modes can be performed through different operation modes. Click to jump to step 2 in Fig. 9), directly store the photo after enhancement; Long press and jump to step 5), store original image and video clip;

步骤2)：对图像进行增强处理，左上角图像框中显示原始图像，并显示缓冲标志，提示用户图像增强的进度，增强完成，缓冲图标消失，图像框中显示增强后的图像。Step 2): The image is enhanced, the original image is displayed in the upper left image box, and a buffer mark is displayed to remind the user of the progress of image enhancement, the enhancement is completed, the buffer icon disappears, and the enhanced image is displayed in the image box.

步骤3)：拍摄的照片增强完成，且存储在终端设备中，在拍摄下一张之前，左上角图像框显示最近时间内拍摄的照片，可以点击该图像框对增强后的图像进行查看。Step 3): the enhancement of the captured photo is completed and stored in the terminal device. Before the next photo is captured, the upper left image frame displays the photo captured in the most recent time, and the enhanced image can be viewed by clicking on the image frame.

步骤4)：点击图像框后，显示增强后的图像。Step 4): After clicking the image box, the enhanced image is displayed.

步骤5)：直接存储原始图像和视频片段，左上角图像框中显示原始图像。后台会根据处理器使用情况来对图像增强进行选择性处理，如果处理器有处理能力则对图像进行增强处理。单击左上角图像框查看图像。终端设备接到单击指令后，首先判断图像增强是否完成，如果后台完成了增强步骤，则跳转到步骤6)，如果后台未完成增强步骤，则跳转到步骤7)。Step 5): Directly store the original image and video clip, and display the original image in the upper left image box. The background will selectively process the image enhancement according to the processor usage, and if the processor has the processing capacity, the image will be enhanced. Click on the upper left image box to view the image. After receiving the click instruction, the terminal device firstly judges whether the image enhancement is completed. If the enhancement step is completed in the background, it will jump to step 6). If the enhancement step has not been completed in the background, it will jump to step 7).

步骤6)：图像增强已经完成，显示增强后的图像。Step 6): Image enhancement has been completed, and the enhanced image is displayed.

步骤7)：图像增强未完成，终端设备继续对图像进行增强，背景图像可以显示原始图像，同时出现缓冲标志提示增强进度。增强完成后，缓冲图标自动消失，跳转到步骤8)。Step 7): the image enhancement is not completed, the terminal device continues to enhance the image, the background image can display the original image, and a buffer mark appears to indicate the enhancement progress. After the enhancement is completed, the buffer icon disappears automatically, jump to step 8).

步骤8)：显示增强后的图像。Step 8): Display the enhanced image.

实施例十、多焦点区域联合播放模式Embodiment 10. Multi-focus area joint play mode

本实施例的目的是帮助用户拍摄具有多个焦点的视频。多个焦点的视频可以是全局区域与用户感兴趣的某个局部区域，也可以是两个感兴趣的局部区域。例如，在拍摄家人或者朋友进行舞蹈演出时，希望既可以对全局景象进行拍摄，又希望对自己的家人或者朋友进行特写拍摄，或者希望同时对多个朋友进行特写拍摄，目前的视频拍摄方式不仅需要用户频繁地进行手动放大及缩小的操作过程，这样极易造成拍摄的视频出现模糊或者抖动，而且在同一时间也仅能看到全局或者看到某个放大的局部区域，无法同时拍摄全局图像和清晰的某个局部图像，或者无法同时拍摄多个感兴趣的局部。本发明中，利用两个摄像头分别设置不同的对焦区域，其中一个摄像头对焦全局，另一个摄像头对焦局部感兴趣区域或者一个摄像头对焦某个局部感兴趣区域，另一个摄像头对焦另一个局部感兴趣区域，然后两个摄像头同时进行拍摄，得到一个兼具全局和局部感兴趣区域的多焦点视频，或者两个清晰的局部区域视频。The purpose of this embodiment is to help the user to shoot a video with multiple focal points. The video with multiple focus can be the global area and a local area of interest to the user, or it can be two local areas of interest. For example, when shooting a dance performance of family members or friends, it is hoped that both the overall scene and the close-up of one's own family or friends, or multiple friends can be captured at the same time. The current video shooting method not only Users are required to perform manual zoom-in and zoom-out operations frequently, which can easily cause blur or jitter in the captured video, and at the same time, only the whole world or an enlarged local area can be seen, and the global image cannot be captured at the same time. and clear images of a certain part, or unable to capture multiple parts of interest at the same time. In the present invention, two cameras are used to set different focus areas, wherein one camera focuses on the global area, the other camera focuses on a local area of interest, or one camera focuses on a local area of interest, and the other camera focuses on another local area of interest , and then two cameras shoot at the same time to obtain a multi-focus video with both global and local regions of interest, or two clear local region videos.

需要说明的是，下述实施例的详述中，将根据第一类多媒体信息对第二类多媒体信息进行相应处理的方式，具体为采集到的对焦于一个焦点区域的视频信息对采集到的对焦于另一个焦点区域的视频信息进行联合播放处理的多焦点区域联合播放模式。It should be noted that, in the detailed description of the following embodiments, the corresponding processing method for the second type of multimedia information will be performed according to the first type of multimedia information. A multi-focus area co-play mode that performs co-play processing by focusing on video information in another focus area.

在本实施例中，第一类多媒体信息与第二类多媒体信息为对焦于不同焦点区域的视频信息；其中，焦点区域包括全局区域和/或局部区域。In this embodiment, the first type of multimedia information and the second type of multimedia information are video information focused on different focus areas; wherein the focus areas include a global area and/or a local area.

具体地，根据采集到的对焦于一个焦点区域的视频信息，对采集到的对焦于另一个焦点区域的视频信息进行联合播放处理。Specifically, according to the collected video information focused on one focus area, joint playback processing is performed on the collected video information focused on another focus area.

其中，全局区域和/或局部区域可通过分屏的布局方式进行联合播放。Wherein, the global area and/or the local area may be played jointly through a split-screen layout.

(1)开启多焦点区域联合播放模式(1) Turn on the multi-focus area joint play mode

包括两种方式，一种是用户主动开启多焦点视频，另一种是终端设备根据拍摄的视频内容，提示用户是否需要开启多焦点视频拍摄。与前文所述类似，用户可以通过语音，按键，手势，生物特征，外部控制器等以及这些交互方式的任意组合方式启动多焦点视频拍摄模式。There are two ways, one is that the user actively starts the multi-focus video, and the other is that the terminal device prompts the user whether to start the multi-focus video shooting according to the content of the captured video. Similar to the previous description, the user can activate the multi-focus video capture mode through voice, keystrokes, gestures, biometrics, external controls, etc., and any combination of these interactions.

关于使用语音开启，例如用户可以预先设定语音开启指令：“开启多焦点视频拍摄”，如果终端设备接收到用户发出的声控指令“开启多焦点视频拍摄”，则对该声控指令进行内容识别，从而确定此时需要开启多焦点视频拍摄。Regarding the use of voice activation, for example, the user can preset the voice activation command: "Enable multi-focus video shooting", if the terminal device receives the user's voice control command "Enable multi-focus video shooting", it will perform content recognition for the voice-activated command. Therefore, it is determined that multi-focus video shooting needs to be enabled at this time.

关于按键开启，按键可以为硬件按键，例如音量键或Home键，用户通过长按音量键或Home键开启多焦点拍摄模式，终端接收到用户的上述长按操作事件后，确认此时需要切换至多焦点视频拍摄模式。按键也可以为虚拟按键，例如屏幕上的虚拟控件按钮，菜单等，终端可以在视频拍摄的预览界面显示多焦点拍摄虚拟按键，接收到用户点击该虚拟按键的事件后，确认此时需要切换至多焦点视频拍摄界面。通过按键开启时，还可以结合用户按的压力、速度、时间、频率等多种特征信息表示不同的含义，如轻按表示更换对焦点目标人物，重按表示修改对焦人物放大倍数，又如长按表示开启多个对焦点目标人物拍摄模式等等。Regarding key-on, the key can be a hardware key, such as the volume key or the Home key. The user turns on the multi-focus shooting mode by long pressing the volume key or the Home key. After the terminal receives the above-mentioned long-pressing operation event from the user, it confirms that it needs to switch to up to Focus video shooting mode. The buttons can also be virtual buttons, such as virtual control buttons on the screen, menus, etc. The terminal can display the multi-focus shooting virtual buttons on the preview interface of video shooting. Focus video capture interface. When the button is turned on, it can also indicate different meanings in combination with the pressure, speed, time, frequency and other characteristic information pressed by the user. For example, pressing lightly means changing the target person of the focus point, pressing again means modifying the magnification of the person in focus, or long Press the indication to turn on the shooting mode of multiple focus points, target people, etc.

关于通过手势开启，手势包括屏幕手势，例如双击屏幕/长按屏幕等，通过屏幕手势开启时可以结合用户手势的压力、速度、时间、频率的不同表示不同含义，如轻按表示更换对焦点目标人物，重按表示修改对焦人物放大倍数，又如长按表示开启多个对焦点目标人物拍摄模式等等。手势还包括隔空手势，如摇晃终端/翻转终端/倾斜终端，摇晃/翻转/倾斜时的不同方向，角度，速度，力度可以表示不同的含义，如上下摇晃表示更换对焦点目标人物，左右摇晃表示更改拍摄时参数，又如向左倾斜表示切换展示方式，向右倾斜表示切换存储方式。上述手势可以是单一的手势，也可以是任意手势的任意组合。如长按屏幕并摇晃终端表示开启多焦点视频拍摄并可以实时更换对焦点目标人物进行拍摄。Regarding opening through gestures, gestures include screen gestures, such as double-clicking the screen/long-pressing the screen, etc. When opening through screen gestures, different meanings can be expressed in combination with the pressure, speed, time, and frequency of user gestures, such as a tap to change the focus target People, press again to modify the magnification of the focus person, and for example, long press to open the shooting mode of multiple focus points and target people, and so on. Gestures also include space gestures, such as shaking the terminal/flipping the terminal/tilting the terminal, different directions, angles, speeds and strengths of shaking/flipping/tilting can represent different meanings, such as shaking up and down to change the focus target person, shaking left and right It means to change the parameters during shooting, and for example, tilting to the left means switching the display mode, and tilting to the right means switching the storage mode. The above gestures can be a single gesture or any combination of any gestures. If you press and hold the screen and shake the terminal, it means that multi-focus video shooting is enabled and you can change the focus point in real time to shoot the target person.

关于通过生物特征开启，包括但不限于手写特征、指纹特征、声纹特征。例如，终端在视频拍摄预览界面时，若指纹检测器或者声纹检测器检测到的指纹或者声纹与预先注册的用户一致，则此时需要切换至提示开启多焦点视频拍摄模式。Regarding opening by biometric features, including but not limited to handwriting features, fingerprint features, and voiceprint features. For example, when the terminal is in the video shooting preview interface, if the fingerprint or voiceprint detected by the fingerprint detector or the voiceprint detector is consistent with the pre-registered user, it needs to switch to the prompt to turn on the multi-focus video shooting mode.

关于通过外部控制器开启，外部控制器可以是与终端设备相关联的手写笔，话筒等设备，例如，终端设备检测到手写笔被取出又被快速插回终端，或者手写笔的预设按键被按下，或者用户使用手写笔做出预设的空中手势，则确认此时需要切换至多焦点视频拍摄模式；外部控制器可以是智能手表，智能眼镜等，其他设备可以为手机或其他配件或附件或独立的设备，上述可穿戴设备可以通过wifi和/或NFC和/或蓝牙和/或数据网络访问终端设备，上述可穿戴设备或其他设备通过按键，手势，生物特征中的至少一种交互方式确认用户想要切换至多焦点视频拍摄模式，并通知上述终端设备。Regarding opening through an external controller, the external controller may be a stylus, a microphone and other devices associated with the terminal device. For example, the terminal device detects that the stylus is taken out and quickly inserted back into the terminal, or the preset button of the stylus is reset. Press, or the user uses a stylus to make a preset air gesture, confirm that it is necessary to switch to the multi-focus video shooting mode at this time; the external controller can be a smart watch, smart glasses, etc., and other devices can be mobile phones or other accessories or accessories Or an independent device, the above-mentioned wearable device can access the terminal device through wifi and/or NFC and/or Bluetooth and/or data network, and the above-mentioned wearable device or other devices can interact through at least one of buttons, gestures, and biometrics. Confirm that the user wants to switch to the multi-focus video shooting mode, and notify the above terminal device.

(2)确定多焦点区域(2) Determine the multi-focal area

当终端设备开启多焦点视频拍摄模式后，在拍摄预览界面，用户可以手动指定多个区域，如果用户仅指定了一个区域，则此时的多焦点区域为拍摄预览界面捕获到的整个图像和指定的这个区域。When the multi-focus video shooting mode is enabled on the terminal device, the user can manually specify multiple areas on the shooting preview interface. If the user only specifies one area, the multi-focus area at this time is the entire image captured on the shooting preview interface and the specified area. of this area.

如果用户指定了两个以上区域，此时的多焦点区域视频在拍摄时，会交替对准某个指定的区域，从而得到用户指定的多个感兴趣区域组成的视频。If the user specifies more than two areas, the multi-focus area video will be alternately aimed at a specified area when shooting, so as to obtain a video composed of multiple areas of interest specified by the user.

除了上述用户手动指定的方式外，终端设备可以根据当前拍摄场景中的对象(如人物)自动确定多焦点区域。例如，检测场景中人物数量，将人物数量最多的区域作为一个感兴趣区域，整个场景作为全局区域。In addition to the above manual designation by the user, the terminal device can automatically determine the multi-focus area according to the object (such as a person) in the current shooting scene. For example, to detect the number of people in the scene, the area with the largest number of people is regarded as a region of interest, and the entire scene is regarded as a global area.

也可以由用户选定焦点对象。将用户选定的焦点对象所在区域确认为焦点区域。当用户开启多焦点视频拍摄模式时，关于对焦点人物的选择可以有多种方式。例如，当进入多焦点视频拍摄的预览模式时，自动进行全局区域内的人脸检测，用户通过单击或者拖动检测到的人脸区域的方式确定拍摄的对焦点区域。在进入拍摄模式时，运用人脸跟踪及识别技术，可以实时跟踪到用户感兴趣的对焦人物，使用双目摄像头的其中之一拍摄全局区域视频，另一个摄像头拍摄跟踪到的用户感兴趣的对焦人物视频。当用户希望更换感兴趣的目标人物时，可以通过双击屏幕，此时即可启动全局区域视频中的人脸检测，用户可以选择检测到的人脸中的某个人物，或者手动指定感兴趣的区域。同时也可以动态地对感兴趣区域的包围框进行修正。例如将人脸区域扩大为脸部及肩部区域，或者人物的上半身区域，或者人物的整个身体区域。The focus object can also be selected by the user. Confirm the area where the focus object selected by the user is located as the focus area. When the user turns on the multi-focus video shooting mode, there are many ways to select the focus point person. For example, when entering the preview mode of multi-focus video shooting, face detection in the global area is automatically performed, and the user determines the focus point area for shooting by clicking or dragging the detected face area. When entering the shooting mode, the face tracking and recognition technology can be used to track the focused person of interest to the user in real time, use one of the binocular cameras to shoot the global area video, and the other camera to shoot the tracked focus of the user's interest. People video. When the user wants to change the target person of interest, he can double-click the screen to start the face detection in the global area video. The user can select a certain person in the detected face, or manually specify the interested person. area. At the same time, the bounding box of the region of interest can be modified dynamically. For example, the face area is enlarged to the face and shoulder area, or the upper body area of the character, or the entire body area of the character.

(3)用户交互方式(3) User Interaction Mode

根据用户拍摄多焦点视频时，手持终端设备的横屏或者竖屏状态提供不同的交互方式。通过重力传感器检测到终端设备如果处于横屏状态拍摄视频时，使用如图10所示的左右屏显示的视频布局方式。左侧为“全局区域视频”，右侧为“局部对焦感兴趣目标人物视频”，或者左侧为“局部对焦感兴趣目标人物视频”，右侧为“全局区域视频”均可。同时根据对焦点人物的数量不同，左侧或者右侧的“局部对焦感兴趣目标人物视频”区域可以显示多个目标人物。如果检测到当前终端设备处于竖屏状态拍摄视频时，使用如图11所示的上下屏显示的视频布局方式。上方为“全局区域视频”，下方为“局部对焦感兴趣目标人物视频”，或者上方为“局部对焦感兴趣目标人物视频”，下方为 “全局区域视频”均可。同时根据对焦点人物的数量不同，上方或者下方的 “局部对焦感兴趣目标人物视频”区域可以显示多个目标人物。当用户更希望关注全局区域，并且希望将自己感兴趣的目标人物存储下来用于后续播放时，可以选择如图12所示的大小屏播放模式。全局视频几乎占据整个屏幕，局部对焦的感兴趣目标人物视频处于屏幕中的一小块位置，可以是右下角、左下角、右上角或左上角。或者可以由用户指定摆放的位置。Different interaction methods are provided according to the horizontal or vertical screen state of the handheld terminal device when the user shoots multi-focus video. When it is detected by the gravity sensor that the terminal device is in the horizontal screen state to shoot video, the video layout method shown in Figure 10 is used for the left and right screen display. The left side is "global area video", the right side is "local focus target person video", or the left side is "local focus target person video", and the right side is "global area video". At the same time, depending on the number of people in focus, multiple target people can be displayed in the "Partially Focused Interested People Video" area on the left or right. If it is detected that the current terminal device is in the vertical screen state to shoot video, use the video layout method shown in Figure 11. The upper part is "global area video", the lower part is "local focus target person video", or the upper part is "local focus target person video", and the lower part is "global area video". At the same time, according to the number of people in focus, multiple target people can be displayed in the upper or lower "Partially Focused Interested People Video" area. When the user wants to pay more attention to the global area, and wants to store the target person he is interested in for subsequent playback, he or she can select the large and small screen playback mode as shown in Figure 12. The global video occupies almost the entire screen, and the partially focused video of the target person of interest is located in a small position on the screen, which may be the lower right corner, the lower left corner, the upper right corner or the upper left corner. Alternatively, the placement position can be specified by the user.

(4)多焦点视频存储(4) Multi-focus video storage

1)采集到的对焦于不同焦点区域的两个视频信息；1) The collected two video information focused on different focus areas;

2)根据采集到的对焦于一个焦点区域的视频信息，对采集到的对焦于另一个焦点区域的视频信息进行合成处理后的合成视频信息；2) according to the video information that is focused on a focal area collected, the synthetic video information after the synthetic processing is carried out to the video information that is focused on another focal area collected;

3)确定出的对焦于不同焦点区域的两个视频信息中的感兴趣视频内容；3) the video content of interest in the two video messages determined to focus on different focus areas;

4)采集到的对焦于全局区域的视频信息以及该全局区域的视频信息中局部区域的位置信息。4) The collected video information focused on the global area and the location information of the local area in the video information of the global area.

基于多焦点拍摄方式，本实施列给出以下四种存储方式：Based on the multi-focus shooting mode, the following four storage modes are given in this implementation column:

方式一：将两个摄像头拍摄的多焦点视频分别存储下来，得到两个视频文件。如果两个摄像头一个是全局对焦，一个是局部对焦，则存储得到的两个视频中一个对应全局对焦视频，一个对应局部对焦视频；如果两个摄像头都是局部对焦，则两个视频分别对应两个摄像头得到的局部对焦视频。Method 1: Store the multi-focus videos captured by the two cameras separately to obtain two video files. If one of the two cameras is in global focus and the other is in local focus, one of the two stored videos corresponds to the global focus video and the other corresponds to the local focus video; if both cameras are partially focused, the two videos correspond to the two videos respectively. Partial focus video obtained by each camera.

方式二：该存储方式提供了一种可见即可得的视频合成存储方法，存储内容与终端设备显示屏幕呈现的画面相同，视频中的每帧画面都同时将两个摄像头的拍摄画面同时呈现，例如，图10-12中给出的三种屏幕呈现方式，对于图12所示的大小屏方式，存储得到的视频中每帧都是一个大小屏的画面，该画面内容与对应时刻屏幕中呈现的内容，画面中大屏和小屏分别对应两个摄像头拍摄的内容。Method 2: This storage method provides a video synthesis storage method that can be seen and available. The storage content is the same as the picture displayed on the display screen of the terminal device, and each frame of the video in the video simultaneously presents the shooting pictures of the two cameras at the same time. For example, for the three screen presentation modes shown in Figure 10-12, for the large and small screen mode shown in Figure 12, each frame in the stored video is a screen of a large and small screen, and the content of the screen is displayed on the screen at the corresponding moment. The large screen and the small screen in the picture correspond to the content captured by the two cameras respectively.

方式三：该存储方式提供了一种用户兴趣驱动的视频合并存储方法，该方式针对屏幕划分为主屏幕和副屏幕的呈现方式，如图3所示的大小屏，大屏为主屏幕，主屏幕中显示的内容表示用户当前感兴趣的视域，存储的最终视频为主屏幕中显示的内容，视频中的每帧都是直接呈现用户该时刻感兴趣的区域。Mode 3: This storage mode provides a method of combining and storing videos driven by user interests. This mode is aimed at the presentation mode of the screen divided into the main screen and the sub-screen. As shown in Figure 3, the large screen is the main screen and the main screen The content displayed on the screen represents the field of view that the user is currently interested in, and the final stored video is the content displayed on the main screen. Each frame in the video directly presents the area of interest to the user at that moment.

方式四：该存储方法是针对拍摄的全局对焦和局部对焦的拍摄方式，存储时可以存储全局拍摄的视频，再加上实时跟踪得到的局部区域在全局中的包围框上的四个点位置信息确定的局部感兴趣区域物体。如图13所示，在全局区域视频中，将全局区域视频保存，并将全局区域中实时跟踪到的矩形框，例如图13中的黄色区域的四个角点，位置保存下来。并以此矩形框大小作为标准，保存另一个摄像头拍摄到的局部对焦区域的内容。Method 4: This storage method is a shooting method for the global focus and local focus of the shooting. The global shooting video can be stored during storage, plus the location information of the four points on the bounding box of the local area obtained by real-time tracking. Identified local ROI objects. As shown in Figure 13, in the global area video, the global area video is saved, and the positions of the rectangular boxes tracked in real time in the global area, such as the four corners of the yellow area in Figure 13, are saved. And use this rectangular frame size as the standard to save the content of the local focus area captured by another camera.

针对如何设置存储方式，本发明给出三种设置方法，终端设备可以根据以下三种中的一种来对存储方式进行选择。第一种是终端设备的系统默认设置；第二种是终端设备接受用户通过语音、按键或外部控制器等触发操作的方式以及这些方式的组合来更改存储方式；第三种是终端设备通过设备相关信息，如存储空间，或历史记录数据自适应设置存储方式。Regarding how to set the storage mode, the present invention provides three setting methods, and the terminal device can select the storage mode according to one of the following three methods. The first is the system default setting of the terminal device; the second is that the terminal device accepts the user’s way of triggering operations through voice, keys or external controllers, and the combination of these methods to change the storage mode; the third is that the terminal device changes the storage mode through the device Relevant information, such as storage space, or adaptively set the storage method for historical data.

关于系统默认设置，终端设备设置四种存储方式中的一种作为默认值，在终端设备没有接收到更改存储方式的指令前都使用该默认存储方式对视频进行存储。Regarding the system default settings, the terminal device sets one of the four storage modes as the default value, and uses the default storage mode to store the video before the terminal device receives an instruction to change the storage mode.

关于用户交互的设置方式，采用实施例一中步骤4类似的设置方法，区别在于指令描述内容，例如，语音设置中的指令为“视频分别存储”，如果终端设备接收到该指令，则对声控指令进行语音识别，确定设置存储方式为第一种存储方式。其他用户交互的设置方式相同，在此不再赘述。Regarding the setting method of user interaction, the setting method similar to step 4 in the first embodiment is adopted, and the difference lies in the description content of the instruction. For example, the instruction in the voice setting is "store video separately". The command performs voice recognition, and determines that the setting storage method is the first storage method. The settings for other user interactions are the same, and will not be repeated here.

关于根据存储空间的自适应设置，根据存储空间，可以选择不同的存储方式，如果剩余存储空间小于某一阈值，例如低于终端设备存储空间的50％，则设置为后三种存储方式中的一种；反之，如果剩余存储空间高于某一阈值，例如高于终端设备存储空间的50％，则存储方式不受存储空间影响。Regarding the adaptive setting according to the storage space, different storage methods can be selected according to the storage space. If the remaining storage space is less than a certain threshold, for example, less than 50% of the storage space of the terminal device, it is set to one of the last three storage methods. One; on the contrary, if the remaining storage space is higher than a certain threshold, for example, higher than 50% of the storage space of the terminal device, the storage mode is not affected by the storage space.

关于根据历史记录数据的自适应设置，例如根据用户以往设置的存储方式，来对用户喜好进行分析，设置为用户偏好的存储方式。Regarding the adaptive setting based on the historical record data, for example, according to the storage method previously set by the user, the user's preference is analyzed, and the storage method is set as the user's preference.

(5)多焦点视频播放方式(5) Multi-focus video playback mode

优选地，响应于接收到的播放触发操作，基于与存储内容相匹配的播放方式对视频信息进行播放；其中，播放方式包括以下至少一项：Preferably, in response to the received play trigger operation, the video information is played based on a play mode that matches the stored content; wherein, the play mode includes at least one of the following:

1)当存储采集到的对焦于不同焦点区域的两个视频信息时，将两个视频信息分别单独播放或联合播放；1) when storing the collected two video messages focusing on different focus areas, the two video messages are played separately or jointly;

2)当存储合成视频信息时，播放合成视频；2) when the composite video information is stored, the composite video is played;

3)当存储确定出的对焦于不同焦点区域的两个视频信息中的感兴趣视频内容时，播放感兴趣视频内容；3) when storing determined focusing on the video content of interest in the two video information of different focus areas, play the video content of interest;

4)当存储全局区域的视频信息以及该全局区域的视频信息中局部区域的位置信息时，通过位置信息确定局部区域的视频信息，并将全局区域的视频信息和局部区域的视频信息分别单独播放或联合播放。4) when storing the video information of the global area and the location information of the local area in the video information of the global area, determine the video information of the local area by the location information, and play the video information of the global area and the video information of the local area separately or syndicated play.

与上述的存储方式相对应，本发明给出四种播放方式，终端设备可以选择但不限于以下四种中的一种：Corresponding to the above-mentioned storage mode, the present invention provides four playback modes, and the terminal equipment can select but not be limited to one of the following four kinds:

第一种：终端设备检测到用户的打开操作，针对分别存储的两个独立的视频。检测到用户点击播放的操作，例如，检测到用户点击播放按钮的操作，则可以针对两个摄像头拍摄得到的视频分别进行播放。这两个保存的视频以一定的时间关联存储在终端设备的存储器上。当用户选择播放拍摄的视频时，可以对这两个独立的视频分别全屏展示播放，也可以由终端设备自适应地将两个视频画面关联起来同时播放，可以参照“交互方式”中介绍的“上下屏”、 “左右屏”和“大小屏”三种方式对两个视频画面进行播放。The first type: The terminal device detects the user's opening operation for two independent videos stored separately. When the operation of the user clicking to play is detected, for example, the operation of the user clicking the play button is detected, the videos captured by the two cameras can be played respectively. The two saved videos are stored in the memory of the terminal device in association with a certain time. When the user chooses to play the captured video, the two independent videos can be displayed and played in full screen respectively, or the terminal device can adaptively associate the two video images to play at the same time. Play the two video images in three ways: up and down, left and right, and large and small.

第二种：终端设备检测到用户的打开操作，针对存储的合成视频。检测到用户点击播放的操作，例如，检测到用户点击播放按钮的操作，则对合成的视频进行播放。用户可以看到两个摄像头拍摄的多焦点视频。The second type: the terminal device detects the user's opening operation, and targets the stored composite video. The operation that the user clicks to play is detected, for example, the operation that the user clicks the play button is detected, and the synthesized video is played. Users can see multi-focus video captured by both cameras.

第三种：终端设备检测到用户的打开操作，针对存储的合并视频。检测到用户点击播放的操作，例如，检测到用户点击播放按钮的操作，则对由双摄像头视频片段合并的视频进行播放。用户可以看到视频录制时主屏幕中呈现的感兴趣画面的多焦点视频。The third type: the terminal device detects the user's opening operation, and targets the stored merged video. The operation of the user clicking to play is detected, for example, the operation of the user clicking the play button is detected, and the video combined by the video clips of the dual cameras is played. The user can see a multi-focus video of the frame of interest presented in the main screen while the video was being recorded.

第四种：终端设备检测到用户的打开操作，针对存储的全局区域视频和感兴趣区域矩形坐标的组合。检测到用户点击播放的操作，例如，检测到用户点击播放按钮的操作，用户可以单独播放全局区域视频和四个坐标点对应的大小的局部区域视频；或者可以将两个视频关联播放。关联播放的方式可以分为两种：一种是参照“上下屏”、“左右屏”、“大小屏”三种形式将两个视频画面同时播放，在“大小屏”这种形式播放时，大屏显示的是全局区域视频或者局部对焦区域，小屏显示的是局部对焦感兴趣区域或者全局区域视频，用户可以通过单击大图或者小图的方式来切换这两个块屏幕的显示内容。并且，小屏在大屏上处于的位置也可以是由用户指定摆放的，如图14所示。当用户没有指定小屏摆放的位置时，终端设备可以自动将小屏放置在屏幕四个角的任意一个位置。当用户希望改变小屏的位置时，终端设备通过检测用户的手势或者操作来决定如何摆放。The fourth type: the terminal device detects the user's opening operation, which is for the combination of the stored global area video and the rectangular coordinates of the area of interest. Detecting the user's click to play operation, for example, detecting the user's click of the play button, the user can play the global area video and the local area video of the size corresponding to the four coordinate points alone; or you can associate the two videos to play. There are two ways of associated playback: one is to play two video images at the same time by referring to the three forms of "up and down screen", "left and right screen" and "large and small screen". The large screen displays the global area video or the partial focus area, and the small screen displays the partial focus area of interest or the global area video. Users can switch the display content of these two screens by clicking the large image or the small image. . In addition, the position of the small screen on the large screen may also be designated by the user, as shown in FIG. 14 . When the user does not specify the position of the small screen, the terminal device can automatically place the small screen at any position in the four corners of the screen. When the user wants to change the position of the small screen, the terminal device determines how to place it by detecting the user's gesture or operation.

下面以一个实施例详细介绍多焦点区域联合播放模式的具体实现方案。The specific implementation scheme of the multi-focus area joint play mode is described in detail below with an embodiment.

步骤1.开启交互步骤：Step 1. Open interactive steps:

1).开启终端设备的照相机，进入视频拍摄预览界面，通过双击屏幕开启多焦点视频模式。1). Turn on the camera of the terminal device, enter the video shooting preview interface, and turn on the multi-focus video mode by double-clicking the screen.

2).多焦点视频模式开启后，如图15所示，预览界面显示两个摄像头采集到的画面，其中占据全屏的是全局区域画面，在全局区域画面当中，显示出场景中所有的人脸区域。用户单击感兴趣的人脸区域，并交互地拉动检测框，可以框住整个感兴趣区域。此时屏幕中，一部分为全局区域画面，另一部分为局部感兴趣区域画面。这两部分画面的布局如前所述，可以是左右屏，上下屏或者大小屏，这里以大小屏为例展示。2). After the multi-focus video mode is turned on, as shown in Figure 15, the preview interface displays the images captured by the two cameras, of which the global area image occupies the full screen. In the global area image, all faces in the scene are displayed area. The user clicks on the face region of interest and interactively pulls the detection frame to frame the entire region of interest. At this time, one part of the screen is the global area screen, and the other part is the local area of interest screen. As mentioned above, the layout of these two parts of the screen can be left and right screens, upper and lower screens, or large and small screens. Here, the large and small screens are used as an example.

3).选定好用户感兴趣的局部区域后，两个摄像头分别对焦全局区域和指定的局部区域后，即可开始进行多焦点视频的拍摄。3). After selecting the local area that the user is interested in, the two cameras focus on the global area and the designated local area respectively, and then start to shoot multi-focus video.

步骤2.多焦点视频拍摄，存储和播放步骤如下：Step 2. The multi-focus video shooting, storage and playback steps are as follows:

步骤2.1多焦点视频拍摄Step 2.1 Multifocal Video Shooting

1).在多焦点视频拍摄界面，用户看到全局区域画面和局部感兴趣区域画面时，用户按下拍摄按钮进行视频拍摄，通过不同的操作方式可以进行不同的存储播放模式。单击界面中的拍摄按钮跳转到2)，将直接记录当前终端设备屏幕拍摄到的画面；长按全局区域中感兴趣的包围框将跳转到5)，存储全局区域视频和实时跟踪得到的包围框上的四个点位置，并存储四个点位置确定存储另一个摄像头对焦拍摄的局部感兴趣区域的大小的视频。如果同时触摸全局区域视频和局部区域视频，跳转到7)，将全局区域视频和局部感兴趣区域分别存储。1). In the multi-focus video shooting interface, when the user sees the global area screen and the local area of interest screen, the user presses the shooting button to shoot video, and different storage and playback modes can be performed through different operation methods. Click the shooting button in the interface to jump to 2), which will directly record the screen shot of the current terminal device screen; long press the bounding box of interest in the global area to jump to 5), store the global area video and real-time tracking to get The four point positions on the bounding box are stored, and the four point positions are determined to store the video of the size of the local area of interest captured by another camera in focus. If the global area video and local area video are touched at the same time, go to 7) to store the global area video and local area of interest separately.

2).直接记录当前终端设备屏幕上拍摄到的画面，全局区域画面占据整个终端设备的屏幕，局部区域画面位于一个小窗口上。该小窗口的位置可以由用户实时移动改变。2). Directly record the picture captured on the screen of the current terminal device. The global area picture occupies the entire screen of the terminal device, and the local area picture is located on a small window. The position of the widget can be changed by the user moving in real time.

步骤2.2多焦点视频存储Step 2.2 Multifocal Video Storage

3).当前屏幕上显示的即为全局区域和局部区域组成的多焦点视频，存储在终端设备中。在拍摄下一段多焦点视频之前，左上角图像框显示最近时间内拍摄的多焦点视频，可以点击该图像框对最近一次拍摄的多焦点视频进行查看。3). The current screen displays the multi-focus video composed of the global area and the local area, which is stored in the terminal device. Before shooting the next multi-focus video, the image frame in the upper left corner displays the multi-focus video captured in the most recent time. You can click the image frame to view the latest multi-focus video.

步骤2.3多焦点视频播放Step 2.3 Multi-focus video playback

4).点击左上角图像框后，在终端设备上显示最近一次拍摄的多焦点视频，此时的播放方式与拍摄时看到的内容相同。4). After clicking the image box in the upper left corner, the latest multi-focus video shot will be displayed on the terminal device, and the playback mode at this time is the same as the content seen when shooting.

5).存储全局区域视频和实时跟踪得到的包围框上的四个点位置，并存储由四个点位置确定存储另一个摄像头对焦拍摄的局部感兴趣区域的大小的视频。用户可以单独播放全局区域视频和四个坐标点对应的大小的局部区域视频；或者可以将两个视频关联播放。采取“大小屏”的关联方式将两个视频画面同时播放，大屏显示的是全局区域视频或者局部对焦区域，小屏显示的是局部对焦感兴趣区域或者全局区域视频，用户可以通过单击大图或者小图的方式来切换这两块屏幕的显示内容。并且，小屏在大屏上处于的位置也可以是由用户指定摆放的。当用户没有指定小屏摆放的位置时，终端设备可以自动将小屏放置在屏幕四个角的任意一个位置。视频存储完成后，在屏幕的左上角图框中显示最近一次拍摄的多焦点视频，如果进行播放，则跳转到 6)。5). Store the global area video and the four point positions on the bounding box obtained by real-time tracking, and store the video of the size of the local area of interest shot by another camera that is determined by the four point positions. The user can play the global region video and the local region video corresponding to the size of the four coordinate points independently; or can play the two videos in association. The two video images are played simultaneously in the “large and small screen” association mode. The large screen displays the global area video or the local focus area, and the small screen displays the partial focus area of interest or the global area video. The display content of the two screens can be switched by means of a picture or a small picture. In addition, the position of the small screen on the large screen may also be designated by the user. When the user does not specify the position of the small screen, the terminal device can automatically place the small screen in any position of the four corners of the screen. After the video is stored, the most recent multi-focus video shot will be displayed in the frame on the upper left corner of the screen. If it is playing, go to 6).

6).长按屏幕上左上角图框中的内容，则播放最近一次拍摄的多焦点视频。6). Long press the content in the upper left corner of the screen to play the most recent multi-focus video.

7).分别存储了全局区域视频和局部感兴趣区域视频，终端设备在播放时可分别播放两部分视频。也可以由终端设备自适应地将两个视频画面关联起来同时播放，可以参照“交互方式”中介绍的“上下屏”、“左右屏”和“大小屏”三种方式对两个视频画面进行播放。7). The global area video and the local interest area video are stored separately, and the terminal device can play two parts of the video respectively when playing. The terminal device can also adaptively associate two video images to play at the same time. You can refer to the three methods of "up and down screen", "left and right screen" and "large and small screen" introduced in "Interaction Mode" to play the two video images. play.

实施例十一、增强处理模式为目标对象凸显播放模式Embodiment 11. The enhancement processing mode is the target object highlighting playback mode

目标对象包括感兴趣人物及物体。在下述实施例中以感兴趣人物为例进行实施例的阐述。Target objects include people and objects of interest. In the following embodiments, a person of interest is taken as an example to illustrate the embodiment.

在日常生活和工作的视频拍摄中，经常由于拍摄的人物较多而导致感兴趣的人物不突出，不仅图像上不易定位感兴趣说话人的位置，人物的声音也经常混淆。目前的视频拍摄方式中并未突出感兴趣人物，而双目镜头和多麦克风的使用，可以定位场景中人物的深度及声音的方位，为拍摄时突出感兴趣人物提供了必要的条件。本发明中，通过双目摄像头及两个或多个麦克风的配合，将拍摄视频时图像中人物与每个人物的说话声音相关联，进而实现仅播放视频中感兴趣的某个人物的动作和声音的目的，得到凸显感兴趣目标人物的效果。从而实现在拍摄得到的多人场景视频中，凸显某一个人物的动作及声音。In the video shooting of daily life and work, the people of interest are often not prominent due to the large number of people photographed. Not only is it difficult to locate the location of the speaker of interest on the image, but the voices of the people are often confused. The current video shooting methods do not highlight people of interest, but the use of binocular lenses and multi-microphones can locate the depth of the people in the scene and the orientation of the sound, which provides the necessary conditions for highlighting the people of interest when shooting. In the present invention, through the cooperation of binocular cameras and two or more microphones, the characters in the image are associated with the voice of each character when shooting the video, so as to realize only the actions and actions of a certain character of interest in the video are played. The purpose of the sound is to get the effect of highlighting the target person of interest. In this way, the action and sound of a certain character can be highlighted in the multi-person scene video obtained by shooting.

需要说明的是，下述实施例的详述中，将根据第一类多媒体信息对第二类多媒体信息进行相应处理的方式，具体为根据采集到的音频信息，对采集到的视频信息进行音视频凸显处理的目标对象凸显播放模式。It should be noted that, in the detailed description of the following embodiments, the method of correspondingly processing the second type of multimedia information according to the first type of multimedia information, specifically, according to the collected audio information, the collected video information is processed audio. The target object highlighting playback mode of video highlighting processing.

当第二类多媒体信息为视频信息，第一类多媒体信息为与视频信息相对应的音频信息时，根据采集到的音频信息，对采集到的视频信息进行音视频凸显处理。When the second type of multimedia information is video information and the first type of multimedia information is audio information corresponding to the video information, audio and video highlight processing is performed on the collected video information according to the collected audio information.

具体地，从采集到的视频信息中确定目标对象；针对目标对象对应的视频信息和/或音频信息进行凸显处理。Specifically, the target object is determined from the collected video information; highlighting processing is performed on the video information and/or audio information corresponding to the target object.

优选地，在采集到的视频信息中确定目标对象所在的视频片段，且依据对应关系在采集到的音频信息中确定与该目标对象对应的音频片段。本发明提出，在拍摄得到的多人场景视频中，凸显某一个或者某几个感兴趣人物的动作及声音。通过双目摄像头及两个或多个麦克风的配合，将视频图像中出现的人物与他们各自的说话声音相关联，进而实现仅播放或凸显播放视频中感兴趣的某个人物或者某几个人物的动作和声音的目的，得到凸显感兴趣人物的效果。具体方案如下：Preferably, the video segment where the target object is located is determined in the collected video information, and the audio segment corresponding to the target object is determined in the collected audio information according to the corresponding relationship. The present invention proposes that, in the multi-person scene video obtained by shooting, the actions and voices of one or several persons of interest are highlighted. Through the cooperation of the binocular camera and two or more microphones, the characters appearing in the video image are associated with their respective speaking voices, so as to play or highlight only one or several characters of interest in the video. The purpose of the action and sound is to get the effect of highlighting the person of interest. The specific plans are as follows:

首先，终端针对拍摄得到视频，检测视频图像帧中的人脸区域，针对检测到的人脸数量，可以获得当前场景中总的目标人物数量；其次，根据检测到的某个人脸区域可以得到该人物与拍摄相机之间的方位信息；然后，结合双目相机通过立体匹配的方法得到该人物距离相机的深度信息，即可以得到场景中每个人物相对于相机坐标系的位置及方位信息；进一步，利用手机上的两个或多个麦克风，得到场景中每个说话人物相对于麦克风坐标系的位置及方位信息；最后，通过预先标定好相机坐标系和麦克风坐标系之间的变换关系，可得到图像上每个人物和音频的对应关系。First, the terminal detects the face area in the video image frame for the video obtained by shooting, and according to the number of detected faces, the total number of target persons in the current scene can be obtained; secondly, according to a detected face area, the The orientation information between the person and the shooting camera; then, combined with the binocular camera, the depth information of the person from the camera is obtained by the method of stereo matching, that is, the position and orientation information of each person in the scene relative to the camera coordinate system can be obtained; further , using two or more microphones on the mobile phone to obtain the position and orientation information of each speaker in the scene relative to the microphone coordinate system; finally, by pre-calibrating the transformation relationship between the camera coordinate system and the microphone coordinate system, you can Get the correspondence between each character and audio on the image.

得到图像上每个人物和音频的对应关系后，当用户点击播放图像中的某个或者某几个感兴趣人物时，视频图像中的其他区域将会被虚化或者将感兴趣区域进行放大，从而凸显感兴趣人物区域。After obtaining the corresponding relationship between each character and audio on the image, when the user clicks to play one or several interesting characters in the image, other areas in the video image will be blurred or the area of interest will be enlarged. Thereby highlighting the area of people of interest.

该实施例的具体实现及展示方式由开启兴趣人物凸显视频拍摄模式、确定感兴趣人物方式、对感兴趣人物图像和语音存储的方式、对感兴趣人物的图像和语音进行播放的方式四个部分组成。The specific implementation and display method of this embodiment consists of four parts: enabling the video shooting mode of highlighting the interesting person, determining the interesting person, storing the image and voice of the interested person, and playing the image and voice of the interested person. composition.

(1)开启兴趣人物凸显视频拍摄模式(1) Turn on the video shooting mode of people of interest highlighting

该模式包括两种方式，一种是用户主动开启兴趣人物凸显视频拍摄模式，另一种是终端设备根据拍摄的视频内容，自动提示用户是否需要开启兴趣人物凸显视频拍摄模式。This mode includes two modes, one is that the user actively turns on the video shooting mode of highlighting interesting people, and the other is that the terminal device automatically prompts the user whether to turn on the video shooting mode of highlighting interesting people according to the content of the captured video.

1)用户主动开启的方法主要通过语音，手势交互等。例如当用户说“录兴趣人物”时，终端设备即开启兴趣人物凸显视频拍摄模式，此时启动人脸检测功能，在视频拍摄预览界面显示出目前拍摄场景内的所有人物，用户可以单击选择某个感兴趣人物进行录制，并且可以改变录制的感兴趣人物；或者将当前视频中检测到的所有人物都实时记录下来，以备后续播放时选择特定的感兴趣人物进行播放；或者在仅录制某一个、某几个或者全部人物都录制的模式之间进行切换。1) The method of user's active opening is mainly through voice, gesture interaction, etc. For example, when the user says "record a person of interest", the terminal device will turn on the video shooting mode of highlighting the person of interest. At this time, the face detection function will be activated, and all the characters in the current shooting scene will be displayed on the video shooting preview interface. The user can click to select A person of interest is recorded, and the recorded person of interest can be changed; or all people detected in the current video are recorded in real time, so that a specific person of interest can be selected for subsequent playback; or when only recording Switch between modes in which one, several or all characters are recorded.

2)终端设备自动检测视频内容，主要是基于视频理解技术，例如通过分析视频中的内容，判断当前视频主要拍摄的场景是进行多人会议、演讲等场合时，终端设备自动提示用户是否需要开启兴趣人物凸显视频拍摄模式。在开启兴趣人物凸显视频拍摄模式后，即可通过用户的手势或者语音交互确定是录制单个感兴趣人物，或者是录制整个场景中的所有人物，或者在两种方式之间切换。2) The terminal device automatically detects the video content, mainly based on video understanding technology. For example, by analyzing the content in the video and judging that the main scene of the current video is a multi-person conference, speech, etc., the terminal device automatically prompts the user whether it needs to be turned on. Interested people highlight the video shooting mode. After turning on the video shooting mode of highlighting people of interest, it can be determined through the user's gesture or voice interaction whether to record a single person of interest, or to record all the people in the entire scene, or switch between the two methods.

(2)确定兴趣人物方式(2) Determine the way of people of interest

用户拍摄兴趣人物凸显视频，指定感兴趣人物的方式分为用户主动的通过语音、手势或外部设备的交互方式，或者终端设备自动地确定感兴趣人物。The user shoots a video highlighting the person of interest, and the way of specifying the person of interest can be divided into the user's initiative through voice, gesture or interaction with an external device, or the terminal device automatically determines the person of interest.

1)用户主动确定感兴趣人物的方式包括语音、手势和外部设备交互等。1) The way that the user actively determines the person of interest includes voice, gesture, and external device interaction.

语音交互：用户通过语音交互开启录制，录制时，也可以通过语音确定录制的是单个人物，还是多个人物，并且可以在这两种方式之间切换。例如，当用户说“录单个人物”时，终端设备仅对用户指定的某个感兴趣人物的图像和声音进行录制；当用户说“全录”时，终端设备将对场景中检测到的所有人物图像和声音进行录制。Voice interaction: The user starts recording through voice interaction. When recording, it can also be determined by voice whether a single character or multiple characters is being recorded, and can switch between the two methods. For example, when the user says "record a single person", the terminal device only records the image and sound of a certain person of interest specified by the user; when the user says "all record", the terminal device will record all the detected characters in the scene People images and voices are recorded.

手势交互：用户可以通过单击检测到的某个人物，指定感兴趣的目标人物进行录制；可以通过双击另一个人物，更换感兴趣的目标人物；可以通过单击屏幕，指定录制整个场景中的所有感兴趣人物；可以通过连续单击多个目标人物，指定录制多个感兴趣的人物。Gesture interaction: The user can click on a detected character to specify the target person of interest for recording; double-click another character to change the target person of interest; click the screen to specify the recording of the entire scene. All people of interest; you can specify to record multiple people of interest by clicking multiple target people in succession.

外部设备交互：上述通过手势交互的操作，均可通过外部设备交互来实现。例如通过与终端设备关联的手写笔，耳机等设备，实现指定某个感兴趣目标人物，或指定多个目标人物，或指定整个场景中的所有人物为目标人物。External device interaction: The above operations through gesture interaction can be implemented through external device interaction. For example, through devices such as stylus and earphones associated with the terminal device, it is possible to designate a certain target person of interest, or designate multiple target people, or designate all characters in the entire scene as target people.

2)终端设备自动地根据当前拍摄的场景确定感兴趣人物。终端设备在开启感兴趣人物拍摄模式时，在图像预览界面检测图像中出现的人物，根据出现的人物数量和人物所处的位置，确定用户感兴趣的人物。例如，将场景中出现的每个人物都作为感兴趣人物，将场景中所有人物及人物对应的声音都存储下来；或者将位置处于画面中靠近中心的人物作为感兴趣人物，并以显著地标示提示拍摄者目前中心人物为感兴趣人物，如果用户希望改变当前终端设备确定的感兴趣人物，可以通过双击自身感兴趣的人物来进行改变。其中，兴趣人物的图像和语音通过终端设备的双目摄像头及多个麦克风进行对应。2) The terminal device automatically determines the person of interest according to the currently photographed scene. When the terminal device turns on the interesting person shooting mode, it detects the person appearing in the image on the image preview interface, and determines the person that the user is interested in according to the number of people appearing and the location of the person. For example, take each character appearing in the scene as a person of interest, and store all the characters in the scene and their corresponding voices; or take a person who is located near the center of the picture as a person of interest, and mark them prominently with It prompts the photographer that the current central character is the person of interest. If the user wants to change the person of interest determined by the current terminal device, he can change the person of interest by double-clicking on the person he is interested in. The image and voice of the person of interest are corresponding to the binocular camera and multiple microphones of the terminal device.

(3)对感兴趣人物图像和视频存储方式(3) Image and video storage methods for people of interest

优选地，根据采集到的音频信息，对采集到的视频信息进行存储处理，其中，存储内容包括以下至少一种情形：Preferably, according to the collected audio information, the collected video information is stored and processed, wherein the stored content includes at least one of the following situations:

1)采集到的视频信息及音频信息；1) Collected video information and audio information;

2)目标对象对应的视频信息及音频信息。2) Video information and audio information corresponding to the target object.

存储方式主要有两种：There are two main storage methods:

第一种：将摄像头及麦克风实时采集得到的内容都存储下来，并且记录了不同时间用户指定的感兴趣人物，以备在播放时适应多种方式。The first one: store the content captured by the camera and the microphone in real time, and record the interesting people specified by the user at different times, so as to adapt to various ways during playback.

第二种：仅将拍摄时录制的感兴趣人物的图像和声音存储下来。即：仅录制拍摄时用户指定的某个人物、多个人物或全部场景中人物的图像和声音。The second type: Only the images and sounds of the people of interest recorded at the time of shooting are stored. That is, only the image and sound of a person specified by the user, multiple persons, or all persons in the scene are recorded at the time of shooting.

以上两种方式，主要针对录制场景在当前终端设备摄像头采集区域内，如果在当前终端设备的另一侧，即摄像头拍摄区域背面发出的声音，以另一个文件进行存储。此时可以由用户在播放时选择是否需要去掉来自摄像头拍摄区域背面发出的声音。通过终端设备上的麦克风，可以检测到声音的朝向是来自于摄像头拍摄区域的正面，还是来自于摄像头拍摄区域的背面。如果当前的声音来自于摄像头拍摄区域的背面，则该声音可能并不想被拍摄者记录下来，例如，该声音可能是“开始录了”，或者是当前接听电话时的谈话内容。所以可以将这部分语音内容单独存储。The above two methods are mainly aimed at recording the scene in the camera capture area of the current terminal device. If it is on the other side of the current terminal device, that is, the sound emitted from the back of the camera capture area, it will be stored in another file. At this time, the user can choose whether to remove the sound from the back of the camera shooting area during playback. Through the microphone on the terminal device, it can be detected whether the direction of the sound comes from the front of the camera shooting area or the back of the camera shooting area. If the current sound comes from the back of the camera's shooting area, the sound may not be recorded by the photographer, for example, the sound may be "starting recording", or the content of the current conversation when answering the phone. Therefore, this part of the voice content can be stored separately.

针对如何设置存储方式，本发明给出三种设置方法，终端设备可以根据以下三种中的一种来对存储方式进行选择。第一种是终端设备默认设置；第二种是终端设备接受用户通过语音、按键或外部控制器等方式以及这些方式的组合来更改存储方式；第三种是终端设备通过存储空间或历史数据自适应设置存储方式。Regarding how to set the storage mode, the present invention provides three setting methods, and the terminal device can select the storage mode according to one of the following three methods. The first is the default setting of the terminal device; the second is that the terminal device accepts the user to change the storage mode through voice, keys or external controllers, and a combination of these methods; Adapt to how the settings are stored.

关于默认设置，终端设备设置两种存储方式中的一种作为默认值，在终端设备没有接收到更改存储方式的指令前都使用该默认存储方式对视频进行存储。Regarding the default setting, the terminal device sets one of the two storage modes as the default value, and the terminal device uses this default storage mode to store the video before the terminal device receives an instruction to change the storage mode.

关于用户交互的设置方式，采用实施例一中步骤4类似的设置方法，区别在于指令描述内容，例如，语音设置中的指令为“感兴趣视频存储”，如果终端设备接收到该指令，则对声控指令进行语音识别，确定设置存储方式为第一种存储方式。其他用户交互的设置方式相同，在此不再赘述。Regarding the setting method of user interaction, the setting method similar to step 4 in the first embodiment is adopted, and the difference lies in the description content of the instruction. For example, the instruction in the voice setting is "interesting video storage". The voice control command is used for voice recognition, and the setting storage mode is determined as the first storage mode. The settings for other user interactions are the same, and will not be repeated here.

关于根据存储空间的自适应设置，根据存储空间，可以选择不同的存储方式，如果剩余存储空间小于某一阈值，例如低于终端设备存储空间的50％，则设置为第二种存储方式；反之，如果剩余存储空间高于某一阈值，例如高于终端设备存储空间的50％，则存储方式不受存储空间影响。Regarding the adaptive setting according to the storage space, different storage methods can be selected according to the storage space. If the remaining storage space is less than a certain threshold, for example, lower than 50% of the storage space of the terminal device, the second storage method is set; otherwise , if the remaining storage space is higher than a certain threshold, for example, higher than 50% of the storage space of the terminal device, the storage method is not affected by the storage space.

关于根据历史数据的自适应设置，例如根据用户以往设置的存储方式，来对用户喜好进行分析，设置为用户偏好的存储方式。Regarding the adaptive setting based on historical data, for example, the user's preference is analyzed according to the storage mode previously set by the user, and the user's preferred storage mode is set.

(4)对感兴趣人物图像和语音播放方式(4) Image and voice playback method for people of interest

优选地，响应于接收到的播放触发操作，基于与存储内容相匹配的播放方式对视频信息及音频信息进行播放；其中，播放方式包括以下至少一项：Preferably, in response to the received playback trigger operation, the video information and the audio information are played based on the playback mode that matches the stored content; wherein, the playback mode includes at least one of the following:

1)当存储采集到的视频信息及音频信息时，将采集到的视频信息及音频信息相关联地播放；1) when storing the collected video information and audio information, the collected video information and audio information are associated and played;

2)当存储采集到的视频信息及音频信息时，将采集到的视频信息中的目标对象与相对应的音频信息相关联地播放；2) when storing the collected video information and audio information, the target object in the collected video information is played in association with the corresponding audio information;

3)当存储采集到的视频信息及音频信息时，将采集到的视频信息中的各个对象与相对应的音频信息相关联地播放；3) when storing the collected video information and audio information, each object in the collected video information is played in association with the corresponding audio information;

4)当存储目标对象对应的视频信息及音频信息时，将目标对象对应的视频信息及音频信息相关联地播放。4) When storing the video information and audio information corresponding to the target object, play the video information and audio information corresponding to the target object in association.

与上述的存储方式相对应，有对应的两种播放方式，终端设备可以选择但不限于以下两种中的一种：Corresponding to the above-mentioned storage mode, there are two corresponding playback modes, and the terminal device can choose but not be limited to one of the following two:

第一种：终端设备检测到用户的打开操作，针对第一种存储的完整的视频。检测到用户点击播放的操作，例如，检测到用户点击播放按钮的操作，对视频进行播放。第一种存储方法将场景中所有人物的图像和声音都有记录，且各时间段用户指定的感兴趣人物也都记录了下来。则在播放时，可以：1) 按照用户录制时指定的视频内容播放：例如前30秒对人物1感兴趣，则仅播放目标人物1的图像和声音，其他人物和背景图像均被模糊和/或静止；或者将目标人物的图像区域放大，其他区域模糊和/或静止。接下来60秒对人物2 感兴趣，则仅播放目标人物2的图像和声音。这里的感兴趣人物的选择是在录制时确定的，并且终端设备记录了用户在哪些时间段对哪个或者哪些人物感兴趣；2)不做处理的播放所录制到的所有图像和声音内容；3)由于记录了场景中所有人物的图像和声音，在播放时，用户可以改变播放的感兴趣人物的顺序，例如前30秒对人物2感兴趣，则仅播放目标人物2的声音和图像，其他人物和背景图像均被模糊和/或静止。接下来60秒对人物1感兴趣，则仅播放目标人物1的图像和声音。The first type: the terminal device detects the user's opening operation, and is for the first type of stored complete video. An operation that the user clicks to play is detected, for example, an operation that the user clicks a play button is detected, and the video is played. The first storage method records the images and sounds of all the characters in the scene, and also records the interesting characters specified by the user in each time period. Then during playback, you can: 1) Play according to the video content specified by the user when recording: For example, if you are interested in character 1 in the first 30 seconds, only the image and sound of the target character 1 will be played, and other characters and background images will be blurred and/or or still; or enlarge the image area of the target person, and other areas are blurred and/or still. If you are interested in person 2 for the next 60 seconds, only the image and sound of the target person 2 will be played. The selection of the person of interest here is determined during recording, and the terminal device records which person or person is interested in which time period the user is interested in; 2) Play all the recorded image and sound content without processing; 3 ) Since the images and sounds of all characters in the scene are recorded, during playback, the user can change the order of the interesting characters to be played. For example, if he is interested in character 2 in the first 30 seconds, only the sound and image of the target character 2 will be played. Others Both people and background images are blurred and/or still. If you are interested in person 1 in the next 60 seconds, only the image and sound of the target person 1 will be played.

第二种：终端设备检测到用户的打开操作，针对第二种存储的感兴趣人物的视频。检测到用户点击播放的操作，例如，检测到用户点击播放按钮的操作，对视频按照录制时选择的感兴趣人物顺序进行播放，即与拍摄时指定的感兴趣区域相同的方式进行播放。The second type: The terminal device detects the user's opening operation, and targets the video of the person of interest stored in the second type. The operation of the user clicking to play is detected, for example, the operation of the user clicking the play button is detected, and the video is played in the order of the people of interest selected during recording, that is, in the same way as the area of interest specified during shooting.

上述两种播放方式主要针对摄像头拍摄的正面区域的图像和声音内容，针对“存储方式”中提到的来自于摄像头拍摄的背面区域的声音内容，可以由用户通过某些特定的语音或者手势指令告诉终端设备是否需要播放。例如，可以通过单击屏幕中的背景区域打开播放来自于摄像头拍摄背面区域的声音内容，播放的时间即按照文件中记录的时间序列与其他播放视频关联起来。也可以通过双击屏幕中的背景区域关闭播放来自于摄像头拍摄背面区域的声音内容。The above two playback methods are mainly aimed at the image and sound content of the front area captured by the camera. For the sound content from the back area captured by the camera mentioned in "Storage Mode", the user can use some specific voice or gesture commands. Tell the end device if playback is required. For example, you can click the background area on the screen to open and play the sound content from the back area captured by the camera, and the playback time is associated with other playback videos according to the time sequence recorded in the file. It is also possible to turn off the playback of the sound content from the back area captured by the camera by double-tapping the background area on the screen.

下面以一个实施例详细介绍兴趣人物凸显视频实施例的具体方案。The specific solution of the embodiment of the interesting person highlighting video is described in detail below with an embodiment.

步骤1.开启交互步骤：Step 1. Open interactive steps:

1).开启终端设备的照相机，进入视频拍摄预览界面，通过长按屏幕开启兴趣人物凸显视频拍摄模式。1). Turn on the camera of the terminal device, enter the video shooting preview interface, and enable the video shooting mode of people of interest highlighting by long-pressing the screen.

2).兴趣人物凸显视频模式开启后，预览界面显示左摄像头采集到的画面，该画面占据整个屏幕。此时启动人脸检测功能，在视频拍摄预览界面显示出目前拍摄场景内的所有人物，将当前视频中检测到的所有人物都实时记录下来，已备后续播放时选择特定的感兴趣人物进行播放。2). After the interesting person highlighting video mode is turned on, the preview interface displays the image captured by the left camera, which occupies the entire screen. At this point, the face detection function is activated, and all the characters in the current shooting scene are displayed on the video shooting preview interface, and all the characters detected in the current video are recorded in real time, and the specific interesting characters are selected to be played for subsequent playback. .

3).当检测出场景中的人物后，将启动一侧摄像头，配合另一侧摄像头计算得到场景中检测到的人物的深度及方向信息，即可开始进行兴趣人物凸显视频的拍摄。3). When a person in the scene is detected, the camera on one side will be activated, and the depth and direction information of the person detected in the scene will be calculated with the camera on the other side, and then the video of the person of interest will be highlighted.

步骤2.兴趣人物凸显视频拍摄，存储和播放方式如下：Step 2. The way of shooting, storing and playing the video of people of interest highlighting is as follows:

步骤2.1兴趣人物凸显视频拍摄Step 2.1 Interested people highlight video shooting

1).在兴趣人物凸显视频拍摄界面，用户看到画面中检测到的人脸区域为绿色时，表明此时场景中人物所处的角度及位置信息已被估计，用户按下拍摄按钮进行视频拍摄，通过不同的操作方式可以进行不同的存储播放模式。单击界面中的拍摄按钮跳转到2)，将直接记录当前终端设备拍摄到的画面；长按画面中某个目标人物将跳转到5)，存储兴趣人物凸显视频和相对应的各个时间点对应的感兴趣人物。同时，实时检测在当前终端设备摄像头拍摄区域背面发出的声音，如果检测到背面发出声音，跳转到7)，将摄像头采集区域内视频和摄像头背面采集到的音频分别存储。1). In the video shooting interface of people of interest highlighting, when the user sees that the detected face area in the screen is green, it indicates that the angle and position information of the characters in the scene have been estimated at this time, and the user presses the shooting button to start the video Shooting, different storage and playback modes can be performed through different operation methods. Click the shooting button in the interface to jump to 2), which will directly record the screen shot by the current terminal device; long-press a target person in the screen to jump to 5), and store the highlighted video of the person of interest and the corresponding time Click on the corresponding person of interest. At the same time, real-time detection of the sound emitted from the back of the current terminal device camera shooting area, if a sound is detected from the back, skip to 7), and store the video in the camera collection area and the audio collected on the back of the camera respectively.

2).直接记录当前屏幕上拍摄到的画面，一侧摄像头拍摄的画面占据整个终端设备的屏幕。实时显示当前场景中的人物区域。如果在拍摄过程当中没有指定某个感兴趣的人物，则可以在播放阶段进行感兴趣人物的选择，跳转到5)，播放指定的感兴趣人物的画面和音频。2). Directly record the image captured on the current screen, and the image captured by the camera on one side occupies the entire screen of the terminal device. Real-time display of the character area in the current scene. If a person of interest is not specified during the shooting process, the person of interest can be selected in the playback stage, and jump to 5) to play the picture and audio of the specified person of interest.

步骤2.2兴趣人物凸显视频存储Step 2.2 Interested people highlight video storage

3).当前屏幕上显示的即为兴趣人物凸显视频，存储在终端设备中。在拍摄下一段兴趣人物凸显视频之前，左上角图像框显示最近时间内拍摄的兴趣人物凸显视频，可以点击该图像框对最近一次拍摄的兴趣人物凸显视频进行查看。3). What is displayed on the current screen is the highlighted video of the person of interest, which is stored in the terminal device. Before shooting the next video highlighting people of interest, the image frame in the upper left corner displays the highlight video of people of interest shot recently. You can click on the image frame to view the highlight video of people of interest recently shot.

步骤2.3兴趣人物凸显视频播放Step 2.3 Interested people highlight video playback

4).点击左上角图像框后，在终端设备上显示最近一次拍摄的兴趣人物凸显视频，此时的播放方式与拍摄时看到的内容相同。如果点击当前播放视频中某个人物时，则跳转到步骤5进行播放。4). After clicking the image box in the upper left corner, the most recent shooting video of people of interest will be displayed on the terminal device, and the playback method at this time is the same as the content seen when shooting. If you click on a character in the currently playing video, jump to step 5 to play.

5).如果在拍摄阶段没有指定某个或某几个感兴趣的人物，则在播放时可以由用户单击感兴趣的人物区域，此时仅播放该人物对应的图像和音频，其他区域均静止和/或模糊。如果在拍摄阶段指定了某一段时间感兴趣的人物，则将用户指定的时长及感兴趣人物次序记录下来，播放时按照拍摄时指定的感兴趣人物次序及时长进行播放。5). If one or several people of interest are not specified in the shooting stage, the user can click the area of the person of interest during playback, and only the image and audio corresponding to the character will be played, and all other areas will be displayed. Still and/or blurry. If a certain period of interesting people is specified during the shooting stage, the user-specified duration and the order of the interesting people are recorded, and the playback will be played according to the order and duration of the interesting people specified at the time of shooting.

6).长按屏幕上左上角图像框内的内容，则播放最近一次拍摄的兴趣人物凸显视频。6). Press and hold the content in the upper left image box on the screen to play the most recent shooting video highlighting the person of interest.

7).分别存储了兴趣人物凸显视频和来自于摄像头背面区域的音频，终端设备在播放时可分别播放两部分内容。如图16(a)-(c)所示，如果对来自于摄像头背面区域的音频不感兴趣，可以直接将该音频内容删除；如果用户希望保留来自于摄像头背面区域的音频，可以在播放时按照时间序列，将对应的音频和视频关联起来共同播放。7). The highlighted video of the person of interest and the audio from the area on the back of the camera are stored separately, and the terminal device can play two parts of content respectively when playing. As shown in Figure 16(a)-(c), if you are not interested in the audio from the back area of the camera, you can delete the audio content directly; if the user wants to keep the audio from the back area of the camera, you can follow the Time series, associate the corresponding audio and video to play together.

本发明还提供了一种多媒体增强处理的装置，如图17所示，该装置包括：多媒体信息获取模块1701、处理模块1702。The present invention also provides an apparatus for multimedia enhancement processing. As shown in FIG. 17 , the apparatus includes: a multimedia information acquisition module 1701 and a processing module 1702 .

多媒体信息获取模块1701获取两个多媒体采集设备分别采集的第一类多媒体信息和第二类多媒体信息；处理模块1702根据第一类多媒体信息对第二类多媒体信息进行相应处理。The multimedia information acquisition module 1701 acquires the first type of multimedia information and the second type of multimedia information respectively collected by the two multimedia collection devices; the processing module 1702 performs corresponding processing on the second type of multimedia information according to the first type of multimedia information.

本发明的方案中，提供的多媒体信息处理的装置中各模块的具体功能实现，可以参照图1提供的多媒体信息处理的方法的具体步骤，在此不再详述。In the solution of the present invention, for the specific function realization of each module in the provided multimedia information processing device, reference may be made to the specific steps of the multimedia information processing method provided in FIG. 1 , which will not be described in detail here.

本技术领域技术人员可以理解，本发明包括涉及用于执行本申请中所述操作中的一项或多项的设备。这些设备可以为所需的目的而专门设计和制造，或者也可以包括通用计算机中的已知设备。这些设备具有存储在其内的计算机程序，这些计算机程序选择性地激活或重构。这样的计算机程序可以被存储在设备(例如，计算机)可读介质中或者存储在适于存储电子指令并分别耦联到总线的任何类型的介质中，所述计算机可读介质包括但不限于任何类型的盘(包括软盘、硬盘、光盘、CD-ROM、和磁光盘)、ROM(Read-OnlyMemory，只读存储器)、RAM(Random Access Memory，随即存储器)、EPROM (ErasableProgrammable Read-Only Memory，可擦写可编程只读存储器)、 EEPROM(ElectricallyErasable Programmable Read-Only Memory，电可擦可编程只读存储器)、闪存、磁性卡片或光线卡片。也就是，可读介质包括由设备(例如，计算机)以能够读的形式存储或传输信息的任何介质。As will be appreciated by those skilled in the art, the present invention includes apparatus for performing one or more of the operations described in this application. These devices may be specially designed and manufactured for the required purposes, or they may include those known in general purpose computers. These devices have computer programs stored in them that are selectively activated or reconfigured. Such a computer program may be stored in a device (eg, computer) readable medium including, but not limited to, any type of medium suitable for storing electronic instructions and coupled to a bus, respectively Types of disks (including floppy disks, hard disks, optical disks, CD-ROMs, and magneto-optical disks), ROM (Read-Only Memory, read-only memory), RAM (Random Access Memory, random access memory), EPROM (Erasable Programmable Read-Only Memory, available Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory, Electrically Erasable Programmable Read-Only Memory), flash memory, magnetic card or optical card. That is, a readable medium includes any medium that stores or transmits information in a form that can be read by a device (e.g., a computer).

本技术领域技术人员可以理解，可以用计算机程序指令来实现这些结构图和/或框图和/或流图中的每个框以及这些结构图和/或框图和/或流图中的框的组合。本技术领域技术人员可以理解，可以将这些计算机程序指令提供给通用计算机、专业计算机或其他可编程数据处理方法的处理器来实现，从而通过计算机或其他可编程数据处理方法的处理器来执行本发明公开的结构图和/或框图和/或流图的框或多个框中指定的方案。Those skilled in the art will understand that computer program instructions can be used to implement each block of these structural diagrams and/or block diagrams and/or flow diagrams, and combinations of blocks in these structural diagrams and/or block diagrams and/or flow diagrams . Those skilled in the art can understand that these computer program instructions can be provided to a general-purpose computer, a professional computer or a processor of other programmable data processing methods to implement, so that the present invention can be executed by a processor of a computer or other programmable data processing method. The block or blocks specified in the block or blocks of the block diagrams and/or block diagrams and/or flow diagrams of the invention are disclosed.

本技术领域技术人员可以理解，本发明中已经讨论过的各种操作、方法、流程中的步骤、措施、方案可以被交替、更改、组合或删除。进一步地，具有本发明中已经讨论过的各种操作、方法、流程中的其他步骤、措施、方案也可以被交替、更改、重排、分解、组合或删除。进一步地，现有技术中的具有与本发明中公开的各种操作、方法、流程中的步骤、措施、方案也可以被交替、更改、重排、分解、组合或删除。Those skilled in the art can understand that the various operations, methods, steps, measures, and solutions discussed in the present invention may be alternated, modified, combined, or deleted. Further, other steps, measures, and solutions in the various operations, methods, and processes that have been discussed in the present invention may also be alternated, modified, rearranged, decomposed, combined, or deleted. Further, steps, measures and solutions in the prior art with various operations, methods, and processes disclosed in the present invention may also be alternated, modified, rearranged, decomposed, combined or deleted.

以上所述仅是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以作出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above are only the preferred embodiments of the present invention. It should be pointed out that for those skilled in the art, without departing from the principles of the present invention, several improvements and modifications can be made, and these improvements and modifications should also be It is regarded as the protection scope of the present invention.

Claims

1. A method for video information processing performed by an electronic device, comprising:

obtaining first video information corresponding to the first focus area by the first multimedia collection device;

obtaining, by the second multimedia collection device, second video information corresponding to the second focus area; and

The first video information and the second video information are displayed on a display of the electronic device.

2. The method of claim 1, wherein the second focus area is located within the first focus area, and

Wherein, the second focus area is smaller than the first focus area.

3. The method of claim 2, wherein the first focus area is a global area; the second focus area is a local area.

4. The method of claim 1, wherein displaying the first video information and the second video information on a display of the electronic device comprises:

The first video information and the second video information are displayed in a split screen layout on a display of the electronic device.

5. The method of claim 1, further comprising:

acquiring third video information corresponding to a third focus area within the first focus area; and

At least one of the first video information and the second video information and the third video information are displayed on a display of the electronic device.

6. The method according to claim 5, wherein displaying at least one of the first video information and the second video information and the third video information on a display of the electronic device comprises:

At least one of the first video information and the second video information and the third video information are displayed in a split screen layout on a display of the electronic device.

7. The method of claim 1, further comprising receiving a first instruction to enter a multi-video mode prior to obtaining the first video information and the second video information.

8. The method of claim 1, further comprising receiving a second instruction to set the second focus area.

9. The method of claim 1, further comprising storing the fourth video information by synthesizing the first video information and the second video information.

10. The method of claim 9, further comprising:

receiving a fourth instruction to play fourth video information; and

The fourth video information is played in response to the received fourth instruction.

11. The method of claim 10, further comprising displaying a user interface for receiving an instruction to play the fourth video information.

12. The method of claim 9, further comprising receiving a third instruction to store fourth video information.

13. The method of claim 9, wherein when the orientation of the electronic device is a horizontal orientation, the fourth video information is stored in a split screen layout including a left screen and a right screen.

14. The method of claim 9, wherein when the orientation of the electronic device is a vertical orientation, the fourth video information is stored in a split screen layout including an upper screen and a lower screen.

15. The method of claim 4, further comprising detecting the orientation of the electronic device by a sensor.

16. The method of claim 15, wherein when the detected orientation of the electronic device is a horizontal orientation, the split screen layout includes a left screen and a right screen, and

Wherein, when the detected direction of the electronic device is a vertical direction, the split-screen layout includes an upper screen and a lower screen.

17. The method of claim 1, wherein the second focus area is determined based on a center of the first focus area.

18. An electronic device comprising:

at least two multimedia collection devices, including a first multimedia collection device and a second multimedia collection device;

monitor;

A multimedia information acquisition module, configured as:

Obtain first video information corresponding to the first focus area through the first multimedia collection device;

Obtain second video information corresponding to the second focus area by the second multimedia capture device; and

The processing module is configured to control the display of the first video information and the second video information on the display.

19. The electronic device of claim 18, wherein the second focus area is located within the first focus area, and

Wherein, the second focus area is smaller than the first focus area.

20. The electronic device of claim 19, wherein the first focus area is a global area; the second focus area is a local area.

21. The electronic device of claim 18, wherein the processing module is further configured to control the display of the first video information and the second video information in a split screen layout on the display.

22. The electronic device of claim 18, wherein the multimedia information acquisition module is further configured to acquire third video information corresponding to a third focus area within the first focus area, and

Wherein, the processing module is further configured to control to display at least one of the first video information and the second video information and the third video information on the display.

23. The electronic device of claim 22, wherein the processing module is further configured to control to display at least one of the first video information and the second video information and the third video information in a split screen layout on the display.

24. The electronic device of claim 18, wherein the processing module is further configured to receive a first instruction to enter a multi-video mode before obtaining the first video information and the second video information.

25. The electronic device of claim 18, wherein the processing module is further configured to receive a second instruction to set a second focus area.

26. The electronic device of claim 18, wherein the processing module is further configured to store fourth video information by synthesizing the first video information and the second video information.

27. The electronic device of claim 26, wherein the processing module is further configured to:

receiving a fourth instruction to play fourth video information; and

Controlling to play the fourth video information on the display in response to the received fourth instruction.

28. The electronic device of claim 27, wherein the processing module is further configured to control displaying on the display a user interface for receiving an instruction to play the fourth video information.

29. The electronic device of claim 26, wherein the processing module is further configured to receive a third instruction to store fourth video information.

30. The electronic device of claim 26, wherein, when the orientation of the electronic device is a horizontal direction, the processing module is further configured to store the fourth video information in a split screen layout including a left screen and a right screen.

31. The electronic device of claim 26, wherein when the orientation of the electronic device is a vertical direction, the processing module is further configured to store the fourth video information in a split screen layout including an upper screen and a lower screen.

32. The electronic device of claim 21, further comprising a sensor configured to detect the orientation of the electronic device.

33. The electronic device of claim 32, wherein when the detected orientation of the electronic device is a horizontal orientation, the split screen layout includes a left screen and a right screen, and

34. The electronic device of claim 18, wherein the multimedia information acquisition module is further configured to determine the second focus area based on the center of the first focus area.