US10719940B2 - Target Tracking Method And Device Oriented To Airborne-Based Monitoring Scenarios - Google Patents

Target detecting and tracking are two of the core tasks in the sphere of visual surveillance. Relu activated absolutely-related layers to derive an output of 4-dimensional bounding box data by regression, whereby the 4-dimensional bounding box knowledge consists of: horizontal coordinates of an higher left corner of the first rectangular bounding box, vertical coordinates of the higher left corner of the first rectangular bounding field, a size of the primary rectangular bounding box, and a width of the primary rectangular bounding box. FIG. Three is a structural diagram illustrating a goal tracking device oriented to airborne-based mostly monitoring scenarios in keeping with an exemplary embodiment of the current disclosure. FIG. Four is a structural diagram illustrating one other goal tracking device oriented to airborne-primarily based monitoring situations based on an exemplary embodiment of the current disclosure. FIG. 1 is a flowchart diagram illustrating a target tracking method oriented to airborne-based mostly monitoring situations according to an exemplary embodiment of the present disclosure. Step 101 obtaining a video to-be-tracked of the goal object in real time, and performing frame decoding to the video to-be-tracked to extract a primary body and a second body.

Step 102 trimming and capturing the primary body to derive a picture for first curiosity area, and trimming and capturing the second frame to derive an image for goal template and an image for second interest area. N times that of a size and width information of the second rectangular bounding field, respectively. N may be 2, that's, the length and width knowledge of the third rectangular bounding field are 2 instances that of the size and best bluetooth tracker width knowledge of the primary rectangular bounding box, respectively. 2 occasions that of the unique knowledge, acquiring a bounding box with an area 4 instances that of the unique knowledge. According to the smoothness assumption of motions, it is believed that the place of the goal object in the primary frame have to be found in the curiosity region that the area has been expanded. Step 103 inputting the image for target template and the picture for first curiosity region right into a preset look tracker community to derive an appearance monitoring position.

Relu, and the variety of channels for outputting the characteristic map is 6, 12, 24, 36, 48, and sixty four in sequence. 3 for the rest. To make sure the integrity of the spatial place information within the function map, the convolutional network does not embrace any down-sampling pooling layer. Feature maps derived from totally different convolutional layers within the parallel two streams of the twin networks are cascaded and built-in utilizing the hierarchical characteristic pyramid of the convolutional neural network while the convolution deepens repeatedly, respectively. This kernel is used for performing a cross-correlation calculation for dense sampling with sliding window sort on the feature map, which is derived by cascading and integrating one stream corresponding to the picture for first interest area, and a response map for look similarity can also be derived. It can be seen that in the looks tracker network, the tracking is in essence about deriving the position where the target is positioned by a multi-scale dense sliding window search within the curiosity region.

The search is calculated primarily based on the goal look similarity, that's, the looks similarity between the goal template and the image of the searched position is calculated at each sliding window place. The position the place the similarity response is large is highly probably the place where the target is located. Step 104 inputting the picture for first interest region and the picture for second interest area into a preset motion best bluetooth tracker community to derive a movement tracking position. Spotlight filter body difference module, a foreground enhancing and background suppressing module in sequence, whereby every module is constructed based on a convolutional neural network structure. Relu activated convolutional layers. Each of the number of outputted characteristic maps channel is three, whereby the function map is the contrast map for the input picture derived from the calculations. Spotlight filter body difference module to obtain a body distinction movement response map corresponding to the interest areas of two frames comprising earlier frame and subsequent frame.

This multi-scale convolution design which is derived by cascading and secondary integrating three convolutional layers with completely different kernel sizes, goals to filter the motion noises attributable to the lens motions. Step 105 inputting the looks monitoring position and the motion tracking place right into a deep integration network to derive an integrated remaining tracking position. 1 convolution kernel to revive the output channel to a single channel, thereby teachably integrating the monitoring results to derive the final tracking position response map. Relu activated absolutely-linked layers, and a 4-dimensional bounding field data is derived by regression for outputting. This embodiment combines two streams tracker networks in parallel in the means of monitoring the goal object, whereby the target object's look and motion data are used to carry out the positioning and monitoring for the target object, and the final tracking place is derived by integrating two times positioning information. FIG. 2 is a flowchart diagram illustrating a goal monitoring methodology oriented to airborne-based mostly monitoring situations according to a different exemplary embodiment of the current disclosure.