R-CNN: Regions with Convolutional Neural Network Features
算法基于tensorflow 1.12实现论文Rich feature hierarachies for accurate object detection and semantic segmentation
windows10+python3.6+tensorflow1.12+scikit-learning+cv2+tflearn
i7-6700+gtx1060
由于设备所限,没有采用原论文中的imagenet数据集,采用较小的17flowers据集, 官网下载:https://www.robots.ox.ac.uk/~vgg/data/flowers/17/
1、config.py---网络定义、训练与数据处理所需要用到的参数
2、Networks.py---用于定义Alexnet_Net模型、fineturn模型、SVM模型、边框回归模型
4、process_data.py---用于对训练数据集与微调数据集进行处理(选择性搜索、数据存取等)
5、train_and_test.py---用于各类模型的训练与测试、主函数
6、selectivesearch.py---选择性搜索源码
简单介绍一下IOU。物体检测需要定位出物体的boundingbox,就像下面的图片一样,我们不仅要定位出车辆的bounding box 我们还要识别出bounding box 里面的物体就是车辆。对于boundingbox的定位精度,有一个很重要的概念,因为我们算法不可能百分百跟人工标注的数据完全匹配,因此就存在一个定位精度评价公式:IOU
IOU定义了两个bounding box的重叠度,如下图所示:
因为一会儿讲RCNN算法,会从一张图片中找出n多个可能是物体的矩形框,然后为每个矩形框为做类别分类概率:
就像上面的图片一样,定位一个车辆,最后算法就找出了一堆的方框,我们需要判别哪些矩形框是没用的。非极大值抑制:先假设有6个矩形框,根据分类器类别分类概率做排序,从小到大分别属于车辆的概率分别为A、B、C、D、E、F。
-
(1)从最大概率矩形框F开始,分别判断A~E与F的重叠度IOU是否大于某个设定的阈值;
-
(2)假设B、D与F的重叠度超过阈值,那么就扔掉B、D;并标记第一个矩形框F,是我们保留下来的。
-
(3)从剩下的矩形框A、C、E中,选择概率最大的E,然后判断E与A、C的重叠度,重叠度大于一定的阈值,那么就扔掉;并标记E是我们保留下来的第二个矩形框。
就这样一直重复,找到所有被保留下来的矩形框。
利用候选区域与 CNN 结合做目标定位 借鉴了滑动窗口思想,R-CNN 采用对区域进行识别的方案。 因此paper采用的方法是:首先输入一张图片,我们先定位出若干个物体候选框,然后采用CNN提取每个候选框中图片的特征向量,特征向量为全连接层去处顶层后的输出,接着采用svm算法对各个候选框中的物体进行分类识别。也就是总个过程分为四个程序:
a、找出候选框;
b、利用CNN提取特征向量;
c、利用SVM进行特征向量分类;
d、利用边框回归修正候选框,NMS剔除多余的候选框
采用在 ImageNet 上已经训练好的模型,然后在 PASCAL VOC 数据集上进行 fine-tune。 因为 ImageNet 的图像高达几百万张,利用卷积神经网络充分学习浅层的特征,然后在小规模数据集做规模化训练,从而可以达到好的效果。 现在,我们称之为迁移学习,是必不可少的一种技能。
能够生成候选区域的方法很多,比如:
- objectness
- selective search
- category-independen object proposals
- constrained parametric min-cuts(CPMC)
- multi-scale combinatorial grouping
- Ciresan
R-CNN 采用的是 Selective Search 算法。首先通过简单的区域划分算法,将图片划分成很多小区域,再通过相似度(基于RGB,HSV等颜色空间)和区域大小(小的区域先聚合,这样是防止大的区域不断的聚合小区域,导致层次关系不完全)不断的聚合相邻小区域,类似于聚类的思路。这样我们便得到若干个region proposals
,即候选框(注:原论文中生成2000个)
selective search模块
import skimage.io
import skimage.feature
import skimage.color
import skimage.transform
import skimage.util
import skimage.segmentation
import numpy
# "Selective Search for Object Recognition" by J.R.R. Uijlings et al.
#
# - Modified version with LBP extractor for texture vectorization
def _generate_segments(im_orig, scale, sigma, min_size):
"""
segment smallest regions by the algorithm of Felzenswalb and
Huttenlocher
"""
# open the Image
im_mask = skimage.segmentation.felzenszwalb(
skimage.util.img_as_float(im_orig), scale=scale, sigma=sigma,
min_size=min_size)
# merge mask channel to the image as a 4th channel
im_orig = numpy.append(
im_orig, numpy.zeros(im_orig.shape[:2])[:, :, numpy.newaxis], axis=2)
im_orig[:, :, 3] = im_mask
return im_orig
def _sim_colour(r1, r2):
"""
calculate the sum of histogram intersection of colour
"""
return sum([min(a, b) for a, b in zip(r1["hist_c"], r2["hist_c"])])
def _sim_texture(r1, r2):
"""
calculate the sum of histogram intersection of texture
"""
return sum([min(a, b) for a, b in zip(r1["hist_t"], r2["hist_t"])])
def _sim_size(r1, r2, imsize):
"""
calculate the size similarity over the image
"""
return 1.0 - (r1["size"] + r2["size"]) / imsize
def _sim_fill(r1, r2, imsize):
"""
calculate the fill similarity over the image
"""
bbsize = (
(max(r1["max_x"], r2["max_x"]) - min(r1["min_x"], r2["min_x"]))
* (max(r1["max_y"], r2["max_y"]) - min(r1["min_y"], r2["min_y"]))
)
return 1.0 - (bbsize - r1["size"] - r2["size"]) / imsize
def _calc_sim(r1, r2, imsize):
return (_sim_colour(r1, r2) + _sim_texture(r1, r2)
+ _sim_size(r1, r2, imsize) + _sim_fill(r1, r2, imsize))
def _calc_colour_hist(img):
"""
calculate colour histogram for each region
the size of output histogram will be BINS * COLOUR_CHANNELS(3)
number of bins is 25 as same as [uijlings_ijcv2013_draft.pdf]
extract HSV
"""
BINS = 25
hist = numpy.array([])
for colour_channel in (0, 1, 2):
# extracting one colour channel
c = img[:, colour_channel]
# calculate histogram for each colour and join to the result
hist = numpy.concatenate(
[hist] + [numpy.histogram(c, BINS, (0.0, 255.0))[0]])
# L1 normalize
hist = hist / len(img)
return hist
def _calc_texture_gradient(img):
"""
calculate texture gradient for entire image
The original SelectiveSearch algorithm proposed Gaussian derivative
for 8 orientations, but we use LBP instead.
output will be [height(*)][width(*)]
"""
ret = numpy.zeros((img.shape[0], img.shape[1], img.shape[2]))
for colour_channel in (0, 1, 2):
ret[:, :, colour_channel] = skimage.feature.local_binary_pattern(
img[:, :, colour_channel], 8, 1.0)
return ret
def _calc_texture_hist(img):
"""
calculate texture histogram for each region
calculate the histogram of gradient for each colours
the size of output histogram will be
BINS * ORIENTATIONS * COLOUR_CHANNELS(3)
"""
BINS = 10
hist = numpy.array([])
for colour_channel in (0, 1, 2):
# mask by the colour channel
fd = img[:, colour_channel]
# calculate histogram for each orientation and concatenate them all
# and join to the result
hist = numpy.concatenate(
[hist] + [numpy.histogram(fd, BINS, (0.0, 1.0))[0]])
# L1 Normalize
hist = hist / len(img)
return hist
def _extract_regions(img):
R = {}
# get hsv image
hsv = skimage.color.rgb2hsv(img[:, :, :3])
# pass 1: count pixel positions
for y, i in enumerate(img):
for x, (r, g, b, l) in enumerate(i):
# initialize a new region
if l not in R:
R[l] = {
"min_x": 0xffff, "min_y": 0xffff,
"max_x": 0, "max_y": 0, "labels": [l]}
# bounding box
if R[l]["min_x"] > x:
R[l]["min_x"] = x
if R[l]["min_y"] > y:
R[l]["min_y"] = y
if R[l]["max_x"] < x:
R[l]["max_x"] = x
if R[l]["max_y"] < y:
R[l]["max_y"] = y
# pass 2: calculate texture gradient
tex_grad = _calc_texture_gradient(img)
# pass 3: calculate colour histogram of each region
for k, v in R.items():
# colour histogram
masked_pixels = hsv[:, :, :][img[:, :, 3] == k]
R[k]["size"] = len(masked_pixels / 4)
R[k]["hist_c"] = _calc_colour_hist(masked_pixels)
# texture histogram
R[k]["hist_t"] = _calc_texture_hist(tex_grad[:, :][img[:, :, 3] == k])
return R
def _extract_neighbours(regions):
def intersect(a, b):
if (a["min_x"] < b["min_x"] < a["max_x"]
and a["min_y"] < b["min_y"] < a["max_y"]) or (
a["min_x"] < b["max_x"] < a["max_x"]
and a["min_y"] < b["max_y"] < a["max_y"]) or (
a["min_x"] < b["min_x"] < a["max_x"]
and a["min_y"] < b["max_y"] < a["max_y"]) or (
a["min_x"] < b["max_x"] < a["max_x"]
and a["min_y"] < b["min_y"] < a["max_y"]):
return True
return False
R = regions.items()
r = [elm for elm in R]
R = r
neighbours = []
for cur, a in enumerate(R[:-1]):
for b in R[cur + 1:]:
if intersect(a[1], b[1]):
neighbours.append((a, b))
return neighbours
def _merge_regions(r1, r2):
new_size = r1["size"] + r2["size"]
rt = {
"min_x": min(r1["min_x"], r2["min_x"]),
"min_y": min(r1["min_y"], r2["min_y"]),
"max_x": max(r1["max_x"], r2["max_x"]),
"max_y": max(r1["max_y"], r2["max_y"]),
"size": new_size,
"hist_c": (
r1["hist_c"] * r1["size"] + r2["hist_c"] * r2["size"]) / new_size,
"hist_t": (
r1["hist_t"] * r1["size"] + r2["hist_t"] * r2["size"]) / new_size,
"labels": r1["labels"] + r2["labels"]
}
return rt
def selective_search(
im_orig, scale=1.0, sigma=0.8, min_size=50):
'''Selective Search
Parameters
----------
im_orig : ndarray
Input image
scale : int
Free parameter. Higher means larger clusters in felzenszwalb segmentation.
sigma : float
Width of Gaussian kernel for felzenszwalb segmentation.
min_size : int
Minimum component size for felzenszwalb segmentation.
Returns
-------
img : ndarray
image with region label
region label is stored in the 4th value of each pixel [r,g,b,(region)]
regions : array of dict
[
{
'rect': (left, top, right, bottom),
'labels': [...]
},
...
]
'''
assert im_orig.shape[2] == 3, "3ch image is expected"
# load image and get smallest regions
# region label is stored in the 4th value of each pixel [r,g,b,(region)]
img = _generate_segments(im_orig, scale, sigma, min_size)
if img is None:
return None, {}
imsize = img.shape[0] * img.shape[1]
R = _extract_regions(img)
# extract neighbouring information
neighbours = _extract_neighbours(R)
# calculate initial similarities
S = {}
for (ai, ar), (bi, br) in neighbours:
S[(ai, bi)] = _calc_sim(ar, br, imsize)
# hierarchal search
while S != {}:
# get highest similarity
# i, j = sorted(S.items(), cmp=lambda a, b: cmp(a[1], b[1]))[-1][0]
i, j = sorted(list(S.items()), key = lambda a: a[1])[-1][0]
# merge corresponding regions
t = max(R.keys()) + 1.0
R[t] = _merge_regions(R[i], R[j])
# mark similarities for regions to be removed
key_to_delete = []
for k, v in S.items():
if (i in k) or (j in k):
key_to_delete.append(k)
# remove old similarities of related regions
for k in key_to_delete:
del S[k]
# calculate similarity set with the new region
for k in filter(lambda a: a != (i, j), key_to_delete):
n = k[1] if k[0] in (i, j) else k[0]
S[(t, n)] = _calc_sim(R[t], R[n], imsize)
regions = []
for k, r in R.items():
regions.append({
'rect': (
r['min_x'], r['min_y'],
r['max_x'] - r['min_x'], r['max_y'] - r['min_y']),
'size': r['size'],
'labels': r['labels']
})
return img, regions
R-CNN抽取了一个 4096 维的特征向量,采用的是Alexnet模型,原论文基于 Caffe 进行代码开发,本次复现基于TensorFlow。 需要注意的是 Alextnet 的输入图像大小是 227x227。 而通过 Selective Search 产生的候选区域大小不一,为了与 Alexnet 兼容,R-CNN 采用了非常暴力的手段,那就是无视候选区域的大小和形状,统一变换到 227*227 的尺寸。
网络架构我们有两个可选方案:第一选择经典的Alexnet;第二选择VGG16。经过测试Alexnet精度为58.5%,VGG16精度为66%。VGG这个模型的特点是选择比较小的卷积核、选择较小的跨步,这个网络的精度高,不过计算量是Alexnet的7倍。后面为了简单起见,我们就直接选用Alexnet,并进行讲解;Alexnet特征提取部分包含了5个卷积层、2个全连接层,在Alexnet中p5层神经元个数为9216、 fc_8、fc_10神经元个数都是4096,通过这个网络训练完毕后,最后提取特征每个输入候选框图片都能得到一个4096维的特征向量。代码如下:
class Alexnet_Net :
'''
此类用来定义Alexnet网络及其参数,之后整体作为参数输入到Solver中
'''
def __init__(self, is_training=True, is_fineturn=False, is_SVM=False):
self.image_size = cfg.Image_size
self.batch_size = cfg.F_batch_size if is_fineturn else cfg.T_batch_size
self.class_num = cfg.F_class_num if is_fineturn else cfg.T_class_num
self.input_data = tf.placeholder(tf.float32,[None, self.image_size, self.image_size,3], name='input')
self.logits = self.build_network(self.input_data, self.class_num, is_svm=is_SVM, is_training=is_training)
if is_training == True :
self.label = tf.placeholder(tf.float32, [None, self.class_num], name='label')
self.loss_layer(self.logits, self.label)
self.accuracy = self.get_accuracy(self.logits, self.label)
self.total_loss = tf.losses.get_total_loss()
tf.summary.scalar('total_loss', self.total_loss)
def build_network(self, input, output, is_svm= False, scope='R-CNN',is_training=True, keep_prob=0.5):
with tf.variable_scope(scope):
with slim.arg_scope([slim.fully_connected, slim.conv2d],
activation_fn=nn_ops.relu,
weights_initializer=tf.truncated_normal_initializer(0.0, 0.01),
weights_regularizer=slim.l2_regularizer(0.0005)):
net = slim.conv2d(input, 96, 11, stride=4, scope='conv_1')
net = slim.max_pool2d(net, 3, stride=2, scope='pool_1')
net = local_response_normalization(net)
net = slim.conv2d(net, 256, 5, scope='conv_3')
net = slim.max_pool2d(net, 3, stride=2, scope='pool_2')
net = local_response_normalization(net)
net = slim.conv2d(net, 384, 3, scope='conv_4')
net = slim.max_pool2d(net, 3, stride=2, scope='pool_3')
net = slim.conv2d(net, 384, 3, scope='conv_5')
net = slim.max_pool2d(net, 3, stride=2, scope='pool_4')
net = slim.conv2d(net, 256, 3, scope='conv_6')
net = slim.max_pool2d(net, 3, stride=2, scope='pool_5')
net = local_response_normalization(net)
net = slim.flatten(net, scope='flat_32')
net = slim.fully_connected(net, 4096, activation_fn=self.tanh(), scope='fc_8')
net = slim.dropout(net, keep_prob=keep_prob,is_training=is_training, scope='dropout9')
net = slim.fully_connected(net, 4096, activation_fn=self.tanh(), scope='fc_10')
if is_svm:
return net
net = slim.dropout(net, keep_prob=keep_prob, is_training=is_training, scope='dropout11')
net = slim.fully_connected(net, output, activation_fn=self.softmax(), scope='fc_11')
return net
def loss_layer(self, y_pred, y_true):
with tf.name_scope("Crossentropy"):
y_pred = tf.clip_by_value(y_pred, tf.cast(1e-10, dtype=tf.float32),tf.cast(1. - 1e-10, dtype=tf.float32)) #tf.clip_by_value(A, min, max):输入一个张量A,把A中的每一个元素的值都压缩在min和max之间。小于min的让它等于min,大于max的元素的值等于max。
cross_entropy = - tf.reduce_sum(y_true * tf.log(y_pred),reduction_indices=len(y_pred.get_shape()) - 1)
loss = tf.reduce_mean(cross_entropy)
tf.losses.add_loss(loss)
tf.summary.scalar('loss', loss)
def get_accuracy(self, y_pred, y_true):
y_pred_maxs =(tf.argmax(y_pred,1))
y_true_maxs =(tf.argmax(y_true,1))
num = tf.count_nonzero((y_true_maxs-y_pred_maxs))
result = 1-(num/self.batch_size)
return result
def softmax(self):
def op(inputs):
return tf.nn.softmax(inputs)
return op
def tanh(self):
def op(inputs):
return tf.tanh(inputs)
return op
参数初始化部分:物体检测的一个难点在于,物体标签训练数据少,如果要直接采用随机初始化CNN参数的方法,那么目前的训练数据量是远远不够的。这种情况下,最好的是采用某些方法,把参数初始化了,然后在进行有监督的参数微调,这边文献采用的是有监督的预训练。所以paper在设计网络结构的时候,是直接用Alexnet的网络,然后连参数也是直接采用它的参数,作为初始的参数值,然后再fine-tuning训练。
def train(self):
for step in range(1, self.max_iter+1):
if self.is_Reg:
input, labels = self.data.get_Reg_batch()
elif self.is_fineturn:
input, labels = self.data.get_fineturn_batch()
else:
input, labels = self.data.get_batch()
feed_dict = {self.net.input_data:input, self.net.label:labels}
if step % self.summary_step == 0 :
summary, loss, _=self.sess.run([self.summary_op,self.net.total_loss,self.train_op], feed_dict=feed_dict)
self.writer.add_summary(summary, step)
print("Data_epoch:"+str(self.data.epoch)+" "*5+"training_step:"+str(step)+" "*5+ "batch_loss:"+str(loss))
else:
self.sess.run([self.train_op], feed_dict=feed_dict)
if step % self.save_step == 0 :
print("saving the model into " + self.ckpt_file)
self.saver.save(self.sess, self.ckpt_file, global_step=self.global_step)
我们接着采用selective search 搜索出来的候选框。首先会逐步读入图片,然后采用seletive search 对读入的图片生成候选区域,再计算每个候选区域和ground truth(代码中的fine_turn_list)的交并比(IOU).当IOU大于阈值时,则认为是当前的候选区域属于正确类。并且将其标定为相应的类别(label)。这样每一个候选区域就会产生相应的label即(image, label). (image, label)就是Fineturn训练的训练集。然后处理到指定大小图片,继续对上面预训练的cnn模型进行fine-tuning训练。假设要检测的物体类别有N类,那么我们就需要把上面预训练阶段的CNN模型的最后一层给替换掉,替换成N+1个输出的神经元(加1,表示还有一个背景),然后这一层直接采用参数随机初始化的方法,其它网络层的参数不变;接着就可以开始继续SGD训练了。开始的时候,SGD学习率选择0.001,在每次训练的时候,我们batch size大小选择128,其中32个事正样本、96个事负样本(正负样本的定义前面已经提过,不再解释)。
首先会逐步读入图片,然后采用seletive search 对读入的图片生成候选区域,这时候会生成候选区域候选框的坐标信息, 再计算每个候选区域和ground truth(代码中的fine_turn_list)的交并比(IOU).当IOU大于阈值时,则认为是当前的候选区域属于正确类。并且将其标定为相应的类别(label), 并将这个label对应的候选区域图(image)的候选框坐标信息与ground truth的位置信息作对比,保留相对位置(平移和缩放)信息保留下来(label_bbox),作为训练数据。这样整个训练数据则为(image, label, label_bbox)对。但是在训练SVM时并没有用到label_bbox信息,还只是用来分类。而且需要对每种类型都单独训练一个分类器, 并保存训练好的模型,备用。另外SVM分类器的输入是Alex网络softmax链接层之前的全连接层的输出结果。
class SVM :
def __init__(self, data):
self.data = data
self.data_save_path = cfg.SVM_and_Reg_save
self.output = cfg.Out_put
def train(self):
svms=[]
data_dirs = os.listdir(self.data_save_path)
for data_dir in data_dirs:
images, labels = self.data.get_SVM_data(data_dir)
clf = svm.LinearSVC()
clf.fit(images, labels)
svms.append(clf)
SVM_model_path = os.path.join(self.output, 'SVM_model')
if not os.path.exists(SVM_model_path):
os.makedirs(SVM_model_path)
joblib.dump(clf, os.path.join(SVM_model_path, str(data_dir)+ '_svm.pkl'))
候选的bbox,与gt的bbox的IoU大于阈值,这个bbox才会作为正类。对于N各类别。训练N个不同的bbox regression。给定一个bbox的中心和长宽,以及cnn学习出来的pool5后的特征,学习gt bbox的中心还有长宽。目标函数:
w
即为要学习的参数,利用梯度下降法或者最小二乘法就可以得到。
class Reg_Net(object):
def __init__(self, is_training=True):
self.output_num = cfg.R_class_num
self.input_data = tf.placeholder(tf.float32, [None, 4096], name='input')
self.logits = self.build_network(self.input_data, self.output_num, is_training=is_training)
if is_training:
self.label = tf.placeholder(tf.float32, [None, self.output_num], name='input')
self.loss_layer(self.logits, self.label)
self.total_loss = tf.losses.get_total_loss()
tf.summary.scalar('total_loss', self.total_loss)
def build_network(self, input_image, output_num, is_training= True, scope='regression_box', keep_prob=0.5):
with tf.variable_scope(scope):
with slim.arg_scope([slim.fully_connected],
activation_fn=self.tanh(),
weights_initializer=tf.truncated_normal_initializer(0.0, 0.01),
weights_regularizer=slim.l2_regularizer(0.0005)):
net = slim.fully_connected(input_image, 4096, scope='fc_1')
net = slim.dropout(net, keep_prob=keep_prob, is_training=is_training, scope='dropout11')
net = slim.fully_connected(net, output_num, scope='fc_2')
return net
def loss_layer(self,y_pred, y_true):
no_object_loss = tf.reduce_mean(tf.square((1 - y_true[:, 0]) * y_pred[:, 0]))
object_loss = tf.reduce_mean(tf.square((y_true[:, 0]) * (y_pred[:, 0] - 1)))
loss = (tf.reduce_mean(y_true[:, 0] * (
tf.reduce_sum(tf.square(y_true[:, 1:5] - y_pred[:, 1:5]), 1))) + no_object_loss + object_loss)
tf.losses.add_loss(loss)
tf.summary.scalar('loss', loss)
def tanh(self):
def op(inputs):
return tf.tanh(inputs)
return op
读入测试图片,生成候选框,用SVM判断类别,如果不是背景,用Reg_box生成平移缩放值, 然后对生成的候选区域进行调整。最后针对每个类,通过计算 IoU 指标,采取非极大性抑制NMS
,以最高分的区域为基础,剔除掉那些重叠位置的区域。
if __name__ =='__main__':
Features_solver, svms, Reg_box_solver =get_Solvers()
img_path = './2flowers/jpg/1/image_1283.jpg' # or './17flowers/jpg/16/****.jpg'
imgs, verts = process_data.image_proposal(img_path)
process_data.show_rect(img_path, verts, ' ')
features = Features_solver.predict(imgs)
print(np.shape(features))
results = []
results_old = []
results_label = []
count = 0
for f in features:
for svm in svms:
pred = svm.predict([f.tolist()])
# not background
if pred[0] != 0:
results_old.append(verts[count])
#print(Reg_box_solver.predict([f.tolist()]))
if Reg_box_solver.predict([f.tolist()])[0][0] > 0.5:
px, py, pw, ph = verts[count][0], verts[count][1], verts[count][2], verts[count][3]
old_center_x, old_center_y = px + pw / 2.0, py + ph / 2.0
x_ping, y_ping, w_suo, h_suo = Reg_box_solver.predict([f.tolist()])[0][1], \
Reg_box_solver.predict([f.tolist()])[0][2], \
Reg_box_solver.predict([f.tolist()])[0][3], \
Reg_box_solver.predict([f.tolist()])[0][4]
new__center_x = x_ping * pw + old_center_x
new__center_y = y_ping * ph + old_center_y
new_w = pw * np.exp(w_suo)
new_h = ph * np.exp(h_suo)
new_verts = [new__center_x, new__center_y, new_w, new_h]
results.append(new_verts)
results_label.append(pred[0])
count += 1
average_center_x, average_center_y, average_w,average_h = 0, 0, 0, 0
#给预测出的所有的预测框区一个平均值,代表其预测出的最终位置
for vert in results:
average_center_x += vert[0]
average_center_y += vert[1]
average_w += vert[2]
average_h += vert[3]
average_center_x = average_center_x / len(results)
average_center_y = average_center_y / len(results)
average_w = average_w / len(results)
average_h = average_h / len(results)
average_result = [[average_center_x, average_center_y, average_w, average_h]]
result_label = max(results_label, key=results_label.count)
process_data.show_rect(img_path, results_old,' ')
process_data.show_rect(img_path, average_result,flower[result_label])
测试结果如下: