主流车道线数据集的车道线评估方法

Tusimple数据 (point-based评估)

车道线上点的accuracy及车道线的FP FN rate 评估代码

基本思路

模型预测输出车道线$L_{set}$，对于第$a$条车道线，我们按$y$值从大到小排序，假设为$L^{xy}_a=\{(x_0, y_0),…,(x_t, y_t)\}$；进行二次采样并输出y_sample$=\{Y_0,…, Y_k\}$下对应的$x$轴坐标$L^x_a = \{\dot{x}_0,…, \dot{x}_k\}$。确定所有预测车道线$L^x_{set}=\{L^x_0, …, L^x_n\}$，然后$L^x_{set}$与第$p$个GT车道线$L^{gt}_p$进行逐一比较，确定当前GT车道线$L^{gt}_p$的$\{acc^0_{p}, …, acc^n_{p}\}$。

其中，当前图像上预测车道线数量为$n$。$L^{xy}_a$为初始预测的某一车道线。y_sample为Tusimple数据集预定义的$y$轴方向上的采样点集合，共有$k$个采样位置。二次采样方式可以是重采样或离散采样，根据$L^{xy}_a$确定。二次采样时，对于y_sample区间内，$\{y_0,…,y_t\}$区间未包含部分对应的$x$坐标置为$-2$。而$\{y_0,…,y_t\}$区间超过y_sample区间部分直接忽略不计

PS: $L^{xy}_a$非必要，仅针对模型不按y_sample输出车道线点的情况。如果模型直接输出$L^x_a$则不需要二次采样。

车道线$L^x_a$与第p个GT车道线$L^{gt}_p$计算$acc^a_{p}$过程：

def line_accuracy(pred, gt, thresh): # thresh阈值根据车道线斜率确定，斜率越大，thresh越大
    pred = np.array([p if p >= 0 else -100 for p in pred])
    gt = np.array([g if g >= 0 else -100 for g in gt])
    return np.sum(np.where(np.abs(pred - gt) < thresh, 1., 0.)) / len(gt)

A point is correct when the difference between a ground-truth and predicted point is less than a certain threshold.

进一步地，根据GT车道线$L^{gt}_p$与所有预测车道线的$\{acc^0_{p}, …, acc^n_{p}\}$确定最佳匹配，假设为$acc^a_{p}$，判断其是否大于pt_thresh = 0.85，进而确定TP or FN +1，单图像因此可以确定FN rate and FP rate(这里FP rate与通常定义的FP rate不一致)。

fp, fn = 0., 0.
matched = 0.
for x_gts, thresh in zip(gt, threshs):
    accs = [LaneEval.line_accuracy(np.array(x_preds), np.array(x_gts), thresh) for x_preds in pred]
    max_acc = np.max(accs) if len(accs) > 0 else 0.
    if max_acc < LaneEval.pt_thresh: # pt_thresh = 0.85
        fn += 1
    else:
        matched += 1
    line_accs.append(max_acc)
fp = len(pred) - matched
if len(gt) > 4 and fn > 0:
    fn -= 1

最后，所有样本的mean(单图像上所有GT车道线的mean($\{acc^0_{p}, …, acc^n_{p}\}$))，作为整个数据的车道线点的accuracy。所有样本的mean(FP rate)和mean(FN rate)作为整个数据的车道线FP FN rate。

All the three reported metrics reported as the average across all images of the average of each image.

CULane and CurveLanes数据 (region-based评估)

车道线precision, recall and F1 score 评估代码

基本思路

将单图像GT和预测车道线扩展为宽度为30的线区域，然后逐一计算pred和gt的IoU。进行最佳匹配。 判断其IoU大于某个阈值（定义0.3为loose、0.5为strict）视作一个TP，进而也确定FN及FP，对数据集内所有图像遍历并累积TP、FP、FN，最后确定整个数据的precision、recall、F1 score。

vector<vector<double> > similarity(anno_lanes.size(), vector<double>(detect_lanes.size(), 0));
for(int i=0; i<anno_lanes.size(); i++)
{
    const vector<Point2f> &curr_anno_lane = anno_lanes[i];
    for(int j=0; j<detect_lanes.size(); j++)
    {
        const vector<Point2f> &curr_detect_lane = detect_lanes[j];
        similarity[i][j] = lane_compare->get_lane_similarity(curr_anno_lane, curr_detect_lane);
    }
}
double LaneCompare::get_lane_similarity(const vector<Point2f> &lane1, const vector<Point2f> &lane2)
{
Mat im1 = Mat::zeros(im_height, im_width, CV_8UC1);
Mat im2 = Mat::zeros(im_height, im_width, CV_8UC1);
// draw lines on im1 and im2
//......
Scalar color_white = Scalar(1);
//扩展点集为图像上'线区域'
for(int n=0; n<p_interp1.size()-1; n++)
{
    line(im1, p_interp1[n], p_interp1[n+1], color_white, lane_width);
}
for(int n=0; n<p_interp2.size()-1; n++)
{
    line(im2, p_interp2[n], p_interp2[n+1], color_white, lane_width);
}
//计算IoU
double sum_1 = cv::sum(im1).val[0];
double sum_2 = cv::sum(im2).val[0];
double inter_sum = cv::sum(im1.mul(im2)).val[0];
double union_sum = sum_1 + sum_2 - inter_sum; 
double iou = inter_sum / union_sum;
return iou;
}

逐图像累积TP、FP、FN数量，并计算整个数据指标

long tp = 0, fp = 0, tn = 0, fn = 0;
for (auto result: tuple_lists) 
{
tp += get<1>(result);
fp += get<2>(result);
// tn = get<3>(result);
fn += get<4>(result);
}
counter.setTP(tp);
counter.setFP(fp);
counter.setFN(fn);
double precision = counter.get_precision();
double recall = counter.get_recall();
double F = 2 * precision * recall / (precision + recall);

VIL-100数据

综合上述point-based和region-based的评估指标

一篇论文[2014]

On Performance Evaluation Metrics for Lane Estimation

针对单车道线$L$而言，lane position deviation (LPD): 定义为在y轴方向的$y_{min}$到$y_{max}$区间范围内等间隔$N$采样，每一采样点处$x$坐标偏差$\delta$在所有采样点处的累计和。

$\delta_{LPD}=\frac{1}{y_{max}-y_{min}} \sum_{y=y_{min}}^{y_{max}} \delta_{y}$

写在最后

车道线的评估本质上是线段的比对和匹配问题。其实就五步：

比对单个GT线和单一预测线，这时候比对可以是基于offset或者基于IoU或者直接用阈值指示函数；
遍历所有预测线；
找该GT线的最佳匹配，并基于某个限定条件确定TP or FN；
按上述遍历所有GT线；
确定所有TP FN进而确定整个数据TP and FN and FP，进而确定最终指标（或者也可以不加限定条件，直接根据最近匹配值作为最终指标）。

但是，一个可能的边界情况是：GT线$L^{gt}_p$最佳匹配是预测车道线$L^{pred}_s$，下一个GT线$L^{gt}_q$最佳匹配也是预测车道线$L^{pred}_s$；假设都满足限定条件，$TP+2?$
所以，最佳匹配也需要是唯一匹配。$L^{pred}_s$一旦成为$L^{gt}_p$的满足限定条件的最佳匹配，$TP+1$的同时也应该$L^{pred}.pop(s)$。