(Hailo) YOLOv8 모델 입출력 데이터 처리

임베디드 AI 환경에서 최신 YOLO 커스텀 모델을 Inference 할 때 입출력 영상의 전처리 및 후처리 방법

입력 데이터 전처리

Hailo 에서는 .hef 파일의 실행만을 지원하며, 실행하는 모델의 입출력 데이터에 대한 전처리 및 후처리 과정은 해당 모델의 reference를 참고하여 직접 구현해야함
입력 데이터 고려해야할 사항 참고 : (Hailo) YOLOv8 모델 입력 데이터

입력 영상

+full

1. Convert RGB

기본적으로 YOLO 최신모델을 지원하는 Ultralytics 환경에서는 입력이미지 자료형에 관계없이 모델 포워드패스 직전 무조건 RGB로 변환하는 과정이 있음.
따라서, Hailo8과 같은 임베디드 환경에서 opencv 등으로 이미지를 불러왔다면 인퍼런스 직전 RGB 포맷으로 변환해주는 작업이 필요함.
OpenCV 에서는 cvtColor(src, dst, COLOR_BGR2RGB)

2. Letter Box

.hef 파일로 컴파일된 커스텀 모델에서 모델의 입력 데이터 사이즈 (ex. 1280 x 1280)를 획득하고 사이즈에 맞게 입력 영상을 변환함

레터박스 생성

커스텀 YOLOv8 모델의 입력 사이즈는 1280x1280 (1:1) 이며, 학습 및 인퍼런스 시 Input 영상의 종횡비를 유지하며 배경(레터박스)를 추가하여 resizing함
이때, 레터박스 영역의 색상은 RGB(114, 114, 114)로 고정함(114 컬러값은 Imagenet의 평균 표준이며, 학습하지 않는 배경색으로 활용됨)

+full

+ Optional

모델 입력 단의 컨볼루션 stride에 맞도록 세로의 길이를 stride에 나누어 떨어지는 수로 조절 가능
YOLOv8의 stride 기본값 32를 이용해 1280x1280 영상 내에 들어가는 이미지 세로 사이즈를 720 > 736으로 조절 할 수 있음 (결과 비교 필요)

Code

cv::Mat letterbox(const cv::Mat& src, int new_width, int new_height, int color = 114) {
    float width = (float)src.cols;
    float height = (float)src.rows;
    float aspect_ratio = static_cast<float>(width) / height;
    float new_aspect_ratio = static_cast<float>(new_width) / new_height;
    
    float pad_width, pad_height;
    if (aspect_ratio >= new_aspect_ratio) {
        pad_height = static_cast<float>((float)new_width / aspect_ratio);
        pad_width = (float)new_width;
    } else {
        pad_width = static_cast<float>((float)new_height * aspect_ratio);
        pad_height = (float)new_height;
    }
    cv::Mat resized;
    cv::resize(src, resized, cv::Size(pad_width, pad_height));
 
    int top = (new_height - pad_height) / 2;
    int left = (new_width - pad_width) / 2;
 
    cv::Mat dst(new_height, new_width, src.type(), cv::Scalar(color, color, color));
    resized.copyTo(dst(cv::Rect(left, top, pad_width, pad_height)));
 
    return dst;
}

출력 데이터 후처리

YOLOv8 모델 출력 중 BBox 좌표는 입력 이미지 비율에 맞는 상대좌표(0~1 사이값)로 표현됨
입력이미지의 비율이 (1:1)이므로, 원본 영상(16:9)에 BBox를 매칭하기위해서 좌표값 변환이 필요함

Bounding Box 좌표 변환

원본영상(16:9 ~ else), 입력영상(1:1), 상대좌표(%), 절대좌표(px)
원본영상과 입력영상의 레터박스 만큼의 차이를 이용해 원본영상 비율에 맞도록 새로운 상대좌표를 구함
원본영상 비율에 맞는 상대좌표를 이용해 원본영상에서의 절대좌표를 구함

+full

결과

+full

출력 데이터 NMS 관련

NMS 알고리즘 참고: 13-0. NMS (Non Maximum Suppression)
일반적으로 Hailo에서 YOLO 모델을 컴파일 할 때 device에서 nms 알고리즘을 처리 하도록 nms 레이어를 추가함.
Hailo에서 제공한 Object Detection 샘플 코드를 보면, YOLO 모델에서 받은 출력(nms 처리된)을 decode 하는 코드가 있음.

NMS Decode

Hailo에서 말하는 NMS output decode 방법에 대한 설명은 아래와 같다.

NMS output decode method
------------------------
 
decodes the nms buffer received from the output tensor of the network.
returns a vector of DetectonObject filtered by the detection threshold.
 
The data is sorted by the number of the classes.
for each class - first comes the number of boxes in the class, then the boxes one after the other,
each box contains x_min, y_min, x_max, y_max and score (uint16_t\float32 each) and can be casted to common::hailo_bbox_t struct (5*uint16_t).
means that a frame size of one class is sizeof(bbox_count) + bbox_count * sizeof(common::hailo_bbox_t).
and the actual size of the data is (frame size of one class)*number of classes.
 
If the data comes after quantization - so dequantization to float32 is needed.
 
As an example - quantized data buffer of a frame that contains a person and two dogs:
(person class id = 1, dog class id = 18)
 
1 107 96 143 119 172 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 2 123 124 140 150 92 112 125 138 147 91 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 
taking the dogs as example - 2 123 124 140 150 92 112 125 138 147 91
can be splitted to two different boxes
common::hailo_bbox_t st_1 = 123 124 140 150 92
common::hailo_bbox_t st_2 = 112 125 138 147 91
now after dequntization of st_1 - we get common::hailo_bbox_float32_t:
ymin = 0.551805 xmin = 0.389635 ymax = 0.741805 xmax = 0.561974 score = 0.95

네트워크의 출력 텐서로부터 nms 버퍼를 decode 한다.
Detection threshold에 의해 필터링된 DetectonObject 벡터를 반환한다.
데이터는 클래스 수에 따라 정렬되며, 각 클래스 마다 [{박스개수}, {박스정보}, {박스정보}, …] 형태로 저장된다.
{박스정보} 는 {x_min, y_min, x_max, y_max, score} 총 5개를 포함하며 각 값은 uint16_t 또는 float32. 이 구조는 common::hailo_bbox_t 구조로 매핑 가능함.
만약 박스정보가 quantized 상태라면 uint16_t → float32 로 매핑하여 dequantization이 필요하다.

uint16_t → float32

uint16_t 는 C/C++에서 사용하는 16비트 부호 없는 정수 자료형으로 플랫폼에 상관없이 항상 2바이트(16비트)를 사용함.
uint16_t 는 0 ~ 65,535 값의 범위를 가지고, float32 형태로 매핑 방법은 아래와 같다.

real_value = scale * (quantized_value - zero_point)

# scale과 zero_point 는 네트워크 훈련 또는 컴파일 시 설정된 값

bbox dequantization 예시

예를 들어 아래와 같이 양자화된 buffer가 있다고 가정한다.

[1][107 96 143 119 172][0 0 0 0 0 ...]  <- 클래스 1: 사람, 박스 1개
[2][123 124 140 150 92][112 125 138 147 91][0 0 0 ...] <- 클래스 18: 개, 박스 2개

18번 클래스에 대해서 아래의 2개의 박스정보를 가지고 올 수 있다.

common::hailo_bbox_t st_1 = 123 124 140 150 92
common::hailo_bbox_t st_2 = 112 125 138 147 91

각 박스정보는 common::hailo_bbox_float32_t 구조로 매핑할 수 있다.

# 예시
ymin = 0.551805 xmin = 0.389635 ymax = 0.741805 xmax = 0.561974 score = 0.95

💻️ MMMSK

탐색기

최근 게시글

(Hailo) Hailo 컴파일과 메모리 할당

(Hailo) Hailo Model Zoo 데이터 전처리

3D Object Detection on Ground Plane

(Hailo) YOLOv8 모델 입출력 데이터 처리

입력 데이터 전처리

입력 영상

1. Convert RGB

2. Letter Box

레터박스 생성

+ Optional

Code

출력 데이터 후처리

Bounding Box 좌표 변환

결과

출력 데이터 NMS 관련

NMS Decode

uint16_t → float32

bbox dequantization 예시

그래프 뷰

목차

백링크

최근 게시글

(Hailo) Hailo 컴파일과 메모리 할당

(Hailo) Hailo Model Zoo 데이터 전처리

3D Object Detection on Ground Plane