Metadata

HOnnotate

HOnnotate, A method for 3D Annotation of Hand and Object Poses Shreyas Hampali, Mahdi Rad, Markus Oberweger, and Vincent Lepetit Institute for Computer Graphics and Vision, Graz University of Technology, Austria LIGM, Ecole des Ponts, Univ Gustave Eiffel, CNRS, Marne-la-Valle, France CVPR 2020


Abstract

โœ” ๊ฐ์ฒด๋ฅผ ์กฐ์ž‘ํ•˜๋Š” ์†์˜ ์ด๋ฏธ์ง€์—์„œ ์†๊ณผ ๊ฐ์ฒด์˜ 3D ํฌ์ฆˆ๋ฅผ ํ•จ๊ป˜ ์–ด๋…ธํ…Œ์ด์…˜ํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ๊ทธ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ์…‹์„ ์ œ์•ˆํ•จ

โœ” ์†์œผ๋กœ ๋ฌผ์ฒด๋ฅผ ๋‹ค๋ฃฐ ๋•Œ, ์ผ๋ฐ˜์ ์œผ๋กœ ์ƒํ˜ธ ๊ฐ€๋ฆผ์œผ๋กœ ์ธํ•ด 3D ํฌ์ฆˆ๋ฅผ ์ถ”์ •ํ•˜๋Š” ๊ฒƒ์ด ์–ด๋ ต๋‹ค โ†’ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์—์„œ ์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ์…‹์ด ๋งŽ์ง€ ์•Š๋‹ค

โœ” ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ ๊ฐœ์˜ RGB-D ์นด๋ฉ”๋ผ๋กœ ์‹œํ€€์Šค๋ฅผ ์บก์ณํ•˜๊ณ  3D์† ๋ฐ ๊ฐ์ฒด ํฌ์ฆˆ๋ฅผ ๋™๊ธฐํ™” ํ•˜์—ฌ, ํฐ ์ƒํ˜ธ ๊ฐ€๋ฆผ์—๋„ ์ •ํ™•ํ•˜๊ฒŒ ์ž๋™ ์–ด๋…ธํ…Œ์ด์…˜์ด ๊ฐ€๋Šฅํ•˜๋‹ค

โœ” ์ตœ์ดˆ๋กœ ์†๊ณผ ๋ฌผ์ฒด ๋ชจ๋‘์— ๋งˆ์ปค๊ฐ€ ์—†๋Š” ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์„ธํŠธ HO-3D ๋ฅผ ๋งŒ๋“ค์—ˆ๊ณ , 77,557๊ฐœ์˜ ํ”„๋ ˆ์ž„, 68๊ฐœ์˜ ์‹œํ€€์Šค, 10๋ช…์˜ ์‚ฌ๋žŒ, 10๋ช…์˜ ๊ฐ์ฒด๋กœ ๊ตฌ์„ฑ๋จ

โœ” ์ œ์•ˆํ•œ ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์—ฌ, ๋‹จ์ผ RGB ์ด๋ฏธ์ง€ ๊ธฐ๋ฐ˜ ์ƒํ˜ธ ๊ฐ€๋ฆผ์—๋„ ๊ฐ•์ธํ•œ Hand Pose ์˜ˆ์ธก ๋ฐฉ๋ฒ•์„ ๊ฐœ๋ฐœํ•œ๋‹ค


1. Introduction

1.1. ๋ฐฐ๊ฒฝ

  • ๋‹จ์•ˆ ์ด๋ฏธ์ง€์—์„œ ๋ฌผ์ฒด์™€ ์†์˜ 3D Pose ์ถ”์ • ๋ฐฉ๋ฒ•์€ ๋”ฅ ๋Ÿฌ๋‹์˜ ๊ฐœ๋ฐœ๊ณผ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์„ธํŠธ๋กœ ์ธํ•ด ์ตœ๊ทผ ์ƒ๋‹นํ•œ ๋ฐœ์ „์„ ์ด๋ฃจ์—ˆ์ง€๋งŒ, (์† + ๊ฐ์ฒด)์˜ ์ƒํ˜ธ์ž‘์šฉ์— ๋Œ€ํ•œ ํŠน์ • ๋ฐ์ดํ„ฐ ์„ธํŠธ๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์ด๋Ÿฌํ•œ ์ƒํ˜ธ ๊ฐ€๋ฆผ์ด ํฐ ๊ฒฝ์šฐ ์—ฌ์ „ํžˆ ์‹คํŒจํ•œ๋‹ค.
  • (์†+๊ฐ์ฒด) ์ƒํ˜ธ์ž‘์šฉ ๋ฐ์ดํ„ฐ์…‹์€ ์–ด๋…ธํ…Œ์ด์…˜ํ•˜๊ธฐ ๋งค์šฐ ์–ด๋ ต์ง€๋งŒ ์ด ๊ฒƒ์€ ์ฆ๊ฐ• ํ˜„์‹ค ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์ด๋‚˜ ๋กœ๋ด‡ ๊ณตํ•™์—์„œ ๋ชจ๋ฐฉ์„ ํ†ตํ•œ ํ•™์Šต์— ๋งค์šฐ ์œ ์šฉํ•  ๊ฒƒ.

1.2. ์‹ค์ œ ์ด๋ฏธ์ง€์— ์–ด๋…ธํ…Œ์ด์…˜ (Annotating real images)

  • 3D Hand Pose ์ถ”์ • ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ์ž๋™ ์–ด๋…ธํ…Œ์ด์…˜ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Š” ๋…ธ์ด์ฆˆ๊ฐ€ ๋งŽ์ง€๋งŒ ์ผ๋ฐ˜์ ์œผ๋กœ ๋‹น์—ฐํ•˜๊ฒŒ ํ›ˆ๋ จ๊ณผ ํ‰๊ฐ€์— ์‚ฌ์šฉ๋œ๋‹ค.
  • ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด ์†์— ๋ถ€์ฐฉ๋œ ์„ผ์„œ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ. ์ด๋Š” 3D ํฌ์ฆˆ๋ฅผ ์ง์ ‘ ์ œ๊ณตํ•˜์ง€๋งŒ, ์ด๋ฏธ์ง€์— ์„ผ์„œ๊ฐ€ ๋ณด์ผ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ํ•™์Šต์„ ํŽธํ–ฅ์‹œํ‚จ๋‹ค. โ†> ๋งˆ์ปค๋Š” ์† ๋ชจ์–‘์„ ๋ณ€๊ฒฝํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€์˜ 3D ์† ๋ ˆ์ด๋ธ”๋ง์— ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๋‹ค.


1.3. ํ•ฉ์„ฑ ์ด๋ฏธ์ง€ ์ƒ์„ฑ (Generating synthetic images)

  • ํ•ฉ์„ฑ ์ด๋ฏธ์ง€(์† ๋ชจ๋ธ ๋ Œ๋”๋ง ์ด๋ฏธ์ง€ ์ธ๋“ฏ)๋Š” 3D ํฌ์ฆˆ๋ฅผ ์™„๋ฒฝํ•˜๊ฒŒ ์•Œ๊ณ  ์žˆ๋‹ค
  • ํ•ฉ์„ฑ์ด๋ฏธ์ง€๋ฅผ Realistic ๋ Œ๋”๋ง๊ณผ Domain transfer ๋ฅผ ํ†ตํ•ด ํ•™์Šต ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค (์ฐธ๊ณ )
  • ํ•˜์ง€๋งŒ ๋ณต์žกํ•œ ์กฐ์ž‘์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•˜๊ธฐ๊ฐ€ ์–ด๋ ต๊ณ  ์‹ค์ œ ๋ฐ์ดํ„ฐ์˜ ์ผ๋ฐ˜ํ™”๋ฅผ ์œ„ํ•ด์„œ๋Š” ์—ฌ์ „ํžˆ ์‹ค์ œ ์ด๋ฏธ์ง€ - 3D ์–ด๋…ธํ…Œ์ด์…˜ ์ •๋ณด๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

์ œ์•ˆ (Proposal)

  • ์†๊ณผ ๋ฌผ์ฒด๊ฐ€ ์ƒํ˜ธ์ž‘์šฉํ•˜๋Š” ์‹ค์ œ ์ด๋ฏธ์ง€์— 3D ํฌ์ฆˆ ์ž๋™ ์–ด๋…ธํ…Œ์ด์…˜ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•จ


  • ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์€ ๋‹จ์ผ RGB-D ์นด๋ฉ”๋ผ์—์„œ ์ž‘๋™ํ•˜์ง€๋งŒ, ๋” ๋‚˜์€ ์ •ํ™•์„ฑ์„ ์œ„ํ•ด ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ ๋” ๋งŽ์€ ์นด๋ฉ”๋ผ๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค

  • ๋‹จ์ผ ์นด๋ฉ”๋ผ ์…‹์—…์€ ์‹œํ€€์Šค์— ๋”ฐ๋ผ ๊ทธ๋žฉ ํฌ์ฆˆ๊ฐ€ ์กฐ๊ธˆ์”ฉ ๋‹ค๋ฅด๋‹ค๋Š” ๊ฐ€์ •ํ•˜๊ฒŒ ์ž‘๋™ํ•˜๊ณ , ๋ฉ€ํ‹ฐ ์นด๋ฉ”๋ผ ์…‹์—…์€ ๋ณต์žกํ•œ ์†+๊ฐ์ฒด ์ƒํ˜ธ์ž‘์šฉ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค


  • ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์€ frame-by-frame ์œผ๋กœ ํฌ์ฆˆ๋ฅผ ์ถ”์ ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ, ์‹œํ€€์Šค์— ๊ฑธ์ณ ์†๊ณผ ๋ฌผ์ฒด์˜ ๋ชจ๋“  3D ํฌ์ฆˆ๋ฅผ ์ตœ์ ํ™” ํ•œ๋‹ค


  • MANO ํ•ธ๋“œ ๋ชจ๋ธ ๊ณผ ๋ฌผ์ฒด์˜ 3D ๋ชจ๋ธ(YCB-Video ๋ฐ์ดํ„ฐ ์„ธํŠธ)์— ์˜์กดํ•จ


  • ์œ„ ๊ทธ๋ฆผ์€ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ์ˆ˜์ง‘๋œ HO-3D ๋ฐ์ดํ„ฐ์„ธํŠธ์ด๊ณ , ์ด ๊ฒƒ์„ ์‚ฌ์šฉํ•ด ๋‹จ์ผ RGB ์ด๋ฏธ์ง€์—์„œ ๊ฐ์ฒด๋ฅผ ์กฐ์ž‘ํ•˜๋Š” ์†์˜ 3D ํฌ์ฆˆ ์˜ˆ์ธก์„ ํ•™์Šตํ•˜์˜€๋‹ค.

    • ์ด๋ฏธ์ง€๋ฅผ ์ž…๋ ฅ์œผ๋กœ ์ฃผ๋ฉด, ๊ด€์ ˆ์˜ ๋ฐฉํ–ฅ ๋ฒกํ„ฐ์™€ 2D ํฌ์ธํŠธ๋ฅผ ์˜ˆ์ธกํ•˜๋„๋ก ๋”ฅ ๋Ÿฌ๋‹์„ ํ›ˆ๋ จ์‹œ์ผฐ๊ณ , ์˜ˆ์ธก๋œ 2D ํฌ์ธํŠธ์™€ ๋ฐฉํ–ฅ ๋ฒกํ„ฐ๋ฅผ MANO ๋ชจ๋ธ์— ์ ์šฉ(ํ”ผํŒ…)ํ•˜์—ฌ 3D๋กœ ๋ฆฌํ”„ํŒ…ํ•จ

    • ์ด ๊ฒ€์ฆ์€ ์ œ์•ˆํ•œ ์–ด๋…ธํ…Œ์ด์…˜ ๋ฐฉ๋ฒ•์œผ๋กœ ์ถ”์ •๋œ 3D ํฌ์ฆˆ๊ฐ€, ์‹ค์ œ๋กœ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค๋Š” ์‚ฌ์‹ค์„ ๊ฒ€์ฆํ•œ๋‹ค.


  • MANO ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ๋‹ค์ด๋ ‰ํŠธ๋กœ ์ถ”์ •ํ•˜๋Š” ํ•ธ๋“œ+๊ฐ์ฒด ํฌ์ฆˆ ์ถ”์ •์„ ์œ„ํ•œ ๋ฐฉ๋ฒ•(ObMan)๊ณผ ๋น„๊ตํ•˜์—ฌ 2D ํ‚คํฌ์ธํŠธ๋ฅผ ์˜ˆ์ธกํ•˜๊ณ  3D๋กœ ๋ฆฌํ”„ํŒ…ํ•˜๋Š” ๊ฒƒ์ด ๋” ์ •ํ™•ํ•˜๊ฒŒ ์ˆ˜ํ–‰๋œ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค.



2.1. 3D Object Pose Estimation

  • ๋‹จ์ผ ํ”„๋ ˆ์ž„์—์„œ ๊ฐ์ฒด์˜ 3D ํฌ์ฆˆ๋ฅผ ์ถ”์ •ํ•˜๋Š” ๋ฌธ์ œ.
  • ์ผ๋ถ€ ๋ฐฉ๋ฒ•์€ ๊ฐ€๋ฆผ์— ๊ฐ•ํ•˜์ง€๋งŒ ๋Œ€๋ถ€๋ถ„์ด 3D ๊ฐ์ฒด ๋ชจ๋ธ์„ Depth ๋ฐ์ดํ„ฐ์— ํ”ผํŒ…ํ•˜๊ธฐ ์œ„ํ•ด RGB-D ๋ฐ์ดํ„ฐ์— ์˜์กดํ•œ๋‹ค.
  • ์ด๋Ÿฌํ•œ ๊ฒฝ์šฐ ์†์ด ๊ฐ์ฒด๋ฅผ ์žก์„ ๋•Œ, ์†์ด ๋ฌผ์ฒด์˜ ํ‘œ๋ฉด์œผ๋กœ ์˜ค์ธ๋  ์ˆ˜ ์žˆ์–ด์„œ ํฌ์ฆˆ ์ถ”์ •์„ ์‹คํŒจํ•  ์ˆ˜ ์žˆ๋‹ค.

2.2. 3D Hand Pose Estimation

  • ์‹ฑ๊ธ€ ์ด๋ฏธ์ง€์—์„œ ์†์˜ 3D ํฌ์ฆˆ๋ฅผ ์ถ”์ •ํ•˜๋Š” ๋ฌธ์ œ.
  • Discriminative ๋ฐฉ๋ฒ•๊ณผ Generative ๋ฐฉ๋ฒ•์œผ๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค.
  • Discriminative ๋ฐฉ๋ฒ•์€ RGB ๋˜๋Š” RGB-D ์ด๋ฏธ์ง€์—์„œ ๊ด€์ ˆ ์œ„์น˜๋ฅผ ์ง์ ‘ ์ฐพ์•„๋‚ธ๋‹ค.
  • ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ดˆ๊ธฐ ๋ฐฉ๋ฒ•์—์„œ๋ถ€ํ„ฐ ํ˜„์žฌ ๋”ฅ๋Ÿฌ๋‹์„ ๊ธฐ๋ฐ˜์œผ๋กœํ•˜์—ฌ ๋†€๋ผ์šด ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋Š” ์ถ”์„ธ.
  • ๊ทธ๋Ÿฌ๋‚˜ Discriminative ๋ฐฉ๋ฒ•์€ ๋ถ€๋ถ„ ๊ฐ€๋ฆผ์ด ์ผ์–ด๋‚  ๋•Œ ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง„๋‹ค.
  • Generative ๋ฐฉ๋ฒ•์€ ํ•ธ๋“œ ๋ชจ๋ธ์˜ ์šด๋™ํ•™์  ๊ตฌ์กฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ํƒ€๋‹นํ•œ ํ•ธ๋“œ ํฌ์ฆˆ ๊ฐ€์„ค์˜ ์ด์ ์„ ๊ฐ€์ง„๋‹ค. GANerated, ๋ฐฉ๋ฒ•2 ๋Š” 2D ๊ด€์ ˆ ์œ„์น˜๋ฅผ ์˜ˆ์ธกํ•œ ํ›„ 3D๋กœ ๋ฆฌํ”„ํŠธํ•จ.
  • ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์€ Discriminative ๋ฐฉ๋ฒ•๊ณผ Generative ๋ฐฉ๋ฒ• ๋ชจ๋‘์™€ ๊ด€๋ จ์žˆ๋‹ค. Generative ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ธ€๋กœ๋ฒŒ ์ตœ์ ํ™” ํ”„๋ ˆ์ž„์›Œํฌ ๋‚ด์—์„œ ํฌ์ฆˆ ์–ด๋…ธํ…Œ์ด์…˜์„ ์ƒ์„ฑํ•œ๋‹ค. Discriminative ๋ฐฉ๋ฒ•์€ ์ด ๋ณต์žกํ•œ ์ตœ์ ํ™”๋ฅผ ์ดˆ๊ธฐํ™”ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•œ๋‹ค.

2.3. Synthetic Images for 3D Pose Estimation

  • ์‹ค์ œ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ์–ด๋…ธํ…Œ์ด์…˜์„ ํš๋“ํ•˜๊ธฐ ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ์—, ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ Discriminative ๋ฐฉ๋ฒ•์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์€ ๊ฐ€์น˜๊ฐ€ ์žˆ๋‹ค.
  • ์†์˜ ํ•ฉ์„ฑ ์ด๋ฏธ์ง€๋ฅผ ๋ณด๋‹ค ์‚ฌ์‹ค์ ์œผ๋กœ(์‹ค์ œ์ฒ˜๋Ÿผ) ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด GAN ์„ ์‚ฌ์šฉํ•œ GANerated
  • ํ•ฉ์„ฑ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ ๋งค๋ ฅ์ ์ด์ง€๋งŒ ๊ฐ€์ƒ ์žฅ๋ฉด(Scene)์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ๋น„์šฉ๊ณผ ์‹œ๊ฐ„์ด ๋งŽ์ด ๋“ ๋‹ค.

2.4. Joint Hand+Object Pose Estimation

  • ๊ณต๋™ ์†+๊ฐ์ฒด ํฌ์ฆˆ ์ถ”์ •์„ ์œ„ํ•œ ์ดˆ๊ธฐ ํฌ์ฆˆ ์ถ”์ • ๋ฐฉ๋ฒ•์€ ์ผ๋ฐ˜์ ์œผ๋กœ ๋ฉ€ํ‹ฐ๋ทฐ ์นด๋ฉ”๋ผ ์…‹์—…๊ณผ ํ”„๋ ˆ์ž„ ๋ฐ”์ด ํ”„๋ ˆ์ž„ ํŠธ๋ž˜ํ‚น ๋ฐฉ๋ฒ•์— ์˜์กด ํ–ˆ์œผ๋ฉฐ, ์ด๋Š” ์‹œ๊ฐ„์ด ์ง€๋‚จ์— ๋”ฐ๋ผ ๋ฐœ์ƒํ•˜๋Š” ๋“œ๋ฆฌํ”„ํŠธ ์˜ค๋ฅ˜๋‚˜ ์ดˆ๊ธฐํ™”์— ๋Œ€ํ•ด ์ฃผ์˜๊ฐ€ ํ•„์š”ํ•˜๋‹ค.
  • A ๋ฐฉ๋ฒ•์€ RGB-D ๊ฐ์ฒด ํ˜•์ƒ ์Šค์บ๋‹์„ ์œ„ํ•ด ์†๊ฐ€๋ฝ ์ ‘์ด‰์ ์„ ์ถ”์ ํ•˜๋Š” ์ƒ์„ฑ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•จ.
  • B ๋ฐฉ๋ฒ•์€ RGB-D ์—์„œ ์†๊ณผ ๋ฌผ์ฒด์˜ ํ”„๋ ˆ์ž„ ๊ฐ„ ์ถ”์ ์„ ์œ„ํ•ด ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์™€ 3D ๋ Œ๋”๋Ÿฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์„ ์ œ์•ˆํ•จ.
  • C ๋ฐฉ๋ฒ•์€ RGB-D ์ด๋ฏธ์ง€์—์„œ ๋‹ค์ค‘ ๊ฐ์ฒด ๋ฐ ๋‹ค์ค‘ ์† ์ถ”์ ์„ ์œ„ํ•ด Collaborative Tracker์˜ ์•™์ƒ๋ธ”์„ ์‚ฌ์šฉํ•จ.
  • ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์˜ ์ •ํ™•๋„๋Š” ์งˆ์ ์œผ๋กœ ๋†’์€ ๊ฒƒ์œผ๋กœ ๋ณด์ด์ง€๋งŒ, ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ ์‹ค์ธก ์ž๋ฃŒ ์ˆ˜์ง‘์ด ์–ด๋ ค์šด ๊ฒƒ์œผ๋กœ ์•Œ๋ ค์ ธ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•ด ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์„ ํ‰๊ฐ€ํ•˜๊ฑฐ๋‚˜, ํŒŒ์•… ์‹œ๋‚˜๋ฆฌ์˜ค ์ค‘ ์† ๋˜๋Š” ๋ฌผ์ฒด ์ž์„ธ ์ฐจ์ด์˜ ํ‘œ์ค€ ํŽธ์ฐจ๋ฅผ ์ธก์ •ํ•˜์—ฌ ํ‰๊ฐ€ํ•œ๋‹ค.

2.5. Hand+Object Datasets

  • ์†+๊ฐ์ฒด ์ƒํ˜ธ ์ž‘์šฉ์„ ์œ„ํ•œ ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ๊ฐ€ ์ด๋ฏธ ์ œ์•ˆ๋˜์—ˆ์ง€๋งŒ, ๋Œ€๋ถ€๋ถ„ ๊ทธ๋žฉ๊ณผ ๋™์ž‘ ๋ ˆ์ด๋ธ”๋ง์— ์ดˆ์ ์„ ๋‘๊ณ , 3D ํฌ์ฆˆ๋ฅผ ์ œ๊ณตํ•˜์ง€ ์•Š๋Š”๋‹ค.


  • ๋ฐฉ๋ฒ• 54๋Š” ํ๋ธŒ๋ฅผ ์กฐ์ž‘ํ•˜๋Š” ์†์˜ RGB-D ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ œ์•ˆํ–ˆ๋Š”๋ฐ, ์—ฌ๊ธฐ์—๋Š” ํ๋ธŒ์˜ ํ•‘๊ฑฐํŒ ์œ„์น˜์™€ 3D ํฌ์ฆˆ ๋ชจ๋‘์— ๋Œ€ํ•œ ์ˆ˜๋™ ์‹ค์ธก ์ •๋ณด๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ๋‹ค.


  • ๋ฐฉ๋ฒ• 15 ๋Š” ์† ๊ด€์ ˆ๊ณผ ๋ฌผ์ฒด ํฌ์ฆˆ ๋ชจ๋‘์— ๋Œ€ํ•œ 3D ์ฃผ์„์„ ๊ฐ€์ง„ ์†๊ณผ ๋ฌผ์ฒด ์ƒํ˜ธ ์ž‘์šฉ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ œ๊ณตํ•œ๋‹ค. RGB-D ๋น„๋””์˜ค ์‹œํ€€์Šค์—์„œ ์† 3D ํฌ์ฆˆ ์ฃผ์„์„ ์–ป๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ์ž์˜ ์†๊ณผ ๋ฌผ์ฒด์— ๋ถ€์ฐฉ๋œ ์ž๊ธฐ ์„ผ์„œ๋กœ ๋งŒ๋“ค์–ด์ง„ ๋ชจ์…˜ ์บก์ฒ˜ ์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ–ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Š” ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€์—์„œ ์„ผ์„œ์™€ ์„ผ์„œ๋ฅผ ๋ถ€์ฐฉํ•˜๋Š” ํ…Œ์ดํ”„๊ฐ€ ๋ณด์ด๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค.

  • ์ตœ๊ทผ ObMan, 2019 ์†์ด ๋ฌผ์ฒด๋ฅผ ์žก๋Š” ๋Œ€๊ทœ๋ชจ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์…‹์„ ์†Œ๊ฐœํ•จ. ๋ฐ์ดํ„ฐ์„ธํŠธ๋Š” ํ•ฉ์„ฑ ์ด๋ฏธ์ง€๋กœ ๊ตฌ์„ฑ๋˜์–ด์žˆ๊ณ  ๋กœ๋ด‡ ๊ณตํ•™์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ์„ฑ๋จ.


  • FeriHAND, 2019 ๋Š” ์†-๊ฐ์ฒด ์ƒํ˜ธ ์ž‘์šฉ์„ ํฌํ•จํ•˜๋Š” ๋ฉ€ํ‹ฐ ๋ทฐ RGB ๋ฐ์ดํ„ฐ์„ธํŠธ๋ฅผ ์ œ์•ˆํ•˜์˜€์œผ๋‚˜, ์ฃผ์„์€ ์†์˜ 3D ํฌ์ฆˆ์™€ ๋ชจ์–‘๋งŒ ์žˆ๋‹ค.

  • ๋…น์ƒ‰ ํ™”๋ฉด ๋ฐฐ๊ฒฝ ํ™˜๊ฒฝ์—์„œ ์—ฌ๋Ÿฌ RGB ์นด๋ฉ”๋ผ๋กœ๋ถ€ํ„ฐ ์ฃผ์„์„ ์–ป๊ธฐ ์œ„ํ•ด human-in-the-loop ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•œ๋‹ค.


์ œ์•ˆ๋ฐฉ๋ฒ•

์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์€ ์™„์ „ ์ž๋™ํ™”๋กœ ์–ด๋…ธํ…Œ์ด์…˜์„ ํ•˜๊ณ , HO-3D ๋ฐ์ดํ„ฐ์„ธํŠธ๋Š” ์‹ค์ œ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ 3D ์† ๊ด€์ ˆ๊ณผ 3D ๋ฌผ์ฒด ํฌ์ฆˆ ์ฃผ์„์„ ๋ชจ๋‘ ์ œ๊ณตํ•˜๋Š” ์ตœ์ดˆ์˜ ๋งˆ์ปค ์—†๋Š” ๋ฐ์ดํ„ฐ์„ธํŠธ์ด๋‹ค


3. 3D Annotation Method

  • (3.1) 3D ์†๊ณผ ๋ฌผ์ฒด ํฌ์ฆˆ๋ฅผ ์ •์˜
  • (3.2) ๋น„์šฉํ•จ์ˆ˜ ์ •์˜
  • (4.1, 4.2) ํฌ์ฆˆ๋ฅผ ์ž๋™์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•˜๊ณ  ์—ฌ๋Ÿฌ ๋‹จ๊ณ„์—์„œ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•

3.1. 3D Hand and Object Poses

  • ์‹œํ€€์Šค์˜ ๋ชจ๋“  ์ด๋ฏธ์ง€์—์„œ ๊ฐ์ฒด์™€ ์†์˜ 3D ์ž์„ธ๋ฅผ ์ถ”์ •ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœํ•จ
  • MANO ํ•ธ๋“œ ๋ชจ๋ธ์„ ์ฑ„ํƒํ•˜๊ณ  YCB-Video dataset ์„ ์‚ฌ์šฉํ–ˆ๋‹ค. ํ•ด๋‹น 3D ๋ชจ๋ธ์ด ์‚ฌ์šฉ๊ฐ€๋Šฅํ•˜๊ณ  ํ’ˆ์งˆ์ด ์ข‹๊ธฐ ๋•Œ๋ฌธ์—.
  • MANO ํ•ธ๋“œ ํฌ์ฆˆ๋Š” 51 DoF = 45 DoF (15๊ฐœ ์†๊ฐ€๋ฝ ๊ด€์ ˆ ๋งˆ๋‹ค 3 DoF ์”ฉ ) + ์†๋ชฉ ๊ด€์ ˆ์˜ 6 DoF (ํšŒ์ „ 3, ์ด๋™ 3) ๋กœ ๊ตฌ์„ฑ๋˜์–ด์žˆ๋‹ค.
  • ์†๋ชฉ๊ด€์ ˆ๊ณผ 15๊ฐœ์˜ ๊ด€์ ˆ์€ ์†๋ชฉ๊ด€์ ˆ ๋…ธ๋“œ๋ฅผ ์ฒซ ๋ฒˆ์งธ ๋ถ€๋ชจ ๋…ธ๋“œ๋กœ ํ•˜๋Š” ์šด๋™ํ•™์  ํŠธ๋ฆฌ๋ฅผ ํ˜•์„ฑํ•จ.
  • ํฌ์ฆˆ ํŒŒ๋ผ๋ฏธํ„ฐ ์™ธ์—๋„, ํ•ธ๋“œ ๋ชจ๋ธ์—๋Š” shape ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ€ ์žˆ๊ณ  ๋ฐฉ๋ฒ•58๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ ์ถ”์ •ํ•œ๋‹ค.

3.2. Cost Function

  • ์† + ๋ฌผ์ฒด ํฌ์ฆˆ ์ถ”์ •์„ energy minimization ์œผ๋กœ ๊ณต์‹ํ™”: ์ด๋•Œ, ์™€ ๋Š” ๋ฐ์ดํ„ฐ ํ•ญ๊ณผ ์ œ์•ฝ์กฐ๊ฑด์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.
  • ๋จผ์ € ๋ฅผ ๋ณด๋ฉด,
    • : ์‹ค๋ฃจ์—ฃ ๋ถˆ์ผ์น˜ ํ•ญ
    • : Depth residual ํ•ญ
    • : 2D ํ•ธ๋“œ ์กฐ์ธํŠธ ์œ„์น˜ ์—๋Ÿฌ
    • : 3D error ํ•ญ
  • ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋œ๋‹ค.
    • : ๋ถ€์ž์—ฐ์Šค๋Ÿฌ์šด ํฌ์ฆˆ๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์†์„ ์‚ฌ์ „ ํฌ์ฆˆ
    • : ์†๊ณผ ๋ฌผ์ฒด๊ฐ€ ์„œ๋กœ ๊ฒน์น˜์ง€ ์•Š๋„๋ก ํ•˜๋Š” ๋ฌผ๋ฆฌ์  ํƒ€๋‹น์„ฑ
    • : ์‹œ๊ฐ„์  ์ผ๊ด€์„ฑ

Silhouette discrepancy term

์‹ค๋ฃจ์—ฃ ๋ถˆ์ผ์น˜ ํ•ญ (Silhouette discrepancy term)
  • (ํ˜„์žฌ ์ถ”์ •๋œ ํฌ์ฆˆ๋กœ ๋ Œ๋”๋ง๋œ ์†๊ณผ ๊ฐ์ฒด์˜ ์‹ค๋ฃจ์—ฃ)๊ณผ ๊ทธ๋“ค์˜(์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜ ๋งˆ์Šคํฌ)์™€ ๋น„๊ตํ•œ๋‹ค.
  • ๋Š” ์นด๋ฉ”๋ผ ์—์„œ ๋ Œ๋”๋ง๋œ ์†๊ณผ ๊ฐ์ฒด์˜ ์‹ค๋ฃจ์—ฃ
  • ์†๊ณผ ๋ฌผ์ฒด ๋ชจ๋ธ์€ ์ž์„ธ ๋งค๊ฐœ ๋ณ€์ˆ˜์™€ ๊ด€๋ จ๋œ ๋ฐฉ์ •์‹์˜ ๋„ํ•จ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋Š” ๋ฏธ๋ถ„ ๋ Œ๋”๋Ÿฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์นด๋ฉ”๋ผ ํ‰๋ฉด์—์„œ ๋ Œ๋”๋ง๋œ๋‹ค.
  • ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜ ๋Š” ์นด๋ฉ”๋ผ ์˜ ์ปฌ๋Ÿฌ ์ด๋ฏธ์ง€ ์—์„œ ์–ป๋Š”๋‹ค. (YCB ๋ฌผ์ฒด์— ์†์˜ ์ด๋ฏธ์ง€๋ฅผ ์˜ค๋ฒ„๋ ˆ์ด, ์–ธ๋”๋ ˆ์ด ํ•˜์—ฌ ํ•ฉ์„ฑํ•œ ์ด๋ฏธ์ง€๋กœ ํ•™์Šต๋œ DeepLabv3๋ฅผ ์ด์šฉํ•จ)

Depth residual term

Depth residual term
  • ์†๊ณผ ๋ฌผ์ฒด์˜ Depth Map ๋ Œ๋”๋ง๊ณผ ์นด๋ฉ”๋ผ ์—์„œ ์ดฌ์˜๋œ Depth Map์„ ๋น„๊ตํ•œ๋‹ค.
  • ๋Š” ํ˜„์žฌ ์ถ”์ •๋œ ์†๊ณผ ๊ฐ์ฒด์˜ ํฌ์ฆˆ์˜ Depth ๋ Œ๋”๋ง
  • ๋Š” ์นด๋ฉ”๋ผ ์—์„œ ์–ป์€ Depth.
  • ๋ฏธ๋ถ„ ๋ Œ๋”๋Ÿฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Depth Map์ด ๋ Œ๋”๋ง ๋œ๋‹ค.

2D Joint error term

2D ํ•ธ๋“œ ์กฐ์ธํŠธ ์œ„์น˜ ์—๋Ÿฌ (2D Joint error term)
  • ==21๊ฐœ์˜ ์† ๊ด€์ ˆ์€ 15๊ฐœ์˜ ์†๊ฐ€๋ฝ ๊ด€์ ˆ, 5๊ฐœ์˜ ์†๊ฐ€๋ฝ ๋, ์†๋ชฉ ๊ด€์ ˆ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Œ==
  • ๋Š” ํ•ธ๋“œํฌ์ฆˆ ์˜ ๋ฒˆ์งธ 3D ์† ๊ด€์ ˆ ์œ„์น˜
  • ๋Š” ๋ฅผ ์นด๋ฉ”๋ผ ์— ํˆฌ์˜
  • ๋Š” ์˜ˆ์ธก๋œ 2D ์œ„์น˜
  • ๋Š” ์ปจํ”ผ๋˜์Šค
  • ํžˆํŠธ๋งต์—์„œ ๋Š” ์ตœ๋Œ€ ๊ฐ’์˜ ์œ„์น˜์ด๊ณ , ์€ ์ตœ๋Œ€ ๊ฐ’ ๊ทธ ์ž์ฒด.
  • ํžˆํŠธ๋งต์„ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด์„œ CNN ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•์˜ ์•„ํ‚คํ…์ฒ˜ CPM์„ ํ•™์Šตํ–ˆ๋‹ค.
  • ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹์€ ์šฐ๋ฆฌ์˜ ๋ฐ˜์ž๋™ ๋ฐฉ๋ฒ•์œผ๋กœ์ƒ์„ฑ๋œ ์ดˆ๊ธฐ ๋ฐ์ดํ„ฐ์…‹(๊ฐ ์‹œํ€€์Šค์˜ ์ฒซ ๋ฒˆ์งธ ํ”„๋ ˆ์ž„์— ๋Œ€ํ•œ ๊ทธ๋ฆฝ ํฌ์ฆˆ์™€ ๊ฐ์ฒด ํฌ์ฆˆ๋ฅผ ์ˆ˜๋™์œผ๋กœ ์ดˆ๊ธฐํ™”, ๊ทธ ํ›„ ์ตœ์ ํ™”)๊ณผ Panoptic Studio Dataset

3D error term

3D ์—๋Ÿฌ ํ•ญ (3D error term)
  • ํ•ญ์—์„œ ๋ชจ๋“  ์นด๋ฉ”๋ผ์˜ ๊นŠ์ด ์ •๋ณด๊ฐ€ ์‚ฌ์šฉ๋˜๋ฏ€๋กœ ๋ฐ˜๋“œ์‹œ ํ•„์š”ํ•˜์ง€๋Š” ์•Š์ง€๋งŒ, ์ตœ์†Œ๊ฐ’ ์ˆ˜๋ ด์„ ๊ฐ€์†ํ™” ํ•˜๋Š”๋ฐ ๋„์›€์ด ๋œ๋‹ค.
  • RGB-D ์นด๋ฉ”๋ผ๋“ค์˜ ๊นŠ์ด ๋งต์„ ๊ธฐ์ค€ ํ”„๋ ˆ์ž„์œผ๋กœ ๋ณ€ํ™˜ ํ›„ ์ด๋ฅผ ๋ณ‘ํ•ฉํ•˜์—ฌ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ ๋ฅผ ๊ตฌ์ถ•ํ•œ๋‹ค.
  • ๊ฐ ์นด๋ฉ”๋ผ ์ด๋ฏธ์ง€์˜ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜ ๋งˆ์Šคํฌ ๋ฅผ ์ด์šฉํ•ด ์—์„œ ๊ฐ์ฒด ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ ์™€ ํ•ธ๋“œ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ ๋ฅผ ๋ถ„ํ• ํ•œ๋‹ค.
  • ์ตœ์ ํ™”์˜ ๊ฐ ๋ฐ˜๋ณต๊ณผ์ •์—์„œ ํฌ์ธํŠธ ํด๋ผ์šฐ๋“œ์™€ mesh ๊ฐ„ (๊ฐ์ฒด ํฌ์ธํŠธ - ๊ฐ์ฒด mesh์˜ vertex), (ํ•ธ๋“œ ํฌ์ธํŠธ - ํ•ธ๋“œ mesh์˜ vertex) ๊ฐ๊ฐ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์Œ์„ ์ฐพ์•„ ์„œ๋กœ ๋น„๊ตํ•œ๋‹ค.

Joint angle constraint

๊ด€์ ˆ ๊ฐ๋„ ์ œ์•ฝ (Joint angle constraint)
  • ๊ฒฐ๊ณผ ํฌ์ฆˆ์˜ ์ž์—ฐ์Šค๋Ÿฌ์›€์„ ๋ณด์žฅํ•˜๊ธฐ์œ„ํ•ด ์†์˜ 15๊ฐœ ๊ด€์ ˆ์— ์ œํ•œ์„ ๋‘”๋‹ค.
  • ๊ด€์ ˆ์˜ 3์ฐจ์› ํšŒ์ „์€ MANO ๋ชจ๋ธ์—์„œ ์ถ• ๊ฐ๋„ ํ‘œํ˜„์„ ์‚ฌ์šฉํ•˜์—ฌ ๋งค๊ฐœ๋ณ€์ˆ˜ํ™” ๋˜์–ด 45๊ฐœ์˜ ๊ด€์ ˆ ๊ฐ๋„ ๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ€ ์ƒ์„ฑ๋œ๋‹ค.
  • ๋Š” ์—์„œ ๋ฒˆ์งธ ๊ด€์ ˆ ๊ฐ๋„ ํŒŒ๋ผ๋ฏธํ„ฐ
  • ๋Š” lower limit, ๋Š” upper limit

Physical plausibility

๋ฌผ๋ฆฌ์  ํƒ€๋‹น์„ฑ (Physical plausibility)
  • ์ตœ์ ํ™” ์ค‘์— ํ•ธ๋“œ ๋ชจ๋ธ์€ ๊ฐ์ฒด ๋ชจ๋ธ์„ ๊ด€ํ†ตํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋Š” ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค. ์ด๋ฅผ ํ”ผํ•˜๊ธฐ ์œ„ํ•ด ๋ฌผ์ฒด์™€ ์†์ด ์„œ๋กœ ์นจํˆฌํ•  ๊ฒฝ์šฐ ์„œ๋กœ ๋ฐ€์–ด๋‚ด๋Š” ํ•ญ.
  • ๊ฐ ํ•ธ๋“œ vertex ์— ๋Œ€ํ•ด, ๊ด€ํ†ตํ•˜๋Š” ์–‘ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜๋œ๋‹ค.

๋Š” ํ•ธ๋“œ vertex์™€ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๊ฐ์ฒด vertex ๋Š” vertex์˜ ๋…ธ๋ฉ€ ๋ฒกํ„ฐ

  • ์ฆ‰, ์นจํˆฌ๋Ÿ‰์€ ์† ๊ผญ์ง“์ ๊ณผ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋ฌผ์ฒด ๊ผญ์ง“์ ์„ ์—ฐ๊ฒฐํ•˜๋Š” ๋ฒกํ„ฐ๋ฅผ ๋ฌผ์ฒด ๊ผญ์ง“์  ์œ„์น˜์˜ ์ •๊ทœ ๋ฒกํ„ฐ์— ํˆฌ์˜ํ•จ์œผ๋กœ์จ ์ถ”์ •๋œ๋‹ค.

Temporal consistency

์‹œ๊ฐ„์  ์ผ๊ด€์„ฑ (Temporal consistency)
  • ์ด์ „ ํ•ญ๋“ค์€ ๋ชจ๋‘ ๊ฐ ํ”„๋ ˆ์ž„์— ๋…๋ฆฝ์ ์œผ๋กœ ์ ์šฉ๋˜์ง€๋งŒ, ํ•ด๋‹น ํ•ญ์€ ๋ชจ๋“  ํ”„๋ ˆ์ž„์— ๋Œ€ํ•œ ํฌ์ฆˆ๋ฅผ ํ•จ๊ป˜ ์ œํ•œํ•  ์ˆ˜ ์žˆ๋‹ค.

4. Optimization

  • Eq(1) ์ตœ์ ํ™”๋Š” ์ถ”์ •ํ•ด์•ผ ํ•  ๋งค๊ฐœ ๋ณ€์ˆ˜๊ฐ€ ๋งŽ์€ ๋งค์šฐ ๋น„๋ณผ๋ก ๋ฌธ์ œ์ด๊ธฐ ๋•Œ๋ฌธ์— ์–ด๋ ค์šด ์ž‘์—…์ด๋ฏ€๋กœ, ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด ์—ฌ๋Ÿฌ ๋‹จ๊ณ„์—์„œ ์ตœ์ ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค.

4.1. Multi-Camera Setup

Initialization

  • ๋ฉ€ํ‹ฐ์นด๋ฉ”๋ผ ํ™˜๊ฒฝ์—์„œ, ์ฒซ ๋ฒˆ์งธ ํ”„๋ ˆ์ž„()์—์„œ ํ•ธ๋“œ ํฌ์ฆˆ์— ๋Œ€ํ•œ ์ฒซ ๋ฒˆ์งธ ์ถ”์ • ๋Š” ์œ„์˜ ์‹์„ ํ†ตํ•ด ์–ป๋Š”๋‹ค.
  • Dogleg optimizer ์‚ฌ์šฉ
  • ์ฒซ ๋ฒˆ์งธ ๊ฐ์ฒด ํฌ์ฆˆ ์ถ”์ • ๋Š” YCB ๋ฌผ์ฒด์™€ ์†์˜ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ํ•™์Šต๋œ BB8 ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ์–ป๋Š”๋‹ค.

์ฐธ๊ณ 


HandPoseObjectPose๋…ผ๋ฌธ์ •๋ฆฌdatasets