AI(Artificial Intelligence)/DL(Deep Learning)

[Deep Learning] CNN์˜ ๊ฐœ๋…, Object Detection

ํƒฑ์ ค 2021. 2. 8. 23:51

๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ์ค‘ ํ•˜๋‚˜์ธ CNN(Convolutional Neural Network)์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์ž.

CNN์€ computer vision problem์—์„œ ๋งŽ์ด ์“ฐ์ธ๋‹ค. ํŠนํžˆ ๊ทธ ์ค‘ ๋งŽ์ด ํ™œ์šฉ๋˜๋Š” ๊ฒƒ์€ object detection์ด๋‹ค.

 

Object Detection์ด๋ž€?

  • Feature extraction(ํŠน์ง• ์ถ”์ถœ)
    • ์ด๋ฏธ์ง€์—์„œ ๋Œ์–ด๋‚ผ ์ˆ˜ ์žˆ๋Š” ์œ ์šฉํ•œ feature ์ถ”์ถœ
  • Bounding Box ์ƒ์„ฑ
    • object๋ฅผ ๊ฐ์‹ธ๋Š” bounding box ์ƒ์„ฑ
  • Class classification
    • bounding box ์•ˆ์˜ object๊ฐ€ ์–ด๋–ค class์ธ์ง€ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ณผ์ •

CNN(Convolutional Neural Network)

  • image์˜ ํ˜•ํƒœ๋ฅผ ๋ณด์กดํ•˜๋„๋ก ํ–‰๋ ฌ ํ˜•ํƒœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅ๋ฐ›์•„ ์ •๋ณด ์†์‹ค์„ ๋ฐฉ์ง€ํ•˜๊ณ , ํ–‰๋ ฌ๋กœ ํ‘œํ˜„๋œ ํ•„ํ„ฐ์˜ ๊ฐ ์š”์†Œ(weight, ํ•™์Šต์—์„œ ๊ฐ€์ค‘์น˜)๊ฐ€ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ์— ์ ํ•ฉํ•˜๋„๋ก ์ž๋™์œผ๋กœ ํ•™์Šต์‹œํ‚จ๋‹ค.
  • image์— ํŠน์ •ํ•œ filter(=kernel)์„ ์ ์šฉํ•ด image์˜ feature์„ ์ถ”์ถœํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•œ๋‹ค. → Feature extraction
  • Convolution Operation
    • element wise matrix multiplication + sum์œผ๋กœ, element wise ํ–‰๋ ฌ์—์„œ ๊ฐ™์€ ์œ„์น˜์— ์žˆ๋Š” ์›์†Œ๋ผ๋ฆฌ ๊ณฑํ•ด์ฃผ๊ณ  ๋”ํ•ด์ฃผ๋Š” ๊ฒƒ

Convolution Operation

  • CNN์€ ํŠน์ • ๋ฐฉํ–ฅ์˜ edge๋ฅผ ๊ฐ•์กฐํ•˜๋Š” edge detection์ด๋ผ๋˜์ง€ fine detail์„ ๊ฐ•์กฐํ•ด blurring์„ ์ œ๊ฑฐํ•ด์ฃผ๋Š” sharpness filter ๋“ฑ์˜ ๋‹ค์–‘ํ•œ ํ•„ํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด ํŠน์ง•์„ ์ถ”์ถœํ•œ๋‹ค.

edge detection kernal, sharpness filter

  • Padding
    • ์ปจ๋ณผ๋ฃจ์…˜ ํ•„ํ„ฐ์˜ ํฌ๊ธฐ์— ๋”ฐ๋ผ ์ถœ๋ ฅ ํฌ๊ธฐ๊ฐ€ ์ค„์–ด๋“œ๋Š”๋ฐ, ์ด ๊ณต๊ฐ„์ ์ธ ํฌ๊ธฐ๋ฅผ ๊ณ ์ •ํ•œ ์ฑ„๋กœ ๋‹ค์Œ ๊ณ„์ธต์— ์ „๋‹ฌํ•˜๊ธฐ ์œ„ํ•ด padding์„ ์‚ฌ์šฉํ•œ๋‹ค.
    • ์ฃผ๋ณ€ ํ”ฝ์…€์„ 0์œผ๋กœ ์ฑ„์›Œ์ฃผ๋Š” ์ œ๋กœ ํŒจ๋”ฉ์ด ๋งŽ์ด ์“ฐ์ด๊ณ , 6x6 image์— padding=1, ํ•„ํ„ฐ ํฌ๊ธฐ 3x3์„ ์ ์šฉํ•˜๋ฉด ์•„๋ž˜ ๊ทธ๋ฆผ์˜ ํŒŒ๋ž€์ƒ‰ ์ •์‚ฌ๊ฐํ˜• ๋ชจ์–‘์œผ๋กœ padding์ด ์ง„ํ–‰๋œ๋‹ค.

convolution-padding

  • Stride
    • ์ปจ๋ณผ๋ฃจ์…˜ ํ•„ํ„ฐ๋ฅผ ์ ์šฉํ•˜๋Š” ์œ„์น˜์˜ ๊ฐ„๊ฒฉ
    • ์ถœ๋ ฅ ํฌ๊ธฐ๋Š” ์ปจ๋ณผ๋ฃจ์…˜ ํ•„ํ„ฐ๋ฅผ ์ ์šฉํ•˜๋Š” ์œ„์น˜์˜ ๊ฐ„๊ฒฉ์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง€๋Š” ๋ฐ ์ด ๊ฐ„๊ฒฉ์„ stride๋ผ๊ณ  ํ•œ๋‹ค.
    • ๋งŒ์•ฝ 7x7 ์ด๋ฏธ์ง€์ผ ๋•Œ stride๊ฐ€ 1์ด๋ฉด ์ถœ๋ ฅํฌ๊ธฐ๋Š” 5x5์ด๊ณ  stride๊ฐ€ 2์ด๋ฉด ์ถœ๋ ฅํฌ๊ธฐ๊ฐ€ 3x3์ด๋‹ค.

convolution-stride

  • CNN์—์„œ n x n ์ด๋ฏธ์ง€๋ฅผ ์ž…๋ ฅํ•˜๊ณ , f x f ํ•„ํ„ฐ ์‚ฌ์šฉ, p๋งŒํผ padding์ ์šฉ, s๋งŒํผ stride ์ ์šฉํ•ด์„œ ์ถœ๋ ฅ๋˜๋Š” ์ด๋ฏธ์ง€์˜ ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜ ์ˆ˜์‹๊ณผ ๊ฐ™๋‹ค.


One-layer of a CNN

One-layer of a CNN

์ด์ œ Image์˜ feature๋ฅผ ์ถ”์ถœํ•˜๋Š” CNN์˜ ํ•œ ์ธต์˜ ๊ตฌ์„ฑ์„ ์•Œ์•„๋ณผ ๊ฒƒ์ด๋‹ค.

์ž…๋ ฅ์€ ํ”ฝ์…€๋กœ ์ด๋ฃจ์–ด์ง„ image์ด๊ณ , ์ž…๋ ฅ์‚ฌ์ด์ฆˆ๋Š” ๊ฐ€๋กœ, ์„ธ๋กœ, ์ฑ„๋„์ด ์žˆ๋‹ค. ๋งŒ์•ฝ RGB์ด๋ฏธ์ง€์ด๋ฉด ์ฑ„๋„์ด 3, Grayscale์ด๋ฉด ์ฑ„๋„์ด 1์ด๋‹ค.

 

์•ž์„  ๊ธ€์—์„œ ์–ธ๊ธ‰ํ•œ ์ธ๊ณต์‹ ๊ฒฝ๋ง์˜ ํ•™์Šต ๊ณผ์ •๊ณผ ๋˜‘๊ฐ™์ด ์ง„ํ–‰๋œ๋‹ค.

Input์„ ๋ฐ›์•„์„œ weight filter์™€ ์ปจ๋ณผ๋ฃจ์…˜ ์—ฐ์‚ฐ์„ ์ง„ํ–‰ํ•ด ํŽธํ–ฅ์„ ๋”ํ•œ ํ›„ ํ™œ์„ฑํ™”ํ•จ์ˆ˜(activation function)์— ๋„ฃ์–ด ํ™œ์„ฑํ™”ํ•œ ๋’ค ์ถœ๋ ฅ๊ฐ’์„ ๋ฐ›์•„๋‚ธ๋‹ค. ์ด ๋•Œ ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋‹ค๋ฅธ ํŠน์ง•์„ ์ถ”์ถœํ•˜๊ณ  ์‹ถ์œผ๋ฉด ์—ฌ๋Ÿฌ๊ฐœ์˜ ์ปจ๋ณผ๋ฃจ์…˜ ํ•„ํ„ฐ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.


Type of Layers

CNN์˜ layer์—๋Š” ์ด ์„ธ ๊ฐ€์ง€ ์ข…๋ฅ˜๊ฐ€ ์žˆ๋‹ค.

1. Convolution Layer: ์œ„์—์„œ ์„ค๋ช…ํ•œ convolution layer

2. Pooling Layer

  • ๊ฐ€๋กœ ์„ธ๋กœ ๋ฐฉํ–ฅ์˜ ๊ณต๊ฐ„์„ ์ค„์ด๋Š” layer → ๋„คํŠธ์›Œํฌ ๋ชจ๋ธ์˜ ์—ฐ์‚ฐ๋Ÿ‰์„ ์ค„์ผ ์ˆ˜ ์žˆ์Œ + ์ž…๋ ฅ๊ฐ’์˜ ๋ณ€ํ™”์— ์˜ํ–ฅ์„ ์ ๊ฒŒ ๋ฐ›์Œ
  • ๋งŒ์ผ 4x4 image์— 2x2 pooling์„ ์ ์šฉํ•  ๋•Œ์˜ ์˜ˆ์‹œ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.
  • max-pooling, average-pooling์ด ์žˆ์Œ

pooling layer ์˜ˆ์‹œ

3. Fully-Connected Layer: Classification ๊ฒฐ๊ณผ๋ฅผ ์ถœ๋ ฅํ•˜๋Š” Layer


์ตœ์ข…์ ์ธ CNN์˜ ๊ตฌ์„ฑ

1. Input์„ ๋ฐ›์•„

2. convolution layer, pooling layer ๋ฐ˜๋ณตํ•ด ๊ฑฐ์นœ ํ›„

3. fully connected layer์„ ๊ฑฐ์ณ ๊ฒฐ๊ณผ ์ถœ๋ ฅ

 

์ด ๋•Œ convolution layer๋ฅผ ๊ฑฐ์น˜๋ฉด image์˜ ํŠน์ง•์ด ์ถ”์ถœ๋˜๊ณ , ๊ทธ ํŠน์ง•์œผ๋กœ fully connected layer์—์„œ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜๋จ.

728x90