PL(Programming Language)/Python

[Python] Pandas Dataframe ๊ธฐ๋ณธ(merge, concat, concat ํ–‰, ์—ด ๊ธฐ์ค€์œผ๋กœ ๋ณ‘ํ•ฉ, ์—ฐ๊ฒฐ)

ํƒฑ์ ค 2021. 1. 14. 13:53

์˜ˆ์‹œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„

import pandas as pd

left = pd.DataFrame({
   'id':[1,2,3,4,5],
   'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
   'subject_id':['sub1','sub2','sub4','sub6','sub5']})

right = pd.DataFrame(
   {'id':[1,2,3,4,5],
   'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
   'subject_id':['sub2','sub4','sub3','sub6','sub5']})

 

jupyer ntoebook์—์„œ ์‹คํ–‰ ๊ฒฐ๊ณผ left, right ์ถœ๋ ฅ

1. ๋‘ ๊ฐœ์˜ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ Key ๊ธฐ์ค€์œผ๋กœ ํ•ฉ์น˜๊ธฐ

pd.merge(left,right,on='id')

id ๊ธฐ์ค€์œผ๋กœ ์˜ค๋ฅธ์ชฝ์— ํ•ฉ์ณ์ง„ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค 

2. ๋‘ ๊ฐœ์˜ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ multiple key๋กœ ํ•ฉ์น˜๊ธฐ

pd.merge(left,right,on=['id','subject_id'])

id์™€ subject_id๊ฐ€ ๊ฐ™์€ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•ฉ์ณ์ง

3. merge์— how ์ธ์ž ์ถ”๊ฐ€ํ•ด์„œ ํ•ฉ์น˜๊ธฐ

Merge ๋ฐฉ๋ฒ• SQL์—์„ ? ์„ค๋ช…
left LEFT OUTER JOIN ์™ผ์ชฝ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ key๋กœ ํ•ฉ์น˜๊ธฐ
right RIGHT OUTER JOIN ์˜ค๋ฅธ์ชฝ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ key๋กœ ํ•ฉ์น˜๊ธฐ
outer FULL OUTER JOIN key์˜ ํ•ฉ์ง‘ํ•ฉ ์ด์šฉ
inner INNER JOIN key์˜ ๊ต์ง‘ํ•ฉ ์ด์šฉ

Left join

pd.merge(left, right, on='subject_id', how='left')

์™ผ์ชฝ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๊ธฐ์ค€์œผ๋กœ ํ•ฉ์ณ์ง

Right join

pd.merge(left, right, on='subject_id', how='right')

์˜ค๋ฅธ์ชฝ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๊ธฐ์ค€์œผ๋กœ ํ•ฉ์ณ์ง

Outer join

print pd.merge(left, right, how='outer', on='subject_id')

๋‘ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ ํ•ฉ์ง‘ํ•ฉ์œผ๋กœ ํ•ฉ์ณ์ง

Inner join

pd.merge(left, right, on='subject_id', how='inner')

๋‘ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ ๊ต์ง‘ํ•ฉ์œผ๋กœ ํ•ฉ์ณ์ง

4. ๋‘ ๊ฐœ์˜ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ concat์œผ๋กœ ์—ฐ๊ฒฐํ•˜๊ธฐ (ํ–‰ ์•„๋ž˜๋กœ ์‚ฝ์ž…)

pd.concat([left, right])

728x90