728x90

PL(Programming Language)/Python 17

[Python] set ์ž๋ฃŒํ˜• ์ด์šฉํ•ด ๋ฆฌ์ŠคํŠธ ์ค‘๋ณต ์ œ๊ฑฐ + ๋ฆฌ์ŠคํŠธ ์ฐจ์ง‘ํ•ฉ

๋‘ ๋ฆฌ์ŠคํŠธ๊ฐ€ ์žˆ์„ ๋•Œ ์„œ๋กœ ์ค‘๋ณต๋˜๋Š” ๊ฐ’์„ ์ œ์™ธํ•œ ๊ฐ’๋“ค์„ ๋ณด๊ณ  ์‹ถ๋‹ค๋ฉด set ์ž๋ฃŒํ˜•์„ ์ด์šฉ a = [1, 2, 3, 4] b = [2, 3, 5, 6, 7] 2, 3์ด ๊ฒน์น˜๋Š” ์›์†Œ [x for x in a if x not in set(b)]โ€‹ ์ˆœ์„œ๋ฅผ ๋ณด์กดํ•ด ์ฐจ์ง‘ํ•ฉ ์ถœ๋ ฅ set(a) - set(b) ์ˆœ์„œ ๋ณด์กด x ์ฐจ์ง‘ํ•ฉ ์ถœ๋ ฅ ๋ฆฌ์ŠคํŠธ ์ฐจ์ง‘ํ•ฉ a = ['abc', 'abcd', 'abcde'] b = ['bc', 'abc', 'abcd'] print([x for x in a if x not in b])

[Python] ํŒ๋‹ค์Šค ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„(Pandas DataFrame) sys:1: DtypeWarning: Columns have mixed types.Specify dtype option on import or set low_memory=False. ๋ฌด์‹œํ•˜๊ธฐ

column์— NaN๊ฐ’์ด๋‚˜ ์—ฌ๋Ÿฌ type์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์„ž์—ฌ ์žˆ์œผ๋ฉด ์ด์™€ ๊ฐ™์€ ๊ฒฝ๊ณ ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ์ด ๋•Œ ๊ฒฝ๊ณ  ๋ฉ”์‹œ์ง€๊ฐ€ ์•Œ๋ ค์ฃผ๋Š” ๋Œ€๋กœ dtype option์œผ๋กœ ํƒ€์ž…์„ ๋ช…์‹œํ•ด์ฃผ๊ฑฐ๋‚˜ low_memory = False๋กœ ์ง€์ •ํ•ด ์ฃผ๋ฉด ๊ฒฝ๊ณ  ๋ฉ”์‹œ์ง€๊ฐ€ ์ถœ๋ ฅ๋˜์ง€ ์•Š๋Š”๋‹ค. pd.read_csv('[ํŒŒ์ผ๋ช…].txt', delimiter = '\t', low_memory=False)

[Python] ํŒ๋‹ค์Šค ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„(pandas dataframe) SettingWithCopyWarning ํ•ด๊ฒฐ

๊ธฐ์กด ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ผ๋ถ€๋ฅผ ๋ณต์‚ฌํ•˜๊ฑฐ๋‚˜ ์ธ๋ฑ์‹ฑ ํ›„ ๊ฐ’์„ ์ˆ˜์ •ํ•  ๋•Œ ์ข…์ข… ๋ฐœ์ƒํ•œ๋‹ค. ๊ธฐ์กด ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ๊ฐ€์ ธ์™€(๋ณต์‚ฌ) ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ๋งŒ๋“ค ๋•Œ ์›๋ณธ์„ ์ˆ˜์ •ํ•  ์ง€ ๋ณต์‚ฌ๋ณธ์„ ์ˆ˜์ •ํ•  ์ง€ ๋ชฐ๋ผ์„œ ๋ฐœ์ƒํ•˜๋Š” ์˜ค๋ฅ˜๋ผ๊ณ  ํ•œ๋‹ค. ๋‘ ๊ฐ€์ง€ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์ด ์žˆ๋Š”๋ฐ, ํ•˜๋‚˜๋Š” ๊ฒฝ๊ณ ๋ฅผ ๋ฌด์‹œํ•˜๋Š” ๊ฒƒ์ด๊ณ  ํ•˜๋‚˜๋Š” copy๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ! 1. ๊ฒฝ๊ณ  ๋ฌด์‹œ pd.set_option์„ ์‚ฌ์šฉํ•œ ๊ฒฝ๊ณ ๋ฌธ ์ œ๊ฑฐ # SettingWithCopyError --> ์˜ค๋ฅ˜ raise ๋กœ ์ฝ”๋“œ ์‹คํ–‰ X pd.set_option('mode.chained_assignment', 'raise') # SettingWithCopyWarning --> ์‹คํ–‰์€ ๋˜์ง€๋งŒ ๊ฒฝ๊ณ ๋ฌธ ๋œธ pd.set_option('mode.chained_assignment', 'warn') # err..

[Python] ํŒ๋‹ค์Šค concat, append, join, merge ์ฐจ์ด

Pandas concat vs append vs join vs merge Concat gives the flexibility to join based on the axis( all rows or all columns) Append is the specific case(axis=0, join='outer') of concat Join is based on the indexes (set by set_index) on how variable =['left','right','inner','couter'] Merge is based on any particular column each of the two dataframes, this columns are variables on like 'left_on', 'ri..

[Python] Pandas Dataframe ์ค‘๋ณต ์ œ๊ฑฐ, distinctํ•œ ๊ฐ’ ํ™•์ธ

df.drop_duplicates() df ์ „์ฒด์˜ ์ค‘๋ณต ์ œ๊ฑฐ๋„ ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ์—ด ๋ผ๋ฆฌ ์ค‘๋ณต ์ œ๊ฑฐ๋„ ๊ฐ€๋Šฅํ•˜๋‹ค. ์œ„์˜ ๋ฐ์ดํ„ฐ๋Š” pert_iname์ด๋ผ๋Š” ์—ด์— ์ค‘๋ณต๋œ ๋ฐ์ดํ„ฐ๋“ค์ด ๋งŽ์ด ์žˆ๋Š”๋ฐ, ์—ฌ๊ธฐ์„œ df.drop_duplicates()๋กœ distinctํ•œ ๊ฐ’์€ ๋ช‡ ๊ฐœ์ธ์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์›๋ž˜ 13553๊ฐœ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์ค‘๋ณต๊ฐ’์„ ์ œ์™ธํ•˜๋ฉด 6798๊ฐœ๋ผ๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์œผ๋กœ df.value_counts() ๋ฅผ ์ด์šฉํ•˜๋ฉด distinctํ•œ ๊ฐ’์„ ์ฐพ์•„์ฃผ๋ฉด์„œ ๋ช‡ ๊ฐœ๊ฐ€ ์ค‘๋ณต๋˜์–ด์žˆ๋Š”์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

[Python] sys.path ๋ชจ๋“ˆ ์‚ฌ์šฉ, ์ƒ๋Œ€๊ฒฝ๋กœ

sys ๋ชจ๋“ˆ์„ ์ด์šฉํ•ด ์ƒ๋Œ€๊ฒฝ๋กœ ์„ค์ • ๊ฐ€๋Šฅ import sys sys.path.append('๋‚ด๊ฒฝ๋กœ') ์œ„ ์ฝ”๋“œ๊ฐ€ ๋“ค์–ด๊ฐ€๋ฉด ๋‚ด ๊ฒฝ๋กœ๊ฐ€ ํŒŒ์ผ ์‹คํ–‰ ์œ„์น˜๊ฐ€ ๋˜๊ณ  ๋‹ค๋ฅธ ํŒŒ์ผ์„ import ํ•  ๋•Œ from ~ import ~๋ฅผ ์‚ฌ์šฉํ•ด ์ƒ๋Œ€๊ฒฝ๋กœ๋กœ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. ex) parent ํด๋”์— child ํด๋”๊ฐ€ ์กด์žฌํ•˜๊ณ , child ํด๋” ์•ˆ์— myfuncํ•จ์ˆ˜๋ฅผ ๋‹ด์€ example.py ์žˆ๋‹ค๋ฉด import sys sys.path.append('C:/Parent') from child.example import myfunc ์œ„์ฒ˜๋Ÿผ myfuncํ•จ์ˆ˜๋ฅผ ์ƒ๋Œ€๊ฒฝ๋กœ๋กœ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. ์ƒ๋Œ€๊ฒฝ๋กœ๋กœ ์ž‘์„ฑํ•˜๋ฉด ์ ˆ๋Œ€๊ฒฝ๋กœ๋กœ ๊ฒฝ๋กœ๋ฅผ ๋‹ค ์จ์ฃผ์ง€ ์•Š๊ณ ๋„ ํŽธํ•˜๊ฒŒ ํŒŒ์ผ์„ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์ง€๋งŒ, ํŒŒ์ผ ์œ„์น˜๊ฐ€ ๋ฐ”๋€๋‹ค๊ฑฐ๋‚˜ ํ•˜๋ฉด ๋ถˆํŽธํ•ด์งˆ ์ˆ˜ ์žˆ๋‹ค.

728x90