728x90

์ฝ”๋”ฉ 20

[Python] Pandas Dataframe ์ž๋ฃŒํ˜•์—์„œ NaN ๊ฐ’ ์ฐพ๊ธฐ(๊ฒฐ์ธก๊ฐ’ ์—ฌ๋ถ€ ํ™•์ธ, ๊ฒฐ์ธก๊ฐ’ ๊ฐœ์ˆ˜ ์„ธ๊ธฐ)

How to check NaN in Pandas Dataframe null ๊ฐ’ ํ™•์ธ df.isnull() isnull(df) null ์•„๋‹Œ ๊ฐ’ ํ™•์ธ df.notnull() notnull(df) 1. ์˜ˆ์‹œ dataframe ์ƒ์„ฑ import pandas as pd import numpy as np dates = pd.date_range("20130101", periods=6) df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD")) 2. null ๊ฐ’ ์ถ”๊ฐ€ํ•˜๊ธฐ 'NaN' ํ˜น์€ None์„ ํ†ตํ•ด null๊ฐ’์„ ์ž„์˜๋กœ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ๋‹ค. df['A'][1] = 'NaN' df['B'][2] = None df['C'][2] = 'NaN' d..

Unit test๋ž€? ํ…Œ์ŠคํŠธ ์ฝ”๋“œ ์ž‘์„ฑํ•ด์•ผ ํ•˜๋Š” ์ด์œ , ๊ฐ„๋‹จํ•œ ์˜ˆ์ œ (+ GitHub)

Unit test(๋‹จ์œ„ ํ…Œ์ŠคํŠธ)๋ž€? ์ „์ฒด ์ฝ”๋“œ ์ค‘ ์ž‘์€ ๋ถ€๋ถ„์„ ํ…Œ์ŠคํŠธํ•˜๋Š” ์ฝ”๋“œ ํ•จ์ˆ˜ ํ•˜๋‚˜ํ•˜๋‚˜ ๊ฐœ๋ณ„๋กœ ํ…Œ์ŠคํŠธ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜๋Š” ๊ฒƒ ๋ฒ„๊ทธ๊ฐ€ ์žˆ๋Š”์ง€ ์—†๋Š”์ง€ ์ฒดํฌํ•  ์ˆ˜ ์žˆ์Œ ์–ด๋– ํ•œ ๋ถ€๋ถ„์— ๋ฌธ์ œ๊ฐ€ ์žˆ๊ณ , ๊ณ ์น  ๋ถ€๋ถ„์ด ์–ด๋””์ธ์ง€ ๋ช…ํ™•ํ•˜๊ฒŒ ์•Œ ์ˆ˜ ์žˆ๊ฒŒ ์ž‘์„ฑํ•ด์•ผํ•จ ๊ฐ„๋‹จํ•˜๊ณ  ๋ช…ํ™•ํ•ด์•ผํ•จ Unit test ํ•„์š”์„ฑ ๋ฒ„๊ทธ๊ฐ€ ์žˆ๋Š”์ง€ ์—†๋Š”์ง€ ์ฒดํฌํ•˜๋Š” ์œ ๋‹› ํ…Œ์ŠคํŠธ ๋งŒ๋“ค์–ด ๋‘ ์œผ๋กœ์จ ๋ฌธ์ œ ์‰ฝ๊ฒŒ ํ•ด๊ฒฐ ๊ฐ€๋Šฅ ํ…Œ์ŠคํŠธ ์ฝ”๋“œ๊ฐ€ ์ž‘์„ฑ์ด ์ž˜ ๋˜์–ด์žˆ์œผ๋ฉด ์–ด๋””์— ๋ฌธ์ œ๊ฐ€ ์žˆ๊ณ  ์–ด๋””๋ฅผ ๊ณ ์ณ์•ผ ํ• ์ง€ ๋ช…ํ™•ํ•˜๊ฒŒ ์•Œ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๊ฐœ๋ฐœ ์ค‘ ๋ฏธ๋ฆฌ ๋ฌธ์ œ ํŒŒ์•… ๊ฐ€๋Šฅ ๋ณต์žกํ•œ ๋ฆฌ์†Œ์Šค๋“ค์„ ๊ฐ€์ ธ์˜ค์ง€ ์•Š๊ณ  ์‰ฝ๊ฒŒ ์ฝ”๋“œ ์‹คํ–‰์‹œ์ผœ๋ณด๊ธฐ ์œ„ํ•ด ์ž‘์„ฑ ๋ฒ„๊ทธ๋ฅผ ์ดˆ๊ธฐ์— ์žก์•„๋‚ด๊ธฐ ์ข‹์Œ ๊ฐ„๋‹จํ•œ ์˜ˆ์ œ API๋ฅผ ์ด์šฉํ•˜๋Š” ์˜ˆ์ œ์— ๋Œ€ํ•ด test code ์ž‘์„ฑ Ensemble_REST_API_Test.ipyn..

Git 2021.01.16

[Git] GIT, GitHub ๊ฐœ๋…, Git ํ˜‘์—…

Git VCS(Version Control System) ์ค‘ ํ•˜๋‚˜, ๋ฒ„์ „ ๊ด€๋ฆฌ ์‹œ์Šคํ…œ ๋ถ„์‚ฐ ์†Œ์Šค ๋ฒ„์ „ ๊ด€๋ฆฌ ์‹œ์Šคํ…œ ์„œ๋ฒ„๋ฅผ ๋ถ„์‚ฐ์‹œ์ผœ ๊ตฌ์ถ•ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๋Š” software ์†Œ์Šค์ฝ”๋“œ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ๊ด€๋ฆฌํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” ๋„๊ตฌ Github Git์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๋Š” ์„œ๋ฒ„ Git ์—…๋กœ๋“œํ•  ์ˆ˜ ์žˆ๋Š” ์›น์‚ฌ์ดํŠธ ๊ฐœ๋ฐœ์ž๋“ค์ด ํ˜‘์—… ์‹œ ์‚ฌ์šฉํ•˜๋Š” ํ”Œ๋žซํผ ํ•œ ๊ฐœ์˜ ๋ฉ”์ธ ์„œ๋ฒ„์— ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋กœ์ปฌ ์ปดํ“จํ„ฐ(์—ฌ๋Ÿฌ๋ช…์˜ ์‚ฌ์šฉ์ž)๊ฐ€ ์ ‘์†ํ•ด ์†Œํ†ตํ•˜๋Š” ๋ฐฉ์‹ ๊ฐ์ž์˜ ์ปดํ“จํ„ฐ์— (๋ฐฑ์—…ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ) ๋ถ„์‚ฐ๋˜์–ด ์žˆ๋‹ค. ๋‚ด ๋กœ์ปฌ ์ปดํ“จํ„ฐ ๋‚ด์— ์žˆ๋Š” ๋ธŒ๋žœ์น˜(branch): ๋กœ์ปฌ ๋ธŒ๋žœ์น˜ ์™ธ๋ถ€ ์„œ๋ฒ„์— ์žˆ๋Š” ๋ธŒ๋žœ์น˜: ๋ฆฌ๋ชจํŠธ ๋ธŒ๋žœ์น˜(์›๊ฒฉ ๋ธŒ๋žœ์น˜) ๋ช‡ ๊ฐ€์ง€ Git ๋ช…๋ น์–ด merge ํ•œ ๋ธŒ๋žœ์น˜์—์„œ ์™„์„ฑํ•œ ์ž‘์—…์„ ๋‹ค๋ฅธ ๋ธŒ๋žœ์น˜์— ๋ณ‘ํ•ฉํ•˜๊ธฐ add ์ˆ˜์ •ํ•œ ์ฝ”๋“œ ์„ ํƒํ•ด ์ถ”๊ฐ€ co..

Git 2021.01.16

[Python] Pandas Dataframe ๊ธฐ๋ณธ (๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ/์ €์žฅํ•˜๊ธฐ, ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜ ๊ตฌํ•˜๊ธฐ, ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์—ฐ๊ฒฐํ•˜๊ธฐ, column ๋ชฉ๋ก ํ™•์ธ, pd.Series value_counts๋กœ ์—ด์˜ value ํ™•์ธํ•˜๊ธฐ)

pandas ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ - csv ํ˜•์‹ ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ import pandas as pd df = pd.read_csv('ํŒŒ์ผ๋ช….csv') # csvํŒŒ์ผํ˜•์‹์€ ๊ฐ„๋‹จํ•˜๊ฒŒ ๋ถˆ๋Ÿฌ์™€์ง - ํƒญ์œผ๋กœ ๋ถ„๋ฆฌ๋œ txt ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ import pandas as pd df= pd.read_csv('ํŒŒ์ผ๋ช….txt', delimiter = '\t') # ํƒญ์œผ๋กœ ๋ถ„๋ฆฌ๋œ txt(tsv ํ˜•์‹๋„ ๊ฐ€๋Šฅ) ๋ถˆ๋Ÿฌ์˜ค๊ธฐ - ๊ณต๋ฐฑ์œผ๋กœ ๋ถ„๋ฆฌ๋œ ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ import pandas as pd df = pd.read_csv(‘ํŒŒ์ผ๋ช….ํ™•์žฅ์ž’, delimiter = ' ') # ๊ณต๋ฐฑ์œผ๋กœ ๋ถ„๋ฆฌ๋œ ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ Dataframe์˜ data ๊ฐœ์ˆ˜ ์„ธ๊ธฐ print(len(df.index)) print(df.shape[0]) print(len(df))..

728x90