728x90

PL(Programming Language) 26

[Python] sys.path ๋ชจ๋“ˆ ์‚ฌ์šฉ, ์ƒ๋Œ€๊ฒฝ๋กœ

sys ๋ชจ๋“ˆ์„ ์ด์šฉํ•ด ์ƒ๋Œ€๊ฒฝ๋กœ ์„ค์ • ๊ฐ€๋Šฅ import sys sys.path.append('๋‚ด๊ฒฝ๋กœ') ์œ„ ์ฝ”๋“œ๊ฐ€ ๋“ค์–ด๊ฐ€๋ฉด ๋‚ด ๊ฒฝ๋กœ๊ฐ€ ํŒŒ์ผ ์‹คํ–‰ ์œ„์น˜๊ฐ€ ๋˜๊ณ  ๋‹ค๋ฅธ ํŒŒ์ผ์„ import ํ•  ๋•Œ from ~ import ~๋ฅผ ์‚ฌ์šฉํ•ด ์ƒ๋Œ€๊ฒฝ๋กœ๋กœ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. ex) parent ํด๋”์— child ํด๋”๊ฐ€ ์กด์žฌํ•˜๊ณ , child ํด๋” ์•ˆ์— myfuncํ•จ์ˆ˜๋ฅผ ๋‹ด์€ example.py ์žˆ๋‹ค๋ฉด import sys sys.path.append('C:/Parent') from child.example import myfunc ์œ„์ฒ˜๋Ÿผ myfuncํ•จ์ˆ˜๋ฅผ ์ƒ๋Œ€๊ฒฝ๋กœ๋กœ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. ์ƒ๋Œ€๊ฒฝ๋กœ๋กœ ์ž‘์„ฑํ•˜๋ฉด ์ ˆ๋Œ€๊ฒฝ๋กœ๋กœ ๊ฒฝ๋กœ๋ฅผ ๋‹ค ์จ์ฃผ์ง€ ์•Š๊ณ ๋„ ํŽธํ•˜๊ฒŒ ํŒŒ์ผ์„ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์ง€๋งŒ, ํŒŒ์ผ ์œ„์น˜๊ฐ€ ๋ฐ”๋€๋‹ค๊ฑฐ๋‚˜ ํ•˜๋ฉด ๋ถˆํŽธํ•ด์งˆ ์ˆ˜ ์žˆ๋‹ค.

[Python] Pandas dataframe ๊ฒฐํ•ฉ, ์กฐ์ธ, ๋ณ‘ํ•ฉ(Join, Merge)

Join 1. ์˜ˆ์‹œ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ƒ์„ฑ import pandas as pd df = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'], 'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']}) other = pd.DataFrame({'key': ['K0', 'K1', 'K2'], 'B': ['B0', 'B1', 'B2']}) 2. ์—ด์˜ Index ์ง€์ •ํ•ด์„œ Join df.join(other, lsuffix = '_caller', rsuffix = '_other') 3. Key๋ฅผ index๋กœ ์ง€์ •ํ•ด Join df.set_index('key').join(other.set_index('key')) 4. join ๋ฉ”์†Œ๋“œ์˜ parameter ..

[Python] Pandas Dataframe ์—ด์— ์–ด๋–ค ๋ฐ์ดํ„ฐ ์žˆ๋Š”์ง€ value ํ™•์ธ, ๋ฐ์ดํ„ฐ ๋ณ„๋กœ ๊ฐœ์ˆ˜ ์„ธ๊ธฐ, ์ค‘๋ณต๊ฐ’ ํ™•์ธ, ์œ ์ผํ•œ(์œ ๋‹ˆํฌํ•œ) ๊ฐ’ ์ฐพ๊ธฐ

df.unique() ์œ„์˜ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์€ cell_line์— ๋Œ€ํ•œ ๋‚ด ์˜ˆ์‹œ ๋ฐ์ดํ„ฐ์ด๋‹ค. ์ด์ œ ์ด cell_lien ๋ฐ์ดํ„ฐ์—์„œ ์œ ๋‹ˆํฌํ•œ ๊ฐ’์„ ์ฐพ์•„๋ณผ ๊ฒƒ์ด๋‹ค. 1. ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์—์„œ ์ค‘๋ณต์„ ์ œ๊ฑฐํ•˜์ง€ ์•Š๊ณ  ๊ฐ’ ํ™•์ธ cell_line = f['epigenomes_with_experimental_evidence'].values # values = df[์ปฌ๋Ÿผ๋ช…].values ์œ„์˜ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•˜๋ฉด arrayํ˜•ํƒœ๋กœ ๋ชจ๋“  ๊ฐ’์ด ์ถœ๋ ฅ๋œ๋‹ค. 2. ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์—์„œ ๊ฐ ์š”์†Œ๋ณ„๋กœ ๋ช‡๊ฐœ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋Š”์ง€ ํ™•์ธ f['epigenomes_with_experimental_evidence'].value_counts() # df[์ปฌ๋Ÿผ๋ช…].value_counts() 3. ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์—์„œ ์ค‘๋ณต์„ ์ œ๊ฑฐํ•˜๊ณ  ์œ ๋‹ˆํฌํ•œ ๊ฐ’ ํ™•์ธ f['epigenome..

[Python] Pandas Dataframe ์ž๋ฃŒํ˜•์—์„œ NaN ๊ฐ’ ์ฐพ๊ธฐ(๊ฒฐ์ธก๊ฐ’ ์—ฌ๋ถ€ ํ™•์ธ, ๊ฒฐ์ธก๊ฐ’ ๊ฐœ์ˆ˜ ์„ธ๊ธฐ)

How to check NaN in Pandas Dataframe null ๊ฐ’ ํ™•์ธ df.isnull() isnull(df) null ์•„๋‹Œ ๊ฐ’ ํ™•์ธ df.notnull() notnull(df) 1. ์˜ˆ์‹œ dataframe ์ƒ์„ฑ import pandas as pd import numpy as np dates = pd.date_range("20130101", periods=6) df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD")) 2. null ๊ฐ’ ์ถ”๊ฐ€ํ•˜๊ธฐ 'NaN' ํ˜น์€ None์„ ํ†ตํ•ด null๊ฐ’์„ ์ž„์˜๋กœ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ๋‹ค. df['A'][1] = 'NaN' df['B'][2] = None df['C'][2] = 'NaN' d..

[Python] Multiple arguments function์— ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ(Multiprocessing) ์ ์šฉํ•˜๊ธฐ

๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๋ฅผ ํ•  ๋•Œ, ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ๋ฅผ ์ด์šฉํ•˜๋ฉด ์—ฐ์‚ฐ ์†๋„๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค. ๋”ฅ๋Ÿฌ๋‹ ํ”„๋ ˆ์ž„์›Œํฌ์ธ tensorflow๋‚˜ pytorch ๋“ฑ์˜ ๊ฒฝ์šฐ ํ•™์Šต ๊ณผ์ •์—์„œ ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜์ง€๋งŒ, python์˜ ๊ฒฝ์šฐ ๋”ฐ๋กœ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜์ง€ ์•Š๋Š”๋‹ค. ์ด ๋•Œ python์˜ multiprocessing ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. 1. ํ•จ์ˆ˜ ์ธ์ž๊ฐ€ 1๊ฐœ์ธ ๊ฒฝ์šฐ ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ multiprocessing ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ pool.map์„ ์ด์šฉํ•˜๋ฉด ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ๋ฅผ ํ•  ์ˆ˜ ์žˆ๋‹ค. https://ahnty0122.tistory.com/12 [Python] ํŒŒ์ด์ฌ multiprocessing package๋กœ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„๋ณ‘๋ ฌ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์ตœ๊ทผ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๋กœ ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ๊ฐ€ ๋งŽ์ด ์ค‘์š”ํ•ด์กŒ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํŒŒ์ด์ฌ์—๋Š” ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ๋ฅผ ์ œ๊ณตํ•˜๋Š” ํŒจํ‚ค์ง€์ธ multipr..

[Python] Pandas Dataframe ๊ธฐ๋ณธ(merge, concat, concat ํ–‰, ์—ด ๊ธฐ์ค€์œผ๋กœ ๋ณ‘ํ•ฉ, ์—ฐ๊ฒฐ)

์˜ˆ์‹œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ import pandas as pd left = pd.DataFrame({ 'id':[1,2,3,4,5], 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5']}) right = pd.DataFrame( {'id':[1,2,3,4,5], 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5']}) 1. ๋‘ ๊ฐœ์˜ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ Key ๊ธฐ์ค€์œผ๋กœ ํ•ฉ์น˜๊ธฐ pd.merge(left,right,on='id') 2. ๋‘ ๊ฐœ์˜ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ m..

[Python] ํŒŒ์ด์ฌ multiprocessing package๋กœ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ, ์—ฐ์‚ฐ ์†๋„ ๊ฐœ์„ 

์ตœ๊ทผ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๋กœ ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ๊ฐ€ ๋งŽ์ด ์ค‘์š”ํ•ด์กŒ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํŒŒ์ด์ฌ์—๋Š” ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ๋ฅผ ์ œ๊ณตํ•˜๋Š” ํŒจํ‚ค์ง€์ธ multiprocessing์ด ์žˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์€ multiprocessing ํŒจํ‚ค์ง€๋ฅผ ์ด์šฉํ•ด cpu ์ฝ”์–ด ์ˆ˜๋งŒํผ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ณผ์ •์„ ๋ณด์—ฌ์ค„ ์˜ˆ์ •์ด๋‹ค. 1. CPU์— ์žˆ๋Š” ์ฝ”์–ด์˜ ์ˆ˜๋ฅผ multiprocessing.cpu_count()๋ฅผ ์ด์šฉํ•ด ํ™•์ธ import multiprocessing as mp num_cores = mp.cpu_count() # cpu ์ฝ”์–ด ์ˆ˜ ๋ฐ˜ํ™˜ 2. Dataframe multiprocessing ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ, ํ•œ ์ค„์”ฉ ์ฒ˜๋ฆฌ def parallel_dataframe(df, func, num_cores): df_split = np.array_split(df, num_cor..

[Python] REST API (Ensembl ์‚ฌ์ดํŠธ์—์„œ DNA Sequence ๋ถˆ๋Ÿฌ์˜ค๊ธฐ)

์ „์— ์˜ฌ๋ ธ๋˜ TFBS(Transcription Factor Binding Site) data์— start, end๋ฅผ ์ด์šฉํ•ด (์‹œ์ž‘, ์ข…๊ฒฐ ์ฝ”๋ˆ) ์—ผ๊ธฐ์„œ์—ด์„ ์ถ”๊ฐ€ํ•ด ๋ณด์•˜๋‹ค. ์•™์ƒ๋ธ”์—์„œ ์ œ๊ณตํ•˜๋Š” api ์ด์šฉ rest.ensembl.org/documentation/info/sequence_region Ensembl Rest API - GET sequence/region/:species/:region Returns the genomic sequence of the specified region of the given species. Supports feature masking and expand options. rest.ensembl.org import requests, sys import pandas as p..

728x90