728x90

ํ”„๋กœ๊ทธ๋ž˜๋ฐ 33

[Data Analysis] ๋ฐ์ดํ„ฐ ๋ถ„์„ ๊ณผ์ •, ์ „์ฒ˜๋ฆฌ์˜ ์ค‘์š”์„ฑ

๋ฐ์ดํ„ฐ ๋ถ„์„ ๊ณผ์ •(Data Analysis Process) 1. Goal Definition ๊ฐ๊ด€์ , ๊ตฌ์ฒด์ ์œผ๋กœ ๋ถ„์„ ๋Œ€์ƒ ์ •์˜(=๋ฌธ์ œ ์ •์˜) ํ•ด๋‹น ๋„๋ฉ”์ธ์— ๋Œ€ํ•œ ์ดํ•ด ํ•ด๋‹น ํ”„๋กœ์ ํŠธ์— ๋Œ€ํ•œ ์ดํ•ด 2. Data Searching & Collecting ๋ฌธ์ œ ์ •์˜ ํ›„ ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ ๊ฒ€์ƒ‰ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋ฐ ๋ฐ์ดํ„ฐ ํŒŒ์•… 3. Data Preparation ๋ฐ์ดํ„ฐ์˜ noise๋ฅผ ์ œ๊ฑฐํ•˜๊ณ  ์›ํ•˜๋Š” ํ˜•ํƒœ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€ํ™˜ํ•˜๋Š” Data preprocessing(๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •)ํฌํ•จ ์ตœ์ข… ๋ชจ๋ธ์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ ๋ฐ์ดํ„ฐ ์ค€๋น„ ๋‹จ๊ณ„ ๊ด€๋ จ ๋ฐ์ดํ„ฐ๋ผ๋ฆฌ ๊ด€๊ณ„ ์„ค์ • ๋ฐ ๋ฐ์ดํ„ฐ ์ดํ•ด, ๋ฐ์ดํ„ฐ ๋ณ‘ํ•ฉ 4. Modeling ์–ด๋–ป๊ฒŒ ๋ชจ๋ธ ์„ค๊ณ„ํ• ์ง€ ๊ตฌ์„ฑ R, Python ๋“ฑ ์ด์šฉํ•ด ๋จธ์‹ ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋“ฑ ๋‹ค์–‘ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ ์šฉ 5. Evaluatio..

[๊ฐœ๋ฐœ ํ™˜๊ฒฝ] ์œˆ๋„์šฐ์— CUDA ๋ฒ„์ „์— ๋งž๊ฒŒ torch ์„ค์น˜, pytorch GPU ์‚ฌ์šฉ

ahnty0122.tistory.com/37?category=454641 [ํ™˜๊ฒฝ์„ค์ •] ์œˆ๋„์šฐ(Windows)์— Tensorflow-gpu ์„ค์น˜(NVIDIA driver, CUDA Toolkit, cuDNN ์„ค์น˜) GPU๋ฅผ ์ด์šฉํ•ด ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋ฉด ํ•™์Šต ์†๋„๊ฐ€ ๋งค์šฐ๋งค์šฐ๋งค์šฐ ๋น ๋ฅด๋‹ค. ๊ทธ๋ž˜์„œ ๋”ฅ๋Ÿฌ๋‹์€ ๋ชจ๋ธ GPU๊ฐ€ ์—†์œผ๋ฉด ํ•™์Šต์‹œํ‚ค๊ธฐ ์–ด๋ ค์›€. ๊ทผ๋ฐ ์ฒ˜์Œ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ๋Œ๋ฆด ๋•Œ ๋‚˜๋ฅผ ์ •๋ง ์• ๋จน์ด๋˜..^^ tensorflow-gpu ์„ค ahnty0122.tistory.com ์•ž์„  ๊ธ€์— ์ ์–ด๋†“์•˜๋˜ nvidia driver ์„ค์น˜, cuda ์„ค์น˜, cudnn ์„ค์น˜๋ฅผ ์™„๋ฃŒํ•œ ํ›„ ์ง„ํ–‰ํ•˜๋ฉด torch ์„ค์น˜๋Š” ์•„์ฃผ ์‰ฝ๋‹ค. ์šฐ์„  cmd ์ผœ์„œ cuda ๋ฒ„์ „ ํ™•์ธ nvcc --version cuda๊ฐ€ 10.0์ด๋ฏ€๋กœ ๊ทธ์— ๋งž๋Š” torc..

siRNA, RNAi, off-target effect

RNAi(RNA interference) siRNA(short interfering RNA)๋ผ ๋ถˆ๋ฆฌ๋Š” 12~21 mer์˜ dsRNA์— ์˜ํ•ด ์„œ์—ด ํŠน์ด์ ์œผ๋กœ ์œ ์ „์ž ๋ฐœํ˜„์ด ์–ต์ œ๋˜๋Š” ํ˜„์ƒ --> RNA ๊ฐ„์„ญ gene silencing by RNAi RNA ๊ฐ„์„ญ์„ ์ด์šฉํ•ด ํŠน์ • ์œ ์ „์ž์˜ ํ™œ์„ฑ์„ ์–ต์ œํ•  ์ˆ˜ ์žˆ์Œ ํ‘œ์  mRNA์™€ ์ƒ๋ณด์  ๊ด€๊ณ„์— ์žˆ๋Š” ์ด์ค‘๊ฐ€๋‹ฅ RNA ๋ฅผ ์„ธํฌ์— ๋„์ž…ํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ์–ต์ œ ๊ฐ€๋Šฅ ๋‹จ์ : ํšจ๊ณผ๊ฐ€ ์ผ์‹œ์ ์ผ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋‹ค๋ฅธ ์œ ์ „์ž ๋ฐœํ˜„๋„ ์–ต์ œํ•  ์ˆ˜ ์žˆ์Œ siRNA(short interfering RNA) off-target effect ์ค„์—ฌ์„œ design ํ•ด์•ผํ•จ Off-target effect siRNA๋ฅผ ์ด์šฉํ•œ RNAi์˜ ๋ถ€์ž‘์šฉ, ๋‹ค๋ฅธ ์œ ์ „์ž ๋ฐœํ˜„๋„ ์–ต์ œ๋˜๋Š” ํ˜„์ƒ https://www.ibri..

Bioinfomatics 2021.02.03

[Python] Pandas dataframe ๊ฒฐํ•ฉ, ์กฐ์ธ, ๋ณ‘ํ•ฉ(Join, Merge)

Join 1. ์˜ˆ์‹œ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์ƒ์„ฑ import pandas as pd df = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'], 'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']}) other = pd.DataFrame({'key': ['K0', 'K1', 'K2'], 'B': ['B0', 'B1', 'B2']}) 2. ์—ด์˜ Index ์ง€์ •ํ•ด์„œ Join df.join(other, lsuffix = '_caller', rsuffix = '_other') 3. Key๋ฅผ index๋กœ ์ง€์ •ํ•ด Join df.set_index('key').join(other.set_index('key')) 4. join ๋ฉ”์†Œ๋“œ์˜ parameter ..

[์›นํฌ๋กค๋ง] Python์œผ๋กœ ์›น ํฌ๋กค๋ง, HTML ํŒŒ์‹ฑ, requests, BeautifulSoup ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ด์šฉ

์›น ํฌ๋กค๋ง์œผ๋กœ href ํƒœ๊ทธ์˜ ๊ฐ’ ๊ฐ€์ ธ์˜ค๊ธฐ (HTML ํŒŒ์‹ฑ) import requests from bs4 import BeautifulSoup ftp.ensembl.org/pub/current_regulation/homo_sapiens/RegulatoryFeatureActivity/ Index of /pub/current_regulation/homo_sapiens/RegulatoryFeatureActivity/ ftp.ensembl.org ์ด๋ ‡๊ฒŒ ์ƒ๊ธด ์›น ํ˜•ํƒœ์—์„œ ๋ฐ์ดํ„ฐ ์ด๋ฆ„๋งŒ ๊ฐ€์ ธ์˜ค๋Š” ์›น ํฌ๋กค๋ง ์‹ค์Šต. 1. requests ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ด์šฉํ•ด url get ํ•˜๊ธฐ import requests from bs4 import BeautifulSoup res = requests.get('http://ftp.en..

[Python] Pandas Dataframe ์—ด์— ์–ด๋–ค ๋ฐ์ดํ„ฐ ์žˆ๋Š”์ง€ value ํ™•์ธ, ๋ฐ์ดํ„ฐ ๋ณ„๋กœ ๊ฐœ์ˆ˜ ์„ธ๊ธฐ, ์ค‘๋ณต๊ฐ’ ํ™•์ธ, ์œ ์ผํ•œ(์œ ๋‹ˆํฌํ•œ) ๊ฐ’ ์ฐพ๊ธฐ

df.unique() ์œ„์˜ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์€ cell_line์— ๋Œ€ํ•œ ๋‚ด ์˜ˆ์‹œ ๋ฐ์ดํ„ฐ์ด๋‹ค. ์ด์ œ ์ด cell_lien ๋ฐ์ดํ„ฐ์—์„œ ์œ ๋‹ˆํฌํ•œ ๊ฐ’์„ ์ฐพ์•„๋ณผ ๊ฒƒ์ด๋‹ค. 1. ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์—์„œ ์ค‘๋ณต์„ ์ œ๊ฑฐํ•˜์ง€ ์•Š๊ณ  ๊ฐ’ ํ™•์ธ cell_line = f['epigenomes_with_experimental_evidence'].values # values = df[์ปฌ๋Ÿผ๋ช…].values ์œ„์˜ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•˜๋ฉด arrayํ˜•ํƒœ๋กœ ๋ชจ๋“  ๊ฐ’์ด ์ถœ๋ ฅ๋œ๋‹ค. 2. ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์—์„œ ๊ฐ ์š”์†Œ๋ณ„๋กœ ๋ช‡๊ฐœ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋Š”์ง€ ํ™•์ธ f['epigenomes_with_experimental_evidence'].value_counts() # df[์ปฌ๋Ÿผ๋ช…].value_counts() 3. ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์—์„œ ์ค‘๋ณต์„ ์ œ๊ฑฐํ•˜๊ณ  ์œ ๋‹ˆํฌํ•œ ๊ฐ’ ํ™•์ธ f['epigenome..

728x90