728x90

PYTHON 26

[Python] Pandas Dataframe ์—ด์— ์–ด๋–ค ๋ฐ์ดํ„ฐ ์žˆ๋Š”์ง€ value ํ™•์ธ, ๋ฐ์ดํ„ฐ ๋ณ„๋กœ ๊ฐœ์ˆ˜ ์„ธ๊ธฐ, ์ค‘๋ณต๊ฐ’ ํ™•์ธ, ์œ ์ผํ•œ(์œ ๋‹ˆํฌํ•œ) ๊ฐ’ ์ฐพ๊ธฐ

df.unique() ์œ„์˜ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์€ cell_line์— ๋Œ€ํ•œ ๋‚ด ์˜ˆ์‹œ ๋ฐ์ดํ„ฐ์ด๋‹ค. ์ด์ œ ์ด cell_lien ๋ฐ์ดํ„ฐ์—์„œ ์œ ๋‹ˆํฌํ•œ ๊ฐ’์„ ์ฐพ์•„๋ณผ ๊ฒƒ์ด๋‹ค. 1. ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์—์„œ ์ค‘๋ณต์„ ์ œ๊ฑฐํ•˜์ง€ ์•Š๊ณ  ๊ฐ’ ํ™•์ธ cell_line = f['epigenomes_with_experimental_evidence'].values # values = df[์ปฌ๋Ÿผ๋ช…].values ์œ„์˜ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•˜๋ฉด arrayํ˜•ํƒœ๋กœ ๋ชจ๋“  ๊ฐ’์ด ์ถœ๋ ฅ๋œ๋‹ค. 2. ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์—์„œ ๊ฐ ์š”์†Œ๋ณ„๋กœ ๋ช‡๊ฐœ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋Š”์ง€ ํ™•์ธ f['epigenomes_with_experimental_evidence'].value_counts() # df[์ปฌ๋Ÿผ๋ช…].value_counts() 3. ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์—์„œ ์ค‘๋ณต์„ ์ œ๊ฑฐํ•˜๊ณ  ์œ ๋‹ˆํฌํ•œ ๊ฐ’ ํ™•์ธ f['epigenome..

[Python] Pandas Dataframe ์ž๋ฃŒํ˜•์—์„œ NaN ๊ฐ’ ์ฐพ๊ธฐ(๊ฒฐ์ธก๊ฐ’ ์—ฌ๋ถ€ ํ™•์ธ, ๊ฒฐ์ธก๊ฐ’ ๊ฐœ์ˆ˜ ์„ธ๊ธฐ)

How to check NaN in Pandas Dataframe null ๊ฐ’ ํ™•์ธ df.isnull() isnull(df) null ์•„๋‹Œ ๊ฐ’ ํ™•์ธ df.notnull() notnull(df) 1. ์˜ˆ์‹œ dataframe ์ƒ์„ฑ import pandas as pd import numpy as np dates = pd.date_range("20130101", periods=6) df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD")) 2. null ๊ฐ’ ์ถ”๊ฐ€ํ•˜๊ธฐ 'NaN' ํ˜น์€ None์„ ํ†ตํ•ด null๊ฐ’์„ ์ž„์˜๋กœ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ๋‹ค. df['A'][1] = 'NaN' df['B'][2] = None df['C'][2] = 'NaN' d..

Biopython ์œผ๋กœ ์—ญ์ƒ๋ณด์„œ์—ด ๊ตฌํ•˜๊ธฐ

Biopython ์„ค์น˜ ํ›„ reverse complement ์„œ์—ด ๊ตฌํ•˜๊ธฐ Biopython ์„ค์น˜ pip install biopython ์„ค์น˜ ํ™•์ธ ๋ฐ reverse_complement ์ฝ”๋“œ ์‹คํ–‰ import Bio # ์„ค์น˜ํ™•์ธ from Bio.Seq import Seq my_seq = Seq('TGGTGAAACCCCA').reverse_complement() print(my_seq) print(type(my_seq)) ๊ฒฐ๊ณผ sequence์˜ ํƒ€์ž… ํ™•์ธ ๊ฒฐ๊ณผ ์ฐธ๊ณ : biopython.org/wiki/Getting_Started · Biopython OB— title: Getting Started permalink: wiki/Getting_Started layout: wiki — Download and In..

Bioinfomatics 2021.01.19

[๊ฐœ๋ฐœ ํ™˜๊ฒฝ] ์œˆ๋„์šฐ(Windows)์— Tensorflow-gpu ์„ค์น˜(NVIDIA driver, CUDA Toolkit, cuDNN ์„ค์น˜)

GPU๋ฅผ ์ด์šฉํ•ด ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋ฉด ํ•™์Šต ์†๋„๊ฐ€ ๋งค์šฐ๋งค์šฐ๋งค์šฐ ๋น ๋ฅด๋‹ค. ๊ทธ๋ž˜์„œ ๋”ฅ๋Ÿฌ๋‹์€ ๋ชจ๋ธ GPU๊ฐ€ ์—†์œผ๋ฉด ํ•™์Šต์‹œํ‚ค๊ธฐ ์–ด๋ ค์›€. ๊ทผ๋ฐ ์ฒ˜์Œ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ๋Œ๋ฆด ๋•Œ ๋‚˜๋ฅผ ์ •๋ง ์• ๋จน์ด๋˜..^^ tensorflow-gpu ์„ค์น˜ ๊ทธ ๋• ๊ฒฐ๊ตญ ํฌ๊ธฐํ•˜๊ณ  ๋ฆฌ๋ˆ…์Šค ์„œ๋ฒ„ ์ ‘์†ํ•ด์„œ ๊น”์•˜์—ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋‹ค์‹œํ•ด๋ณด๋‹ˆ ์ž˜ ๋ผ์„œ ๋ฐฉ๋ฒ• ๊ณต์œ  ์ฐจ ์˜ฌ๋ฆฌ๋Š” ๊ธ€ +++++++ ์ปดํŒŒ์ผ๋Ÿฌ ํ™˜๊ฒฝ์„ ์œ„ํ•œ Visual Studio ์„ค์น˜ํ•ด์•ผํ•จ Visual Studio 2019 ๋ฒ„์ „์ด๋‚˜ 2017 ๋ฒ„์ „ ๋‹ค์šด๋ฐ›๊ธฐ! docs.microsoft.com/ko-kr/visualstudio/releases/2019/release-notes Visual Studio 2019 ๋ฒ„์ „ 16.8 ๋ฆด๋ฆฌ์Šค ์ •๋ณด Visual Studio 2019 ๋ฒ„์ „ 16.8์˜ ์ตœ์‹  ๊ธฐ๋Šฅ, ๋ฒ„๊ทธ ์ˆ˜์ •..

[Python] Pandas Dataframe ๊ธฐ๋ณธ(merge, concat, concat ํ–‰, ์—ด ๊ธฐ์ค€์œผ๋กœ ๋ณ‘ํ•ฉ, ์—ฐ๊ฒฐ)

์˜ˆ์‹œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ import pandas as pd left = pd.DataFrame({ 'id':[1,2,3,4,5], 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5']}) right = pd.DataFrame( {'id':[1,2,3,4,5], 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5']}) 1. ๋‘ ๊ฐœ์˜ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ Key ๊ธฐ์ค€์œผ๋กœ ํ•ฉ์น˜๊ธฐ pd.merge(left,right,on='id') 2. ๋‘ ๊ฐœ์˜ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ m..

[Python] ํŒŒ์ด์ฌ multiprocessing package๋กœ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ, ์—ฐ์‚ฐ ์†๋„ ๊ฐœ์„ 

์ตœ๊ทผ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๋กœ ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ๊ฐ€ ๋งŽ์ด ์ค‘์š”ํ•ด์กŒ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํŒŒ์ด์ฌ์—๋Š” ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ๋ฅผ ์ œ๊ณตํ•˜๋Š” ํŒจํ‚ค์ง€์ธ multiprocessing์ด ์žˆ๋‹ค. ์ด๋ฒˆ ํฌ์ŠคํŒ…์€ multiprocessing ํŒจํ‚ค์ง€๋ฅผ ์ด์šฉํ•ด cpu ์ฝ”์–ด ์ˆ˜๋งŒํผ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ณผ์ •์„ ๋ณด์—ฌ์ค„ ์˜ˆ์ •์ด๋‹ค. 1. CPU์— ์žˆ๋Š” ์ฝ”์–ด์˜ ์ˆ˜๋ฅผ multiprocessing.cpu_count()๋ฅผ ์ด์šฉํ•ด ํ™•์ธ import multiprocessing as mp num_cores = mp.cpu_count() # cpu ์ฝ”์–ด ์ˆ˜ ๋ฐ˜ํ™˜ 2. Dataframe multiprocessing ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ, ํ•œ ์ค„์”ฉ ์ฒ˜๋ฆฌ def parallel_dataframe(df, func, num_cores): df_split = np.array_split(df, num_cor..

[Python] REST API (Ensembl ์‚ฌ์ดํŠธ์—์„œ DNA Sequence ๋ถˆ๋Ÿฌ์˜ค๊ธฐ)

์ „์— ์˜ฌ๋ ธ๋˜ TFBS(Transcription Factor Binding Site) data์— start, end๋ฅผ ์ด์šฉํ•ด (์‹œ์ž‘, ์ข…๊ฒฐ ์ฝ”๋ˆ) ์—ผ๊ธฐ์„œ์—ด์„ ์ถ”๊ฐ€ํ•ด ๋ณด์•˜๋‹ค. ์•™์ƒ๋ธ”์—์„œ ์ œ๊ณตํ•˜๋Š” api ์ด์šฉ rest.ensembl.org/documentation/info/sequence_region Ensembl Rest API - GET sequence/region/:species/:region Returns the genomic sequence of the specified region of the given species. Supports feature masking and expand options. rest.ensembl.org import requests, sys import pandas as p..

[Python] Pandas Explode, Pandas Dataframe, column split ๋ฐ”์ด์˜ค๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๋กœ ๋‹ค์ง€๋Š” Pandas ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ

(์‹ค์ œ ์ฝ”๋“œ ๊ฒฐ๊ณผ๋กœ ์ž‘์„ฑ, transcription factor binding site ๋ฐ์ดํ„ฐ ์ด์šฉ) Dataframe์— ์ƒˆ๋กœ์šด column ์ž‘์„ฑํ•˜๊ธฐ Dataframe์˜ column split ํ›„ ๋‹ค๋ฅธ column์œผ๋กœ ์ €์žฅํ•˜๊ธฐ # df[์—ด์ด๋ฆ„].str.split() ์ด์šฉ Dataframe์˜ ์—ด data๋ฅผ split ํ›„ ๋‹ค์‹œ ์ €์žฅํ•˜๊ธฐ Pandas explode ๋ฉ”์†Œ๋“œ ์‚ฌ์šฉํ•˜๊ธฐ (๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์—ด์— ๋ฆฌ์ŠคํŠธ๋กœ ์ €์žฅ๋œ ๋ฐ์ดํ„ฐ์—์„œ ๋ฆฌ์ŠคํŠธ ์š”์†Œ๋ฅผ ํ–‰์œผ๋กœ ์ถ”๊ฐ€ํ•˜๊ธฐ) import pandas as pd f1 = pd.read_csv('test.txt', delimiter = '\t', names = ['1', '2', '3', '4', '5', '6', '7', '8', '9']) # 1~9๋กœ ์—ด ์ด๋ฆ„ ์ •ํ•ด์„œ ํƒญ์œผ๋กœ ๋ถ„..

728x90