728x90

coding 2

[Python] REST API (Ensembl ์‚ฌ์ดํŠธ์—์„œ DNA Sequence ๋ถˆ๋Ÿฌ์˜ค๊ธฐ)

์ „์— ์˜ฌ๋ ธ๋˜ TFBS(Transcription Factor Binding Site) data์— start, end๋ฅผ ์ด์šฉํ•ด (์‹œ์ž‘, ์ข…๊ฒฐ ์ฝ”๋ˆ) ์—ผ๊ธฐ์„œ์—ด์„ ์ถ”๊ฐ€ํ•ด ๋ณด์•˜๋‹ค. ์•™์ƒ๋ธ”์—์„œ ์ œ๊ณตํ•˜๋Š” api ์ด์šฉ rest.ensembl.org/documentation/info/sequence_region Ensembl Rest API - GET sequence/region/:species/:region Returns the genomic sequence of the specified region of the given species. Supports feature masking and expand options. rest.ensembl.org import requests, sys import pandas as p..

[Python] Pandas Dataframe ๊ธฐ๋ณธ (๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ/์ €์žฅํ•˜๊ธฐ, ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜ ๊ตฌํ•˜๊ธฐ, ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ์—ฐ๊ฒฐํ•˜๊ธฐ, column ๋ชฉ๋ก ํ™•์ธ, pd.Series value_counts๋กœ ์—ด์˜ value ํ™•์ธํ•˜๊ธฐ)

pandas ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ - csv ํ˜•์‹ ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ import pandas as pd df = pd.read_csv('ํŒŒ์ผ๋ช….csv') # csvํŒŒ์ผํ˜•์‹์€ ๊ฐ„๋‹จํ•˜๊ฒŒ ๋ถˆ๋Ÿฌ์™€์ง - ํƒญ์œผ๋กœ ๋ถ„๋ฆฌ๋œ txt ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ import pandas as pd df= pd.read_csv('ํŒŒ์ผ๋ช….txt', delimiter = '\t') # ํƒญ์œผ๋กœ ๋ถ„๋ฆฌ๋œ txt(tsv ํ˜•์‹๋„ ๊ฐ€๋Šฅ) ๋ถˆ๋Ÿฌ์˜ค๊ธฐ - ๊ณต๋ฐฑ์œผ๋กœ ๋ถ„๋ฆฌ๋œ ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ import pandas as pd df = pd.read_csv(โ€˜ํŒŒ์ผ๋ช….ํ™•์žฅ์žโ€™, delimiter = ' ') # ๊ณต๋ฐฑ์œผ๋กœ ๋ถ„๋ฆฌ๋œ ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ Dataframe์˜ data ๊ฐœ์ˆ˜ ์„ธ๊ธฐ print(len(df.index)) print(df.shape[0]) print(len(df))..

1
728x90