PL(Programming Language)/Python

[Python] REST API (Ensembl ์‚ฌ์ดํŠธ์—์„œ DNA Sequence ๋ถˆ๋Ÿฌ์˜ค๊ธฐ)

ํƒฑ์ ค 2021. 1. 13. 22:45

์ „์— ์˜ฌ๋ ธ๋˜ TFBS(Transcription Factor Binding Site) data์— start, end๋ฅผ ์ด์šฉํ•ด (์‹œ์ž‘, ์ข…๊ฒฐ ์ฝ”๋ˆ) ์—ผ๊ธฐ์„œ์—ด์„ ์ถ”๊ฐ€ํ•ด ๋ณด์•˜๋‹ค.

 

์•™์ƒ๋ธ”์—์„œ ์ œ๊ณตํ•˜๋Š” api ์ด์šฉ

 

rest.ensembl.org/documentation/info/sequence_region

 

 

Ensembl Rest API - GET sequence/region/:species/:region

Returns the genomic sequence of the specified region of the given species. Supports feature masking and expand options.

rest.ensembl.org

import requests, sys
import pandas as pd

f = pd.read_csv('21test.csv') # 21๋ฒˆ ์—ผ์ƒ‰์ฒด ๋ฐ์ดํ„ฐ ์ผ๋ถ€ ๊ฐ€์ ธ์˜ค๊ธฐ

# start์™€ end์— dataframe ์—ด ๋Œ€์ž…
start = f['4']
end = f['5']

print(f) ๊ฒฐ๊ณผ, ์›๋ž˜ ๋ฐ์ดํ„ฐ

# ๋ฌธ์ž์—ด format ์ด์šฉํ•ด REST API์—์„œ ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
# sequence list ๋งŒ๋“ค์–ด api์—์„œ sequence๋ฅผ ๋ถˆ๋Ÿฌ์˜ฌ ๋•Œ๋งˆ๋‹ค appendํ•˜๊ธฐ

sequence = []

server = "https://rest.ensembl.org"
for i in f.index:
    ext = "/sequence/region/human/{0}:{1}..{2}?".format('21', start[i], end[i])
 
    r = requests.get(server+ext, headers={ "Content-Type" : "text/plain"})

    if not r.ok:
      r.raise_for_status()
      sys.exit()


    print(r.text)
    sequence.append(r.text)
    
# sequence list๋ฅผ dataframe์˜ ์ƒˆ๋กœ์šด ์—ด๋กœ ์ง€์ •
f['sequence'] = sequence

print(sequence) ๊ฒฐ๊ณผ

sequence ์—ด์ด ์ถ”๊ฐ€๋œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Œ.

 

REST API ํ˜ธ์ถœ ์„ฑ๊ณต~~

 

728x90