首頁猿問 CS50問題集6“蟒蛇...

CS50問題集6“蟒蛇”，我無法計算間歇性DNA序列，我的代碼在小數(shù)據(jù)庫中成功，在大數(shù)據(jù)庫中失敗

Python

哈士奇WWW 2022-09-13 19:15:24

我是編程的初學者，所以我決定參加CS50課程。在問題集6（Python）中，我編寫了代碼，它適用于小型數(shù)據(jù)庫，但對于大型數(shù)據(jù)庫卻失敗了，所以我只尋求有關該想法的幫助。這是課程頁面，您可以在此處下載（從Google云端硬盤）我的代碼import csvfrom sys import argvclass DnaTest(object): """CLASS HELP: the DNA test, simply give DNA sequence to the program, and it searches in the database to determine the person who owns the sample. type the following in cmd to run the program: python dna.py databases/small.csv sequences/1.txt """ def __init__(self): # get filename from the command line without directory names "database" and "sequence" self.sequence_argv = str(argv[2][10:]) self.database_argv = str(argv[1][10:]) # Automatically open and close the database file with open(f"databases/{self.database_argv}", 'r') as database_file: self.database_file = database_file.readlines() # Automatically open and close the sequence file with open(f"sequences/{self.sequence_argv}", 'r') as sequence_file: self.sequence_file = sequence_file.readline() # Read CSV file as a dictionary, function: compare_database_with_sequence() self.csv_database_dictionary = csv.DictReader(self.database_file) # Read CSV file to take the first row, function: get_str_list() self.reader = csv.reader(self.database_file) # computed dictionary from the sequence file self.dict_from_sequence = {} # returns the first row of the CSV file (database file) def get_str_list(self): # get first row from CSV file self.keys = next(self.reader) # remove 'name' from list, get STR only. self.keys.remove("name") return self.keys問題是在函數(shù)i使用計數(shù)，它是工作，但對于順序序列，在序列文件中（示例5.txt），所需的序列是非序列的，我無法比較每個連續(xù)序列的數(shù)量。我搜索了一下，但我沒有找到任何簡單的東西。有些人使用正則表達式模塊，有些人使用re模塊，我還沒有找到解決方案。get_str_count_from_sequence(self):

查看完整描述

2 回答

DIEA

TA貢獻1820條經(jīng)驗獲得超2個贊

謝謝你“皮尤什·辛格”我接受了你的建議，并用它來解決問題。首先，我使用并設置了一個匹配組（最長的順序序列），這是一個字典，然后我為每個STR取了最大值，然后我清除了字典數(shù)據(jù)以存儲下一個STR，在這里我對比較函數(shù)字典進行了更新（從數(shù)據(jù)庫中讀取并從序列文件中計算）rere

import csv

from sys import argv

import re

class DnaTest(object):

"""CLASS HELP: the DNA test, simply give DNA sequence to the program, and it searches in the database to

determine the person who owns the sample.

type the following in cmd to run the program:

python dna.py databases/small.csv sequences/1.txt """

def __init__(self):

# get filename from the command line without directory names "database" and "sequence"

self.sequence_argv = str(argv[2][10:])

self.database_argv = str(argv[1][10:])

# Automatically open and close the database file

with open(f"databases/{self.database_argv}", 'r') as database_file:

self.database_file = database_file.readlines()

# Automatically open and close the sequence file

with open(f"sequences/{self.sequence_argv}", 'r') as sequence_file:

self.sequence_file = sequence_file.readline()

# Read CSV file as a dictionary, function: compare_database_with_sequence()

self.csv_database_dictionary = csv.DictReader(self.database_file)

# Read CSV file to take the first row, function: get_str_list()

self.reader = csv.reader(self.database_file)

# computed dictionary from the sequence file

self.dict_from_sequence = {}

self.select_max = {}

# returns the first row of the CSV file (database file)

def get_str_list(self):

# get first row from CSV file

keys = next(self.reader)

# remove 'name' from list, get STR only.

keys.remove("name")

return keys

# returns dictionary of computed STRs from the sequence file (key(STR): value(count))

def get_str_count_from_sequence(self): # PROBLEM HERE AND RETURN DICTIONARY FROM IT !

for str_key in self.get_str_list():

regex = rf"({str_key})+"

matches = re.finditer(regex, self.sequence_file, re.MULTILINE)

# my code

for match in matches:

match_len = len(match.group())

key_len = len(str_key)

self.select_max[match] = match_len

# select max value from results dictionary (select_max)

max_values = max(self.select_max.values())

if max_values >= key_len:

result = int(max_values / key_len)

self.select_max[str_key] = result

self.dict_from_sequence[str_key] = result

# clear compare dictionary to select new key

self.select_max.clear()

# compare computed dictionary with the database dictionaries and get the person name

def compare_database_with_sequence(self):

# comparison function between database dictionary and sequence computed dictionary

def dicts_equal(from_sequence, from_database):

""" return True if all keys and values are the same """

return all(k in from_database and int(from_sequence[k]) == int(from_database[k]) for k in from_sequence) \

and all(k in from_sequence and int(from_sequence[k]) == int(from_database[k]) for k in from_database)

def check_result():

for dictionary in self.csv_database_dictionary:

dict_from_database = dict(dictionary)

dict_from_database.pop('name')

if dicts_equal(self.dict_from_sequence, dict_from_database):

dict_from_database = dict(dictionary)

print(dict_from_database['name'])

return True

if check_result():

pass

else:

print("No match")

# run the class and its functions (Program control)

if __name__ == '__main__':

RunTest = DnaTest()

RunTest.get_str_count_from_sequence()

RunTest.compare_database_with_sequence()

檢查解決方案

Run your program as python dna.py databases/small.csv sequences/1.txt. Your program should output Bob.

Run your program as python dna.py databases/small.csv sequences/2.txt. Your program should output No match.

有關更多檢查，請訪問 CS50 DNA 問題集

反對回復 2022-09-13

浮云間

TA貢獻1829條經(jīng)驗獲得超4個贊

為了獲得每個STR的最大連續(xù)STR數(shù)量，我只寫了幾行代碼。這個想法是：你搜索一個STR，如果你找到它，然后你搜索STRx2，如果再次找到，然后搜索STRX3，依此類推，直到你找不到STRXn，那么你的最大數(shù)量是n-1。由于STRxn始終是連續(xù)的，因此如果發(fā)現(xiàn)任何非連續(xù)的事物，則無需擔心。除了系統(tǒng)和csv之外，您不需要蟒蛇庫。我的整段代碼不到30行。

enter code here

import csv

import sys

# check command-line arguments, expect 3 including dna.py

n = len(sys.argv)

if n != 3:

print("Usage: python dna.py data.csv sequence.txt")

exit(0)

with open(sys.argv[1], 'r') as database: # read database

data_lines = csv.reader(database) # read line-by-line, store in data_lines

data = [row for row in data_lines] # convert to list of lists, store in data

with open(sys.argv[2], 'r') as sequences:

dna = sequences.read() # read sequence data, store in string dna

counts = [] # list to store counts of the longest run of consecutive repeats of each STR

for i in range(1, len(data[0])): # loop through all STR

count = 1

string = data[0][i] # assign each STR to a string

while string * count in dna: # if find 1 string, then try to find string*2, and so on

count += 1

counts.append(str(count - 1)) # should be decreased by 1 as initialized to 1. int to str

for j in range(1, len(data)): # loop through all rows in database

if data[j][1:len(data[0])] == counts: # compare only numebrs in each row to counts

print(data[j][0]) # print corresponding name

exit(0)

print('No Match')

反對回復 2022-09-13

2 回答
0 關注
154 瀏覽

關注

添加回答

舉報

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購課補貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學習伙伴

公眾號

掃描二維碼
關注慕課網(wǎng)微信公眾號

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

CS50問題集6“蟒蛇”，我無法計算間歇性DNA序列，我的代碼在小數(shù)據(jù)庫中成功，在大數(shù)據(jù)庫中失敗

CS50問題集6“蟒蛇”，我無法計算間歇性DNA序列，我的代碼在小數(shù)據(jù)庫中成功，在大數(shù)據(jù)庫中失敗

2 回答

添加回答

CS50問題集6“蟒蛇”，我無法計算間歇性DNA序列，我的代碼在小數(shù)據(jù)庫中成功，在大數(shù)據(jù)庫中失敗

CS50問題集6“蟒蛇”，我無法計算間歇性DNA序列，我的代碼在小數(shù)據(jù)庫中成功，在大數(shù)據(jù)庫中失敗