2 回答

TA貢獻1820條經(jīng)驗 獲得超2個贊
謝謝你“皮尤什·辛格”我接受了你的建議,并用它來解決問題。首先,我使用并設置了一個匹配組(最長的順序序列),這是一個字典,然后我為每個STR取了最大值,然后我清除了字典數(shù)據(jù)以存儲下一個STR,在這里我對比較函數(shù)字典進行了更新(從數(shù)據(jù)庫中讀取并從序列文件中計算)rere
import csv
from sys import argv
import re
class DnaTest(object):
"""CLASS HELP: the DNA test, simply give DNA sequence to the program, and it searches in the database to
determine the person who owns the sample.
type the following in cmd to run the program:
python dna.py databases/small.csv sequences/1.txt """
def __init__(self):
# get filename from the command line without directory names "database" and "sequence"
self.sequence_argv = str(argv[2][10:])
self.database_argv = str(argv[1][10:])
# Automatically open and close the database file
with open(f"databases/{self.database_argv}", 'r') as database_file:
self.database_file = database_file.readlines()
# Automatically open and close the sequence file
with open(f"sequences/{self.sequence_argv}", 'r') as sequence_file:
self.sequence_file = sequence_file.readline()
# Read CSV file as a dictionary, function: compare_database_with_sequence()
self.csv_database_dictionary = csv.DictReader(self.database_file)
# Read CSV file to take the first row, function: get_str_list()
self.reader = csv.reader(self.database_file)
# computed dictionary from the sequence file
self.dict_from_sequence = {}
self.select_max = {}
# returns the first row of the CSV file (database file)
def get_str_list(self):
# get first row from CSV file
keys = next(self.reader)
# remove 'name' from list, get STR only.
keys.remove("name")
return keys
# returns dictionary of computed STRs from the sequence file (key(STR): value(count))
def get_str_count_from_sequence(self): # PROBLEM HERE AND RETURN DICTIONARY FROM IT !
for str_key in self.get_str_list():
regex = rf"({str_key})+"
matches = re.finditer(regex, self.sequence_file, re.MULTILINE)
# my code
for match in matches:
match_len = len(match.group())
key_len = len(str_key)
self.select_max[match] = match_len
# select max value from results dictionary (select_max)
max_values = max(self.select_max.values())
if max_values >= key_len:
result = int(max_values / key_len)
self.select_max[str_key] = result
self.dict_from_sequence[str_key] = result
# clear compare dictionary to select new key
self.select_max.clear()
# compare computed dictionary with the database dictionaries and get the person name
def compare_database_with_sequence(self):
# comparison function between database dictionary and sequence computed dictionary
def dicts_equal(from_sequence, from_database):
""" return True if all keys and values are the same """
return all(k in from_database and int(from_sequence[k]) == int(from_database[k]) for k in from_sequence) \
and all(k in from_sequence and int(from_sequence[k]) == int(from_database[k]) for k in from_database)
def check_result():
for dictionary in self.csv_database_dictionary:
dict_from_database = dict(dictionary)
dict_from_database.pop('name')
if dicts_equal(self.dict_from_sequence, dict_from_database):
dict_from_database = dict(dictionary)
print(dict_from_database['name'])
return True
if check_result():
pass
else:
print("No match")
# run the class and its functions (Program control)
if __name__ == '__main__':
RunTest = DnaTest()
RunTest.get_str_count_from_sequence()
RunTest.compare_database_with_sequence()
檢查解決方案
Run your program as python dna.py databases/small.csv sequences/1.txt. Your program should output Bob.
Run your program as python dna.py databases/small.csv sequences/2.txt. Your program should output No match.
有關更多檢查,請訪問 CS50 DNA 問題集

TA貢獻1829條經(jīng)驗 獲得超4個贊
為了獲得每個STR的最大連續(xù)STR數(shù)量,我只寫了幾行代碼。這個想法是:你搜索一個STR,如果你找到它,然后你搜索STRx2,如果再次找到,然后搜索STRX3,依此類推,直到你找不到STRXn,那么你的最大數(shù)量是n-1。由于STRxn始終是連續(xù)的,因此如果發(fā)現(xiàn)任何非連續(xù)的事物,則無需擔心。除了系統(tǒng)和csv之外,您不需要蟒蛇庫。我的整段代碼不到30行。
enter code here
import csv
import sys
# check command-line arguments, expect 3 including dna.py
n = len(sys.argv)
if n != 3:
print("Usage: python dna.py data.csv sequence.txt")
exit(0)
with open(sys.argv[1], 'r') as database: # read database
data_lines = csv.reader(database) # read line-by-line, store in data_lines
data = [row for row in data_lines] # convert to list of lists, store in data
with open(sys.argv[2], 'r') as sequences:
dna = sequences.read() # read sequence data, store in string dna
counts = [] # list to store counts of the longest run of consecutive repeats of each STR
for i in range(1, len(data[0])): # loop through all STR
count = 1
string = data[0][i] # assign each STR to a string
while string * count in dna: # if find 1 string, then try to find string*2, and so on
count += 1
counts.append(str(count - 1)) # should be decreased by 1 as initialized to 1. int to str
for j in range(1, len(data)): # loop through all rows in database
if data[j][1:len(data[0])] == counts: # compare only numebrs in each row to counts
print(data[j][0]) # print corresponding name
exit(0)
print('No Match')
添加回答
舉報