Repeated DNA Sequences
题目描述
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
Return:
["AAAAACCCCC", "CCCCCAAAAA"].
解题方法
原来的解法是因为A,C,G,T的ascii码只有后三位不同,所以用这后三位来区分,每次取30位,不断shift,放到hashmap里看 是否有重复的。
因为python可以用string做key,所以也可以直接用10-letter-long的字符串做key,就很简单了。
注意点
- 去掉重复的
- Index range
- 长度小于10的
Solution
class Solution(object):
def findRepeatedDnaSequences(self, s):
"""
:type s: str
:rtype: List[str]
"""
dic = {}
length = len(s)
result = []
if length < 10:
return []
for i in range(length - 9):
subStr = s[i:i+10]
if subStr in dic:
if dic[subStr] == 1:
result.append(subStr)
dic[subStr] += 1
else:
dic[subStr] = 1
return result