algorithm - How to remove duplicate entries from a list in python -
algorithm - How to remove duplicate entries from a list in python -
recently in interview asked write python code remove duplicate entries list.
for illustration :
input list = {1,2,4,5,2,3,1} expected output list = {4,5,3} in above example, 1's & 2's appear more 1 time , should removed. , order preservation important. question asked.
again did not want me utilize inbuilt functions set(), unique() etc. testing algorithm , ds skills, guess. had made clear in start.
i thought of 2 approaches on spot. 1.) sorting (complexity of nlog(n)) 2.) hashtable (faster sorting)
hashtable approach :
arr = [1,2,4,5,2,3,1] //function : create hash table key = arr[i] & value = occurence count def datacounttable(arr): counttable = {} = 0 while i<len(arr) : if arr[i] in counttable : counttable[arr[i]] += 1 else : counttable[arr[i]] = 1 i+=1 homecoming counttable //function : remove duplicates using arr & hash table def rmvallduplicate(arr, counttable): outlist = list() = 0 while i<len(arr) : if counttable[arr[i]] == 1 : outlist.append(arr[i]); i+=1 homecoming outlist print rmvallduplicate(arr, datacounttable(arr)) the interviewer did not seem impressed answer. , kept me searching improve aprooach. could'nt find one.
if help me improve solution or suggest new , improve solution, nice!
thanks!
although hash-table implementation made more concise, readable, , idiomatic, little boost in speed, suspect isn't interviewer disappointed in.
more likely, pushed improve solution in hopes come argument why solution optimal, instead, searched while , gave up.
so, there multiple things prove here:
any solution problem has o(n) time. your solution (amortized, average-and-almost-always) o(n) time. the multiplier in solution's time complexity reasonable. any solution problem that's o(n) time has o(m) space (where m number of distinct values). your solution o(m) space. the multiplier in solution's space complexity reasonable.even easy ones you're not going come actual rigorous proof in interview. , of them, may not able create compelling case—but raising possible exceptions , acknowledging you're waving hands may sufficient. example:
python'sdict (and set) has o(n) worst-case time; it's o(1) amortized average-case. there info create worse o(1)? not, but… what if part of server wanted dos, , send input info wanted? all of values gave little integers. guaranteed true? in case, don't utilize dict counts, utilize list(range(0, p)) p max number. it's o(p) space, sounds worse o(m), except multiplier lot smaller—a list takes 1/3rd space (just values, rather hashes, keys, , values), if p << m/3 it's win. , it's win on speed, because there's no need maintain hashing values. , can improve array.array. a python hash table overkill storing both sets , dicts little counts. custom hash table cutting things significantly, or not plenty worth it? python algorithm sorting hashmap
Comments
Post a Comment