algorithm - How to remove duplicate entries from a list in python -



algorithm - How to remove duplicate entries from a list in python -

recently in interview asked write python code remove duplicate entries list.

for illustration :

input list = {1,2,4,5,2,3,1} expected output list = {4,5,3}

in above example, 1's & 2's appear more 1 time , should removed. , order preservation important. question asked.

again did not want me utilize inbuilt functions set(), unique() etc. testing algorithm , ds skills, guess. had made clear in start.

i thought of 2 approaches on spot. 1.) sorting (complexity of nlog(n)) 2.) hashtable (faster sorting)

hashtable approach :

arr = [1,2,4,5,2,3,1] //function : create hash table key = arr[i] & value = occurence count def datacounttable(arr): counttable = {} = 0 while i<len(arr) : if arr[i] in counttable : counttable[arr[i]] += 1 else : counttable[arr[i]] = 1 i+=1 homecoming counttable //function : remove duplicates using arr & hash table def rmvallduplicate(arr, counttable): outlist = list() = 0 while i<len(arr) : if counttable[arr[i]] == 1 : outlist.append(arr[i]); i+=1 homecoming outlist print rmvallduplicate(arr, datacounttable(arr))

the interviewer did not seem impressed answer. , kept me searching improve aprooach. could'nt find one.

if help me improve solution or suggest new , improve solution, nice!

thanks!

although hash-table implementation made more concise, readable, , idiomatic, little boost in speed, suspect isn't interviewer disappointed in.

more likely, pushed improve solution in hopes come argument why solution optimal, instead, searched while , gave up.

so, there multiple things prove here:

any solution problem has o(n) time. your solution (amortized, average-and-almost-always) o(n) time. the multiplier in solution's time complexity reasonable. any solution problem that's o(n) time has o(m) space (where m number of distinct values). your solution o(m) space. the multiplier in solution's space complexity reasonable.

even easy ones you're not going come actual rigorous proof in interview. , of them, may not able create compelling case—but raising possible exceptions , acknowledging you're waving hands may sufficient. example:

python's dict (and set) has o(n) worst-case time; it's o(1) amortized average-case. there info create worse o(1)? not, but… what if part of server wanted dos, , send input info wanted? all of values gave little integers. guaranteed true? in case, don't utilize dict counts, utilize list(range(0, p)) p max number. it's o(p) space, sounds worse o(m), except multiplier lot smaller—a list takes 1/3rd space (just values, rather hashes, keys, , values), if p << m/3 it's win. , it's win on speed, because there's no need maintain hashing values. , can improve array.array. a python hash table overkill storing both sets , dicts little counts. custom hash table cutting things significantly, or not plenty worth it?

python algorithm sorting hashmap

Comments

Popular posts from this blog

assembly - What is the addressing mode for ld, add, and rjmp instructions? -

vowpalwabbit - Interpreting Vowpal Wabbit results: Why are some lines appended by "h"? -

Is there a way to convert an HTML page styled with Bootstrap CSS into email-compatible html? -