Caveat when removing elements from Python list or dict
There is a caveat when removing duplications from a list in Python using for loop. One thing you have to keep in mind to make it correct is, that you need to iterate over a copy of the list, not the list itself.
Suppose you have this list.
l = [1, 2, 3, 1, 2, 3]
And you want to remove all occurrences of number 1.
Seems pretty straightforward. You would probably write something like
for i in l:
if i == 1:
l.remove(i)
It will work, the l will have numbers [2, 3, 2, 3]
which is the correct result, but the approach is not correct. The reason why it works is that the list by coincidence has a special distribution of the elements. In particular, the elements that are duplicated and that I want to remove are not consecutive. If they were, it would not work.
Let us see. Now you have this list.
l = [1, 1, 2, 2, 3, 3]
Run the same code again to remove all ones.
for i in l:
if i == 1:
l.remove(i)
If you check the list l now, you will see it contains [1, 2, 2, 3, 3 ]
. How is this possible and why happens?
It is because when iteration over a mutable sequence and changing its size (removing elements) Python will move the indexing and it will mess up the results.
A simple fix to this is to just make a copy of the list. If you iterate over the copy, you use the indexes of the copied list, but you will be removing the elements from the original.
This is how it looks.
for i in l[:]:
if i == 1:
l.remove(i)
Now if you check the result, it will be correct. The syntax l[:] is a fancy way to make a copy using slicing. You can also use a copy module to make a shallow copy.
This behavior similar when removing elements using a loop in dictionaries.
If you have a dictionary like this
d = {"a": 1, "b": 2, "c": 3}
You might want to write a loop removing an item this way
for key, value in d.items():
if value == 1:
d.popitem()
But this will not work, it will give you RuntimeError: dictionary changed size during iteration.
The solution to this is to again make a copy of the original dictionary.
for key, value in dict(d).items():
if value == 1:
d.popitem()
Notice dict(d).items()
which makes a copy of the original. We iterate over this copy but remove from the original using d.popitem()
which is a synonym to del d[key]
Getting the Runtime Error is actually much better than letting it silently produce incorrect results. I do not know why Python implements it in with a dict, but not with a list. It is a bit odd.