Python Set Methods

Chapter 26 14 mins

Learning outcomes:

  1. Adding elements using add()
  2. Removing elements using remove(), discard() and pop()
  3. Checking for subsets via issubset()
  4. Checking for supersets via issuperset()
  5. Performing set operations using union(), intersection(), difference() and symmetric_difference()
  6. Update a set through update(), intersection_update(), difference_update() and symmetric_difference_update()
  7. Clearing a set via clear()
  8. Copying a set using copy()

Adding stuff

As stated in the previous chapter, adding stuff to a set can only be done in one way and that's using the add() method.

It takes in a single argument and adds it to the given set.

set.add(element)

An example follows:

evens = {0, 2, 4, 6, 8}
evens.add(10)

print(evens)
{0, 2, 4, 6, 8, 10}

Removing given elements

The remove() method can be used to remove stuff from a set.

As with add(), just provide it with the element you wish not to see anymore in the given set, and then wait for the method to do the magic.

set.remove(element)

Below we remove the number 4 from our evens set:

evens = {0, 2, 4, 6, 8}
evens.remove(4)

print(evens)
{0, 2, 6, 8}

Keep in mind that if the element to be removed doesn't exist in the set, remove() would throw a KeyError exception.

This can be seen as follows:

evens = {0, 2, 4, 6, 8}
evens.remove(10)
Traceback (most recent call last): File "stdin", line 2, in evens.remove(10) KeyError: 10

Discarding given elements

The discard() method is similar to remove() in that is also removes an element from a set.

set.discard(element)

Following we discard 4 from our evens set:

evens = {0, 2, 4, 6, 8}
evens.discard(4)

print(evens)
{0, 2, 6, 8}

The only difference is that if the set doesn't contain the element to be removed, discard() does not throw an exception, unlike remove() which does throw one.

Following we remove a non-existent element from evens, yet get no sort of error thrown:

evens = {0, 2, 4, 6, 8}
evens.discard(10)

A simple way to remember which of the two methods remove() and discard() throws an error is detailed as follows:

The word 'error' starts with an 'e', and of the two names 'remove' and 'discard', only 'remove' contains the 'e', implying that remove() throws an error.

Pop elements

To remove an arbitrary element from a set, use the pop() method.

Since it doesn't remove any specific element, it doesn't require any argument.

Shown below is an elementary example:

evens = {0, 2, 4, 6, 8}
evens.pop()

print(evens)

Check for subsets

A set A is said to be the subset of another set B if all its elements exist in B.

For instance, the set of even integers is a subset of the set of integers. Similarly, the set of integers is a subset of the set of real numbers and so on.

To check if a set is a subset of another set, we have at our dispense, the issubset() method.

a.issubset(b)

It returns True if a is a subset of b; or else False.

Let's inspect the method on two sets: one holding the first two non-negative evens and the other one holding the first five non-negative evens.

first_two_evens = {0, 2}
first_five_evens = {0, 2, 4, 6, 8}
first_two_evens.issubset(first_five_evens)
True
first_five_evens.issubset(first_two_evens)
False

Reading out the first statement: 'first_two_evens is the subset of first_five_evens.' Since, this is true, what we get returned is indeed True.

The second statement reads as follows: 'first_five_evens is the subset of first_two_evens.' Since, this is wrong, what we get returned is False.

Check for supersets

A superset is the opposite of a subset. If A is a superset of B, then everything in B exists in A. To define it another way, if A is a subset of B, then B is a superset of A.

In Python, we can use the issuperset() method to check for supersets.

a.issuperset(b)

It returns True if a is a superset of b; or otherwise False.

Let's take the same example above:

first_two_evens = {0, 2}
first_five_evens = {0, 2, 4, 6, 8}
first_two_evens.issuperset(first_five_evens)
False
first_five_evens.issuperset(first_two_evens)
True

In the first statement we're saying: 'first_two_evens is a superset of first_five_evens.'. Clearly, this is incorrect, likewise we get False returned.

The second staement goes like: 'first_five_evens is a superset of first_two_evens.' As this is correct, we get True returned.

Set operations

In the previous chapter, we came across set operations in Python powered by operators; | for union, & for intersection, - for difference, and ^ for symmetric difference.

The set data class also provides these operations via method calls. The methods union(), intersection(), difference() and symmetric_difference(), all take in a given set, perform the respective operation, and return the resulting set.

As with the operators, none of these methods mutates the original set.

Below we demonstrate all four of these methods:

a = {0, 2, 4}
b = {2, 3, 5}

print('Union:', a.union(b))
print('Intersection:', a.intersection(b))
print('Difference:', a.difference(b))
print('Symmetric difference:', a.symmetric_difference(b))
Union: {0, 2, 3, 4, 5}
Intersection: {2}
Difference: {0, 4}
Symmetric difference: {0, 3, 4, 5}

A common question arising in the minds of developers at this stage is what's the purpose of these four methods, if the same operations can be done using operators.

Purpose of the methods for set operations

Essentially, there is absolutely no difference in their operation. Both the methods and the operators perform the same operation on two given sets.

The main difference is that the methods can accept any iterable argument, as compared to the operators which can only entertain sets (as operands).

The reason of not allowing iterables to be used alongside the operators is to prevent ambiguous expressions such as the one shown below:

set('123') & '345' # Python doesn't allow this!
{3}

This expression looks fairly ambiguous — a set is being intersected with a string. One might think that the set {'1', '2', '3'} is being intersected with the set {'345'}.

Compare this to the expression:

set('123').intersection('345')

Here we can clearly see that both the strings are wrapped up in function calls, implying that they aren't directly being used in the intersection operation, but first being coerced into a set.

Visually, this expression looks much better than the previous one.

Preventing the operators from operating on any iterable basically prevents confusing results, such as the one we just saw above, and the one shown below:

[1, 2] | '23' | ('4', '2') # Python doesn't allow this!
{1, 2, 3, 4}

Here we're trying to compute the union of a list, a string and a tuple — which sounds really weird!

Peform operation and update

Sometimes, when a given set operation is performed on a set s, it's further desired that we update it with the result of the operation.

For instance, consider the example below:

a = {0, 2, 4}
b = {1, 2, 3}

a = a & b
print(a)
{2}

We compute the intersection of a with the set b and then assign the resulting set back to a. In other words, we update a with the intersection set.

This can be done on any given set very easily using the assignment syntax, shown above. However, Python provides methods out of the box to do so.

The methods update(), intersection_update(), difference_update() and symmetric_difference_update() all take in a set as argument, perform the respective operation on the calling set and the argument, and finally update the calling set to the result of the operation.

It's also possible to provide more than one set as argument to these methods.

Here's what each method does:

  1. a.update(b) computes the union of a and b and updates a to the result.
  2. a.intersection_update(b) computes the intersection of a and b and updates a to the result.
  3. a.difference_update(b) computes the difference of a and b and updates a to the result.
  4. a.symmetric_difference_update(b) computes the symmetric difference of a and b and updates a to the result.

Consider the following code:

evens = {0, 2, 4, 6, 8}
evens.update({2, 4, 8, 12, 16})

print(evens)
{0, 2, 4, 6, 8, 12, 16}

The set evens is updated using the set {2, 4, 8, 12, 16}. Obviously, since evens already contains 2, 4 and 8, these won't (and technically can't) be added again to the set, as duplicates. It's only 12 and 16 that get added to the set.

Shown below is another example, using intersection_update():

a = {0, 2, 4}
b = {1, 2, 3}

a.intersection_update(b)
print(a)
{2}

The intersection of the sets {0, 2, 4} and {1, 2, 3} is computed and saved in the variable a.

All these methods return None — they perform the set operation and then update the calling set as a side effect.

Clearing everything

The clear() method serves the same purpose on sets, as it does on lists — clearing everything from them.

It can be handy when we want to erase all the contents of an exisiting set, without deleting the set itself.

Below shown is an example:

evens = {0, 2, 4, 6, 8}
print(evens)

evens.clear()
print(evens)
{0, 2, 4, 6, 8}
set()

Copying a set

Since a set is a mutable data type, assigning a set to a variable and then assigning back this variable to another variable, creates two variables pointing to the same location in memory.

This means, that if we want to independently work on both the variables, we can't do so, since each of them refers to the same set in memory. This issue exists for all other immutable types as well, most commonly for the list and dict classes.

Fortunately, it can be easily avoided by copying the set. And to copy a set, we can use the copy() method.

Consider the code below:

evens = {0, 2, 4, 6, 8}
evens_copy = evens.copy()

evens.update({10}) # change evens

print(evens)
print(evens_copy)
{0, 2, 4, 6, 8, 10}
{0, 2, 4, 6, 8}

We first create a set evens and then make its copy and put that in evens_copy. Then we update the set evens to see whether or not the changes show up in evens_copy.

Since, evens_copy is a copy of evens (not pointing to it), it remains unchanged, as can be confirmed by the second line of output.

Note that the copy() method returns a shallow copy of a set.

There is no method to make a deep copy of a set in Python. In fact, it would be completely inefficient and senseless, if there was one. This is because, we can't access the content inside a set, and therefore it shouldn't make any difference if an element in a set is a copy of an element or the original one.