Python Sets Basics

Chapter 25 11 mins

Learning outcomes:

  1. Creating sets
  2. Adding and removing elements
  3. The null set
  4. Checking for membership
  5. Set operations
  6. Disjoint check
  7. The set() function

Introduction

In mathematics, sets are collections of unique objects often similar in nature. Python provides the set data type to aid in working with sets easily.

A set can hold unique elements only — no duplicates. This means that if you create a set that has duplicate items, Python automatically removes the duplicate values and keeps only one of them.

Creating sets

As with lists, there is a literal way to denote a set in Python. That is using a pair of {} curly braces.

{element_1, element_2, ..., element_n}

The elements of the set go within these braces, delimited by commas, just as with lists.

Below we create a set of the first 5 non-negative even integers:

evens = {0, 2, 4, 6, 8}

Remember that it's not necessary to keep the data type of each element of a set consistent with one another. A set can hold heterogenous data i.e data of different types.

Below we have a set of some random values:

s = {10, True, 'Hello', [0, 1, 2]}

Although a set does support storing data of different types, this is not usually done is a real program. In a real program, sets are created to hold similar data.

Anyways, let's now see how to process sets in different ways.

Adding elements

There is only one way to add elements to a set — using the add() method.

set.add(element)

It takes in a single argument that is the element to add to the set.

Below we add 10 to our old set of the first five non-negative even integers. Now the set holds the first six non-negative integers.

evens = {0, 2, 4, 6, 8}
evens.add(10)

print(evens)
{0, 2, 4, 6, 8, 10}

As can be seen above, the add() method mutates the original set.

Removing elements

To remove an element from a set we can use the remove() method.

Simply give it the element you want to remove, and then let it do all the work itself.

set.remove(element)

Consider the following code:

evens = {0, 2, 4, 6, 8}
evens.remove(0)

print(evens)
{2, 4, 6, 8}

We remove 0 from the set evens, so now that it becomes a set holding the first four positive even integers.

The null set

In mathematics, there is a special representation for a set that contains nothing i.e is empty. It's represented as φ and is known as the null set, or as the empty set.

In Python, one might go forward and denote an empty set as follows:

null_set = {}

However, this is not an empty set — in fact, it's not even a set.

{} denotes an empty dictionary — another data type in Python which we shall explore later in this course.

To denote an empty set, we ought to call the set() function without any arguments.

null_set = set()

The set() function when called without any arguments, creates a set without any elements. That is, it creates a null set.

Even when printing out a null set, Python uses the representation set() — not {}.

null_set = set()
print(null_set)
set()

Checking for membership

In set theory, it's a common task to check whether a set contains a given element. In other words, it's common to check for the membership of a given value in a set.

This can be done just like it is done in the case of a list — using the in operator.

When used on a set, in returns True if the given value is a member of the set; otherwise False.

Consider the following snippet:

evens = {0, 2, 4, 6, 8}

print('0 in evens:', 0 in evens)
print('1 in evens:', 1 in evens)
0 in evens: True
1 in evens: False

Unlike in used on lists, the in operator used on sets is very fast. It can search for values in approximately constant time, in contrast to the one for lists which takes time depending on the length of the list.

Searching for a value in a list using in takes linear time.

This means that if the set consists of 10 items, or 10,000 items, searching would roughly take the same amount of time. This follows from the way sets are stored internally in Python.

Moving on, to check whether a value is not a member of a set, use the negated not in operator.

evens = {0, 2, 4, 6, 8}

print('8 not in evens:', 8 not in evens)
print('10 not in evens:', 10 not in evens)
8 not in evens: False
10 not in evens: True

Set operations

If you've ever worked with sets in mathematics, then you'll surely be aware of set operations such as intersection, union, relative complement and so on.

In Python, all these operations are available on sets by means of operators, and methods.

Let's explore each one...

Union

The union of two sets A and B is the set containing all the elements of A and B.

The | or operator computes the union of two sets, and returns the resulting set.

evens = {0, 2, 4, 6, 8}
odds = {1, 3, 5, 7, 9}

print(evens | odds)
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
The | operator (and each operator discussed below) returns a new set. It doesn't mutate any of the given sets.

Intersection

The intersection of two sets A and B is the set that contains elements present in both A and B.

The & and operator computes the intersection of two sets and returns the resulting set.

first_two_evens = {0, 2}
first_five_evens = {0, 2, 4, 6, 8}

print(first_two_evens & first_five_evens)
{0, 2}

Let's consider another example:

evens = {0, 2, 4, 6, 8}
odds = {1, 3, 5, 7, 9}

print(evens & odds)
set()

Since there are no similar elements in evens and odds, the intersection of these sets is the empty set.

Difference (relative complement)

The difference of two sets A and B, denoted as A - B is the set that contains all elements present in A, but not in B.

The - difference operator computes the difference of B from A, when used as A - B.

Think of A - B as removing B from A, just like 5 - 2 is removing 2 from 5. Everything in A, that exists in B, gets removed from it — what's left behind is everything that is in A, but not in B.

The following example demonstrates set difference very clearly:

first_two_evens = {0, 2}
first_five_evens = {0, 2, 4, 6, 8}

print(first_five_evens - first_two_evens)
{4, 6, 8}

What's happening in the expression first_five_evens - first_two_evens can be better understood by the following: 'Remove the first two evens from the first five evens'.

first_two_evens = {0, 2}
first_five_evens = {0, 2, 4, 6, 8}

print(first_two_evens - first_five_evens)
set()

Once again, read out the expression first_two_evens - first_five_evens loud and try to comprehend what it says: 'Remove the the first five evens from the first two evens'.

Well, technically, what's being said is incorrect; nonetheless it means to remove everything from the first set i.e first_two_evens, to leave nothing in it. Thus, we get back the empty set.

Symmetric difference

The symmetric difference of two sets A and B is the set that contains everything in A or B, but not in both.

In Python, the ^ xor operator can be used to compute the symmetric difference of two sets.

Consider the following code:

multiples_of_2 = {0, 2, 4, 6, 8}
multiples_of_3 = {0, 3, 6, 9, 12}

print(multiples_of_2 ^ multiples_of_3)
{2, 3, 4, 8, 9, 12}

The set {2, 3, 4, 8, 9, 12} contains all those elements that are either in multiples_of_2 or in multiples_of_3, but not in both. Since 0 and 6 appeared in both the sets, the symmetric difference set doesn't contain them.

Disjoint check

Two sets are said to be disjoint if they don't contain any similar elements. That is, their intersection is the empty set.

To check whether two sets are disjoint in Python, using what we've learnt uptil now, we can first compute the intersection of the two sets and see if the result is not an empty set, as follows:

evens = {0, 2, 4, 6, 8}
odds = {1, 3, 5, 7, 9}

intersection = evens & odds

if intersection:
    print('Not disjoint')
else:
    print('Disjoint')

If intersection holds a non-empty set, it would coerce to True in the conditional in line 6, and therefore get the if block executed.

See how natural the conditional above sounds: 'If there is an intersection of these sets, they are not disjoint'.

However, Python provides a much simpler way to accomplish this and that is using the isdisjoint() method.

a.isdisjoint(b)

The method returns True if the sets a and b are disjoint; or otherwise False.

Below shown is a simple example.

evens = {0, 2, 4, 6, 8}
odds = {1, 3, 5, 7, 9}
ints = {0, 1, 2, 3, 4}

print(evens.isdisjoint(odds))
print(evens.isdisjoint(ints))
True
False

The set() function

Another way to create a set is to use the set() function.

We've already seen above that when called without an argument, the set() function creates an empty set.

The set() function can also create a set from a given iterable sequence like a string, a list, etc.

set([iterable])

Provide it with the sequence as an argument and it will map it to a set.

Consider the code below:

nums_list = [10, 20, 30]
nums_set = set(nums_list)

print(nums_set)

We have a list of numbers saved in nums_list which we convert into a set by passing it to the set() function. This set is saved in the variable nums_set and ultimately printed.

{10, 20, 30}

The set() function can be useful if we want to remove duplicate values from a list. First convert the list into a set using set(). Then convert back the set into a list using list. The list obtained would contain only unique values.

Consider the following snippet:

nums = [0, 0, 5, 2, -1, 3, 10, 3, 5, 0, 0]

# convert to a set to remove duplicates
nums_set = set(nums)

# convert the set back to a list
nums = list(nums_set)

print(nums)
[0, 2, 3, 5, 10, -1]

We want to remove duplicates from the list nums, so we follow the steps detailed above to do so.