Python String Methods

Chapter 17 18 mins

Learning outcomes:

  1. Manipulating casing using title(), lower() and upper()
  2. Checking for casing via istitle(), islower() and isupper()
  3. Striping characters using strip(), lstrip() and rstrip()
  4. Splitting strings using split()
  5. Joining iterables into a string using join()
  6. Counting substrings through count()
  7. Finding substrings using find()

Title casing

The title() method converts a string into what Python calls title casing.

In title casing, the first letter of each word is uppercased while all other characters are lowercased.

An example follows:

s = "hello world, in title casing!"
print(s.title())
Hello World, In Title Casing!

See how the first letter of every world is in uppercase.

Note, however, that sometimes title() can give unexpected results as in the example below:

s = "Earth's core is hot!"
print(s.title())
Earth'S Core Is Hot!

Ideally, this string should've been converted into "Earth's Core Is Hot!", however title() uppercases even the letter following the apostrophe.

To correctly convert string of this type into a title, Python provides a more careful utility — the capwords() function of the string module — which we shall explore later on.

Lower and upper casing

To convert a string into all lowercase characters use the lower() method. Similarly to convert it into all uppercase characters, use the upper() method.

Consider the snippet below:

s = 'Hello World'
print(s.lower())
hello world

The lower() method is frequently used when comparing two strings together to normalise their casing.

For example, suppose you want to check whether a city entered by the user exists in a list of cities saved by your program. The list is as follows:

cities = ['London', 'Tokyo', 'Paris', 'Vancouver']

The user might enter the city as 'LONDON', which technically does exist in the list of cities. However if we use a comparison like the one shown below, the comparison will fail to give a match:

cities = ['London', 'Tokyo', 'Paris', 'Vancouver']
city_exists = False

city = input("Enter your city: ")

for c in cities:
    if c == city:
        city_exists = True

print("The entered city exists:", city_exists)
Enter your city: LONDON
The entered city exists: False

Usually, in such cases, the comparison is made on the lowercase versions of both the strings being compared. This makes the comparison more accurate.

cities = ['London', 'Tokyo', 'Paris', 'Vancouver']
city_exists = False

city = input("Enter your city: ")

for c in cities:
if c.lower() == city.lower(): city_exists = True print("The entered city exists:", city_exists)
Enter your city: LONDON
The entered city exists: True

Checking the casing

To check whether a given string is in title, lowercase or uppercase format, we can use the methods istitle(), islower() and isupper(), respectively.

Each of these takes a string as argument and returns a Boolean indicating whether the given string is in the respective format or not.

The following shell snippets illustrate all these three methods one by one.

Let's first see the istitle() method:

'Hello World'.istitle()
True
'Hello world'.istitle()
False
'This Is Title Casing'.istitle()
True
'10'.istitle()
False

The last statement here i.e '10'.istitle() returns False since stringified numbers ('10' in this case) don't follow title casing. In fact, they don't follow any casing at all!

Now let's see the islower() method:

'Hello World'.islower()
False
'hello world'.islower()
True
'this is lower casing'.islower()
True
'10'.islower()
False

As stated before, stringified numbers don't follow any casing; likewise '10'.islower() returns False.

Finally, let's see the isupper() method:

'Hello World'.isupper()
False
'HELLO WORLD'.isupper()
True
'THIS IS UPPER CASING'.isupper()
True
'10'.isupper()
False

Striping characters

Often times, while reading input from files or other external sources , strings are filled up with unnecessary whitespace characters.

An example is 'Hello World ', with an array of spacebar characters at the end.

In Python, one can use the strip() method to remove these characters from both ends of a given string.

Just call strip() on the string and you'll get a cleaned up version of it.

s = 'Hello World        '

print(s + '!')
print(s.strip() + '!')
Hello World !
Hello World!

Apart from whitespace characters, the strip() method can also strip other characters from both ends of a given string. This can be done by providing an argument to the strip() method.

s.strip(characters)

characters is a string containing all the individual characters to be removed from both ends of the string s.

'....Hello World....!!!!'.strip('.!')
Hello World

The string '.!' passed to strip() here specifies to remove all '.' and '!' characters from the given string.

The order of characters in the characters argument doesn't matter — it just matters which characters are mentioned. For example, the result above could've been accomplished using strip('!.') as well, instead of strip('.!').

Left strip and right strip

If we don't want to strip given characters from both ends of a string, but rather from just one end, we can use the lstrip() or rstrip() methods.

Both lstrip() and rstrip() work exactly like strip() except for that they operate on just one end of the given string.

The method lstrip() works on the left end whereas rstrip() works on the right end.

Consider the following example:

'....Hello World....'.lstrip('.')
Hello World....
'....Hello World....'.rstrip('.')
....Hello World

Splitting into substrings

Splitting a string into an array of substrings is a paramount action frequently performed on strings.

Say you have a string of numbers each separated by a single space as follows:

s = '10 20 30 40'

and you want to extract each number from this string in order to process it.

This type of a problem is more than just common in coding competitions. The input is supplied as a string of space-delimited numbers which the programmer has to break apart in order to work with each number.

Anyways, akin to most programming languages, Python has a way to split strings into smaller chunks at given separators — using the split() method.

The split() method splits a string at given positions and returns a list containing all the individual chunks of the string.

If we call split() as is on a string, it will split it at every sequence of whitespace characters, as shown below:

'10 20 30 40'.split()
['10', '20', '30', '40']

The string is broken apart at every space character to yield a list containing all the stringified numbers.

It isn't necessary to have a single space between the numbers. As stated before, split() will split the string at every sequence of whitespace characters — space, tab, newline characters etc.

This can be confirmed by the snippet below:

'10  20 30\n40      50'.split()
['10', '20', '30', '40', '50']

On the other hand, if we pass an argument to split(), then it'll split the string at every position where the argument occurs.

s.split(delimiter)

delimiter specifies the substring at which to split the string s.

A very simple example follows:

'10,20,30,40'.split(',')
['10', '20', '30', '40']

Between each number there appears a , comma character. This goes as argument to the split() method which divides the string at each occurence of this delimiter.

Note that if an argument is provided to split(), it's searched for in the string exactly as it appears. For example, in the snippet below, see how the string is split:

'10,20,30, 40'.split(',')
['10', '20', '30', ' 40']

The string is divided at every point where , occurs. This leaves the last number as ' 40', with the space character included. This is because the splitting occurs only at the given character — nothing is matched beyond it.

This also means that split('') is not equivalent of calling split(). The former would split apart a string at every single space character whereas the latter would do so at every sequence of whitespace characters.

The code below distinguishes between these two expressions:

s = '10 20  30\n40'

print(s.split(' '))
print(s.split())
['10', '20', '', '30\n40']
['10', '20', '30', '40']

As you can clearly see, split('') breaks s at each space character, producing some empty substrings in the process. However, split() breaks it correctly, taking into account multiple spaces and the newline character.

Joining into string

As common as it is splitting a string into a list of substrings, is the task of joining the elements of a list into a string.

This can be accomplished using the join() method.

The string on which join() is called behaves as the delimiter in the final joined string.

In the code below we join the list ['Hello', 'World', '!'] using the delimiter ','.

l = ['Python', 'is', 'cool!']
print(' '.join(l))
Python is cool!

Apart from lists, we use join() on any other iterable object in Python. Below we use it on a string and a tuple:

', '.join('Apples', 'Onions', 'Oranges')
Apples, Onions, Oranges
'-'.join('100')
1-0-0

Why does Python have join() as a method of strings, and not as a method of lists.

If you've worked with a programming language before, you'll be in the thought as to why is join() a method of strings in Python. Why isn't it provided on lists, instead?

Well, every developer of Python, coming from some other language comes across this thought. So what's the big idea?

In languages like JavaScript, arrays are capable of being joined into a string using the join() method. However, no other iterable data type shares this feature. For instance, in JavaScript, we can't join custom iterables into a single string.

Python looks at joining lists from another dimension. It thinks that every iterable shall be capable of being joined into a single string — not just lists. And for this, each iterables class has to have a definition for join() which is tiring.

A better option is to put a single join() method on the string data class that accepts an iterable as argument. In this way all iterables can be made capable of being joined into a string, while at the same time ensuring that there is no clutter of join() methods all around iterable classes in Python.

Counting substrings

To see how many times a given substring occurs within a string, use the count() method.

Mention the substring as an argument to the method, and then let it do all counting for you.

'This is inspirational'.count('i')
5
'This is inspirational'.count('is')
2
'This is inspirational'.count('this')
0
count() is case-sensitive, as one would expect. This can be confirmed by the last statement above — 'this' doesn't exist in the string.

One special case is counting "" in a string. You may think that it would give a 0, but it doesn't; rather it returns a number that is one greater than the length of the string.

Consider the following:

'This is inspirational'.count('')
22
'Hello'.count('')
6

Finding substrings

An extremely common concern while working with strings is searching for given substrings. This can be accomplished using the find() method.

The find() method accepts the substring as an argument and returns back the index of its first occurence. If no match is found, -1 is returned.

Consider the following:

'Hello World!'.find('World')
6

The substring 'World' occurs at index 6 in the string s; likewise, find('World') returns 6.

The find() method is case-sensitive, so the following statement would return -1:

'Two languages'.find('two')
-1

Apart from the substring to look for, find() can accept further two optional arguments specifying the indexes where to start and end the searching, respectively.

s.find(sub[, start,[ end]])

start defaults to 0 whereas end defaults to the length of the string.

Consider the following:

'Two apples are two apples'.find('apples')
4
'Two apples are two apples'.find('apples', 5)
19

First 'apples' is searched for in the string starting at index 0. The search terminates at the first occurence of 'apples' which occurs at index 4, and likewise 4 is returned.

Then, in the second statement, 'apples' is searched for in the string starting at index 5. Since the second substring 'apples' occurs at 19, this is returned by the second statement.

If the start argument in the second statement above had been 4 or lesser, the find() method would've found the first occurence of 'apples', instead of the second one.

Let's now see an example of the third end argument:

'Two apples are two apples'.find('apples')
4
'Two apples are two apples'.find('apples', 0, 3)
-1

The first statement is the same as the one above. The second statement is of interest — searching begins at index 0 and ends at index 3. Since the first occurence of 'apples' occurs at index 4 which is beyond end, this statement returns -1.