Python String Methods
Learning outcomes:
- Manipulating casing using
title()
,lower()
andupper()
- Checking for casing via
istitle()
,islower()
andisupper()
- Striping characters using
strip()
,lstrip()
andrstrip()
- Splitting strings using
split()
- Joining iterables into a string using
join()
- Counting substrings through
count()
- Finding substrings using
find()
Title casing
The title()
method converts a string into what Python calls title casing.
In title casing, the first letter of each word is uppercased while all other characters are lowercased.
An example follows:
s = "hello world, in title casing!"
print(s.title())
See how the first letter of every world is in uppercase.
Note, however, that sometimes title()
can give unexpected results as in the example below:
s = "Earth's core is hot!"
print(s.title())
Ideally, this string should've been converted into "Earth's Core Is Hot!"
, however title()
uppercases even the letter following the apostrophe.
To correctly convert string of this type into a title, Python provides a more careful utility — the capwords()
function of the string
module — which we shall explore later on.
Lower and upper casing
To convert a string into all lowercase characters use the lower()
method. Similarly to convert it into all uppercase characters, use the upper()
method.
Consider the snippet below:
s = 'Hello World'
print(s.lower())
The lower()
method is frequently used when comparing two strings together to normalise their casing.
For example, suppose you want to check whether a city entered by the user exists in a list of cities saved by your program. The list is as follows:
cities = ['London', 'Tokyo', 'Paris', 'Vancouver']
The user might enter the city as 'LONDON'
, which technically does exist in the list of cities. However if we use a comparison like the one shown below, the comparison will fail to give a match:
cities = ['London', 'Tokyo', 'Paris', 'Vancouver']
city_exists = False
city = input("Enter your city: ")
for c in cities:
if c == city:
city_exists = True
print("The entered city exists:", city_exists)
Usually, in such cases, the comparison is made on the lowercase versions of both the strings being compared. This makes the comparison more accurate.
cities = ['London', 'Tokyo', 'Paris', 'Vancouver']
city_exists = False
city = input("Enter your city: ")
for c in cities:
if c.lower() == city.lower():
city_exists = True
print("The entered city exists:", city_exists)
Checking the casing
To check whether a given string is in title, lowercase or uppercase format, we can use the methods istitle()
, islower()
and isupper()
, respectively.
Each of these takes a string as argument and returns a Boolean indicating whether the given string is in the respective format or not.
The following shell snippets illustrate all these three methods one by one.
Let's first see the istitle()
method:
'Hello World'.istitle()
'Hello world'.istitle()
'This Is Title Casing'.istitle()
'10'.istitle()
The last statement here i.e '10'.istitle()
returns False
since stringified numbers ('10'
in this case) don't follow title casing. In fact, they don't follow any casing at all!
Now let's see the islower()
method:
'Hello World'.islower()
'hello world'.islower()
'this is lower casing'.islower()
'10'.islower()
As stated before, stringified numbers don't follow any casing; likewise '10'.islower()
returns False
.
Finally, let's see the isupper()
method:
'Hello World'.isupper()
'HELLO WORLD'.isupper()
'THIS IS UPPER CASING'.isupper()
'10'.isupper()
Striping characters
Often times, while reading input from files or other external sources , strings are filled up with unnecessary whitespace characters.
An example is 'Hello World '
, with an array of spacebar characters at the end.
In Python, one can use the strip()
method to remove these characters from both ends of a given string.
Just call strip()
on the string and you'll get a cleaned up version of it.
s = 'Hello World '
print(s + '!')
print(s.strip() + '!')
Apart from whitespace characters, the strip()
method can also strip other characters from both ends of a given string. This can be done by providing an argument to the strip()
method.
s.strip(characters)
characters
is a string containing all the individual characters to be removed from both ends of the string s
.
'....Hello World....!!!!'.strip('.!')
The string '.!'
passed to strip()
here specifies to remove all '.'
and '!'
characters from the given string.
characters
argument doesn't matter — it just matters which characters are mentioned. For example, the result above could've been accomplished using strip('!.')
as well, instead of strip('.!')
.Left strip and right strip
If we don't want to strip given characters from both ends of a string, but rather from just one end, we can use the lstrip()
or rstrip()
methods.
Both lstrip()
and rstrip()
work exactly like strip()
except for that they operate on just one end of the given string.
The method lstrip()
works on the left end whereas rstrip()
works on the right end.
Consider the following example:
'....Hello World....'.lstrip('.')
'....Hello World....'.rstrip('.')
Splitting into substrings
Splitting a string into an array of substrings is a paramount action frequently performed on strings.
Say you have a string of numbers each separated by a single space as follows:
s = '10 20 30 40'
and you want to extract each number from this string in order to process it.
This type of a problem is more than just common in coding competitions. The input is supplied as a string of space-delimited numbers which the programmer has to break apart in order to work with each number.
Anyways, akin to most programming languages, Python has a way to split strings into smaller chunks at given separators — using the split()
method.
The split()
method splits a string at given positions and returns a list containing all the individual chunks of the string.
If we call split()
as is on a string, it will split it at every sequence of whitespace characters, as shown below:
'10 20 30 40'.split()
The string is broken apart at every space character to yield a list containing all the stringified numbers.
It isn't necessary to have a single space between the numbers. As stated before, split()
will split the string at every sequence of whitespace characters — space, tab, newline characters etc.
This can be confirmed by the snippet below:
'10 20 30\n40 50'.split()
On the other hand, if we pass an argument to split()
, then it'll split the string at every position where the argument occurs.
s.split(delimiter)
delimiter
specifies the substring at which to split the string s
.
A very simple example follows:
'10,20,30,40'.split(',')
Between each number there appears a ,
comma character. This goes as argument to the split()
method which divides the string at each occurence of this delimiter.
Note that if an argument is provided to split()
, it's searched for in the string exactly as it appears. For example, in the snippet below, see how the string is split:
'10,20,30, 40'.split(',')
The string is divided at every point where ,
occurs. This leaves the last number as ' 40'
, with the space character included. This is because the splitting occurs only at the given character — nothing is matched beyond it.
This also means that split('')
is not equivalent of calling split()
. The former would split apart a string at every single space character whereas the latter would do so at every sequence of whitespace characters.
The code below distinguishes between these two expressions:
s = '10 20 30\n40'
print(s.split(' '))
print(s.split())
As you can clearly see, split('')
breaks s
at each space character, producing some empty substrings in the process. However, split()
breaks it correctly, taking into account multiple spaces and the newline character.
Joining into string
As common as it is splitting a string into a list of substrings, is the task of joining the elements of a list into a string.
This can be accomplished using the join()
method.
The string on which join()
is called behaves as the delimiter in the final joined string.
In the code below we join the list ['Hello', 'World', '!']
using the delimiter ','
.
l = ['Python', 'is', 'cool!']
print(' '.join(l))
Apart from lists, we use join()
on any other iterable object in Python. Below we use it on a string and a tuple:
', '.join('Apples', 'Onions', 'Oranges')
'-'.join('100')
Why does Python have join()
as a method of strings, and not as a method of lists.
If you've worked with a programming language before, you'll be in the thought as to why is join()
a method of strings in Python. Why isn't it provided on lists, instead?
Well, every developer of Python, coming from some other language comes across this thought. So what's the big idea?
In languages like JavaScript, arrays are capable of being joined into a string using the join()
method. However, no other iterable data type shares this feature. For instance, in JavaScript, we can't join custom iterables into a single string.
Python looks at joining lists from another dimension. It thinks that every iterable shall be capable of being joined into a single string — not just lists. And for this, each iterables class has to have a definition for join()
which is tiring.
A better option is to put a single join()
method on the string data class that accepts an iterable as argument. In this way all iterables can be made capable of being joined into a string, while at the same time ensuring that there is no clutter of join()
methods all around iterable classes in Python.
Counting substrings
To see how many times a given substring occurs within a string, use the count()
method.
Mention the substring as an argument to the method, and then let it do all counting for you.
'This is inspirational'.count('i')
'This is inspirational'.count('is')
'This is inspirational'.count('this')
count()
is case-sensitive, as one would expect. This can be confirmed by the last statement above — 'this'
doesn't exist in the string.One special case is counting ""
in a string. You may think that it would give a 0
, but it doesn't; rather it returns a number that is one greater than the length of the string.
Consider the following:
'This is inspirational'.count('')
'Hello'.count('')
Finding substrings
An extremely common concern while working with strings is searching for given substrings. This can be accomplished using the find()
method.
The find()
method accepts the substring as an argument and returns back the index of its first occurence. If no match is found, -1
is returned.
Consider the following:
'Hello World!'.find('World')
The substring 'World'
occurs at index 6
in the string s
; likewise, find('World')
returns 6
.
The find()
method is case-sensitive, so the following statement would return -1
:
'Two languages'.find('two')
Apart from the substring to look for, find()
can accept further two optional arguments specifying the indexes where to start and end the searching, respectively.
s.find(sub[, start,[ end]])
start
defaults to 0
whereas end
defaults to the length of the string.
Consider the following:
'Two apples are two apples'.find('apples')
'Two apples are two apples'.find('apples', 5)
First 'apples'
is searched for in the string starting at index 0
. The search terminates at the first occurence of 'apples'
which occurs at index 4
, and likewise 4
is returned.
Then, in the second statement, 'apples'
is searched for in the string starting at index 5
. Since the second substring 'apples'
occurs at 19
, this is returned by the second statement.
start
argument in the second statement above had been 4
or lesser, the find()
method would've found the first occurence of 'apples'
, instead of the second one.Let's now see an example of the third end
argument:
'Two apples are two apples'.find('apples')
'Two apples are two apples'.find('apples', 0, 3)
The first statement is the same as the one above. The second statement is of interest — searching begins at index 0
and ends at index 3
. Since the first occurence of 'apples'
occurs at index 4
which is beyond end
, this statement returns -1
.
Spread the word
Think that the content was awesome? Share it with your friends!
Join the community
Can't understand something related to the content? Get help from the community.