Python Data Types

Chapter 6 33 mins

Learning outcomes:

  1. Primitives vs objects
  2. Integers, floats and Booleans
  3. Strings, lists, tuples
  4. Sets and dictionaries

Introduction

Every programming language on this planet enables us to deal with data. That data can be one of several types such as an integer, a string, an object and so on.

Different languages segment data differently — some have many categories of data to start with whereas some only offer a few categories withh many subcategories further deeper.

Talking about Python, it has its own data type system as we shall see, in detail, in this chapter.

Primitives vs. objects

Before we start the discussion on Python's data types, it's worthwhile to understand two commonly used terms in programming when discussing data types of a language: primitives and objects.

Any data type that is implemented in a language without any sort of binded information is known as a primitive.

An object is the exact opposite of this — it has information binded with it.

Let's understand this using a very simple example.

In Java, we can create an integer using the int keyword followed by the same assignment pattern used in Python, as shown below:

int x = 10

The variable x here is considered a primitive. It has no properties or methods available on it — it is just a pure number present in memory. The integer data type in Java is a primitive data type.

Compare this to Java's string data type which is not primitive:

String s = "Hello World!"

In the snippet above, the variable s has many properties and methods available on it.

For example, s.length() returns the number of characters in s; s.toLowerCase() converts s into all lowercase characters on so on. length() and toLowerCase() here are part of the information binded with strings in Java.

The variable s does not hold a string directly — rather it holds an object which has some attribute pointing to the string data "Hello World!" in memory and some attribute pointing to information and functionality for that string data, like the length() and toLowerCase() methods.

If you don't understand any of these details now, don't worry - as you learn programming in general, the concept of primitives and objects would come naturally to you.

If you are really curious to understand this quickly then headover to our JavaScript course — in the first six chapters you'll not only learn what are primitives and objects but also one of the most popular languages out there — JavaScript!

Coming back to the topic, we now know that a primitive data type can have no sort of information attached to it as compared to an object data type, which does have information attached — for instance, the string data type in Java that has methods attached to it, such as length(), toLowerCase() and so on.

Talking about Python, it has no primitive data type:

Everything in Python is an object.

Let's explore this in detail...

Everything is an object

Python is an object-oriented language where everything is an object. Now let's first understand what exactly is an object.

Think of the real world objects around you such as a computer — it has characteristics such as color, size, weight, price and so on and similarly some behavior as well — it can be powered on, shut down and so on.

Let's take another example: a toaster. It also has properties — color, size, weight, wattage, brand name and even behavior like toasting bread.

This concept of an object is exactly what the term 'object' in Python and in all OOP languages refers to:

An object is an entity with properties and/or some behavior.

This simply means that everything in Python has properties and/or behavior attached to it.

But how do we confirm this fact?

There's a simple, yet clever way to do this.

In Python, passing a given value to the dir() function returns all the information binded with the value, in the form of a list.

Although, it's too early for now to completely understand the concept of a function or a list, to the core, it won't take long to grasp the outskirts of these concepts.

A function, as we've seen in Python Basics, is a block of code that can be executed by calling the function. A function is called by writing the name of the function followed by a pair of () parentheses.

Here's how we would call the dir() function on an integer 10 in Python:

dir(10)

First comes the name dir followed by a pair of () parentheses. Inside these parentheses goes the integer 10. The integer 10 here is called an argument to the function dir().

An argument is data that we provide to a function to let it do its work.

Let's see what does dir(10) return:

dir(10)
['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate', 'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes']

As you can see, it returns a huge list of information. Well, what's returned is indeed called a list in Python.

A list is another type of data in Python. It holds multiple values under one single unit. All the values are stored in contiguous blocks of memory, and are reachable by indexes.

Once again, don't worry if you don't understand what is a list by this short description. You'll learn about Python lists in the section below, and then, in even more detail, in the unit Python Lists.

Coming to the dir() function, when we call it on the integer 10, we get a lot of information returned back. This confirms the fact that an integer is an object in Python; not a primitive.

Let's work with some of this information:

10 .__add__(5)
15

__add__() is referred to as a method of the integer 10. Its purpose is apparent in its name — it adds the main integer to a provided value.

A method is a special kind of a function, in that it is attached to an object.

We call it by writing a period (.) after the value (on which the method is desired to be called) followed by the name of the method, followed by a pair of parentheses. Any arguments to the method are provided in the parentheses, just like we do in normal functions.

In this case, we call the method __add__() on the value 10, and pass it the value 5 as argument. The return value is the sum of 10 and 5, which is 15.

Simple, isn't it?

The space between 10 and . (i.e. 10 .__add__(5)) in the code above is very important. It distinguishes a method call from a floating-point number.

Recall that a floating-point number also has a period character in it, which denotes the decimal point. If we wrote 10.__add__(5), without a space, Python would have misinterpreted the dot as a decimal point, ultimately leading to an error.

In this way, every single information shown above can be called on the integer 10, and on all integers in Python.

To boil it down,

Everything in Python can be inspected using dir() and what we get in return is always a list holding some information. This confirms the bigger picture — everything is, in effect, an object.

Note that this type model is not used in every programming language.

For instance, in Java, some data types such as integers, floats and Booleans are primitives i.e they are not objects, and so have no methods or properties available on them.

In terms of memory, this object type model puts overhead information to be carried around, however despite this it allows for quick and flexible programming, which rules out its weak point in many applications.

Programming day-to-day applications using a language that treats every data type as an object, such as Python, won't cause even the slightest of considerable performance janks! It's only in memory intensive applications such as 3D Games, that working with such languages becomes a concern.

Couldn't understand all this? No problem. All this will become clear with time as you learn Python and programming in general.

So what we've learnt so far is that everything in Python is of type object.

However, it's paramount to realise that not everything is the same type of object.

Integers are a different type of object as compared to floats. Strings are another type of object and so are Booleans. Everything is definitely an object, but a different kind of an object.

Take the example of your house — everything within it can be thought of as an object, but not everything is the same kind of object. You have chairs, tables, lights, fans, and so on!

From this point onwards, we'll be referring to these individual types of objects simply as data types, and not as object types, given that you keep it in mind that every data type in Python is, in effect, an object type.

Integers

Integers are whole numbers, without a decimal point.

Examples include -2, -1, 0, 1, 2 and so on.

Even if a number is technically a whole number but has a decimal point in it, it is not classified as an integer. Rather, it's classified as a float, as we shall see in the next section.

For instance, 4.0 is technically a whole number as its fractional part is equal to zero. Nonetheless, Python recognises this as a float; not as an integer!

But how do we know which value is considered an integer and which one is considered a float?

Well one way is to use the type() function.

It works as follows: we provide it a value whose type we want to know, as an argument similar to passing a value to the dir() or print() functions. The function returns back the object type of the value, in a special notation.

As we shall see later on in this course, what type() actually returns back is technically the class of the given value.

Let's inspect the type of the numbers 4 and 4.0 in the shell:

type(4)
<class 'int'>
type(4.0)
<class 'float'>

As can be confirmed from the snippet above, 4 is an integer since type(4) returns <class 'int'>. Here int refers to an integer.

On the same lines, 4.0 is not an integer, since type(4.0) returns <class 'float'>.

Moving on, unlike many languages, Python sets no specific limit to the size of integers - they can be as large as one desires, but obviously within the limits of the machine being used.

You can't store something like 100100100 on a machine with a limited amount of memory!

Below we multiply two large numbers together to obtain an even larger number, yet capable of being processed by Python:

x = 1984548948495055640
y = 400379004593964645645405

z = x * y
print(z) # 794571732566449589026609381571299185334200

If you think this is big enough, consider the following code, where we generate a number spanning close to 5 lines!

z = 5 ** 500
print(z)
30549363634996046820519793932136176997894027405723266638936139092812916265247204577018572351080152282568751526935904671553178534278042839697351331142009178896307244205337728522220355888195318837008165086679301794879136633899370525163649789227021200352450820912190874482021196014946372110934030798550767828365183620409339937395998276770114898681640625

Having no sort of limit on the size of integers is one of the many reasons developers prefer Python in coding competitions (where numbers can easily go out of control!) and some number-intensive applications.

Floats

The second classification of numbers in Python is that of floats.

Floats, or floating-point numbers, are numbers with a decimal point.

Examples include -5.1, -0.7, 0.0, 3.89, 10.001.

x = 0.5

Floats in Python are based on the IEEE-754 double-precision floating-point format; the same format used in JavaScript for all numbers, and in Java for the double data type.

In this format, each floating-point number is represented using 8 bytes of memory.

Python floats aren't 8 bytes large!

Remember that in Python, a floating point number won't be 8 bytes large if you inspect it. Rather it would be greater than that. Why?

Simply because of Python's everything-is-an-object type system. Floats are also objects with attached information, and storing this information requires memory. This memory along with the 8 bytes of storing the actual floating point number (in the IEEE-754 format) adds upto something definitely greater than 8 bytes!

Let's inspect the type of floating-point numbers in the shell:

type(0.3)
<class 'float'>

As we saw before, type() called on floats returns <class 'float'> which is the class representing all the floats in Python. We'll see more details to this class in the Python Number Basics chapter.

Strings

As we saw back in the Python Basics chapter, a string is a sequence of textual characters that can be denoted in a multitude of ways — the most common being '' and "".

Below shown is an example:

s1 = 'Hello'
s2 = "World!"

We have two strings stored in two variables s1 and s2, one denoted using '' and the other using "".

We'll explore other ways to denote a string in the Python Strings Basics chapter.

Let's now see some other aspects of strings...

Since a string is just a sequence of characters, it has two typical concepts related to sequences in computer science i.e. length and index.

The length of a string is the total number of characters in it.

The index of a particular character in a string is its position in the string. Indexes begin at 0 and increment by 1 with every subsequent character.

Hence, the first character is at index 0, the second is at index 1, the third is at index 2, and so on and so forth.

To determine the length of a string in Python, we use the len() function.

An example follows:

len('Hello')
5
len('A B C D')
7
len('cat')
3
len(' ')
1
s = 'Programming geeks'
len(s)
17

Note that a space is also a valid character and hence also gets counted in the length of the string.

Following is a quick test for you:

What will len('') return?

  • -1
  • 0
An empty string (i.e '' or "") has no characters in it. That is, its length is 0. And this means that len() called on that string returns 0.

As far as the index is concerned, it's quite a common concern to retrieve a particular character from a string based on its index. This can be done via bracket notation.

Here's the general syntax of bracket notation as applied on a string:

string[index]

string is the string whose character ought to be retrieved, and index is the index of that very character, given as an integer. The expression string[index] returns the given character.

The use of brackets ([]) here is the reason why this notation is called 'bracket notation'.

Consider the code below:

s = 'Hello World!'
s[0]
'H'
s[1]
'e'
s[5]
' '
s[9]
'l'

The first character lies at index 0, likewise s[0] returns the first character of s i.e. 'H'. The second character, similarly, lies at index 1 and this is what s[1] returns i.e. 'e'. And so on and so forth.

As we know from the previous chapters, one of the most common and useful operations performed on strings in Python, and many other programming languages, is concatenation. The operator used in this regard is +.

Concatenation is to join two strings together into one single string.

The snippet below demonstrates concatenation:

'Hello' + ' World!'
'Hello World!'
'Hello' + 'World!'
'HelloWorld!'

Simple.

Finally, let's explore what is returned when type() is called with a string:

type(s)
<class 'str'>

<class 'str'> is returned, since all strings in Python belong to the class str.

Booleans

One of the most useful concepts in computer programming is that of conditional execution. Conditional execution is when a piece of code is executed only if a given condition is met.

At the heart of this concept sits Booleans - that are simply true or false values.

In Python, the two Boolean values are True and False.

Let's create two Boolean variables:

is_raining = True
user_authorised = False

Both of these are considered reserved keywords by the language!

In some languages, like JavaScript, PHP, etc. Boolean values are given as true and false. In Python, these values are capitalised and it's necessary to capitalise them if you wish to use them.

At least for now, you won't find Booleans any useful. It's only once we get the hang of control-flow structures like while, for, if etc. that the significance of Booleans will become apparent.

Lists

Lists in Python are an extremely useful data type. They represent a sequence of values that can be of any type.

Consider how lists work in real life - we have items one after another in an ordered manner. This is just how lists work in Python.

To create a list, we start by writing a pair of square brackets []. Inside this pair we put the items of the list, also known as the elements of the list. Each new item is separated from the previous one using a , comma

Below shown is a simple example:

odds = [1, 3, 5]

The variable odds is a list of three elements, all integers (and odd numbers).

Each item in a list is at a specific position. This position is formally referred to as an index.

The first element is at index 0, the second one is index 1, the third is at index 2 and so on.

To access a given element of a list we ought to use its index.

First comes the name of the list, followed by a pair of [] square brackets and then within these brackets, the index of the element we wish to be retrieved.

Let's access the first and third elements of the list odds:

odds[0]
1
odds[2]
5

The first element is at index 0 and so we write odds[0] to access it. The same goes for the third element.

List indexes can only be integers, nothing else - not even floats!

We'll learn more about lists including the syntax of creating a list, the concept of list comprehensions, dimensions of a list, how to loop over a given list, sorting lists, and much much more in the Python Lists unit.

Tuples

In mathematics, a tuple is simply an ordered collection of numbers denoted using a pair of () parentheses. The following are examples of tuples.

(1, 2), (0, 1, 2), (1.2, 3.7)

Lists aren't the only way to store sequences of data in Python - it provides another data type to serve this purpose and that is tuples.

Generally tuples behave exactly like lists except for the fact that they are immutable i.e we can't change a tuple's value once it has been defined.

Creating a tuple in Python follows the same syntax as creating a tuple in mathematics - write a pair of () parentheses and then within these parentheses, put the individual items of the tuple, separated by a , comma.

Below we create a tuple holding the first 3 odd numbers:

odds_tuple = (1, 3, 5)

To access items in a tuple we use the same index logic as we did in the case of lists and strings; since tuples are also sequences.

odds_tuple[0]
1

Sets

A great deal of mathematics utilises the concept of set theory. Sets are unordered collections of data that usually meets a given property (although it is not necessary to).

In Python, the set data type is exactly based on sets in mathematics. It's denoted in the same way, it works in the same way - it just does everything in the same way!

To create a set, we start with a pair of {} curly braces. Inside these, we put the elements of the set, separated from one another using the same old , comma character.

Below we create a set s holding the first 5 non-negative even numbers:

s = {0, 2, 4, 6, 8}

Remember that a set is unorderd in nature, which means that we can't just access any of its elements using an index. There is no concept of indexes in sets!

Being unordered in nature also means that the two sets {0, 1} and {1, 0} are equal to one another. Let's compare these in real:

{0, 1} == {1, 0}
True

The == double equals sign here denotes the equality operator.

The equality operator compares two values and returns True if they are equal to one another; or otherwise False.

In the snippet above, True was returned by the given equality operation which confirms the fact that Python considers {0, 1} and {1, 0} as identical sets.

In fact, any two sets, that hold the same elements be they in any order, are considered equal to one another.

In the Python Sets unit, we'll explore how to perform set operations on Python sets. These include intersection, union, difference, symmetric difference; checking whether a set is a subset or superset of another set; and much more.

Dictionaries

If you want to store labeled information of a given object in one place, then a dictionary is your way to go.

A dictionary is an unordered collection of key-value pairs. A key is usually a characteristic of the object the dictionary represents and a value is its corresponding value.

Creating a dictionary is superbly easy...

Start with a pair of {} curly braces and then inside these, put the key-value pairs separated by a , comma. A key-value pair is formed as follows: write the key, followed by a : colon, and finally write the value that belongs to this key.

Dictionary keys can be strings, integers, or tuples. However, in most cases they are strings.

The general syntax of a dictionary can be represented as:

{key1: value1, key2: value2, ....}

Consider the code below:

item = {'category': 'Dairy', 'name': 'Eggs', 'price': 1.2}

Notice how the dictionary item here models a real world item in a grocery store i.e a box of eggs. The keys represent properties of the item such as its category. its price; whereas the values represent their corresponding values, obviously.

Dictionaries in Python are made for this purpose - they can encapsulate labeled data of a given item.

However, there isn't any necessity that you have to use a dictionary for only this purpose - you can use it for other cases as well.

One is highlighted below:

students = {'maths': 60, 'chemistry': 56, 'physics': 31}

The dictionary students here shows how many students are enrolled in each subject offered at an institute.

Notice that the dictionary does not denote a real world item here whose properties are 'maths', 'physics' or 'chemistry'. Rather, it's just a convenient name for us to denote how many students are enrolled in a particular subject.

We'll learn more about dictionaries in the Python Dictionaries unit.

More types

The list of data types in Python doesn't end here. All the ones that we've mentioned above are pretty basic and so got the chance to be put up in this chapter.

There is a decent amount of other data types such as classes, modules, functions, bytearrays etc. left to be discovered in the late segment of this course.

For now, getting hang of these elementary data types is important for you so that you can get more fluid in working with Python and as a result become more confident for some concepts you'll learn in the coming chapters.