Skip to Content

5. Data types

This section covers the basic stuff of what any program works with: data.

Data comes in two basic forms: literals and variables (cf. 1. Basic concepts). It is also distinguished by its type. Just as there are different file types for music and pictures and documents, there are different data types for text data, integer numbers and decimal numbers. In general, you will use the 'string' data type for text, the 'int' data type for whole numbers, the 'float' data type for decimal numbers and the 'Boolean' data type for true or false data.

Python in general supports all C-style data types, such as char, double, long, etc., but you will not need to use these explicitly in a normal Python program.

5.1. Strings

Strings are like sentences (and in fact, often are). They're sequences of typographical characters, like numbers and letters, and are delimited by several types of quotation marks:

a) Single quotes:

python code
            'This is a string'
        

b) Double quotes:

python code
            "This is a string"
        

c) Triple quotes:

python code
            """This is a string
that extends over several lines"""
        

Generally speaking, there is no difference what sort of delimiter you choose to enclose your string in. The differences arise due to the symbols that mean special things to Python. For example, you can't use a simple single quote mark in a string delimited by single quotes, since Python will interpret it as the end of the string. So if you have text that has a lot of single quotes, it's better to use double quotes to enclose your text, and vice-versa. To display a special character in a string, you need to tell Python that it's not what it seems to be, via the 'escape symbol', the backslash:

python code
            print("Hello, it's me!") # prints: Hello, it's me!
print('Hello, it\'s me!') # same thing
print("Your dad said: \"I'm coming home late.\"") # prints: Your dad said: "I'm coming home late."
        

You can also indicate certain whitespace characters by escape sequences:

python code
            print("This is one line.\nThis will appear on the next line.") # the \n tells Python to output a newline
        

You also need to escape the backslash, to output a backslash:

python code
            print("The Windows system folder is C:\\Windows\\System") #prints: The Windows system folder is C:\Windows\System
        

A list of escape sequences follows:

python code
            \n Newline
\t Tab
\' Single quote
\" Double quote
\\ Backslash
        

You can use triple-quoted strings for text that ignores all escape sequences (previously mentioned with respect to docstrings):

python code
            """This text won't care that there are double quotes "" single quotes ' ' or backslashes\ strewn
about
   it even preserves tabs and newlines! it only ends when it sees another triple quote mark """
        

5.1.1. String operations

String manipulation is one of the number one things that simple programs are written to do. There are a few key functions that you'll use frequently: concatenating strings; getting the length of a string; extracting subsets of a string; modifying subsets of a string; and checking if a string includes a given substring.

Concatenating strings is easy in Python: you just add them with +:

python code
            firstword="Hello"
secondword="world!"
phrase = firstword+", "+secondword
print(phrase) # prints "Hello, world!"
        

As mentioned previously, you can also repeat strings with *:

python code
            phrasetwice = phrase*2 #"Hello, world!Hello, world!"
        

To extract a substring from a string, you should understand how array indexing works. Consider the string "phrase" in the example above ("Hello, world!"). This string is a list of characters, each of which has a unique number that indicates its position in the string, starting with 0. So phrase[0] = "H", phrase [1] = "e" and so on. You can also refer to characters by a negative number starting from the end of the string: phrase[-1] = "!", phrase[-2] = "d" and so on. To refer to a substring of more than one letter, use a colon:

python code
            phrase[1:4] = "ello"
phrase[-6:-2] = "world"
phrase[1:-1] = "ello, world!"
        

Including the colon but omitting one of the numbers extends the reference to the beginning or end of the string:

python code
            phrase[1:] = "ello, world!"
phrase[:-2] = "Hello, world"
        

This functionality can be useful for cutting out the last letter of a string, for example.

To find out the length of a string, use the following method:

python code
            len(phrase) # returns the integer number 13
        

There are two useful ways of finding if a string includes a particular substring (without getting into the wide world of regular expressions).

Using the "find" method searches for the given substring and returns the position index of the first matching spot and -1 otherwise:

python code
            phrase.find("world") # returns 7
        

Using the "count" method returns the number of times the given substring occurs in the test string:

python code
            phrase.count("l") # returns 3
        

To replace one substring with another, the replace method works well:

python code
            greeting = phrase.replace("world", "Mom") # greeting = "Hello, Mom!"
        

You can split a string into an array of its component words:

python code
            array = phrase.split() # array[0] = "Hello,"; array[1] = "World!"
        

(You could specify a different separator character by giving an argument inside the parentheses of the split method. This is occasionally useful when turning string data delimited by tabs or commas into arrays.)

Lastly you can convert strings into upper or lower case with the 'upper' and 'lower' methods:

python code
            print(phrase.upper) # outputs "HELLO, WORLD!"
        

5.2. Lists & Arrays

Arrays are rather like matrices in linear algebra. In regular Python, what is generally dealt with is lists: They are lists of things, indexed in all the same ways strings are:

python code
            phrase = ["an", "apple", "a", "day"]
print(phrase0])# "an"
print(phrase[2])# "a"
print(phrase[1:3]) # "['apple', 'a', 'day']"
        

You can have lists of lists (multidimensional lists):

python code
            threesquarematrix = [[1,2,3],[4,5,6],[7,8,9]]
        

Lists can be treated in much the same way strings can,

There is also something called a tuple, which is mostly the same as a list, but created with round parentheses rather than square brackets, and whose contents cannot be changed after creation (i.e., trying to assign tuple[3] = "orange" will result in an error).

5.2.1. Array operators

There is actually a true array data type, which can be created from a list or tuple with the 'array' function:

python code
            array([1,2,3,4])
        

Note the presence of both the parentheses and square brackets. the square brackets denote a list, while the parentheses turn that list into an array.

To use arrays, you must include one of the following lines at the start of your code:

python code
            import numpy
        

or

python code
            from numpy import *
        

For more on arrays, refer to the numpy tutorial.

5.3. Objects; conversion of data types

Python is an object-oriented programming language, which means that basically, everything in Python is an object. Python hides most of this functionality from you unless you need it. In general, when you write a line such as the following:

python code
            phrase.replace("world", "Mom")
        

you are invoking the "method" 'replace' on the string object named phrase with the arguments given in the parentheses.

Remember, while every variable in Python is an object, different objects may be of different type even if they contain the same information (e.g. the string "5" is not the same as the number 5) which is why data type conversion is important.

The following functions are useful:

python code
            b = int(a) # interprets variable 'a' as an integer number and puts it into variable 'b' (truncating--i.e., rounding down--any decimals)
b = float(a) # interprets variable 'a' as a floating point (decimal) number and puts it into variable 'b'
b = str(a) # formats the variable 'a' into a nicely-printable string
b = bin(a) # turns the number 'a' into a binary string (i.e. a string consisting of 1s and 0s)
        

N.B. Remember, the input function always returns a string! So convert your result to the appropriate type before doing anything with it.

Up to Index
Previous: 4. Control flow | Next: 6. Functions & Includes