The string is one of the simplest data types in python. Strings can be created by putting either single quotations ('
) or double quotations ("
) .
There are some characters that cannot be easily expressed within a string. These characters, called escape characters, can be easily integrated within a string by using two or more characters. In Python, we denote escape characters with a backslash (\
) at the beginning. For example, to start a new line in the string we could add a linefeed (\n
).
Now, let's say you want to print out some multi-line text. You could do it like this.
>>> print("""\
... Heya!
... Hi!
... Hello!
... Welcome!""")
Heya!
Hi!
Hello!
Welcome!
Some of you may have noticed that print()
automatically ends with an extra linefeed (\n
). There is a way to by pass this.
>>> print("I love Wikiversity!", end="")
I love Wikiversity!>>>
A usefully way to span a string multiple lines without inserting automatic line-feeds is to use parentheses.
>>> spam = ("Hello,
... world!")
>>> print(spam)
Hello, world!
Formatting
Strings in Python can be subjected to special formatting, much like strings in C. Formatting serves a special purpose by making it easier to make well formatted output. You can format a string using a percent sign (%
) or you could use the newer curly brackets ({}
) formatting. An simple example is given below.
>>> print("The number three (%d)." % 3)
The number three (3).
The above code simple uses special format characters (%d
), which is replaced with a decimal-based integer. The percent sign (%
) after the string is the stuff that replaces the format characters. That can be a lot to take in. Let's demonstrate this a couple more times.
>>> name = "I8086"
>>> print("Copyright (c) %s 2014" % name)
Copyright (c) I8086 2014
This time, we use a different type of format that inserts a string. You'll need to do some extra work if the string needs to be formatted more than once.
>>> name = "I8086"
>>> date = 2014
>>> print("Copyright (c) %s %d" % (name, date))
Copyright (c) I8086 2014
Notice the need for parentheses and the comma. If we don't add the parenthesis around the format arguments, then we'll get an error.
>>> name = "I8086"
>>> date = 2014
>>> print("Copyright (c) %s %d" % name, date)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: not enough arguments for format string
To keep you from guessing what is what; here is a table of all possible formats with a little information about them.
Type | Meaning |
s | String format. Default for formatting. |
b | Binary format. |
c | Converts an integer to a Unicode character before it is formatted. |
d | Decimal format. |
o | Octal format. |
x | Hexadecimal format. Use lowercase for a-h. |
X | Hexadecimal format. Use uppercase for A-H. |
n | Number format. This is the same as 'd', except that it uses the current locale setting to insert the appropriate number separator characters.[2] |
e | Exponent notation. Prints a number in scientific notation. Default precision is 6. |
E | Exponent notation. Same as 'e', except it prints 'E' in the notation. |
f | Fix point. Displays a fixed-point number. Default precision is 6. |
F | Fixed point. Same as 'f', but converts nan to NAN and inf to INF .[3] |
g | General format. |
G | General format. Switches to 'E' if numbers are too large. |
Sorry, this section is under construction.
Indexing
Strings in Python support indexing, which allows you to retrieve part of the string. It would be better to show you some indexing before we actually tell you how it's done, since you'll grasp the concept more easily.
>>> "Hello, world!"[1]
'e'
>>> spam = "Hello, world!"
>>> spam[1]
'e'
By putting the index number inside brackets ([]
), you can extract a character from a string. But what magic numbers correspond to the characters? Indexing in Python starts at 0, so the maximum index of a string is one less than its length. Lets try and index a string beyond its limits.
>>> spam = "abc"
>>> spam[3]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range
Here's a little chart of "Hello, world!"
's character positions.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
H | e | l | l | o | , | | w | o | r | l | d | ! |
Hopefully that chart above helped to visually clarify some things about indexing. Now that we know the formula for the last character in a string, we should be able to get that character.
>>> eggs = "Hello, world!"
>>> eggs[len(eggs)-1]
'!'
 |
Note: In Python, and most languages, the length of a string is measured by how many characters are contained within the string. The string "abc" is only 3 characters long. |
In the above code, we used the formula, string length minus one, to get the last character of a string. By using the built-in function len()
, we can get the length of a string. In this instance, len()
returns 13, which we subtract by 1, resulting in 12. This can be a bit exhausting and repetitive when you need to repeat this over and over again. Luckily, Python has a special indexing method that allows you to get the last character of string without needing know the strings length. By using negative numbers, we can index from right to left instead of left to right.
>>> spam = "I love Wikiversity!"
>>> spam[-1]
'!'
>>> spam[-2]
'y'
 |
Note: Since -0 will still be considered 0 in Python, so you'll need to start with -1 . spam[-19] will be 'I' instead of spam[-18] being 'I' , which would really be ' ' . |
There is a table below showing the indexing number corresponding to the character. Take some time to study the table.
-19 | -18 | -17 | -16 | -15 | -14 | -13 | -12 | -11 | -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 |
I | | l | o | v | e | | W | i | k | i | v | e | r | s | i | t | y | ! |
It is important that you understand that strings are immutable, which means that there content cannot be manipulated. Immutable data types have a fixed value that cannot change. The only way to change there value is to completely re-assign the variable.
>>> spam = "Hello,"
>>> spam = spam + " world!"
>>> spam
'Hello, world!'
From the above example, spam
is re-assigned to a different value. So what does this have to do with indexing? Well, the same rules apply to indexing, so all of the indexes cannot be assigned with a new value nor can they be manipulated. The example below will help clarify this concept.
>>> spam = "Hello, world!"
>>> spam[3] = "y"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
>>> spam[7] = " Py-"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
To re-assign a string variable while replacing part of the substring will need a little extra work with slicing. If you aren't familiar with slicing, it is taught in the next section. You'll probably want to come back to this after you're done reading that section.
>>> spam = "Hello, world!"
>>> spam = spam[:2] +"y" + spam[3:]
>>> spam
'Heyllo, world!'
>>> spam = "Hello, world!"
>>> spam = spam[:6] + " Py-" + spam[7:]
>>> spam
'Hello, Py-world!'
Slicing
Slicing is an important concept that you'll be using in Python. Slicing allows you to extract a substring that is in the string. A substring is part of a string within a string, so "I", "love", and "Python" are all substring of "I love Python.". When you slice in Python, you'll need to remember that the colon (:
) is important. It would be better to show you, then to tell you right away how to slice strings.
>>> spam = "I love Python."
>>> spam[0:1]
'I'
>>> spam[2:6]
'love'
>>> spam[7:13]
'Python'
As you can see, slicing builds onto Python's indexing concepts which were taught in the previous section. spam[0:1]
gets the substring starting with the character at 0 till the character of 1. So really the first number is where you start your slice and the number after the colon (:
) is where you end your slice.
Now slicing like this can be helpful in situations, but what if you'd like to get the first 4 characters after the start of a string? We could use the len()
function to help us, but there is an easier way. By omitting one of the parameters in the slice, it will slice from the beginning or end, depending on which parameter was omitted.
>>> eggs = "Hello, world!"
>>> eggs[:6]
'Hello,'
>>> eggs[6:]
' world!'
By slicing like this, we can remove or get part of a string without needing to know its length. As you can see from the example above, eggs[:6]
and eggs[6:]
are equal to eggs
. This helps ensure that we don't get the same character into both strings.
>>> eggs = "Hello, world!"
>>> eggs[:6]+eggs[6:]
'Hello, world!'
>>> eggs[:6] + eggs[6:] == eggs
True
The handling of IndexError
is when slicing or indexing. Attempting to index a string with a number larger than (or equal to) its length, it would produce an error.
>>> "Hiya!"[10]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range
While slicing, this kind of error is suppressed, since it returns
.
>>> "Hiya!"[10:]
''
>>> "Hiya!"[10:11]
''
>>> "Hiya!"[:10]
'Hiya!'
Sorry, this section is under construction.
Encoding
So we know what a string is and how it works, but what really is a string? Depending on the encoding, it could be different things without changing. The most prominent string encoding are ASCII and Unicode. The first is a simple encoding for some, but not all, Latin characters and other things like numbers, signs, and money units. The second, called Unicode, is a larger encoding that can have thousands of characters. The purpose of Unicode is to create one encoding that can contain all of the worlds alphabets, characters, and scripts. In Python 3 Unicode is the default encoding. So this means we can put almost any character into a string and have it print correctly. This is great news for non-English countries, since the ASCII encoding doesn't permit many types of characters. In fact, ASCII only allows 127 characters! Here's some examples using different languages, some with non-Latin characters.
>>> print("Witaj świecie!")
Witaj świecie!
>>> print("Hola mundo!")
Hola mundo!
>>> print("Привет мир!")
Привет мир!
>>> print("שלום עולם!")
שלום עולם!