How do I interact with files in python?

Python comes with libraries that allow your programs to interact with files in your computer. This document covers part of the os module.

File Systems

Your computer drive is organized in a hierarchical structure of files and directories.

Your filesystem starts from a root directory, notated by a forward slash / on Unux and by a drive letter C:/ on Windows.

Absolute and Relative file paths

Absolute file paths are notated by a leading forward slash or drive label. For example, /home/example_user/example_directory or C:/system32/cmd.exe. An absolute file path describes how to access a given file or directory, starting from the root of the file system. A file path is also called a pathname.

Relative file paths are notated by a lack of a leading forward slash. For example, example_directory. A relative file path is interpreted from the perspective your current working directory. If you use a relative file path from the wrong directory, then the path will refer to a different file than you intend, or it will refer to no file at all.

In a sense, whenever you use a relative file path, it is joined with your current directory to create an absolute file path. That is, if my current working directory is /home/example_user and I use a relative file path of example_directory/example_python_program, then that is equivalent to using tho absolute file path /home/example_user/example_directory/example_file_program.

In the following example usage of a Unix command-line shell, the current working directory is initially /home/example_user/example_directory. There is a program called example_python_program, which prints "this is an example python program". At first, the program can be referenced by the relative file path example_python_program. After the directory is changed to /home/example_user, the relative file path to access the program becomes example_directory/example_python_program. Please note that the $ symbolizes a prompt where the user is allowed to type.

$ pwd
/home/example_user/example_directory
$ ls
example_python_program.py
$ python example_python_program.py
this is an example python program
$ cd ..
$ pwd
/home/example_user
$ python example_python_program.py
python: can't open file 'example_python_program.py': [Errno 2] No such file or directory
$ python example_directory/example_python_program.py
this is an example python program

os.getcwd

When you run a python program, its current working directory is initialized to whatever your current working directory was when you ran the program. It can be retrieved using the function os.getcwd. Consider the following program, cwd_printer.py.

import os
print "The current working directory is", os.getcwd()

The following example usage of the command-line shell illustrates how Python's current working directory is set.

$ pwd
/home/example_user/example_directory
$ ls
cwd_printer.py
$ python cwd_printer.py
The current working directory is /home/example_user/example_directory
$ cd ..
$ pwd
/home/example_user
$ python example_directory/cwd_printer.py
The current working directory is /home/example_user

os.listdir and os.path.join

The os.listdir function takes one argument: an absolute or relative pathname, which should refer to a directory. The function returns a list of relative pathnames (strings) of all files/directories inside of the given directory name.

These strings should be used relative to os.listdir's argument. Consider the following situation:

The following example usage shows a wrong and a right way to open people.txt

$ pwd
/home/example_user
$ python
Python 2.7.3 (default, Sep 26 2012, 21:51:14) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> filenames = os.listdir("data")
>>> filenames
['people.txt']
>>> file = open(filenames[0])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IOError: [Errno 2] No such file or directory: 'people.txt'
>>> pathname = os.path.join("data", filenames[0])
>>> pathname
'data/people.txt'
>>> open(pathname)
<open file 'data/people.txt', mode 'r' at 0x7f74ad5d8270>
>>>

Notice that when the first attempt to open the file was made, python reported that no file exists. Consider that we are trying to open the file with the absolute path of /home/example_user/data/people.txt. When we try to use the relative path of people.txt (recall that this is interpreted as relative because it does not have a leading /), our computer will combine the relative path with the current working directory, which in this case is /home/example_user. This generates the incorrect absolute path of /home/example_user/people.txt

Instead, we can combine data and people.txt using os.path.join to generate the relative path data/people.txt. When we try to use this new relative path, our computer generates the absolute path /home/example_user/data/people.txt, which is what we want.

Do not combine paths using string concatenation (+) or anything other than os.path.join. Different computers represent paths in different ways. In particular, Windows uses \ as a directory separator in pathnames, while Unix (Mac and Linux) machines use / as a directory separator in pathnames. Your code will not handle pathnames as elegantly and correctly as os.path.join will.