File Handling
Python reads and writes files using the built-in open() function. The with statement ensures files are closed properly, even when errors occur. Text mode, binary mode, encoding, and file position are the main concepts to know.
File handling shows up in data processing, config loading, and logging. Knowing when to use text vs binary mode, how
read()affects position, and whywithmatters avoids subtle bugs and impresses interviewers.
Opening Files
open(file, mode='r', encoding=None) returns a file object. The file path can be a string or a pathlib.Path. The mode controls whether the file is read, written, or appended, and whether it is text or binary.
| Mode | Description |
|---|---|
'r' | Read (default). File must exist. |
'w' | Write. Creates or truncates the file. |
'a' | Append. Creates the file if missing; writes at the end. |
'x' | Exclusive create. Fails if the file exists. |
'b' | Binary mode. Use with r, w, a, or x. |
't' | Text mode (default). |
'+' | Read and write. Use with r, w, or a. |
Combine modes: 'rb' reads binary, 'wb' writes binary, 'r+' reads and writes text. The 'b' and 't' modes are mutually exclusive.
f = open("data.txt", "r") # read text (default)
f = open("image.png", "rb") # read binary
f = open("out.txt", "w") # write text (truncates)
f = open("log.txt", "a") # append textEncoding: In text mode, encoding defaults to the platform encoding (often UTF-8 on modern systems). Specify it explicitly for portability: open("file.txt", encoding="utf-8").
Newline: The newline parameter controls how newlines are translated in text mode. The default None translates \n to the platform newline when writing and normalizes \r\n to \n when reading. Use newline="" for CSV files to avoid double-translation of newlines inside quoted fields.
The with Statement
The with statement guarantees the file is closed when the block exits, whether by normal completion or by an exception. It calls __enter__ when entering and __exit__ when leaving, and __exit__ closes the file.
with open("data.txt") as f:
content = f.read()
# f is closed here, even if read() raised an errorWithout with: If an exception occurs before f.close(), the file stays open. On some systems, too many open files causes errors. Always prefer with for file handling.
# Risky: file may stay open on error
f = open("data.txt")
content = f.read()
f.close()
# Safe: file closed on any exit path
with open("data.txt") as f:
content = f.read()Multiple files: Open several files in one with by separating them with commas.
with open("in.txt") as fin, open("out.txt", "w") as fout:
for line in fin:
fout.write(line.upper())Reading Files
read()
f.read(size) reads up to size characters (text mode) or bytes (binary mode). With no argument, it reads until the end of the file. Each call advances the file position.
with open("data.txt") as f:
all_text = f.read() # entire file as one stringWith size: f.read(20) reads at most 20 characters. Useful for chunked reading of large files.
with open("data.txt") as f:
chunk = f.read(20) # first 20 characters
next_chunk = f.read(20) # next 20readline()
f.readline() reads one line, including the newline character (if present). Returns an empty string at end of file.
with open("data.txt") as f:
first = f.readline() # "line one\n"
second = f.readline() # "line two\n"
last = f.readline() # "" at EOFreadlines()
f.readlines() reads all lines and returns a list of strings. Each string ends with \n except possibly the last line. For large files, this loads everything into memory.
with open("data.txt") as f:
lines = f.readlines() # ["line 1\n", "line 2\n", "line 3\n"]Iterating Over Lines
The file object is iterable. for line in f: reads one line at a time without loading the whole file. This is the preferred way to process large files line by line.
with open("data.txt") as f:
for line in f:
print(line.strip()) # process each lineMemory: Iteration uses a buffer and does not load the entire file. readlines() loads all lines at once. For a file with millions of lines, iteration is the right choice.
Writing Files
write()
f.write(s) writes the string s to the file. In text mode, s must be a string. In binary mode, s must be bytes. Returns the number of characters or bytes written.
with open("out.txt", "w") as f:
n = f.write("hello\n")
f.write("world\n")
# n is 6 (length of "hello\n")No automatic newline: write() does not add a newline. Add \n explicitly when writing lines.
with open("out.txt", "w") as f:
f.write("line 1\n")
f.write("line 2\n")writelines()
f.writelines(iterable) writes each string from the iterable. It does not add newlines between items. If each item should be a line, each string must end with \n.
lines = ["a\n", "b\n", "c\n"]
with open("out.txt", "w") as f:
f.writelines(lines)Common mistake: writelines(["a", "b", "c"]) produces "abc" with no line breaks. Use "a\n", "b\n", "c\n" or write in a loop with write().
File Modes in Detail
Read ('r')
The file must exist. FileNotFoundError is raised if it does not. The file position starts at 0.
# open("missing.txt") # FileNotFoundError if file does not exist
with open("data.txt") as f:
print(f.read())Write ('w')
Creates the file if it does not exist. If it exists, it is truncated to zero length. All previous content is lost. Use 'a' when you want to keep existing content.
with open("out.txt", "w") as f:
f.write("new content\n")
# Previous content of out.txt is goneAppend ('a')
Creates the file if it does not exist. If it exists, writes go to the end. Existing content is preserved. The position cannot be moved before the end for writing (platform-dependent).
with open("log.txt", "a") as f:
f.write("entry at end\n")Exclusive Create ('x')
Creates the file only if it does not exist. Raises FileExistsError if the file already exists. Useful to avoid overwriting an existing file by mistake.
# with open("new.txt", "x") as f:
# f.write("first write\n")
# Second run: FileExistsErrorText vs Binary Mode
| Mode | Data type | Newline handling |
|---|---|---|
Text ('t' or default) | str | Translates \n to/from platform newline |
Binary ('b') | bytes | No translation; raw bytes |
Text mode: read() returns strings. write() accepts strings. On Windows, \n is converted to \r\n when writing and back when reading. On Unix, no conversion.
Binary mode: read() returns bytes. write() accepts bytes. No newline conversion. Use for images, executables, and any non-text data.
# Text: strings
with open("text.txt", "w") as f:
f.write("hello\n")
# Binary: bytes
with open("data.bin", "wb") as f:
f.write(b"\x00\x02\x04\x06")File Position: seek() and tell()
Every file object has a position: the byte (binary) or character (text) offset where the next read or write occurs. tell() returns the current position. seek(offset, whence) moves it.
whence | Meaning |
|---|---|
0 (default) | From start of file |
1 | From current position |
2 | From end of file |
with open("data.txt") as f:
f.read(4) # read first 4 chars; position now 4
pos = f.tell() # 4
f.seek(0) # back to start
f.seek(2, 1) # move 2 forward from current (whence=1)Text mode caveat: seek() with whence=1 or whence=2 is not supported in text mode on some platforms. For portable code, use seek(0) to rewind to the start, or use binary mode when random access is needed.
After read(): A full read() leaves the position at end of file. A subsequent read() returns an empty string. Use seek(0) to read again.
with open("data.txt") as f:
first = f.read()
second = f.read() # "" - position at EOF
f.seek(0)
again = f.read() # same as firstEncoding
In text mode, encoding controls how bytes on disk are decoded to strings (read) and how strings are encoded to bytes (write). The default comes from the system; specify encoding="utf-8" for portable, Unicode-safe handling.
with open("file.txt", encoding="utf-8") as f:
content = f.read()
with open("out.txt", "w", encoding="utf-8") as f:
f.write("café\n")Encoding errors: Invalid bytes raise UnicodeDecodeError when reading. Use errors="ignore" or errors="replace" to handle them: open("file.txt", encoding="utf-8", errors="replace").
Path Handling with pathlib
The pathlib module provides Path objects for cross-platform path handling. Path works with open() and avoids string concatenation for paths.
from pathlib import Path
p = Path("data") / "file.txt"
with open(p) as f:
content = f.read()
# Create parent dirs
Path("logs/2024").mkdir(parents=True, exist_ok=True)Useful methods: Path.exists(), Path.is_file(), Path.read_text(), Path.write_text(), Path.iterdir().
from pathlib import Path
p = Path("config.txt")
if p.exists():
text = p.read_text(encoding="utf-8")
p.write_text("new config", encoding="utf-8")Common Patterns
Read Entire File
with open("data.txt") as f:
content = f.read()Read Line by Line (Large Files)
with open("large.txt") as f:
for line in f:
process(line)Read Lines into a List
with open("data.txt") as f:
lines = f.readlines()
# or
with open("data.txt") as f:
lines = list(f)Write Multiple Lines
lines = ["a\n", "b\n", "c\n"]
with open("out.txt", "w") as f:
f.writelines(lines)
# or
with open("out.txt", "w") as f:
for line in lines:
f.write(line)Append to a File
with open("log.txt", "a") as f:
f.write("new entry\n")Copy a File
with open("source.txt") as fin, open("copy.txt", "w") as fout:
fout.write(fin.read())Read and Process Numbers
with open("numbers.txt") as f:
values = [int(line.strip()) for line in f if line.strip()]
# numbers.txt: 2, 4, 6, 8 on separate linesBuffering
open() accepts a buffering argument. The default is line buffering for interactive streams and a fixed buffer (often 8 KB) for files. Usually the default is fine. For unbuffered binary I/O, use buffering=0 (binary mode only).
Tricky Behaviors
Open - 'w' truncates immediately
Opening with 'w' truncates the file as soon as open() is called, not when the first write() happens. The previous content is lost before any write.
Open - 'x' fails if file exists
'x' is for exclusive creation. If the file already exists, open() raises FileExistsError. Use it when overwriting would be a mistake.
Read - read() advances position
Each read() moves the position forward. A second read() without seek() returns what remains. After reading the whole file, read() returns an empty string.
Read - readlines() loads everything
readlines() reads the entire file into a list. For a 2 GB file, that uses about 2 GB of memory. Prefer for line in f: for large files.
Read - Last line may lack newline
The last line of a file might not end with \n. readlines() returns it as-is. for line in f: also returns it without a trailing newline. Check line.endswith('\n') if that matters.
Write - No automatic newline
write("hello") and write("world") produce "helloworld" with no separator. Add \n explicitly for line-based output.
Write - writelines() does not add newlines
writelines(["a", "b"]) writes "ab". Each string must include its own newline if lines are desired.
Mode - 'a' ignores seek() for writing
In append mode, seek() may change the position for reading, but writes typically go to the end regardless. Behavior can vary by platform. Do not rely on seek() to insert in the middle when using 'a'.
Mode - Text vs binary and write()
In text mode, write() accepts only strings. In binary mode, only bytes. Passing the wrong type raises TypeError.
Encoding - Default is platform-specific
Without encoding, Python uses the default from the system. On Windows it might be cp1252; on Linux, often utf-8. Specify encoding="utf-8" for consistent behavior across machines.
Newline - Default translation on Windows
With newline=None (default), \n is converted to \r\n when writing on Windows, and \r\n is converted to \n when reading. For CSV or other formats with newlines inside quoted fields, use newline="" to avoid corrupting the data.
Path - String paths and separators
"data/file.txt" works on Unix and Windows (Python accepts / on Windows). "data\\file.txt" is Windows-specific. pathlib.Path handles separators for you.
with - Variable persists after block
The variable (e.g. f) still exists after the with block, but the file is closed. Calling f.read() after the block raises ValueError: I/O operation on closed file.
with - Multiple files and early exit
When opening multiple files in one with, all are closed when the block exits. If an error occurs while opening the second file, the first is still closed correctly.
Open - FileNotFoundError vs IOError
FileNotFoundError (a subclass of OSError) is raised when the file does not exist in read mode. Other I/O problems (e.g. permission denied, disk full) raise OSError or IOError (in Python 3, IOError is an alias for OSError).
Interview Questions
What is the difference between 'r', 'w', and 'a' modes?
'r' reads; the file must exist. 'w' writes and truncates the file; it creates the file if missing. 'a' appends; it creates the file if missing and writes at the end without truncating.
Why use with when opening files?
with ensures the file is closed when the block exits, whether normally or via an exception. Without it, an error before close() can leave the file open, which may cause resource leaks or “too many open files” errors.
What happens if you call read() twice without seek()?
The first read() advances the position to the end. The second read() returns an empty string because there is nothing left to read. Use seek(0) to rewind and read again.
When would you use for line in f: instead of readlines()?
Use iteration when the file is large. readlines() loads every line into memory. for line in f: reads one line at a time and keeps memory use low.
What does writelines() do? Does it add newlines?
writelines(iterable) writes each string from the iterable. It does not add newlines. Each string must end with \n if line breaks are wanted.
What is the 'x' mode for?
'x' creates the file only if it does not exist. If the file exists, open() raises FileExistsError. Use it to avoid accidentally overwriting an existing file.
What is the difference between text and binary mode?
Text mode ('t') works with strings and translates newlines. Binary mode ('b') works with bytes and does no translation. Use binary for images, executables, and any non-text data.
When does 'w' truncate the file?
As soon as open() is called with 'w'. The file is truncated before any write(). The previous content is lost at open time.
What does seek(0) do?
It moves the file position to the start. After reading the whole file, seek(0) allows reading again from the beginning.
Why specify encoding="utf-8" when opening files?
The default encoding is platform-specific. Specifying encoding="utf-8" makes behavior consistent across systems and avoids UnicodeDecodeError when the file uses UTF-8 and the system default is different.
Can you use seek() with whence=1 or whence=2 in text mode?
On some platforms, seek() with whence=1 (current position) or whence=2 (end) is not supported in text mode. For portable code, use seek(0) to rewind, or use binary mode for random access.
What happens if you read or write to a file after the with block exits?
The file is closed. Any read or write raises ValueError: I/O operation on closed file. The variable still exists, but the underlying file handle is closed.
How do you open multiple files in one with statement?
Use commas: with open("a.txt") as fa, open("b.txt", "w") as fb:. Both files are closed when the block exits.
What does Path.read_text() do? When would you use it?
Path.read_text() opens the file, reads the entire contents as a string, and closes it. Use it for short config or data files when a one-liner is convenient. For large files, use open() with iteration.
Why might the last line from readlines() or for line in f: not end with \n?
The last line of a file may not have a trailing newline. Both readlines() and iteration return it as-is. Code that assumes every line ends with \n can fail on the last line.
What does newline="" do when opening a file?
With newline="", no newline translation occurs: input is returned as-is, and output is written as-is. Use it for CSV or other formats where newlines inside quoted fields must not be translated. With the default newline=None, \r\n on disk becomes \n when read, and \n becomes \r\n when written on Windows.
What happens when you open a file that does not exist in read mode?
open("missing.txt", "r") raises FileNotFoundError. Use Path.exists() or try/except to handle missing files. In write or append mode, the file is created if it does not exist.