In Python, the csv
module allows you to read and write CSV files.
Despite its name "csv", the module can handle any text file separated by various character strings, not just CSV (Comma-Separated Value) files.
Contents
- Read CSV files: csv.reader()
- Basic usage
- Delimiter: delimiter
- Quotation marks: quoting, quotechar
- Line breaks
- Headers
- Read CSV files as dictionaries: csv.DictReader()
- Write CSV files: csv.writer()
- Basic Usage
- Append to an existing CSV file
- Delimiter: delimiter
- Quotation marks: quoting, quotechar
- Line breaks
- Headers
- Write dictionaries: csv.DictWriter()
- Read and write CSV files with NumPy
- Read and write CSV files with pandas
This article doesn't cover all possible arguments for the functions in the csv
module. For a comprehensive understanding of these, refer to the official documentation above.
As will be discussed later in this article, it is recommended to use libraries like NumPy or pandas for operations such as calculating averages or totals from data read from CSV files.
The sample code in this article uses the csv and pprint modules. Both are included in the standard library, so no additional installation is necessary.
- Pretty-print with pprint in Python
import csvimport pprint
source: csv_reader.py
Read CSV files: csv.reader()
Use csv.reader()
to read CSV files.
Basic usage
Consider the following file as an example. Here, it is simply read as a text file using the built-in open()
function.
- Read, write, and create files in Python (with and open())
with open('data/src/sample.csv') as f: print(f.read())# 11,12,13,14# 21,22,23,24# 31,32,33,34
source: csv_reader.py
Specify the file object, which is opened with open()
, as the first argument of csv.reader()
. A csv.reader
object is an iterator that processes rows. For example, you can get data for each row as a list with a for
loop.
with open('data/src/sample.csv') as f: reader = csv.reader(f) for row in reader: print(row)# ['11', '12', '13', '14']# ['21', '22', '23', '24']# ['31', '32', '33', '34']
source: csv_reader.py
If you want to get it as a two-dimensional array (list of lists), use a list comprehension.
- List comprehensions in Python
with open('data/src/sample.csv') as f: reader = csv.reader(f) l = [row for row in reader]print(l)# [['11', '12', '13', '14'], ['21', '22', '23', '24'], ['31', '32', '33', '34']]
source: csv_reader.py
Get rows, columns, and elements
If you read a CSV file as a two-dimensional array (list of lists), you can get a row by specifying the row number starting from 0
with an index []
.
print(l[1])# ['21', '22', '23', '24']
source: csv_reader.py
You can get an element by specifying the column number starting from 0
with an index []
.
print(l[1][1])# 22
source: csv_reader.py
To extract a column, you need to transpose the array (i.e., switch rows and columns) and then use an index []
.
- Transpose 2D list in Python (swap rows and columns)
l_T = [list(x) for x in zip(*l)]print(l_T)# [['11', '21', '31'], ['12', '22', '32'], ['13', '23', '33'], ['14', '24', '34']]print(l_T[1])# ['12', '22', '32']
source: csv_reader.py
Note that rows and columns can be extracted more easily using libraries like NumPy or pandas, which will be discussed later.
Convert strings to numbers
By default, each element is a string (str
).
print(l[0][0])# 11print(type(l[0][0]))# <class 'str'>
source: csv_reader.py
To convert a string to a number, use int()
or float()
.
- Convert a string to a number (int, float) in Python
To convert a list of strings to a list of numbers, you can use list comprehension. It is also possible to convert the entire two-dimensional array (list of lists) at once.
print(l[0])# ['11', '12', '13', '14']print([int(v) for v in l[0]])# [11, 12, 13, 14]print([[int(v) for v in row] for row in l])# [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]
source: csv_reader.py
By setting the quoting
argument to csv.QUOTE_NONNUMERIC
, elements not enclosed in quotation marks (by default, double quotes "
) will be retrieved as floating point numbers (float
). However, note that an error will occur if an element that cannot be converted to float
is not enclosed in quotation marks.
with open('data/src/sample.csv') as f: reader = csv.reader(f, quoting=csv.QUOTE_NONNUMERIC) l_f = [row for row in reader]print(l_f)# [[11.0, 12.0, 13.0, 14.0], [21.0, 22.0, 23.0, 24.0], [31.0, 32.0, 33.0, 34.0]]print(l_f[0][0])# 11.0print(type(l_f[0][0]))# <class 'float'>
source: csv_reader.py
Delimiter: delimiter
By default, csv.reader()
treats a comma ,
as a delimiter. You can specify any string as a delimiter with the delimiter
argument.
Consider the following file separated by spaces as an example.
with open('data/src/sample.txt') as f: print(f.read())# 11 12 13 14# 21 22 23 24# 31 32 33 34
source: csv_reader.py
By setting delimiter=' '
, elements are split at each space and correctly retrieved.
with open('data/src/sample.txt') as f: reader = csv.reader(f, delimiter=' ') l = [row for row in reader]print(l)# [['11', '12', '13', '14'], ['21', '22', '23', '24'], ['31', '32', '33', '34']]
source: csv_reader.py
For TSV (Tab-Separated Values) files, you should set delimiter='\t'
.
Quotation marks: quoting
, quotechar
In some CSV files, elements are enclosed within quotation marks, such as double quotes "
.
Consider the following CSV file as an example.
with open('data/src/sample_quote.csv') as f: print(f.read())# 1,2,"3"# "a,b,c",x,y
source: csv_reader.py
By default, the quotation marks themselves are not included in the value of the element, and the delimiter within the part surrounded by quotation marks is ignored. In many situations, this behavior is considered appropriate.
with open('data/src/sample_quote.csv') as f: reader = csv.reader(f) for row in reader: print(row)# ['1', '2', '3']# ['a,b,c', 'x', 'y']
source: csv_reader.py
If you set the quoting
argument to csv.QUOTE_NONE
, no special processing will be done for quotation marks. The delimiter within the part surrounded by quotation marks is also treated as an element separator.
with open('data/src/sample_quote.csv') as f: reader = csv.reader(f, quoting=csv.QUOTE_NONE) for row in reader: print(row)# ['1', '2', '"3"']# ['"a', 'b', 'c"', 'x', 'y']
source: csv_reader.py
By default, the double quote "
is treated as a quotation mark, but you can specify any value as a quotation mark with the quotechar
argument. If you want to use a single quote '
as a quotation mark, you should set quotechar="'"
.
Line breaks
Consider the following CSV file example that includes line breaks within parts surrounded by quotation marks.
with open('data/src/sample_linebreak.csv') as f: print(f.read())# 1,2,"3"# "a# b",x,y
source: csv_reader.py
According to the official documentation, in environments where the newline code is \r\n
(like Windows), it is safe to set the newline
argument of open()
to ''
.
If
newline=''
is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use\r\n
line endings on write an extra\r
will be added. It should always be safe to specifynewline=''
, since the csv module does its own (universal) newline handling.csv — CSV File Reading and Writing — Python 3.11.4 documentation
with open('data/src/sample_linebreak.csv', newline='') as f: reader = csv.reader(f) for row in reader: print(row)# ['1', '2', '3']# ['a\nb', 'x', 'y']
source: csv_reader.py
Headers
Consider the following CSV file that includes a header row and an index column.
with open('data/src/sample_header_index.csv') as f: print(f.read())# ,a,b,c,d# ONE,11,12,13,14# TWO,21,22,23,24# THREE,31,32,33,34
source: csv_reader.py
csv.reader()
does not specially handle header rows or index columns. To extract only the data, you may need to use operations such as list comprehensions and slicing.
- List comprehensions in Python
- How to slice a list, string, tuple in Python
with open('data/src/sample_header_index.csv') as f: reader = csv.reader(f) l = [row for row in reader]pprint.pprint(l)# [['', 'a', 'b', 'c', 'd'],# ['ONE', '11', '12', '13', '14'],# ['TWO', '21', '22', '23', '24'],# ['THREE', '31', '32', '33', '34']]print([row[1:] for row in l[1:]])# [['11', '12', '13', '14'], ['21', '22', '23', '24'], ['31', '32', '33', '34']]print([[int(v) for v in row[1:]] for row in l[1:]])# [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]
source: csv_reader.py
For such files, it's easier to use pandas, which will be mentioned later.
Read CSV files as dictionaries: csv.DictReader()
While csv.reader()
reads each line as a list, csv.DictReader()
interprets each line as a dictionary.
Consider the following CSV file as an example.
with open('data/src/sample_header.csv') as f: print(f.read())# a,b,c,d# 11,12,13,14# 21,22,23,24# 31,32,33,34
source: csv_dictreader.py
By default, the values in the first row are used as the dictionary keys.
with open('data/src/sample_header.csv') as f: reader = csv.DictReader(f) l = [row for row in reader]pprint.pprint(l)# [{'a': '11', 'b': '12', 'c': '13', 'd': '14'},# {'a': '21', 'b': '22', 'c': '23', 'd': '24'},# {'a': '31', 'b': '32', 'c': '33', 'd': '34'}]print(l[1])# {'a': '21', 'b': '22', 'c': '23', 'd': '24'}print(l[1]['c'])# 23
source: csv_dictreader.py
If there is no header in the first row, you can specify the fieldnames
argument.
with open('data/src/sample.csv') as f: print(f.read())# 11,12,13,14# 21,22,23,24# 31,32,33,34with open('data/src/sample.csv') as f: reader = csv.DictReader(f, fieldnames=['a', 'b', 'c', 'd']) for row in reader: print(row)# {'a': '11', 'b': '12', 'c': '13', 'd': '14'}# {'a': '21', 'b': '22', 'c': '23', 'd': '24'}# {'a': '31', 'b': '32', 'c': '33', 'd': '34'}
source: csv_dictreader.py
The index column is not specially processed. If you wish to exclude it, you can remove it using the pop()
method, as demonstrated below.
- Remove an item from a dictionary in Python (clear, pop, popitem, del)
with open('data/src/sample_header_index.csv') as f: print(f.read())# ,a,b,c,d# ONE,11,12,13,14# TWO,21,22,23,24# THREE,31,32,33,34with open('data/src/sample_header_index.csv') as f: reader = csv.DictReader(f) l = [row for row in reader]pprint.pprint(l, width=100)# [{'': 'ONE', 'a': '11', 'b': '12', 'c': '13', 'd': '14'},# {'': 'TWO', 'a': '21', 'b': '22', 'c': '23', 'd': '24'},# {'': 'THREE', 'a': '31', 'b': '32', 'c': '33', 'd': '34'}]print([od.pop('') for od in l])# ['ONE', 'TWO', 'THREE']pprint.pprint(l)# [{'a': '11', 'b': '12', 'c': '13', 'd': '14'},# {'a': '21', 'b': '22', 'c': '23', 'd': '24'},# {'a': '31', 'b': '32', 'c': '33', 'd': '34'}]
source: csv_dictreader.py
Write CSV files: csv.writer()
Use csv.writer()
to write CSV files.
Basic Usage
Specify the file object, which is opened with open()
, as the first argument of csv.writer()
.
To create a new file, pass the path to open()
with the write mode 'w'
. Be aware that if the write mode 'w'
is specified for an existing file, it will overwrite the original content
- Read, write, and create files in Python (with and open())
Use the writerow()
method to write one line at a time. Specify a list as an argument.
with open('data/temp/sample_writer_row.csv', 'w') as f: writer = csv.writer(f) writer.writerow([0, 1, 2]) writer.writerow(['a', 'b', 'c'])with open('data/temp/sample_writer_row.csv') as f: print(f.read())# 0,1,2# a,b,c
source: csv_writer.py
The writerows()
method allows you to write a two-dimensional array (list of lists) in a single operation.
l = [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]with open('data/temp/sample_writer.csv', 'w') as f: writer = csv.writer(f) writer.writerows(l)with open('data/temp/sample_writer.csv') as f: print(f.read())# 11,12,13,14# 21,22,23,24# 31,32,33,34
source: csv_writer.py
Append to an existing CSV file
To append to an existing CSV file, use open()
with the append mode 'a'
. Then, just like when creating a new file, use writerow()
or writerows()
to write contents. The contents will be appended at the end.
with open('data/temp/sample_writer_row.csv', 'a') as f: writer = csv.writer(f) writer.writerow(['X', 'Y', 'Z'])with open('data/temp/sample_writer_row.csv') as f: print(f.read())# 0,1,2# a,b,c# X,Y,Z
source: csv_writer.py
Delimiter: delimiter
As in the previous examples, the default output is a CSV (Comma-Separated Value) file. You can specify any delimiter with the delimiter
argument.
To save as a TSV (Tab-Separated Values) file, you should set delimiter='\t'
.
l = [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]with open('data/temp/sample_writer.tsv', 'w') as f: writer = csv.writer(f, delimiter='\t') writer.writerows(l)with open('data/temp/sample_writer.tsv') as f: print(f.read())# 11 12 13 14# 21 22 23 24# 31 32 33 34
source: csv_writer.py
Quotation marks: quoting
, quotechar
By default, elements containing delimiters and other special characters are written enclosed in quotation marks.
l = [[0, 1, 2], ['a,b,c', 'x', 'y']]with open('data/temp/sample_writer_quote.csv', 'w') as f: writer = csv.writer(f) writer.writerows(l)with open('data/temp/sample_writer_quote.csv') as f: print(f.read())# 0,1,2# "a,b,c",x,y
source: csv_writer.py
You can manipulate the quotation marks using the quoting
argument. The default is quoting=csv.QUOTE_MINIMAL
, which encloses only elements that contain special characters like delimiters in quotation marks, as shown in the above result.
If you set quoting=csv.QUOTE_ALL
, all elements will be enclosed in quotation marks.
with open('data/temp/sample_writer_quote_all.csv', 'w') as f: writer = csv.writer(f, quoting=csv.QUOTE_ALL) writer.writerows(l)with open('data/temp/sample_writer_quote_all.csv') as f: print(f.read())# "0","1","2"# "a,b,c","x","y"
source: csv_writer.py
If you set quoting=csv.QUOTE_NONNUMERIC
, elements that are not numeric will be enclosed in quotation marks.
with open('data/temp/sample_writer_quote_nonnumeric.csv', 'w') as f: writer = csv.writer(f, quoting=csv.QUOTE_NONNUMERIC) writer.writerows(l)with open('data/temp/sample_writer_quote_nonnumeric.csv') as f: print(f.read())# 0,1,2# "a,b,c","x","y"
source: csv_writer.py
If you set quoting=csv.QUOTE_NONE
, no elements will be enclosed in quotation marks. In this case, you need to specify the character to use for escaping with the escapechar
argument.
with open('data/temp/sample_writer_quote_none.csv', 'w') as f: writer = csv.writer(f, quoting=csv.QUOTE_NONE, escapechar='\\') writer.writerows(l)with open('data/temp/sample_writer_quote_none.csv') as f: print(f.read())# 0,1,2# a\,b\,c,x,y
source: csv_writer.py
The default quotation mark is the double quotation mark "
. This can be specified with the quotechar
argument.
with open('data/temp/sample_writer_quote_char.csv', 'w') as f: writer = csv.writer(f, quotechar="'") writer.writerows(l)with open('data/temp/sample_writer_quote_char.csv') as f: print(f.read())# 0,1,2# 'a,b,c',x,y
source: csv_writer.py
Line breaks
If an element contains a line break, it is written enclosed in quotation marks.
l = [[0, 1, 2], ['a\nb', 'x', 'y']]with open('data/temp/sample_writer_linebreak.csv', 'w', newline='') as f: writer = csv.writer(f) writer.writerows(l)with open('data/temp/sample_writer_linebreak.csv') as f: print(f.read())# 0,1,2# "a# b",x,y
source: csv_writer.py
According to the official documentation, in environments where the newline code is \r\n
(like Windows), it is safe to set the newline
argument of open()
to ''
.
If
newline=''
is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use\r\n
line endings on write an extra\r
will be added. It should always be safe to specifynewline=''
, since the csv module does its own (universal) newline handling.csv — CSV File Reading and Writing — Python 3.11.4 documentation
with open('data/src/sample_linebreak.csv', newline='') as f: reader = csv.reader(f) for row in reader: print(row)# ['1', '2', '3']# ['a\nb', 'x', 'y']
source: csv_reader.py
Headers
csv.writer()
has no special feature for adding a header row or an index column.
To add a header row, you simply write it first with writerow()
. For an index column, add an element to the start of each list being written.
- Add an item to a list in Python (append, extend, insert)
l = [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]header = ['', 'a', 'b', 'c', 'd']index = ['ONE', 'TWO', 'THREE']with open('data/temp/sample_writer_header_index.csv', 'w') as f: writer = csv.writer(f) writer.writerow(header) for i, row in zip(index, l): writer.writerow([i] + row)with open('data/temp/sample_writer_header_index.csv') as f: print(f.read())# ,a,b,c,d# ONE,11,12,13,14# TWO,21,22,23,24# THREE,31,32,33,34
source: csv_writer.py
For such operations, it's easier to use pandas, which will be mentioned later.
Write dictionaries: csv.DictWriter()
While csv.writer()
writes a list to each row, csv.DictWriter()
writes a dictionary to each row.
Specify the list of dictionary keys you want to write for the second argument fieldnames
of csv.DictWriter()
. The writeheader()
method writes the field names as a header.
Pass a dictionary to writerow()
to write rows.
d1 = {'a': 1, 'b': 2, 'c': 3}d2 = {'a': 10, 'c': 30}with open('data/temp/sample_dictwriter.csv', 'w') as f: writer = csv.DictWriter(f, ['a', 'b', 'c']) writer.writeheader() writer.writerow(d1) writer.writerow(d2)with open('data/temp/sample_dictwriter.csv') as f: print(f.read())# a,b,c# 1,2,3# 10,,30
source: csv_dictwriter.py
You can also write a list of dictionaries using writerows()
.
with open('data/temp/sample_dictwriter_list.csv', 'w') as f: writer = csv.DictWriter(f, ['a', 'b', 'c']) writer.writeheader() writer.writerows([d1, d2])with open('data/temp/sample_dictwriter_list.csv') as f: print(f.read())# a,b,c# 1,2,3# 10,,30
source: csv_dictwriter.py
As shown in the example above, if a dictionary with no matching key is written, that element is skipped (i.e., nothing is written).
If you write a dictionary with keys not included in 'fieldnames'
, you can set the extrasaction
argument to 'ignore'
to avoid writing those key elements.
with open('data/temp/sample_dictwriter_ignore.csv', 'w') as f: writer = csv.DictWriter(f, ['a', 'c'], extrasaction='ignore') writer.writeheader() writer.writerows([d1, d2])with open('data/temp/sample_dictwriter_ignore.csv') as f: print(f.read())# a,c# 1,3# 10,30
source: csv_dictwriter.py
The default is extrasaction='raise'
, which throws a ValueError
.
Read and write CSV files with NumPy
The third-party library NumPy offers more flexible manipulation of multidimensional arrays. It's also possible to convert between NumPy arrays (numpy.ndarray
) and Python built-in lists.
- Convert numpy.ndarray and list to each other
Read a CSV file:
- NumPy: Read and write CSV files (np.loadtxt, np.genfromtxt, np.savetxt)
import numpy as npwith open('data/src/sample.csv') as f: print(f.read())# 11,12,13,14# 21,22,23,24# 31,32,33,34a = np.loadtxt('data/src/sample.csv', delimiter=',')print(type(a))# <class 'numpy.ndarray'>print(a)# [[11. 12. 13. 14.]# [21. 22. 23. 24.]# [31. 32. 33. 34.]]
source: csv_numpy.py
Extract a specific range:
- NumPy: Slicing ndarray
print(a[1:, :2])# [[21. 22.]# [31. 32.]]
source: csv_numpy.py
Calculate the average over all elements and column-wise sums:
- NumPy: Sum, mean, max, min for entire array, column/row-wise
print(a.mean())# 22.5print(a.sum(axis=0))# [63. 66. 69. 72.]
source: csv_numpy.py
Read and write CSV files with pandas
The third-party library pandas allows more flexible manipulation of mixed numeric and string data. It's also possible to convert between pandas.DataFrame
and Python built-in lists.
- Convert pandas.DataFrame, Series and list to each other
With pandas, reading and writing CSV files is also easy. Refer to the following articles for details.
- pandas: Read CSV into DataFrame with read_csv()
- pandas: Write DataFrame to CSV with to_csv()
Read a CSV file:
import pandas as pdwith open('data/src/sample_pandas_normal.csv') as f: print(f.read())# name,age,state,point# Alice,24,NY,64# Bob,42,CA,92# Charlie,18,CA,70# Dave,68,TX,70# Ellen,24,CA,88# Frank,30,NY,57df = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)print(type(df))# <class 'pandas.core.frame.DataFrame'>print(df)# age state point# name # Alice 24 NY 64# Bob 42 CA 92# Charlie 18 CA 70# Dave 68 TX 70# Ellen 24 CA 88# Frank 30 NY 57
source: csv_pandas.py
Select columns or elements:
- pandas: Select rows/columns by index (numbers and names)
- pandas: Get/Set values with loc, iloc, at, iat
print(df['age'])# name# Alice 24# Bob 42# Charlie 18# Dave 68# Ellen 24# Frank 30# Name: age, dtype: int64print(df.at['Bob', 'age'])# 42
source: csv_pandas.py
Extract rows by specifying conditions:
- pandas: Query DataFrame and extract rows with query()
print(df.query('state == "NY"'))# age state point# name # Alice 24 NY 64# Frank 30 NY 57print(df.query('age > 30'))# age state point# name # Bob 42 CA 92# Dave 68 TX 70
source: csv_pandas.py
Calculate summary statistics:
- pandas: Get summary statistics for each column with describe()
print(df.describe())# age point# count 6.000000 6.000000# mean 34.333333 73.500000# std 18.392027 13.707662# min 18.000000 57.000000# 25% 24.000000 65.500000# 50% 27.000000 70.000000# 75% 39.000000 83.500000# max 68.000000 92.000000
source: csv_pandas.py
Thus, pandas efficiently processes tabular data, especially those with mixed numerical and textual columns. For other articles related to pandas, see the link below:
- List of pandas articles