Read and write CSV files in Python | note.nkmk.me (2024)

In Python, the csv module allows you to read and write CSV files.

Despite its name "csv", the module can handle any text file separated by various character strings, not just CSV (Comma-Separated Value) files.

Contents

  • Read CSV files: csv.reader()
    • Basic usage
    • Delimiter: delimiter
    • Quotation marks: quoting, quotechar
    • Line breaks
    • Headers
    • Read CSV files as dictionaries: csv.DictReader()
  • Write CSV files: csv.writer()
    • Basic Usage
    • Append to an existing CSV file
    • Delimiter: delimiter
    • Quotation marks: quoting, quotechar
    • Line breaks
    • Headers
    • Write dictionaries: csv.DictWriter()
  • Read and write CSV files with NumPy
  • Read and write CSV files with pandas

This article doesn't cover all possible arguments for the functions in the csv module. For a comprehensive understanding of these, refer to the official documentation above.

As will be discussed later in this article, it is recommended to use libraries like NumPy or pandas for operations such as calculating averages or totals from data read from CSV files.

The sample code in this article uses the csv and pprint modules. Both are included in the standard library, so no additional installation is necessary.

  • Pretty-print with pprint in Python
import csvimport pprint

Read CSV files: csv.reader()

Use csv.reader() to read CSV files.

Basic usage

Consider the following file as an example. Here, it is simply read as a text file using the built-in open() function.

  • Read, write, and create files in Python (with and open())
with open('data/src/sample.csv') as f: print(f.read())# 11,12,13,14# 21,22,23,24# 31,32,33,34

Specify the file object, which is opened with open(), as the first argument of csv.reader(). A csv.reader object is an iterator that processes rows. For example, you can get data for each row as a list with a for loop.

with open('data/src/sample.csv') as f: reader = csv.reader(f) for row in reader: print(row)# ['11', '12', '13', '14']# ['21', '22', '23', '24']# ['31', '32', '33', '34']

If you want to get it as a two-dimensional array (list of lists), use a list comprehension.

  • List comprehensions in Python
with open('data/src/sample.csv') as f: reader = csv.reader(f) l = [row for row in reader]print(l)# [['11', '12', '13', '14'], ['21', '22', '23', '24'], ['31', '32', '33', '34']]

Get rows, columns, and elements

If you read a CSV file as a two-dimensional array (list of lists), you can get a row by specifying the row number starting from 0 with an index [].

print(l[1])# ['21', '22', '23', '24']

You can get an element by specifying the column number starting from 0 with an index [].

print(l[1][1])# 22

To extract a column, you need to transpose the array (i.e., switch rows and columns) and then use an index [].

  • Transpose 2D list in Python (swap rows and columns)
l_T = [list(x) for x in zip(*l)]print(l_T)# [['11', '21', '31'], ['12', '22', '32'], ['13', '23', '33'], ['14', '24', '34']]print(l_T[1])# ['12', '22', '32']

Note that rows and columns can be extracted more easily using libraries like NumPy or pandas, which will be discussed later.

Convert strings to numbers

By default, each element is a string (str).

print(l[0][0])# 11print(type(l[0][0]))# <class 'str'>

To convert a string to a number, use int() or float().

  • Convert a string to a number (int, float) in Python

To convert a list of strings to a list of numbers, you can use list comprehension. It is also possible to convert the entire two-dimensional array (list of lists) at once.

print(l[0])# ['11', '12', '13', '14']print([int(v) for v in l[0]])# [11, 12, 13, 14]print([[int(v) for v in row] for row in l])# [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]

By setting the quoting argument to csv.QUOTE_NONNUMERIC, elements not enclosed in quotation marks (by default, double quotes ") will be retrieved as floating point numbers (float). However, note that an error will occur if an element that cannot be converted to float is not enclosed in quotation marks.

with open('data/src/sample.csv') as f: reader = csv.reader(f, quoting=csv.QUOTE_NONNUMERIC) l_f = [row for row in reader]print(l_f)# [[11.0, 12.0, 13.0, 14.0], [21.0, 22.0, 23.0, 24.0], [31.0, 32.0, 33.0, 34.0]]print(l_f[0][0])# 11.0print(type(l_f[0][0]))# <class 'float'>

Delimiter: delimiter

By default, csv.reader() treats a comma , as a delimiter. You can specify any string as a delimiter with the delimiter argument.

Consider the following file separated by spaces as an example.

By setting delimiter=' ', elements are split at each space and correctly retrieved.

with open('data/src/sample.txt') as f: reader = csv.reader(f, delimiter=' ') l = [row for row in reader]print(l)# [['11', '12', '13', '14'], ['21', '22', '23', '24'], ['31', '32', '33', '34']]

For TSV (Tab-Separated Values) files, you should set delimiter='\t'.

Quotation marks: quoting, quotechar

In some CSV files, elements are enclosed within quotation marks, such as double quotes ".

Consider the following CSV file as an example.

with open('data/src/sample_quote.csv') as f: print(f.read())# 1,2,"3"# "a,b,c",x,y

By default, the quotation marks themselves are not included in the value of the element, and the delimiter within the part surrounded by quotation marks is ignored. In many situations, this behavior is considered appropriate.

with open('data/src/sample_quote.csv') as f: reader = csv.reader(f) for row in reader: print(row)# ['1', '2', '3']# ['a,b,c', 'x', 'y']

If you set the quoting argument to csv.QUOTE_NONE, no special processing will be done for quotation marks. The delimiter within the part surrounded by quotation marks is also treated as an element separator.

with open('data/src/sample_quote.csv') as f: reader = csv.reader(f, quoting=csv.QUOTE_NONE) for row in reader: print(row)# ['1', '2', '"3"']# ['"a', 'b', 'c"', 'x', 'y']

By default, the double quote " is treated as a quotation mark, but you can specify any value as a quotation mark with the quotechar argument. If you want to use a single quote ' as a quotation mark, you should set quotechar="'".

Line breaks

Consider the following CSV file example that includes line breaks within parts surrounded by quotation marks.

with open('data/src/sample_linebreak.csv') as f: print(f.read())# 1,2,"3"# "a# b",x,y

According to the official documentation, in environments where the newline code is \r\n (like Windows), it is safe to set the newline argument of open() to ''.

If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n line endings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.csv — CSV File Reading and Writing — Python 3.11.4 documentation

with open('data/src/sample_linebreak.csv', newline='') as f: reader = csv.reader(f) for row in reader: print(row)# ['1', '2', '3']# ['a\nb', 'x', 'y']

Headers

Consider the following CSV file that includes a header row and an index column.

with open('data/src/sample_header_index.csv') as f: print(f.read())# ,a,b,c,d# ONE,11,12,13,14# TWO,21,22,23,24# THREE,31,32,33,34

csv.reader() does not specially handle header rows or index columns. To extract only the data, you may need to use operations such as list comprehensions and slicing.

  • List comprehensions in Python
  • How to slice a list, string, tuple in Python
with open('data/src/sample_header_index.csv') as f: reader = csv.reader(f) l = [row for row in reader]pprint.pprint(l)# [['', 'a', 'b', 'c', 'd'],# ['ONE', '11', '12', '13', '14'],# ['TWO', '21', '22', '23', '24'],# ['THREE', '31', '32', '33', '34']]print([row[1:] for row in l[1:]])# [['11', '12', '13', '14'], ['21', '22', '23', '24'], ['31', '32', '33', '34']]print([[int(v) for v in row[1:]] for row in l[1:]])# [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]

For such files, it's easier to use pandas, which will be mentioned later.

Read CSV files as dictionaries: csv.DictReader()

While csv.reader() reads each line as a list, csv.DictReader() interprets each line as a dictionary.

Consider the following CSV file as an example.

with open('data/src/sample_header.csv') as f: print(f.read())# a,b,c,d# 11,12,13,14# 21,22,23,24# 31,32,33,34

By default, the values in the first row are used as the dictionary keys.

with open('data/src/sample_header.csv') as f: reader = csv.DictReader(f) l = [row for row in reader]pprint.pprint(l)# [{'a': '11', 'b': '12', 'c': '13', 'd': '14'},# {'a': '21', 'b': '22', 'c': '23', 'd': '24'},# {'a': '31', 'b': '32', 'c': '33', 'd': '34'}]print(l[1])# {'a': '21', 'b': '22', 'c': '23', 'd': '24'}print(l[1]['c'])# 23

If there is no header in the first row, you can specify the fieldnames argument.

with open('data/src/sample.csv') as f: print(f.read())# 11,12,13,14# 21,22,23,24# 31,32,33,34with open('data/src/sample.csv') as f: reader = csv.DictReader(f, fieldnames=['a', 'b', 'c', 'd']) for row in reader: print(row)# {'a': '11', 'b': '12', 'c': '13', 'd': '14'}# {'a': '21', 'b': '22', 'c': '23', 'd': '24'}# {'a': '31', 'b': '32', 'c': '33', 'd': '34'}

The index column is not specially processed. If you wish to exclude it, you can remove it using the pop() method, as demonstrated below.

  • Remove an item from a dictionary in Python (clear, pop, popitem, del)
with open('data/src/sample_header_index.csv') as f: print(f.read())# ,a,b,c,d# ONE,11,12,13,14# TWO,21,22,23,24# THREE,31,32,33,34with open('data/src/sample_header_index.csv') as f: reader = csv.DictReader(f) l = [row for row in reader]pprint.pprint(l, width=100)# [{'': 'ONE', 'a': '11', 'b': '12', 'c': '13', 'd': '14'},# {'': 'TWO', 'a': '21', 'b': '22', 'c': '23', 'd': '24'},# {'': 'THREE', 'a': '31', 'b': '32', 'c': '33', 'd': '34'}]print([od.pop('') for od in l])# ['ONE', 'TWO', 'THREE']pprint.pprint(l)# [{'a': '11', 'b': '12', 'c': '13', 'd': '14'},# {'a': '21', 'b': '22', 'c': '23', 'd': '24'},# {'a': '31', 'b': '32', 'c': '33', 'd': '34'}]

Write CSV files: csv.writer()

Use csv.writer() to write CSV files.

Basic Usage

Specify the file object, which is opened with open(), as the first argument of csv.writer().

To create a new file, pass the path to open() with the write mode 'w'. Be aware that if the write mode 'w' is specified for an existing file, it will overwrite the original content

  • Read, write, and create files in Python (with and open())

Use the writerow() method to write one line at a time. Specify a list as an argument.

with open('data/temp/sample_writer_row.csv', 'w') as f: writer = csv.writer(f) writer.writerow([0, 1, 2]) writer.writerow(['a', 'b', 'c'])with open('data/temp/sample_writer_row.csv') as f: print(f.read())# 0,1,2# a,b,c

The writerows() method allows you to write a two-dimensional array (list of lists) in a single operation.

l = [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]with open('data/temp/sample_writer.csv', 'w') as f: writer = csv.writer(f) writer.writerows(l)with open('data/temp/sample_writer.csv') as f: print(f.read())# 11,12,13,14# 21,22,23,24# 31,32,33,34

Append to an existing CSV file

To append to an existing CSV file, use open() with the append mode 'a'. Then, just like when creating a new file, use writerow() or writerows() to write contents. The contents will be appended at the end.

with open('data/temp/sample_writer_row.csv', 'a') as f: writer = csv.writer(f) writer.writerow(['X', 'Y', 'Z'])with open('data/temp/sample_writer_row.csv') as f: print(f.read())# 0,1,2# a,b,c# X,Y,Z

Delimiter: delimiter

As in the previous examples, the default output is a CSV (Comma-Separated Value) file. You can specify any delimiter with the delimiter argument.

To save as a TSV (Tab-Separated Values) file, you should set delimiter='\t'.

l = [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]with open('data/temp/sample_writer.tsv', 'w') as f: writer = csv.writer(f, delimiter='\t') writer.writerows(l)with open('data/temp/sample_writer.tsv') as f: print(f.read())# 11 12 13 14# 21 22 23 24# 31 32 33 34

Quotation marks: quoting, quotechar

By default, elements containing delimiters and other special characters are written enclosed in quotation marks.

l = [[0, 1, 2], ['a,b,c', 'x', 'y']]with open('data/temp/sample_writer_quote.csv', 'w') as f: writer = csv.writer(f) writer.writerows(l)with open('data/temp/sample_writer_quote.csv') as f: print(f.read())# 0,1,2# "a,b,c",x,y

You can manipulate the quotation marks using the quoting argument. The default is quoting=csv.QUOTE_MINIMAL, which encloses only elements that contain special characters like delimiters in quotation marks, as shown in the above result.

If you set quoting=csv.QUOTE_ALL, all elements will be enclosed in quotation marks.

with open('data/temp/sample_writer_quote_all.csv', 'w') as f: writer = csv.writer(f, quoting=csv.QUOTE_ALL) writer.writerows(l)with open('data/temp/sample_writer_quote_all.csv') as f: print(f.read())# "0","1","2"# "a,b,c","x","y"

If you set quoting=csv.QUOTE_NONNUMERIC, elements that are not numeric will be enclosed in quotation marks.

with open('data/temp/sample_writer_quote_nonnumeric.csv', 'w') as f: writer = csv.writer(f, quoting=csv.QUOTE_NONNUMERIC) writer.writerows(l)with open('data/temp/sample_writer_quote_nonnumeric.csv') as f: print(f.read())# 0,1,2# "a,b,c","x","y"

If you set quoting=csv.QUOTE_NONE, no elements will be enclosed in quotation marks. In this case, you need to specify the character to use for escaping with the escapechar argument.

with open('data/temp/sample_writer_quote_none.csv', 'w') as f: writer = csv.writer(f, quoting=csv.QUOTE_NONE, escapechar='\\') writer.writerows(l)with open('data/temp/sample_writer_quote_none.csv') as f: print(f.read())# 0,1,2# a\,b\,c,x,y

The default quotation mark is the double quotation mark ". This can be specified with the quotechar argument.

with open('data/temp/sample_writer_quote_char.csv', 'w') as f: writer = csv.writer(f, quotechar="'") writer.writerows(l)with open('data/temp/sample_writer_quote_char.csv') as f: print(f.read())# 0,1,2# 'a,b,c',x,y

Line breaks

If an element contains a line break, it is written enclosed in quotation marks.

l = [[0, 1, 2], ['a\nb', 'x', 'y']]with open('data/temp/sample_writer_linebreak.csv', 'w', newline='') as f: writer = csv.writer(f) writer.writerows(l)with open('data/temp/sample_writer_linebreak.csv') as f: print(f.read())# 0,1,2# "a# b",x,y

According to the official documentation, in environments where the newline code is \r\n (like Windows), it is safe to set the newline argument of open() to ''.

If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n line endings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.csv — CSV File Reading and Writing — Python 3.11.4 documentation

with open('data/src/sample_linebreak.csv', newline='') as f: reader = csv.reader(f) for row in reader: print(row)# ['1', '2', '3']# ['a\nb', 'x', 'y']

Headers

csv.writer() has no special feature for adding a header row or an index column.

To add a header row, you simply write it first with writerow(). For an index column, add an element to the start of each list being written.

  • Add an item to a list in Python (append, extend, insert)
l = [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]header = ['', 'a', 'b', 'c', 'd']index = ['ONE', 'TWO', 'THREE']with open('data/temp/sample_writer_header_index.csv', 'w') as f: writer = csv.writer(f) writer.writerow(header) for i, row in zip(index, l): writer.writerow([i] + row)with open('data/temp/sample_writer_header_index.csv') as f: print(f.read())# ,a,b,c,d# ONE,11,12,13,14# TWO,21,22,23,24# THREE,31,32,33,34

For such operations, it's easier to use pandas, which will be mentioned later.

Write dictionaries: csv.DictWriter()

While csv.writer() writes a list to each row, csv.DictWriter() writes a dictionary to each row.

Specify the list of dictionary keys you want to write for the second argument fieldnames of csv.DictWriter(). The writeheader() method writes the field names as a header.

Pass a dictionary to writerow() to write rows.

d1 = {'a': 1, 'b': 2, 'c': 3}d2 = {'a': 10, 'c': 30}with open('data/temp/sample_dictwriter.csv', 'w') as f: writer = csv.DictWriter(f, ['a', 'b', 'c']) writer.writeheader() writer.writerow(d1) writer.writerow(d2)with open('data/temp/sample_dictwriter.csv') as f: print(f.read())# a,b,c# 1,2,3# 10,,30

You can also write a list of dictionaries using writerows().

with open('data/temp/sample_dictwriter_list.csv', 'w') as f: writer = csv.DictWriter(f, ['a', 'b', 'c']) writer.writeheader() writer.writerows([d1, d2])with open('data/temp/sample_dictwriter_list.csv') as f: print(f.read())# a,b,c# 1,2,3# 10,,30

As shown in the example above, if a dictionary with no matching key is written, that element is skipped (i.e., nothing is written).

If you write a dictionary with keys not included in 'fieldnames', you can set the extrasaction argument to 'ignore' to avoid writing those key elements.

with open('data/temp/sample_dictwriter_ignore.csv', 'w') as f: writer = csv.DictWriter(f, ['a', 'c'], extrasaction='ignore') writer.writeheader() writer.writerows([d1, d2])with open('data/temp/sample_dictwriter_ignore.csv') as f: print(f.read())# a,c# 1,3# 10,30

The default is extrasaction='raise', which throws a ValueError.

Read and write CSV files with NumPy

The third-party library NumPy offers more flexible manipulation of multidimensional arrays. It's also possible to convert between NumPy arrays (numpy.ndarray) and Python built-in lists.

  • Convert numpy.ndarray and list to each other

Read a CSV file:

  • NumPy: Read and write CSV files (np.loadtxt, np.genfromtxt, np.savetxt)
import numpy as npwith open('data/src/sample.csv') as f: print(f.read())# 11,12,13,14# 21,22,23,24# 31,32,33,34a = np.loadtxt('data/src/sample.csv', delimiter=',')print(type(a))# <class 'numpy.ndarray'>print(a)# [[11. 12. 13. 14.]# [21. 22. 23. 24.]# [31. 32. 33. 34.]]

source: csv_numpy.py

Extract a specific range:

  • NumPy: Slicing ndarray
print(a[1:, :2])# [[21. 22.]# [31. 32.]]

source: csv_numpy.py

Calculate the average over all elements and column-wise sums:

  • NumPy: Sum, mean, max, min for entire array, column/row-wise
print(a.mean())# 22.5print(a.sum(axis=0))# [63. 66. 69. 72.]

source: csv_numpy.py

Read and write CSV files with pandas

The third-party library pandas allows more flexible manipulation of mixed numeric and string data. It's also possible to convert between pandas.DataFrame and Python built-in lists.

  • Convert pandas.DataFrame, Series and list to each other

With pandas, reading and writing CSV files is also easy. Refer to the following articles for details.

  • pandas: Read CSV into DataFrame with read_csv()
  • pandas: Write DataFrame to CSV with to_csv()

Read a CSV file:

import pandas as pdwith open('data/src/sample_pandas_normal.csv') as f: print(f.read())# name,age,state,point# Alice,24,NY,64# Bob,42,CA,92# Charlie,18,CA,70# Dave,68,TX,70# Ellen,24,CA,88# Frank,30,NY,57df = pd.read_csv('data/src/sample_pandas_normal.csv', index_col=0)print(type(df))# <class 'pandas.core.frame.DataFrame'>print(df)# age state point# name # Alice 24 NY 64# Bob 42 CA 92# Charlie 18 CA 70# Dave 68 TX 70# Ellen 24 CA 88# Frank 30 NY 57

Select columns or elements:

  • pandas: Select rows/columns by index (numbers and names)
  • pandas: Get/Set values with loc, iloc, at, iat
print(df['age'])# name# Alice 24# Bob 42# Charlie 18# Dave 68# Ellen 24# Frank 30# Name: age, dtype: int64print(df.at['Bob', 'age'])# 42

Extract rows by specifying conditions:

  • pandas: Query DataFrame and extract rows with query()
print(df.query('state == "NY"'))# age state point# name # Alice 24 NY 64# Frank 30 NY 57print(df.query('age > 30'))# age state point# name # Bob 42 CA 92# Dave 68 TX 70

Calculate summary statistics:

  • pandas: Get summary statistics for each column with describe()
print(df.describe())# age point# count 6.000000 6.000000# mean 34.333333 73.500000# std 18.392027 13.707662# min 18.000000 57.000000# 25% 24.000000 65.500000# 50% 27.000000 70.000000# 75% 39.000000 83.500000# max 68.000000 92.000000

Thus, pandas efficiently processes tabular data, especially those with mixed numerical and textual columns. For other articles related to pandas, see the link below:

  • List of pandas articles
Read and write CSV files in Python | note.nkmk.me (2024)
Top Articles
Latest Posts
Article information

Author: Kareem Mueller DO

Last Updated:

Views: 6126

Rating: 4.6 / 5 (66 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Kareem Mueller DO

Birthday: 1997-01-04

Address: Apt. 156 12935 Runolfsdottir Mission, Greenfort, MN 74384-6749

Phone: +16704982844747

Job: Corporate Administration Planner

Hobby: Mountain biking, Jewelry making, Stone skipping, Lacemaking, Knife making, Scrapbooking, Letterboxing

Introduction: My name is Kareem Mueller DO, I am a vivacious, super, thoughtful, excited, handsome, beautiful, combative person who loves writing and wants to share my knowledge and understanding with you.