to_csv writes wrong with NaN value #18676

jackasser · 2017-12-07T08:09:28Z

to_csv with Nan value at top row, unexpected "" in the csv file

import pandas as pd
df = pd.DataFrame([None,1,2])
df.to_csv("df.csv",header=None,index=None,encoding ='utf-8')

# df.csv
#""
#1.0
#2.0

# I want to make df.csv
#
#1.0
#2.0

Versions:

pandas.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 36.4.0
Cython: None
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2017-12-08T02:26:44Z

this does look like a bug.

@gfyoung ?

gfyoung · 2017-12-08T02:32:54Z

Agreed. I can also replicate this on 0.21.0 (@jackasser : I noticed that you were on 0.20.3). Investigation and PR are welcome!

Licht-T · 2017-12-08T05:55:57Z

I am working on this. This seems not the encoding issue.

df.to_csv("df.csv",header=None,index=None)

[pandas] cat df.csv                                                                                                       14:54:23  ☁  add-tuple-list-factorize-test ☂ ✭
""
1.0
2.0

Licht-T · 2017-12-08T06:29:48Z

Also, this returns the right result.

df = pd.DataFrame([1,None,1,2])

Licht-T · 2017-12-08T08:38:25Z

@gfyoung Seems that the bug or the default spec. in csv library.

Case 1

Input:

import csv
fp = open('test.csv', 'w')
w = csv.writer(fp, dialect=csv.excel)
w.writerow(['1'])
w.writerow([''])
fp.close()

Output:

Case 2

Input:

import csv
fp = open('test.csv', 'w')
w = csv.writer(fp, dialect=csv.excel)
w.writerow([''])
w.writerow(['1'])
fp.close()

Output:

""
1

Licht-T · 2017-12-08T08:48:10Z

But this works.

import csv
fp = open('test.csv', 'w')
w = csv.writer(fp, dialect=None)
w.writerow(['', '1'])
w.writerow(['3', '2'])
fp.close()

,1
3,2

Licht-T · 2017-12-08T08:59:18Z

The csv library in CPython says the single empty field is quoted.
https://github.com/python/cpython/blob/0b3ec192259a65971001ce8f0de85a9c1e71d9c7/Modules/_csv.c#L1244

I don't know why this is needed.

gfyoung · 2017-12-08T09:16:05Z

@Licht-T : Ah! That's very good to know. Okay, this means that this issue is out of the control of the pandas library since we do wrap Python csv library when doing this.

@jackasser : Looks like you may have hit upon a point of contention in Python's CSV library. I would raise this issue in their library by submitting an issue on their python.org website.

gfyoung · 2017-12-08T09:16:21Z

Closing because this is out of pandas control. @Licht-T : Thanks for the investigation!

jreback · 2017-12-08T10:18:00Z

@gfyoung this can be fixed on pandas side
just because the csv library doesn’t do this correctly
we shouldn’t be inconsistent

doesn’t passing dialect=None work

gfyoung · 2017-12-08T10:47:09Z

@jreback : to_csv uses Python's csv.writer class to write to CSV. The dialect parameter goes straight to Python's csv writer, and that AFAICT has no impact on the output in the example @Licht-T provided.

Here's the code where we initialize our writer:

https://github.com/pandas-dev/pandas/blob/master/pandas/io/formats/format.py#L1644-L1656

UnicodeWriter utilizes the csv.writer class behind the scenes. Thus, we are always using Python's csv library to get the job done, meaning our CSV writing is vulnerable to any issues in Python's CSV module for the time being.

gfyoung · 2017-12-08T10:55:36Z

I suppose we could hack our away around this by checking for an empty first row before writing it to CSV and replace it with a space for example, though again as I said, very hackish IMO.

jreback · 2017-12-08T11:02:44Z

try passing dialect=None when we pass to the csvwriter

Licht-T · 2017-12-08T11:07:29Z

@jreback I already tried, but the result is same.

import csv
fp = open('test.csv', 'w')
w = csv.writer(fp, dialect=None)
w.writerow([''])
w.writerow(['1'])
fp.close()

""
1

Licht-T · 2017-12-08T11:11:15Z

The only parameter that makes some impact is quoting.

import csv
fp = open('test.csv', 'w')
w = csv.writer(fp, dialect=None, quoting=csv.QUOTE_NONE)
w.writerow([''])
w.writerow(['1'])
fp.close()

---------------------------------------------------------------------------
Error                                     Traceback (most recent call last)
<ipython-input-56-97051671206d> in <module>()
      2 fp = open('test.csv', 'w')
      3 w = csv.writer(fp, dialect=csv.excel,quoting=csv.QUOTE_NONE)
----> 4 w.writerow([''])
      5 w.writerow(['1'])
      6 fp.close()

Error: single empty field record must be quoted

Licht-T · 2017-12-08T11:24:23Z

Well..., this csv library behavior is very inconsistent and should be fixed in CPython.
Or, is there any reason to do such behavior...?
https://github.com/python/cpython/blame/0b3ec192259a65971001ce8f0de85a9c1e71d9c7/Modules/_csv.c#L1244

jreback · 2017-12-08T11:34:20Z

@Licht-T certainly can file a bug report there.

ok I guess no easy way to fix this here, however, maybe we should add a small note in the code about this?

jackasser · 2017-12-08T14:57:01Z

@jreback @Licht-T @gfyoung thank you to considering about it!
I understand that this problem should solve in Python's CSV library.
But I think we should add a small note in the code.

Licht-T · 2017-12-08T15:08:12Z

@jreback Created the issue on CPython.
https://bugs.python.org/issue32255

I'll add the small note on pandas.
Where should I add the note?

jackasser · 2017-12-09T01:30:34Z

@Licht-T thank you for creating issue!

pandas/pandas/core/frame.py

Line 1522 in ba3a442

↑ I think here.
↓ and this page
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html?highlight=to_csv#pandas.DataFrame.to_csv

Licht-T · 2017-12-12T23:47:02Z

@jackasser @jreback @gfyoung Actually, the double quoted blank field is the default spec. when writing single column CSV. IOW, this issue is the correct behavior, but the "Case 1" in #18676 (comment) was wrong behavior. This is now fixed in CPython and the patch is backported to CPython 3.6.
python/cpython#4769

Please note that this bug does not exist in CPython 2.7.

jreback · 2017-12-13T14:11:18Z

@Licht-T ok can you add a test for >= 3.6 only, and xfail it for now (as not sure which release its on, though maybe its actually out?).

Licht-T · 2017-12-14T00:19:38Z

@jreback Okay! (That fix is not released yet, will be included in the next release of CPython 3.6.)

Closes gh-18676

…9091) Closes pandas-devgh-18676

jreback added the IO CSV read_csv, to_csv label Dec 8, 2017

gfyoung added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Dec 8, 2017

gfyoung closed this as completed Dec 8, 2017

gfyoung removed the Bug label Dec 8, 2017

gfyoung added this to the No action milestone Dec 8, 2017

jreback reopened this Dec 8, 2017

jreback removed this from the No action milestone Dec 8, 2017

Licht-T mentioned this issue Dec 9, 2017

bpo-32255: Fix inconsistent behavior when csv.writer writes None python/cpython#4769

Merged

jreback added this to the 0.22.0 milestone Dec 13, 2017

Licht-T mentioned this issue Jan 5, 2018

TST: Add to_csv test when writing the single column CSV #19091

Merged

4 tasks

gfyoung added the Testing pandas testing functions or related to the test suite label Jan 5, 2018

gfyoung closed this as completed in #19091 Feb 11, 2018

gfyoung pushed a commit that referenced this issue Feb 11, 2018

TST: Add to_csv test when writing the single column CSV (#19091)

b9d8b26

Closes gh-18676

harisbal pushed a commit to harisbal/pandas that referenced this issue Feb 28, 2018

TST: Add to_csv test when writing the single column CSV (pandas-dev#1…

c416fea

…9091) Closes pandas-devgh-18676

ahawryluk mentioned this issue Feb 24, 2021

Add back skip_blank_lines to read_excel in pandas v>1.1.4 #39808

Closed

asishm mentioned this issue Jun 26, 2024

BUG: None becomes empty string when writing multiple columns to CSV, but double quotes "" when writing single columns #59116

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to_csv writes wrong with NaN value #18676

to_csv writes wrong with NaN value #18676

jackasser commented Dec 7, 2017

jreback commented Dec 8, 2017

gfyoung commented Dec 8, 2017

Licht-T commented Dec 8, 2017 •

edited

Loading

Licht-T commented Dec 8, 2017

Licht-T commented Dec 8, 2017 •

edited

Loading

Licht-T commented Dec 8, 2017

Licht-T commented Dec 8, 2017 •

edited

Loading

gfyoung commented Dec 8, 2017

gfyoung commented Dec 8, 2017 •

edited

Loading

jreback commented Dec 8, 2017

gfyoung commented Dec 8, 2017 •

edited

Loading

gfyoung commented Dec 8, 2017

jreback commented Dec 8, 2017

Licht-T commented Dec 8, 2017

Licht-T commented Dec 8, 2017 •

edited

Loading

Licht-T commented Dec 8, 2017 •

edited

Loading

jreback commented Dec 8, 2017

jackasser commented Dec 8, 2017

Licht-T commented Dec 8, 2017

jackasser commented Dec 9, 2017

Licht-T commented Dec 12, 2017 •

edited

Loading

jreback commented Dec 13, 2017

Licht-T commented Dec 14, 2017 •

edited

Loading

to_csv writes wrong with NaN value #18676

to_csv writes wrong with NaN value #18676

Comments

jackasser commented Dec 7, 2017

to_csv with Nan value at top row, unexpected "" in the csv file

Versions:

jreback commented Dec 8, 2017

gfyoung commented Dec 8, 2017

Licht-T commented Dec 8, 2017 • edited Loading

Licht-T commented Dec 8, 2017

Licht-T commented Dec 8, 2017 • edited Loading

Case 1

Case 2

Licht-T commented Dec 8, 2017

Licht-T commented Dec 8, 2017 • edited Loading

gfyoung commented Dec 8, 2017

gfyoung commented Dec 8, 2017 • edited Loading

jreback commented Dec 8, 2017

gfyoung commented Dec 8, 2017 • edited Loading

gfyoung commented Dec 8, 2017

jreback commented Dec 8, 2017

Licht-T commented Dec 8, 2017

Licht-T commented Dec 8, 2017 • edited Loading

Licht-T commented Dec 8, 2017 • edited Loading

jreback commented Dec 8, 2017

jackasser commented Dec 8, 2017

Licht-T commented Dec 8, 2017

jackasser commented Dec 9, 2017

Licht-T commented Dec 12, 2017 • edited Loading

jreback commented Dec 13, 2017

Licht-T commented Dec 14, 2017 • edited Loading

Licht-T commented Dec 8, 2017 •

edited

Loading

Licht-T commented Dec 8, 2017 •

edited

Loading

Licht-T commented Dec 8, 2017 •

edited

Loading

gfyoung commented Dec 8, 2017 •

edited

Loading

gfyoung commented Dec 8, 2017 •

edited

Loading

Licht-T commented Dec 8, 2017 •

edited

Loading

Licht-T commented Dec 8, 2017 •

edited

Loading

Licht-T commented Dec 12, 2017 •

edited

Loading

Licht-T commented Dec 14, 2017 •

edited

Loading