Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

CSV is hell. Some idiot somewhere decided that Comma Separated Values in certain locales should be based on semicolons (who would have thought files would be shared across country borders!?), so when we open CSV files that are actually comma separated all the information is in the first cell (until a semicolon appears).

To get comma separated CSVs to show properly in Excel we have to mess around with OS language settings. CSV as a format should have died years ago, it's a shame so many apps/services only export CSV files. Many developers (mainly US/UK based) are probably not aware of how much of a headache they inflict on people in other countries by using CSV files.



A CSV importer absolutely needs to be configurable. I've seen delimiters including tabs, vertical bars, tildes, colons, and random control characters (they didn't even choose RS and US).


I shouldn't have to resort to arcane concepts like importing files to get them to display properly when people in other locales can just open them.


Good luck with configuration if your CSV parser is ten layers removed from any human, and still needs to get it right. Now what? (Now we guess. We call it "heuristics," of course.)


> CSVs to show properly in Excel we have to mess around with OS language settings.

Why? Aren't the import settings enough?

https://support.office.com/en-us/article/Text-Import-Wizard-...


That copies the data from the CSV file into a worksheet, you aren't editing the CSV file anymore.

I'm not just being pedantic, it makes a big difference. If I want to change some values in a spreadsheet I should be able to just open it, change the values, save, and be certain that the document will be identical apart from the deliberate changes. This is especially important for CSV files, which are commonly used for import/export operations.


I think I understand your use case, but in that case aren't there better apps for that? Excel is not a CSV editor. Even if they would look very similar...


If you open a .csv file in excel and save it, you may find that it has converted any long numbers into scientific notation for you, and saved that as text. It will also get rid of any leading zeros for you.


Also it will round those numbers silently to 15 digits.

It was a fun time explaining to accounting why their report showed duplicate billing events. Try telling an accountant "Don't use Excel". The solution of course was to prefix a "'" character, making the file useless outside of spreadsheet programs.


This is the classic response of the Excelista. "You're holding it wrong". Yeah well Excel is a hammer with two claws and no handle, there's no right way to hold it.

If Excel isn't for opening CSV files why does it associate itself with that extension by default? Why does it implicitly convert the CSV file to a half-assed Excel workbook? Why is there no option to change this behavior?


I haven't used excel in years, so your rant is a bit misdirected. There's a simple reason why it associates that way - it can open CSV files. Same as notepad associates to things which it's useless for. (but if you don't have anything else, it will work)

I'm just saying that editing CSVs regularly, preserving the formatting, while avoiding the import/export functionality is a very specific use case. It's likely not something excel project managers care about that much either. If the use case is popular, then there's going to be an editor that does it better. If not... tough.


Sorry, I didn't intend that rant to be directed toward you specifically. I just mean that your comment sounds like something those Excel project managers would tell me and I'm tired of hearing it.

After years of hearing "that's not what Excel is for" I am left wondering what it is for. I have asked many "Excel pros" how to solve some problem in Excel and I honestly can't think of any satisfactory answers. Just "that's not a common use case".

The CSV behavior is just one of the annoyances. Conflating display format with data type is another. Silently changing data is another. My list of gripes with Excel is long and on this topic my fuse is short.


My experience is that Excel is the Swiss Army Knife application, sufficing to do almost any job even if it's not the best choice. Being installed on almost every desktop means it becomes the first choice tool for almost any task.


Not really. I've found TableTool [1], but it's incredibly basic (doesn't even support sorting columns or search/replace).

[1] https://github.com/jakob/TableTool


> Some idiot somewhere decided that Comma Separated Values in certain locales should be based on semicolons

Semicolons are really better though, because they aren't used as a decimal separator unlike commas in most countries.

I don't know about Excel, but LibreOffice allows very easily to select which parameters to use when opening a CSV file, it works just fine.


> Semicolons are really better though, because they aren't used as a decimal separator unlike commas in most countries.

If you're going to separate values with semicolons--which is perfectly reasonable--I feel like you probably shouldn't do that with a format called Comma Separated Values.


Picking a less-common separator might help but you could also just follow RFC 4180 and quote fields that have commas then double any single quotes in values.


You can also use the file import wizard in Excel to make similar choices. But that's not the default behavior for files with a .csv extension.


The only good CSV dialect is the dif-named DSV (Delimiter Separated Values) where you select and support just one supported delimiter, and you require escaping of the delim character inside values. It's simple, it works. Quotes are hard to parse so don't use those. Just \escape.

http://www.catb.org/esr/writings/taoup/html/ch05s02.html


What's a good alternative non-proprietary format that all major spreadsheet software supports?


Unicode is vast. There's absolutely no good reason we don't have Snowman Separated Values (or some other proper separator sign that isn't commonly used elsewhere) other than that people don't demand it.


You don't even need to go to Unicode. ASCII has

  0x1C  FS  File Separator
  0x1D  GS  Group Separator
  0x1E  RS  Record Separator
  0x1F  US  Unit Separator
You just need a font with glyphs for them.

https://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text


Where can I find those on my keyboard? Entering escape sequences by hand isn't user friendly, if it works at all. One of the benefits of csv is that it's universal, I can open it in vim or write an awk script to extract values, someone else can open in notepad++, someone else should be able to open in excel.


Just thinking out loud here, but isn't part of the point is that they are not keyboard characters. If they're on the keyboard, then they will pop up in ordinary text, similar to the | (pipe) character and friends.


They are ^\ ^] ^^ and ^_, respectively. Of course, most text editors will interpret those keys as something else. In emacs you can enter them literally by using C-q (quote) first.


Since we're kind of talking about things from an I18N perspective, those characters are written like this on a bog standard Swedish keyboard:

    \ is AltGr+?
    ] is AltGr+9
    ^ is Shift+^, then space
    _ is Shift+-
AltGr is the right Alt key, to the right of the space bar.

So none of those are single keys, which means that combining them with Ctrl to write control characters becomes almost comically difficult. Not very accessible to typical users, I'd say.


Actually \ is AltGr++, i.e. "+" is the symbol you get from that key without any modifier. With shift you get ? and with AltGr you get \. My bad, and too late to edit.


Herein lies the slippery slope to APL...


While XLSX is proprietary by descent, it is standardized; thus, it's readable/writable by man and machine alike (essentially a zipped XML with some bells and whistles). I have not encountered a less broken format that is similarly widespread.


I don't think that's common knowledge, although I have unzipped a .xlsx file in the past to inspect the contents. Works on .docx files too.

P.S. Aren't .xlsx files subject to the same problem as the .csv?


Well, some of the issues are shared between CSV and XLSX. However, it does have a clear distinction between structure (which also can be validated!) and content (which moots the issue with comma-semicolon-tab separation), a well-defined character set and somewhat-sane character escaping rules.

It does have similar "executability" issues as CSV (and more), but 1. the formula evaluation is documented and expected behavior, 1b. there is a documented way to suppress it, and 2. programs reading it are aware that security is a thing, and either a) constrain/sandbox it (in the case of table processors such as MSOffice or LibreOffice), or b) don't execute its macros and expansions at all (in the case of libraries such as PhpExcel). Not sure about the Google Docs issue.

(As far as "common knowledge" - knowledge for manual inspection of strings is IMNSHO not required, all that's needed is that it's program-readable; in this respect, most table processors are capable of this. The point "but you can inspect CSVs by hand" comes from experience: it is also possible to inspect binaries by hand, neither of these is intuitive, both are a learned skill)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: