Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The phrases "avoid data silos" and "we want single source of truth" have been used repeatedly about Excel misuse/abuse/overuse for 20+ years. The WSJ story talks about the complaint but not about the underlying tension between data redundancy vs data manipulation.

Excel endures because it's the non-programmer's REPL for data. Like a Lisp REPL, using Excel is nimble and has an instant feedback loop. Click "A-Z" to sort ascending, click "Z-A" to sort descending. Move/hide/freeze columns. If one has intermediate-to-advanced skills, one can pivot the data on multiple axis. None of the centralized systems with even the slickest web GUIs can match Excel's capabilities.

That's why the centralized systems like SAP ERP, Microsoft Team Foundation, mainframe sytems, etc offer functionality to export to Excel spreadsheets. (Or they offer export to csv files -- which is 1 step removed from Excel.) With MS TFS, you can even re-import from Excel so in that case, Excel acts as a sort of offline data-entry client for the project management tool.

One middle ground between Excel data silos and the utopia of a centralized system is a shared document store like Sharepoint or Dropbox. The hierarchy would be something like:

1) Excel spreadsheets emailed around. Lots of redundancy; uncertainty of which version is canonical.

2) Excel xls file stored on Sharepoint with automatic version control. Employees must have discipline to only email links to the "one true" spreadsheet and never attach the actual xls file in the emails.

3) The data that was in Excel is migrated to centralized system and all data entry and reporting is done there. However, because the central systems are so cumbersome, this abstraction "leaks" back out to Excel because of the highly desired "export to Excel" functions! (This restarts the dreaded Excel "data silos" complaint cycle all over again.)

There isn't an IT department in the world that can develop centralized IT solutions fast enough to keep up with the ways employees slice & dice data. That's why Excel is "overused".



> There isn't an IT department in the world that can develop centralized IT solutions fast enough to keep up with the ways employees slice & dice data. That's why Excel is "overused".

It wasn't very long ago that I realized that my two decades in IT could be summed up as: making proper applications out of Excel workbooks with feeble macros. I hate Microsoft as much as the next Linux zealot, but dang if they didn't create a world in which I've made a career. Where I'm at now, I'm drowning in opportunities to ameliorate horrific Excel-based workflows, and I'm fairly confident that there's enough work out there to last me until another 20 years to retirement.


>making proper applications out of Excel workbooks

Yes, exactly.

If we're being uncharitable, we can spin Greenspun's 10th rule[1] of programming as:

>"Any sufficiently complicated Excel spreadsheet contains an ad-hoc, informally-specified, bug-ridden, slow implementation of a proper centralized database."

Or, if we're being charitable, we frame it as an internal MVP (Minimum Viable Product):

>"Spreadsheets are the internal 'mvp' that proves the business value before you build the centralized systems. When the spreadsheet becomes unmaintable spaghetti formulas and the xls email workflow crushes under its own weight, that will give the company the evidence and the confidence to spend $1 million and migrate the spreadsheet to a proper centralized database."

[1] https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule


I’ve just done exactly this: replace an old xls file which was sent each month by email with a simple web based application. The app includes some nice dataviz.

But then the people keep asking “how can I export to excel”. One user was copying an html table split in 20 pages one by one to excel.

So I found a elegant and simple solution: an excel spreadsheet linked to the PostgreSQL db of the app via ODBC. Now users can just click on Refresh All in excel and get the latest data. I’ve also added an “click here to open” link in excel which opens the form with the row’s id in the web app.

Now everyone is happy: the data is kept centralized and available to everyone in real time. And users can make all the pivot tables and pivot charts that they want natively in excel.

Do not underestimate the power of ODBC with excel!

(may not work for big data though, but most business app do not contain more than a few thousand rows)


>that will give the company the evidence and the confidence to spend $1 million and migrate the spreadsheet to a proper centralized database

...along with understanding that migrating at earlier stage would cost only $100K.


Migrating too early before a business flow is properly understood could lead to that reduced cost but also a product that doesn't quite fit the actual needs.


A database is not a spreadsheet replacement. A database backed application is a spreadsheet replacement.


Henceforth known as "jasode's corollary."


How did you build that career if you don't mind me asking? I've moved on from that stuff in my early career but would love to get back into it.


None of the centralized systems with even the slickest web GUIs can match Excel's capabilities.

You can very easily use Excel as a client to a central database and get the best of both worlds.


Actually you run into the one of Excel's biggest problems when you do this, namely "Multiple versions of the truth". Here's how that happens: The central database often contains bad data. Users "fix" the bad data in their spreadsheet. Excel doesn't have the ability to write the updated data back to the database. Now everyone has their own versions of "the truth". There are products that attempt to solve this problem by adding the ability to write modified data back to the database. See for example Pebble Reports, http://pebblereports.com/features/ but this is a reporting program and does not have spreadsheet capabilities.


My first question was going to follow the poster above, and you answered it for me. I didn't realize Excel couldn't write back to the database. It shouldn't be too hard to develop an extension for that purpose, would it?

Aside from that — what about MS cloud services? I mean, maybe that's not appropriate for healthcare or certain businesses, but for others it allows there to be a single source of truth if the workflow is correctly tuned, no? I can produce a doc in my local instace, push to 365, edit remotely on the web, have somebody else edit remotely, and sync it back to my local instance without a hitch.

I use Office365 for that purpose regularly at my job. I'm a developer, but I have regular communication with editors, producers, project managers, division managers, operations managers, art directors, and so on. A lot of the information we share is tabled and needs to stay updated, and editable by each party involved. It's worked well for me in that regard.

Maybe others have had different experiences?


"I didn't realize Excel couldn't write back to the database. It shouldn't be too hard to develop an extension for that purpose, would it?"

Well, more accurately, it can write back to the database, but normal usage of Excel won't afford that result. Normal usage of Excel you have an independent value.

It is also a dubious proposition as to whether we want people to actually write back to the central database; you'd be one stray macro away from disaster. You sorta want it if someone manually corrects something, but not if they operate on it programmatically, and now we're making things really complicated.


There shouldn't be any risk if you have version control.


Thanks for the insight.


> There are products that attempt to solve this problem by adding the ability to write modified data back to the database.

Another company/product that I know about (though it's been over a decade since I last touched it) that attempted to fix this issue was Actuate:

https://en.wikipedia.org/wiki/Actuate_Corporation

Hmm - and now that I've started digging into it, I see that this company has radically changed from what I knew of it. I had known it as a company which created software for designing and running reports from various data sources (whatever could feed it, honestly - excel, odbc, csv files, etc), which internally could be configured for more advanced usage using a language that had similarities to Visual Basic (it was OO long before VB, and geared for the reporting system, not general purpose usage). Before I left my employer who used it, they had just announced a Java-based system of "business objects" and some kind of tool to interact with data in an Excel-like manner (the tool essentially had all of the functionality of Excel at the time - including pivot tables), where changes would propagate thru the system. Plus interaction could only be done thru those objects, which were defined as "views" into the dataset.

So - after research this (just now!) - imagine my surprise at finding that this company now is named:

https://www.opentext.com/

...and they also did something called "BIRT" - which was open source and part of Eclipse:

https://en.wikipedia.org/wiki/BIRT_Project

Apparently this started in 2004 (not long after I left my employer - and that it is Java-based also fits).

I'm not sure what is what with the company any longer? If I had to guess (probably wrongly), they spun off their Actuate reporting stuff and the other tools as open source (?), and are now concentrating on other things, possibly in the same vein?

Weird. But very interesting.


>easily use Excel as a client to a central database and get the best of both worlds.

Yes it seems that way but many spreadsheets exist because the centralized system is an incomplete solution.

For example, a company might have a "human resources tracking" centralized system. (E.g. Ariba, Peoplesoft, etc). It has some standard features like ingesting resumes and extracting "skills" from parsing MS Word docs, etc.

However, the company happens to use a interviewing process where the candidate must pass (1) phone interview and (2) onsite with 5 peer programmers, and (3) hiring manager.

The hiring manager wants to "capture" all of those states (phone_passed=True/False, peer interviews with 5 different numerical scores, manager_pass=True/False). The centralized system doesn't have fields that capture all that so the hiring manager creates a spreadsheet that has those extra columns. Many spreadsheets out in the wild "fill the gap" between incomplete centralized systems and custom IT programming. An adhoc xls file can be created far faster than submitting a change request to the corporate IT department to customize Ariba -- which could take months (or never) to implement.

Or spreadsheets act as the "intersection" of data between multiple systems. Again, one purpose of centralized "data warehouses" was to eliminate Excel "data silos" but again, no IT department can create data warehouse solutions (which also includes all the custom reports) as fast as someone creating spreadsheets and just emailing them around.

Having a hybrid process where Excel acts as a round-trip offline client to a central db doesn't really eliminate the fundamental reason Excel (over)use keeps multiplying like rabbits.


This also illustrates another problem: all the programmers here are assuming perfect domain knowledge even exists in the first place.

When it doesn't, Excel will work fine (and will only cause disaster in individual cases). A "proper" unit-tested solution will either not work at all or just won't support half the use cases.

And without someone specifying the domain knowledge, there's nothing any programmer, no matter how well-versed in unit testing, agile, or ... can do about it.


Can you easily integrate with externals systems via REST & SOAP? I've done some quick web searches on this topic and have never seen anything that jumped out at me.

I know Google Sheets gives you a little bit of this with Javascript embedding but I was searching for something more tightly integrated with Excel or LibreOffice.


Look on the "Get & Transform" tab


> There isn't an IT department in the world that can develop centralized IT solutions fast enough to keep up with the ways employees slice & dice data.

That's why the principal way you sell centralized systems to users isn't in the ease of manipulating results but in the ease / automation of data entry. The whole pain point and problem with Excel is in manual data entry, and having a single source of truth, even if you succeed in enforcing its property as a single source of truth, does not automatically solve disagreements between different human sources of data.

That's why tools like Grafana have interesting futures for business users, because they enforce a clear separation between data sources and data visualization, while offering a lot of control to the end user over statistical manipulations and visualizations that they can create, and keeping those visualizations updated in real time, so that they don't have to be re-prepared every time period for some new report.


"One middle ground between Excel data silos and the utopia of a centralized system is a shared document store like Sharepoint or Dropbox." ... 100% agree, and add I find that Google Sheet is the best for this role: 1 source of truth, modification history, read/write permissions and shareable. I am now starting to use spreadsheets for use cases I would have strongly objected in the past. I'm surprised Google sheet isn't more mainstream.


And an API that lets data be used in other 'real' applications


Excel endures because it's the non-programmer's REPL for data. Like a Lisp REPL, using Excel is nimble and has an instant feedback loop. Click "A-Z" to sort ascending, click "Z-A" to sort descending. Move/hide/freeze columns. If one has intermediate-to-advanced skills, one can pivot the data on multiple axis. None of the centralized systems with even the slickest web GUIs can match Excel's capabilities.

What about Business Intelligence products such as Hyperion? They can provide reports from a defined data model which gets data directly from the database. People can create their pivots and other transformations on the data fairly easily. Is their something that Excel offers that makes it much better than one of these Business Intelligence products in certain situations?


Excel offers a few things in most scenarios, primarily because that any enforced structure always implies lost flexibility, and one of the issues often quoted with Excel is the lack of enforcement of using the right structures and data sources.

A simple one: If I have been asked to do a report, but it turns out I don't have the correct access rights to some data, my manager can simply send it to me in an Excel sheet, or he can log an issue with IT to change the acess rights to a specific key/dimension/hierarchy.

The 'one true data source' move often correspond to tighter access controls when good auditing really would have been the correct solution, hence the above situation is fairly common.


Thanks very much for the info that Excel may offer least friction in a lot of circumstances, especially when ad-hoc things need to be done with data.


For qualitative data management, but numerical analysis, 1 source of truth collaboratively - I use Atlassian Jira with great success.

Got a complex work stream? Got thousand small problems that needs to be slowly figured out over 50 different main topics? Got a team of 2,50 or 1000? No problem.

1. Just set up a bunch of jira entries while tagging them with the same labels

2. set up filters on these labels.

3. Set up dashboard(s) that give a complete realtime/update to any inquiring user.

Need to report status? No problem, screenshot of the Dashboard and done.

It's really great tbh.


Alternatively, use Google Sheets :) Centralized, single source of truth spreadsheets.


Google Sheets is great and makes sense for a lot of vanilla use cases, but even in its current form, doesn't hold a candle to Excel running on a Windows machine when it comes to doing anything even slightly fancy or non-trivial.


That's not true. Like newer versions of Excel, you can use Javascript on Google Sheets now https://developers.google.com/apps-script/guides/sheets It's better than Excel's VBA.

Also Google Sheets has an API for stuff like Python.


Excel can cover many use cases with just with UI workflows without requiring its users to actually learn programming.

Only the very advanced users turn to VBA programming.


Example of fancy or non-trivial?


Google Sheets and other similar web apps replicate most if not all of the Excel features you just listed, and it's centralized and more than one person can work on it simultaneously.


> The phrases "avoid data silos" and "we want single source of truth" have been used repeatedly about Excel misuse/abuse/overuse for 20+ years.

Yep. Even if they get centralize data via RDBMs like SQL Server or even god forbid Microsoft Access, office workers are going to want the front-end of that data to be excel.

There is too much knowledge investment in excel for many office workers to want to switch.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: