Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've used GnuCash for years before switching to beancount (https://bitbucket.org/blais/beancount/ with smart_importer: https://github.com/beancount/smart_importer) and fava (https://github.com/beancount/fava/). Much easier to work on your journals (ledger, trades, prices...) since they are just text files. Really great if you're using the beancount package for Sublime: https://packagecontrol.io/packages/Beancount.

More importantly, the importers (for all my banks and financial services) let me import and reconcile all transactions, but also archive all documents (including PDF, text files, etc) in one, well organized directory: each file is saved into a folder that corresponds to my account structure such as Asset:Current:Cash, Liability:Mortgage, Income:Salary, Expenses:Health:Dentist. It's great to rely on fava (example: https://fava.pythonanywhere.com/example-beancount-file/incom...) to check your accounting (with all files listed in the journal by date, with tags and links and other neat features) and still be able to browse documents in your file browser.



Wow, your setup sounds very similar to what I've converged on. Except I wrote a fairly substantial pipeline that preprocesses the data before giving it to beancount and Fava.

Maybe I missed something, but it seemed like beancount wants everything to live in One Giant Journal file. I really wanted a pipeline where each bank statement PDF would output one file with a corresponding list of transactions (this stage can run completely in parallel and I use "ninja" to make it very fast).

Then another process can run over these files looking for matches (+$X, -$X), and spit out "transaction groups", where each transaction group is a set of transaction ids that sum to zero. And then a different interactive tool lets you categorize transactions and spits out transaction groups with embedded "expense" transactions. It's all non-destructive; each tool only adds data, and nothing ever modifies existing files. Then a final step can combine all these files and spit out a beancount file for Fava.

How does this compare with the way beancount's importer does it? How does its importer handle transfers and categorization? Is it destructive or non-destructive?


You're parsing PDF? I'd imagine most places allow some sort of data export like csv or something? That's how I get the data out of Chase and BofA.


Yes, in my experience CSV/XML/etc export is spotty. Some institutions don't have it at all, and even when they do the time range can be limited or hard to time-window reliably.

For example, I can't get CSV about my pay stub and all the places money flows (taxes, insurance, etc). So I use PDF as the best way to get all the data.

Parsing PDF is a huge PITA, since PDF is really designed only for layout and not to encode semantic document structure. But if you want the greatest amount of visibility, nothing is as authoritative as the actual statements.


Yeah, I guess I don't really look at paystubs often and my banks support good csv export for like 2+ years, so I'll always have it when I need when I refresh those sheets. I do year-to-date in those sheets to make them not huge.

Regarding taxes, I do a tax year calculation using my W2's and returns to compute an effective tax rate and figure out ways I can improve my situation re taxes -- but my budgeting basically only starts once the net money hits a bank account which has worked pretty well.


There are a few problems with exporting data... here's what comes to my mind at the moment:

- It's an additional step and unnecessary when you're already downloading PDF statements, which you probably should be doing regardless

- Data is often unavailable for export after a few months or a year or two (depending on the bank), far earlier than PDFs

- If you have PDF scans (especially from earlier days) then you need to parse the OCR anyway

- If you're using official APIs (instead of writing a scraper or or downloading by hand) then you may need to pay extra


To each their own, csv for me has been the easiest thing moving forward every month or three.

I did do a historical net worth calculation and the only data I had going back far enough was PDFs, so through a mix of bash scripts and pdftotext, I was able to get a number for each month back about 10 years. But I ended up just putting that monthly balance for each account in a google sheet so I could sum and plot it there. Now I just stick the month-end balance for each account in a sheet to keep this updated.


> beancount wants everything to live in One Giant Journal file

Actually, beancount supports include directive. See includes section [1] of the language syntax document

[1]: https://docs.google.com/document/d/1wAMVrKIA2qtRGmoVDSUBJGmY...


Thanks for the reference. But a single transaction can span multiple financial institutions. Say you pay off a credit card. One half of the transaction comes from the credit card statement, the other half comes from your bank. The two halves sum to zero.

My understanding is that the whole transaction has to live in one file with Beancount. This doesn't seem amenable to having each bank statement generate a single Beancount output file.

But I could be wrong. I don't know how Beancount's importer works.


I use beancount, and a somewhat custom importer and categorizer. (happy to expand on those but this comment is about the multi-transaction).

I have the categorizer send such transactions to a temporary "limbo account". So I have two transactions:

  # This TX is imported from checking acct statement:
  2020-05-15 * "ACH Payment BankA credit card"
      Assets:BankA:Checking  -500
      Equity:Limbo:CreditCardPayments

  # This TX is imported from credit card statement:
  2020-05-16 * "Payment applied THANK YOU"
      Liabilities:BankB:CreditCard  500
      Equity:Limbo:CreditCardPayments
In both cases the second posting is added by the automatic categorizer based on the description.

After doing a bout of imports, I look at the Limbo account, which should always sum to zero. Sometimes the two TXs don't appear on the same day, so occasionally there's a temporary balance there but that's fine.

I use the same to match up my direct deposits (one end from paystubs import, other from checking account), wire transfers between banks/brokerage, etc. I tend to put them in different subaccounts of "Equity:Limbo" so that I can more easily spot any discrepancies.

Edit: Should've first read the previous reply that says the same thing!


If you really want to do that, there is a beancount plugin called ZeroSum that does it

https://github.com/redstreet/beancount_reds_plugins/tree/mas...


Conceptually this is one transaction; from a book keeping (and beancount) perspective it is, or can be, two.

Often (in the US) the dates are different- debit from the bank on day X, post to credit card on day X + 1 or X + 2. For my stuff I want the dates in beancount to reflect the dates on the respective statements, so that the balances, which are date-specific, match.

So I capture these as two transactions. I do it as- debit the bank, credit the "ether", then debit the "ether" credit the credit card.

Beancount just wants the postings within a literal transaction to sum to zero. If this conceptual transaction is split into two literal transactions, with some fictional bridge account in between, they can live in different beancount files.


Can I ask how you parse PDFs? I'm curious both in terms of reading the PDF data (Python library?) and parsing it (regex?)... and do you have to deal with OCR as well?


I use "pdftotext -layout" and then parse that. Here is some more info from people who have tried this approach:

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...


Thanks!


"each file is saved into a folder that corresponds to my account structure such as Asset:Current:Cash, Liability:Mortgage, Income:Salary, Expenses:Health:Dentist"

Beancount user here, even if you don't bother with any finance tracking, organizing financial documents following the folder structure is useful - see "Filing Dcouments" in https://docs.google.com/document/d/1e4Vz3wZB_8-ZcAwIFde8X5Cj...

Shameless promos:

- People are starting to make Fava extensions for things like envelope budgeting, portfolio tracking, etc. Usually these are announced on the mailing list: https://groups.google.com/forum/#!forum/beancount

- The sublime plugin is great but I have a really useful improvement to search by org-mode headlines that never seemed to get merged - see https://github.com/norseghost/sublime-beancount/pull/14


> People are starting to make Fava extensions for things like envelope budgeting

I've been hearing about making extensions for envelope budgeting for years, with little progress. I haven't checked in the last 2 years though - any progress on this?

To be frank, it was fairly annoying that the author kept declaring it as "not too hard" and yet no one came up with one.



Thanks. Too soon to see if it will be workable. I'll stick to Ledger until this becomes mature.


Thanks for this. Looks compelling. Is there a simple installation guide, I find it hard to figure out how to get going.


As a longtime beancount user- it should be easy to install- there are packages for most OSes, they mostly work, the maintainer, Martin Blais is amazing and indefatigable, and the community is active and friendly and technical.

But the "how to get going" part is where I find the real difficulty lies.

There is a conceptual friction in double-entry accounting, which beancount enforces. If you do not have prior exposure to it, even if you are financially and mathematically inclined- grokking double entry can be a life-changing practice. It is the Iyengar Yoga to the 7 minute workout that is just tracking transactions as per GnuCash or Quicken or whatever.

One has to have the appetite to consume a worldview. As someone living in that world, FWIW, I highly recommend it. I would not undersell the conceptual work required. It is not for everyone.

But if the approach speaks to you, reach out on the mailing list and people will help with the mechanics.

Cheers.


A thousand times yes! the abstraction of double-entry accounting is absolutely central to understanding the modern economy and why (large) economic agents behave the way they do. It’s also essential for understanding fiat money, which is more or less just numbers in a double-entry ledger.


The documentation says to type "pip install beancount". That doesn't seem hard, so I'm guessing you're having some other problem? If you post on the mailing list, people will help you.


The installation is really just a python library, but if you want to keep everything contained you can put them in a Dockerfile


I made this VS Code extension to help with manually entering Beancount transactions: https://github.com/aostiles/Beanquick


I'm also a happy beancount user. It's also pretty simple to extend, if not too documented.

I was able to write a plugin to get data from the Milan stock exchange in a few hours, which was very nice. The vim plugin is also pretty ok.


Been an avid beancount'er for about two years now too!

I've never used the import features, the manual inputting of transactions into a text file doesn't take too long.


This sounds amazing. It would be great if one day you could publish a detailed how-to / tutorial.


How do you parse the pdfs?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: