A python library for analyzing personal financial data.
tcat
provides two building pathways: a dedicated Docker
image containing a
Jupyter Lab environment, and a .whl
package to be installed with via pip
.
To build and install the .whl
package, ensure that you have python 3.9 and
poetry installed on your machine. Then run the
following commands:
poetry install && poetry build -f wheel && pip install dist/*.whl
To build and run the Jupyter Lab environment, a run.sh
script is provided,
otherwise just run:
docker build -t tcatenv:latest .
The core unit of tcat
is the Transaction
object. A Transaction
represents
the data one might find contained in a single line of a transaction CSV file
downloaded from their bank. Each transaction object is a
dataclass containing the
following bits of information:
Field Name | Data Type | Description |
---|---|---|
account |
str |
The account associated with the transaction, such as "checking" or "savings". |
amount |
float |
The dollar amount of the transaction. |
balance |
float |
The dollar amount corresponding to the balance of the associated account after registering the transaction. |
bank |
str |
The name of the financial institution associated with the account. |
date |
datetime.date |
The post date associated with the transaction. |
desc |
str |
The description of the transaction, as raw text pulled from the source CSV file. |
name |
Optional[str] |
A more human-friendly version of desc , such as "Money Transfer" or "Pizza Planet". |
note |
Optional[str] |
A convenient field to add any additional arbitrary notes about the transaction. |
tags |
list[str] |
A list of tag strings used to categorize the transaction (see below for more information). |
Note that the name
, note
, and tags
fields may be "empty" (either None
or
[]
), in which case the transaction contains the same data as would be parsed
from a transaction CSV file. Such a transaction is considered uncategorized,
and is ultimately not very useful apart noting the change in balance over time.
tcat
provides tools for automatically categorizing transactions, but more on
that later. A properly categorized transaction might look like the following:
from tcat import Transaction
t = Transaction(
account = 'checking',
amount = -3.87,
balance = 456.78,
bank = 'bank1',
date = datetime.date(2021, 8, 12),
desc = 'PIZZA PLANET 0123456789 1 20210812 XXYYZZ',
name = 'Pizza Planet',
tags = ['food', 'fast-food', 'pizza']
)
While the Transaction
object might be considered the "core" unit of tcat
,
most interaction with the library will be done with Transactions
objects (note
the s
!). Transactions
objects provide a convenient wrapper over a list of
transactions, and are bundled with a ton of useful methods for sorting,
filtering, importing/exporting, etc. The following provides a brief overview of
some of the things you can do:
from tcat import Transactions
# Lets pretend this is a large collection of bank transactions.
transactions = Transactions()
# Transactions objects can be iterated over.
for t in transactions:
print(str(t))
# What banks/accounts are associated with this collection?
accounts = transactions.accounts()
banks = transactions.banks()
# Grab all food-related transactions within the last 90 days from all checking
# accounts.
food = transactions.filter(account='checking', date=90, tags='food')
# What was the most I spent on groceries in the last 30 days?
ans1 = transactions.filter(account='checking', date=30, tags='groceries').max_amount(absolute_value=True)
# What was the average transaction amount each week for the last 90 days?
ans2 = [w.mean_amount() for w in transactions.filter(date=90).group(by='date-weekly').values()]
Before categorizing transaction objects, one typically parses them from transaction CSV files. Such a file might look something like the following:
Date,Description,Amount,Balance
03/19/2020,"PIZZA PLANET 0123456789 2",-$9.47,$67.45
03/20/2020,"SUB SHOP 0123456789 1",-$3.75,$63.70
...
tcat
provides the Parser
class for handling the translation between the
string content above and a resulting Transactions
object:
from tcat import Parser
# Parse a collection of transactions from "example.csv", indicating the bank/account they belong to.
transactions = Parser().parse_transaction_csv(path='example.csv', bank='bank1', account='checking')
Note that for convenience sake, the above function may imply the bank and/or
account associated with the list of transactions depending on whether the bank
or account
parameters are set to None
(their defaults).
- If neither
bank
noraccount
are defined, the parser assumes the name of the input CSV file is of the form{bank}-{account}.csv
. - If
bank
is defined but notaccount
, the parser will set the account to the name of the input file. - If
account
is defined but notbank
, the parser will set the bank to the name of the input file.
A few additional notes about how the parser works:
- Amounts or balances within parentheses are considered negative. For example,
($3.87)
would be parsed to-3.87
. - The CSV file must have a header line, although field matching is
case-insensitive and pretty flexible (either
description
, ordesc
is allowed, for example). - Dates may be of the form
YYYY/MM/DD
orMM/DD/YYYY
, and may be mixed within the same file.
To categorize a collection of transactions, tcat
leverages a database of
regular expressions defined in YAML documents. Typically tcat
is pointed to a
directory containing many YAML files loosely grouped by some higher-level
category (food.yaml
, bills.yaml
, etc.), where each file looks something like
the following:
# food.yaml
# A collection of tags inherited by anything that matches within this file.
tags:
- 'expense'
- 'food'
# Defines a list of match objects to which an uncategorized transaction's `desc`
# field will be tested.
data:
- name: 'Pizza Planet'
match: >-
^pizza\s*planet
tags:
- 'fast-food'
- 'pizza'
- name: 'Sub Shop'
match: >-
sub\s*shop
tags:
- 'fast-food'
- 'subs'
Note that matching is done against the lowercase form of the desc
field.
To ingest these files, one instantiates a Categorizer
object pointing at the
directory containing them, combined with the Parser
object explained above,
one might write the following:
from tcat import Categorizer, Parser
uncategorized = Parser().parse_transaction_csv('~/example/csv')
categorized = Categorizer(data_path='~/example/data').cat(uncategorized)