Initial commit.
This commit is contained in:
commit
6f7e52a42b
|
@ -0,0 +1,3 @@
|
||||||
|
__pycache__/
|
||||||
|
.venv/
|
||||||
|
.envrc
|
|
@ -0,0 +1,21 @@
|
||||||
|
The MIT License (MIT)
|
||||||
|
|
||||||
|
Copyright (c) 2024 Julien Palard
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in
|
||||||
|
all copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
||||||
|
THE SOFTWARE.
|
|
@ -0,0 +1,128 @@
|
||||||
|
# Parseur de relevés BoursoBank
|
||||||
|
|
||||||
|
⚠ Cette bibliothèque a été développée indémendament de BoursoBank.
|
||||||
|
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
pip install boursobank
|
||||||
|
|
||||||
|
|
||||||
|
## Sécurité
|
||||||
|
|
||||||
|
### Mot de passe
|
||||||
|
|
||||||
|
Cette bibliothèque ne **se connecte pas à internet** (dans le doute,
|
||||||
|
lis le code) elle ne fait que lire des relevés au format PDF déjà
|
||||||
|
téléchargés, tous les traitements sont effectés en local.
|
||||||
|
|
||||||
|
Dans le doute il doit être possible de faire tourner l’application
|
||||||
|
dans [firejail](https://github.com/netblue30/firejail) ou similaire.
|
||||||
|
|
||||||
|
Il n’est donc pas nécessaire de s’inquiéter pour son mot de passe : il
|
||||||
|
n’est pas demandé (là, pas besoin de relire le code : si la lib ne
|
||||||
|
demande pas le mot de passe… elle ne l’a pas).
|
||||||
|
|
||||||
|
|
||||||
|
### Erreurs du parseur
|
||||||
|
|
||||||
|
Lire des PDF [n’est pas simple](https://pypdf.readthedocs.io/en/stable/user/extract-text.html#ocr-vs-text-extraction).
|
||||||
|
|
||||||
|
Pour s’assurer de ne pas introduire d’erreur dans vos analyses, cette
|
||||||
|
bibliothèque fournit une méthode `validate()` qui valide que le
|
||||||
|
montant initial + toutes les lignes donne bien le montant final, sans
|
||||||
|
quoi une `ValueError` est levée.
|
||||||
|
|
||||||
|
Cet exemple ne lévera donc une exception qu’en cas d’erreur d’analyse
|
||||||
|
(ou de la banque, comme au monopoly) :
|
||||||
|
|
||||||
|
```python
|
||||||
|
for file in args.files:
|
||||||
|
statement = Statement.from_pdf(file)
|
||||||
|
statement.pretty_print()
|
||||||
|
statement.validate()
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## Interface en ligne de commande
|
||||||
|
|
||||||
|
Cette lib est utilisable en ligne de commande :
|
||||||
|
|
||||||
|
boursobank *.pdf
|
||||||
|
|
||||||
|
vous affichera vos relevés (CB ou compte), exemple :
|
||||||
|
|
||||||
|
$ boursobank 2024-01.pdf
|
||||||
|
2024-01.pdf
|
||||||
|
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
|
||||||
|
┃ Date ┃ RIB ┃
|
||||||
|
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
|
||||||
|
│ 2024-01-01 │ 12345 12345 00000000000 99 │
|
||||||
|
└────────────┴────────────────────────────┘
|
||||||
|
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
|
||||||
|
┃ Label ┃ Value ┃
|
||||||
|
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
|
||||||
|
│ VIR SEPA Truc │ 42.42 │
|
||||||
|
│ VIR SEPA Machin truc │ 99.00 │
|
||||||
|
│ Relevé différé Carte 4810********0000 │ -123.45 │
|
||||||
|
└──────────────────────────────────────────┴──────────┘
|
||||||
|
|
||||||
|
|
||||||
|
## API
|
||||||
|
|
||||||
|
Tout l’intérêt est de pouvoir consulter ses relevés en Python, par
|
||||||
|
exemple un export en CSV :
|
||||||
|
|
||||||
|
```
|
||||||
|
import argparse
|
||||||
|
import csv
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from boursobank import Statement
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
args = parse_args()
|
||||||
|
statement = Statement.from_pdf(args.ifile)
|
||||||
|
writer = csv.writer(sys.stdout)
|
||||||
|
for line in statement.lines:
|
||||||
|
writer.writerow((line.label, line.value))
|
||||||
|
|
||||||
|
|
||||||
|
def parse_args():
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument("ifile", type=Path, help="PDF file")
|
||||||
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
|
```
|
||||||
|
|
||||||
|
La bibliothèque ne fournit qu’un point d’entrée : la classe `Statement`
|
||||||
|
|
||||||
|
Depuis cette classe il est possible de parser des PDF :
|
||||||
|
|
||||||
|
relevé_bancaire = Statement.from_pdf("test.pdf")
|
||||||
|
|
||||||
|
ou du texte :
|
||||||
|
|
||||||
|
relevé_bancaire = Statement.from_text("blah blah")
|
||||||
|
|
||||||
|
|
||||||
|
Cette classe fournit principalement deux attributs, un dictionnaire `headers` contenant :
|
||||||
|
|
||||||
|
- `date` : le 1° jour du mois couvert par ce relevé.
|
||||||
|
- `emit_date` : la date à laquelle le relevé a été rédigé.
|
||||||
|
- `RIB` : le RIB/IBAN du relevé.
|
||||||
|
- `devise` : probablement `"EUR"`.
|
||||||
|
- `card_number` : le numéro de carte bleu si c’est un relevé de carte.
|
||||||
|
- `card_owner` : le nom du possesseur de la carte bleu si c’est un relevé de carte.
|
||||||
|
|
||||||
|
et un attribut lines contenant des instances de la classe `Line` dont
|
||||||
|
les attributs principaux sont :
|
||||||
|
|
||||||
|
- `label` : la description courte de la ligne.
|
||||||
|
- `description` : la suite de la description de la ligne si elle est sur plusieurs lignes.
|
||||||
|
- `value` : le montant de la ligne (positif pour un crédit, négatif pour un débit).
|
|
@ -0,0 +1,384 @@
|
||||||
|
"""Parses BoursoBank account statements."""
|
||||||
|
|
||||||
|
import datetime as dt
|
||||||
|
import logging
|
||||||
|
import re
|
||||||
|
from decimal import Decimal
|
||||||
|
|
||||||
|
from pypdf import PdfReader
|
||||||
|
from rich.console import Console
|
||||||
|
from rich.table import Table
|
||||||
|
from rich import print as rich_print
|
||||||
|
from rich.panel import Panel
|
||||||
|
|
||||||
|
__version__ = "0.1"
|
||||||
|
|
||||||
|
DATE_RE = r"([0-9]{1,2}/[0-9]{2}/[0-9]{2,4})"
|
||||||
|
|
||||||
|
HEADER_VALUE_PATTERN = rf"""\s*
|
||||||
|
(?P<date>{DATE_RE})\s+
|
||||||
|
(?P<RIB>[0-9]{{5}}\s+[0-9]{{5}}\s+[0-9]{{11}}\s+[0-9]{{2}})\s+
|
||||||
|
(
|
||||||
|
(?P<devise>[A-Z]{{3}})
|
||||||
|
|
|
||||||
|
(?P<card_number>[0-9]{{4}}\*{{8}}[0-9]{{4}})
|
||||||
|
)\s+
|
||||||
|
(?P<periode>(du)?\s+{DATE_RE}\s+(au\s+)?{DATE_RE})\s+
|
||||||
|
"""
|
||||||
|
|
||||||
|
RE_CARD_OWNER = [ # First pattern is tried first
|
||||||
|
re.compile(r"Porteur\s+de\s+la\s+carte\s+:\s+(?P<porteur>.*)$", flags=re.M),
|
||||||
|
re.compile(
|
||||||
|
r"44\s+rue\s+Traversiere\s+CS\s+80134\s+92772\s+"
|
||||||
|
r"Boulogne-Billancourt\s+Cedex\s+(?P<porteur>.*)$",
|
||||||
|
flags=re.M,
|
||||||
|
),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def parse_decimal(value: str):
|
||||||
|
"""Parse a French value like 1.234,56 to a Decimal instance."""
|
||||||
|
return Decimal(value.replace(".", "").replace(",", "."))
|
||||||
|
|
||||||
|
|
||||||
|
class Line:
|
||||||
|
"""Represents one line (debit or credit) in a bank statement."""
|
||||||
|
|
||||||
|
PATTERN = re.compile(
|
||||||
|
rf"\s+(?P<date>{DATE_RE})\s*(?P<label>.*)\s+"
|
||||||
|
rf"(?P<valeur>{DATE_RE})\s+(?P<amount>[0-9.,]+)$"
|
||||||
|
)
|
||||||
|
|
||||||
|
def __init__(self, statement, line):
|
||||||
|
self.statement = statement
|
||||||
|
self.line = line
|
||||||
|
self.description = ""
|
||||||
|
self.match = self.PATTERN.match(line)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def label(self):
|
||||||
|
"""Line short description."""
|
||||||
|
return re.sub(r"\s+", " ", self.match["label"]).strip()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def safe_label(self):
|
||||||
|
"""Line short description without double quotes."""
|
||||||
|
return self.label.replace('"', "")
|
||||||
|
|
||||||
|
def add_description(self, description_line):
|
||||||
|
"""Add a line to a long description."""
|
||||||
|
self.description += description_line
|
||||||
|
|
||||||
|
@property
|
||||||
|
def direction(self):
|
||||||
|
"""returns '-' for outbound, and '+' for inbound.
|
||||||
|
|
||||||
|
There's two columns in the PDF: Débit, Crédit.
|
||||||
|
|
||||||
|
Sadly we don't really know where they are, and there's
|
||||||
|
variations depending on the format, so we have to use an
|
||||||
|
heuristic.
|
||||||
|
"""
|
||||||
|
if self.statement.headers["date"] < dt.date(2021, 1, 1):
|
||||||
|
column_at = 98
|
||||||
|
else:
|
||||||
|
column_at = 225
|
||||||
|
|
||||||
|
column = self.match.start("amount")
|
||||||
|
return "-" if column < column_at else "+"
|
||||||
|
|
||||||
|
@property
|
||||||
|
def amount(self):
|
||||||
|
"""Raw value for this line, dependless of its 'debit'/'credit' column"""
|
||||||
|
return parse_decimal(self.match["amount"])
|
||||||
|
|
||||||
|
@property
|
||||||
|
def value(self):
|
||||||
|
"""Value for this line. Positive for credits, negative for debits."""
|
||||||
|
return self.amount if self.direction == "+" else -self.amount
|
||||||
|
|
||||||
|
def __str__(self):
|
||||||
|
return f"{self.safe_label} {self.value}"
|
||||||
|
|
||||||
|
|
||||||
|
class AccountLine(Line):
|
||||||
|
"""Represents one line (debit or credit) in a bank statement."""
|
||||||
|
|
||||||
|
PATTERN = re.compile(
|
||||||
|
rf"\s+(?P<date>{DATE_RE})\s*(?P<label>.*)\s+"
|
||||||
|
rf"(?P<valeur>{DATE_RE})\s+(?P<amount>[0-9.,]+)$"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class BalanceBeforeLine(AccountLine):
|
||||||
|
PATTERN = re.compile(rf"\s+SOLDE\s+AU\s+:\s+{DATE_RE}\s+(?P<amount>[0-9,.]+)$")
|
||||||
|
|
||||||
|
|
||||||
|
class BalanceAfterLine(AccountLine):
|
||||||
|
PATTERN = re.compile(r"\s+Nouveau\s+solde\s+en\s+EUR\s+:\s+(?P<amount>[0-9,.]+)$")
|
||||||
|
|
||||||
|
|
||||||
|
class CardLine(Line):
|
||||||
|
"""Represents one line (debit or credit) in a card statement."""
|
||||||
|
|
||||||
|
PATTERN = re.compile(
|
||||||
|
rf"\s*(?P<date>{DATE_RE})\s+CARTE\s+(?P<valeur>{DATE_RE})"
|
||||||
|
rf"\s+(?P<label>.*)\s+(?P<amount>[0-9.,]+)$"
|
||||||
|
)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def direction(self):
|
||||||
|
"""returns '-' for outbound, and '+' for inbound.
|
||||||
|
|
||||||
|
As it's a card, we have only one column: debits.
|
||||||
|
"""
|
||||||
|
return "-"
|
||||||
|
|
||||||
|
|
||||||
|
class CardLineDebit(CardLine):
|
||||||
|
PATTERN = re.compile(
|
||||||
|
rf"\s+A\s+VOTRE\s+DEBIT\s+LE\s+{DATE_RE}\s+(?P<amount>[0-9.,]+)$"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class CardLineDebitWithFrancs(CardLineDebit):
|
||||||
|
PATTERN = re.compile(
|
||||||
|
rf"\s+A\s+VOTRE\s+DEBIT\s+LE\s+{DATE_RE}\s+"
|
||||||
|
rf"(?P<amount>[0-9.,]+)\s+(?P<debit_francs>[0-9.,]+)$"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class CardLineWithFrancs(CardLine):
|
||||||
|
"""Represents one line (debit or credit) in a card statement."""
|
||||||
|
|
||||||
|
PATTERN = re.compile(
|
||||||
|
rf"\s*(?P<date>{DATE_RE})\s+CARTE\s+(?P<label>.*)\s+"
|
||||||
|
rf"(?P<amount>[0-9.,]+)\s+(?P<amount_francs>[0-9.,]+)$"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class Statement:
|
||||||
|
"""Represents a bank account statement."""
|
||||||
|
|
||||||
|
LineImpl = Line
|
||||||
|
|
||||||
|
def __init__(self, filename, text, headers, **kwargs):
|
||||||
|
self.filename = filename
|
||||||
|
self.text = text
|
||||||
|
self.headers = headers
|
||||||
|
self.lines = []
|
||||||
|
super().__init__(**kwargs)
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def from_string(cls, string, filename="-"):
|
||||||
|
"""Builds a statement from a string, usefull for tests purposes."""
|
||||||
|
headers = cls._parse_header(string, filename)
|
||||||
|
if headers.get("card_number"):
|
||||||
|
self = CardStatement(filename=filename, text=string, headers=headers)
|
||||||
|
else:
|
||||||
|
self = AccountStatement(filename=filename, text=string, headers=headers)
|
||||||
|
self._parse()
|
||||||
|
return self
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def from_pdf(cls, filename):
|
||||||
|
"""Builds a statement from a PDF file."""
|
||||||
|
buf = []
|
||||||
|
for page in PdfReader(filename).pages:
|
||||||
|
try:
|
||||||
|
buf.append(
|
||||||
|
page.extract_text(extraction_mode="layout", orientations=[0])
|
||||||
|
)
|
||||||
|
except AttributeError:
|
||||||
|
# Maybe just a blank page
|
||||||
|
pass # logger.exception("while parsing PDF %s", filename)
|
||||||
|
return cls.from_string("\n".join(buf), filename)
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def _parse_header(cls, text: str, filename: str) -> dict:
|
||||||
|
headers = {}
|
||||||
|
for text_line in text.splitlines():
|
||||||
|
if values := re.match(HEADER_VALUE_PATTERN, text_line, re.VERBOSE):
|
||||||
|
headers["emit_date"] = dt.datetime.strptime(
|
||||||
|
values["date"], "%d/%m/%Y"
|
||||||
|
).date()
|
||||||
|
headers["date"] = (
|
||||||
|
dt.datetime.strptime(values["periode"].split()[-1], "%d/%m/%Y")
|
||||||
|
.date()
|
||||||
|
.replace(day=1)
|
||||||
|
)
|
||||||
|
headers["RIB"] = re.sub(r"\s+", " ", values["RIB"])
|
||||||
|
headers["devise"] = values["devise"]
|
||||||
|
headers["card_number"] = values["card_number"]
|
||||||
|
break
|
||||||
|
else:
|
||||||
|
logger.warning("Cannot find header values in %s.", filename)
|
||||||
|
return {}
|
||||||
|
return headers
|
||||||
|
|
||||||
|
def _parse_lines(self):
|
||||||
|
current_line = None
|
||||||
|
for text_line in self.text.splitlines():
|
||||||
|
line = self.LineImpl(self, text_line)
|
||||||
|
if line.match:
|
||||||
|
if current_line:
|
||||||
|
self.lines.append(current_line)
|
||||||
|
current_line = line
|
||||||
|
elif current_line:
|
||||||
|
current_line.add_description(text_line)
|
||||||
|
if current_line:
|
||||||
|
self.lines.append(current_line)
|
||||||
|
|
||||||
|
def __str__(self):
|
||||||
|
buf = [f"Date: {self.headers['date']}", f"RIB: {self.headers['RIB']}"]
|
||||||
|
for line in self.lines:
|
||||||
|
buf.append(str(line))
|
||||||
|
return "\n".join(buf)
|
||||||
|
|
||||||
|
|
||||||
|
class AccountStatement(Statement):
|
||||||
|
LineImpl = AccountLine
|
||||||
|
|
||||||
|
def __init__(self, **kwargs):
|
||||||
|
self.balance_before = Decimal(0)
|
||||||
|
self.balance_after = Decimal(0)
|
||||||
|
super().__init__(**kwargs)
|
||||||
|
|
||||||
|
def validate(self):
|
||||||
|
"""Consistency check.
|
||||||
|
|
||||||
|
It just verifies that all the lines sum to the right total.
|
||||||
|
"""
|
||||||
|
computed = sum(line.value for line in self.lines)
|
||||||
|
if self.balance_before + computed != self.balance_after:
|
||||||
|
raise ValueError(
|
||||||
|
f"Inconsistent total, found: {self.balance_before + computed!r}, "
|
||||||
|
f"expected: {self.balance_after!r} in {self.filename}."
|
||||||
|
)
|
||||||
|
|
||||||
|
def _parse(self):
|
||||||
|
self._parse_soldes()
|
||||||
|
self._parse_lines()
|
||||||
|
|
||||||
|
def _parse_soldes(self):
|
||||||
|
for text in self.text.splitlines():
|
||||||
|
line = BalanceBeforeLine(self, text)
|
||||||
|
if line.match:
|
||||||
|
self.balance_before = line.value
|
||||||
|
line = BalanceAfterLine(self, text)
|
||||||
|
if line.match:
|
||||||
|
self.balance_after = line.value
|
||||||
|
|
||||||
|
def pretty_print(self):
|
||||||
|
table = Table(title=str(self.filename))
|
||||||
|
table.add_column("Date")
|
||||||
|
table.add_column("RIB")
|
||||||
|
table.add_row(str(self.headers["date"]), self.headers["RIB"])
|
||||||
|
Console().print(table)
|
||||||
|
|
||||||
|
table = Table()
|
||||||
|
table.add_column("Label", justify="right", style="cyan", no_wrap=True)
|
||||||
|
table.add_column("Value", style="magenta")
|
||||||
|
for line in self.lines:
|
||||||
|
table.add_row(line.label, str(line.value))
|
||||||
|
|
||||||
|
Console().print(table)
|
||||||
|
|
||||||
|
|
||||||
|
class CardStatement(Statement):
|
||||||
|
LineImpl = CardLine
|
||||||
|
|
||||||
|
def __init__(self, **kwargs):
|
||||||
|
self.card_debit = Decimal(0)
|
||||||
|
super().__init__(**kwargs)
|
||||||
|
|
||||||
|
def validate(self):
|
||||||
|
"""Consistency check.
|
||||||
|
|
||||||
|
It just verifies that all the lines sum to the right total.
|
||||||
|
"""
|
||||||
|
computed = sum(line.value for line in self.lines)
|
||||||
|
if computed != self.card_debit:
|
||||||
|
raise ValueError(
|
||||||
|
f"Inconsistent total, found: {computed!r}, "
|
||||||
|
f"expected: {self.card_debit!r} in {self.filename}."
|
||||||
|
)
|
||||||
|
|
||||||
|
def _parse(self):
|
||||||
|
self._parse_card_owner()
|
||||||
|
self._parse_card_debit()
|
||||||
|
self._parse_lines()
|
||||||
|
|
||||||
|
def _parse_card_debit(self):
|
||||||
|
for text in self.text.splitlines():
|
||||||
|
line = CardLineDebitWithFrancs(self, text)
|
||||||
|
if line.match:
|
||||||
|
self.card_debit = line.value
|
||||||
|
self.LineImpl = CardLineWithFrancs
|
||||||
|
return
|
||||||
|
line = CardLineDebit(self, text)
|
||||||
|
if line.match:
|
||||||
|
self.card_debit = line.value
|
||||||
|
return
|
||||||
|
|
||||||
|
def _parse_card_owner(self):
|
||||||
|
for pattern in RE_CARD_OWNER:
|
||||||
|
if match := pattern.search(self.text):
|
||||||
|
self.headers["card_owner"] = re.sub(r"\s+", " ", match["porteur"])
|
||||||
|
break
|
||||||
|
|
||||||
|
def pretty_print(self):
|
||||||
|
table = Table(title=str(self.filename))
|
||||||
|
table.add_column("Date")
|
||||||
|
table.add_column("RIB")
|
||||||
|
table.add_column("Card number")
|
||||||
|
table.add_column("Card debit")
|
||||||
|
table.add_column("Card owner")
|
||||||
|
table.add_row(
|
||||||
|
str(self.headers["date"]),
|
||||||
|
self.headers["RIB"],
|
||||||
|
self.headers["card_number"],
|
||||||
|
str(self.card_debit),
|
||||||
|
self.headers["card_owner"],
|
||||||
|
)
|
||||||
|
Console().print(table)
|
||||||
|
|
||||||
|
table = Table()
|
||||||
|
table.add_column("Label", justify="right", style="cyan", no_wrap=True)
|
||||||
|
table.add_column("Value", style="magenta")
|
||||||
|
for line in self.lines:
|
||||||
|
table.add_row(line.label, str(line.value))
|
||||||
|
|
||||||
|
Console().print(table)
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
args = parse_args()
|
||||||
|
|
||||||
|
logging.getLogger("pypdf._text_extraction._layout_mode._fixed_width_page").setLevel(
|
||||||
|
logging.ERROR
|
||||||
|
)
|
||||||
|
|
||||||
|
for file in args.files:
|
||||||
|
statement = Statement.from_pdf(file)
|
||||||
|
if args.debug:
|
||||||
|
rich_print(Panel(statement.text))
|
||||||
|
statement.pretty_print()
|
||||||
|
statement.validate()
|
||||||
|
|
||||||
|
|
||||||
|
def parse_args():
|
||||||
|
import argparse
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument("-d", "--debug", action="store_true")
|
||||||
|
parser.add_argument("files", nargs="*", type=Path)
|
||||||
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
|
@ -0,0 +1,22 @@
|
||||||
|
[build-system]
|
||||||
|
requires = ["flit_core >=3.2,<4"]
|
||||||
|
build-backend = "flit_core.buildapi"
|
||||||
|
|
||||||
|
[project]
|
||||||
|
name = "boursobank"
|
||||||
|
authors = [{name = "Julien Palard", email = "julien@palard.fr"}]
|
||||||
|
license = {file = "LICENSE"}
|
||||||
|
classifiers = ["License :: OSI Approved :: MIT License"]
|
||||||
|
dynamic = ["version", "description"]
|
||||||
|
dependencies = [
|
||||||
|
"pypdf",
|
||||||
|
"rich",
|
||||||
|
]
|
||||||
|
|
||||||
|
[project.scripts]
|
||||||
|
boursobank = "boursobank:main"
|
||||||
|
|
||||||
|
[project.urls]
|
||||||
|
Home = "https://git.afpy.org/mdk/boursobank"
|
||||||
|
|
||||||
|
[tool.black]
|
|
@ -0,0 +1,88 @@
|
||||||
|
"""Simple non-regression tests for the Statement PDF parser.
|
||||||
|
|
||||||
|
It's possible to drop some PDF files in the test directory to run some
|
||||||
|
tests against them too.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import datetime as dt
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
from boursobank import Statement, CardStatement, AccountStatement
|
||||||
|
|
||||||
|
|
||||||
|
def test_parse_header_2012():
|
||||||
|
"""Test parsing an old format of headers where the date can have a
|
||||||
|
single digit.
|
||||||
|
"""
|
||||||
|
statement = Statement.from_string(
|
||||||
|
"""
|
||||||
|
...
|
||||||
|
1/02/2012 12345 12345 00000000000 99 EUR 31/12/2011\
|
||||||
|
31/01/2012 1.000,00 € 0,000000 % 1
|
||||||
|
...
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
assert statement.headers["date"] == dt.date(2012, 1, 1)
|
||||||
|
assert isinstance(statement, AccountStatement)
|
||||||
|
|
||||||
|
|
||||||
|
def test_parse_header_cb():
|
||||||
|
"""Test parsing a Bank Card statement header (with a bank card number in it)."""
|
||||||
|
|
||||||
|
statement = Statement.from_string(
|
||||||
|
"""
|
||||||
|
...
|
||||||
|
28/02/2024 12345 12345 00000000000 99 4810********9999 \
|
||||||
|
du 30/01/2024 au 27/02/2024 1/2
|
||||||
|
...
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
assert statement.headers["date"] == dt.date(2024, 2, 1)
|
||||||
|
assert isinstance(statement, CardStatement)
|
||||||
|
|
||||||
|
|
||||||
|
def test_parse_cb_line():
|
||||||
|
"""Test parsing a CB line which contains the label AFTER the value date."""
|
||||||
|
|
||||||
|
statement = Statement.from_string(
|
||||||
|
"""
|
||||||
|
28/02/2024 12345 12345 00000000000 99 4810********9999 du \
|
||||||
|
30/01/2024 au 27/02/2024 1/2
|
||||||
|
12/02/2024 CARTE 10/02/24 PHOTOMATON 8,00
|
||||||
|
...
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
assert statement.lines
|
||||||
|
assert statement.lines[0].value == -8
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("pdf", list(Path(__file__).parent.glob("*.pdf")))
|
||||||
|
def test_cb_consistency_from_files(pdf):
|
||||||
|
"""Test PDF files in the tests/ directory (place them yourself, there's not)."""
|
||||||
|
statement = Statement.from_pdf(pdf)
|
||||||
|
if not isinstance(statement, CardStatement):
|
||||||
|
return
|
||||||
|
found = statement.card_debit
|
||||||
|
computed = sum(line.value for line in statement.lines)
|
||||||
|
assert (
|
||||||
|
found == computed
|
||||||
|
), f"Inconsistent total, found: {found!r}, computed: {computed!r}"
|
||||||
|
|
||||||
|
|
||||||
|
def test_old_owner():
|
||||||
|
statement = Statement.from_string(
|
||||||
|
"""
|
||||||
|
EN CAS DE PERTE OU DE VOL
|
||||||
|
- Appelez le Centre d'opposition au 09 77 40 10 08
|
||||||
|
pour une Carte VISA Classic, au 04 42 60 53 44
|
||||||
|
pour une Carte VISA Premier.
|
||||||
|
- Faites une déclaration au Commissariat de Police
|
||||||
|
- Confirmez par courrier à Boursorama Banque, Service Client
|
||||||
|
44 rue Traversiere CS 80134 92772 Boulogne-Billancourt Cedex THE OWNER IS HERE
|
||||||
|
|
||||||
|
28/02/2024 12345 12345 00000000000 99 4810********9999 du \
|
||||||
|
30/01/2024 au 27/02/2024 1/2
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
assert statement.headers["card_owner"] == "THE OWNER IS HERE"
|
Loading…
Reference in New Issue