Code
Coding Guides
Practical work from the field — built around real problems in data, investigations, and software development.
Featured Scripts
excel_ingestor.pyOne-call Excel → DataFrame with built-in data cleaning
Loads any Excel sheet into a pandas DataFrame and silently strips the noise: currency symbols, thousand-separator commas, accounting-style negatives, percentage signs, and Excel’s many ways of writing “blank”. Column names get sanitised in the same pass.
- →Single-pass numeric cleaning (currency, %, accounting negatives)
- →Automatic column-name normalisation
- →Structured logging — silent by default, verbose on request
View on GitHub ↗
sttm_function.pySource-to-Target Mapping extraction for embedded SQL
Scans Python source for embedded SQL and returns a sorted, deduplicated DataFrame of [Database, Table, Column] lineage — the documentation artefact every AML data engineer ends up building by hand. Built from real work supporting financial-crimes investigations.
- →Regex parser handles JOINs, aliases, multi-database references
- →Deduplicated, sorted output ready for documentation
- →Optional source-SQL and filename columns for traceability
View on GitHub ↗
pbkdf2.pyEmail pseudonymization for Teradata PII lookups
Generates a deterministic PBKDF2-HMAC-SHA256 hash from an email address — the same scheme used by Teradata databases storing hashed PII. Given a list of plain emails, builds a DataFrame of pseudonymous identifiers ready to query against encrypted-email columns without ever exposing raw data.
- →Deterministic output — same email always produces the same hash
- →Matches existing Teradata hashing scheme exactly
- →Pandas-free core; DataFrame output only when run directly
View on GitHub ↗
df2table + df2insertCreate and load Teradata volatile tables from a DataFrame
A two-script workflow for Teradata DBAs and SQL developers. df2table generates a CREATE MULTISET VOLATILE TABLE DDL statement by inferring Teradata types from a pandas DataFrame. df2insert generates the INSERT statements to populate it — as a single batched UNION ALL SELECT or individual VALUES rows.
- →df2table — infers DATE, TIMESTAMP, INTEGER, BIGINT, FLOAT, BYTEINT, VARCHAR by sampling each column
- →df2insert — batched UNION ALL SELECT (default) or individual INSERT VALUES per row
- →Optional PRIMARY INDEX; table and column name validation against Teradata identifier rules
View on GitHub ↗
Areas of Focus
Python
From scripting and automation to data pipelines and investigation tooling. Practical Python built around real-world problems.
SQL
Query design, performance tuning, and patterns for financial data — from basic SELECT to complex analytical queries.
APIs & Integration
Building and consuming REST APIs with FastAPI. Authentication, CORS, and connecting systems securely.
Financial Crimes Tech
Technical patterns and tooling for AML, fraud detection, and investigative workflows at scale inside a financial institution.
More scripts and write-ups added as I build them. Real problems, real code — pulled from the day-to-day of financial-crimes engineering.