Code

Coding Guides

Practical work from the field — built around real problems in data, investigations, and software development.

Featured Scripts

📊excel_ingestor.py

One-call Excel → DataFrame with built-in data cleaning

Loads any Excel sheet into a pandas DataFrame and silently strips the noise: currency symbols, thousand-separator commas, accounting-style negatives, percentage signs, and Excel’s many ways of writing “blank”. Column names get sanitised in the same pass.

  • Single-pass numeric cleaning (currency, %, accounting negatives)
  • Automatic column-name normalisation
  • Structured logging — silent by default, verbose on request
pythonpandasopenpyxl

View on GitHub ↗

🔍sttm_function.py

Source-to-Target Mapping extraction for embedded SQL

Scans Python source for embedded SQL and returns a sorted, deduplicated DataFrame of [Database, Table, Column] lineage — the documentation artefact every AML data engineer ends up building by hand. Built from real work supporting financial-crimes investigations.

  • Regex parser handles JOINs, aliases, multi-database references
  • Deduplicated, sorted output ready for documentation
  • Optional source-SQL and filename columns for traceability
pythonsqlregexdata lineage

View on GitHub ↗

🔐pbkdf2.py

Email pseudonymization for Teradata PII lookups

Generates a deterministic PBKDF2-HMAC-SHA256 hash from an email address — the same scheme used by Teradata databases storing hashed PII. Given a list of plain emails, builds a DataFrame of pseudonymous identifiers ready to query against encrypted-email columns without ever exposing raw data.

  • Deterministic output — same email always produces the same hash
  • Matches existing Teradata hashing scheme exactly
  • Pandas-free core; DataFrame output only when run directly
pythoncryptographyteradatapiidata engineering

View on GitHub ↗

🗄️df2table + df2insert

Create and load Teradata volatile tables from a DataFrame

A two-script workflow for Teradata DBAs and SQL developers. df2table generates a CREATE MULTISET VOLATILE TABLE DDL statement by inferring Teradata types from a pandas DataFrame. df2insert generates the INSERT statements to populate it — as a single batched UNION ALL SELECT or individual VALUES rows.

  • df2table — infers DATE, TIMESTAMP, INTEGER, BIGINT, FLOAT, BYTEINT, VARCHAR by sampling each column
  • df2insert — batched UNION ALL SELECT (default) or individual INSERT VALUES per row
  • Optional PRIMARY INDEX; table and column name validation against Teradata identifier rules
pythonsqlteradataddl

View on GitHub ↗

Areas of Focus

🐍

Python

From scripting and automation to data pipelines and investigation tooling. Practical Python built around real-world problems.

scriptingautomationpandasdata pipelines
🗄️

SQL

Query design, performance tuning, and patterns for financial data — from basic SELECT to complex analytical queries.

queriesanalyticsjoinsaggregations
🔌

APIs & Integration

Building and consuming REST APIs with FastAPI. Authentication, CORS, and connecting systems securely.

RESTFastAPIauthJSON
🔍

Financial Crimes Tech

Technical patterns and tooling for AML, fraud detection, and investigative workflows at scale inside a financial institution.

AMLfrauddata engineeringcompliance

More scripts and write-ups added as I build them. Real problems, real code — pulled from the day-to-day of financial-crimes engineering.