Software · Systems · Practice

I build reliable, low-latency systems and clear interfaces—measured by telemetry, not vibes.

Senior software engineer shipping data-centric products at large scale across operations, risk, and data platforms. Focus: event-driven services, workflow engines, performance harnesses, and developer experience. Certificates: Applied Data Science (MIT) and Applied Machine Learning (Columbia).

About

I design for observability, explicit contracts, and fast feedback. Default toolkit: event streams, reproducible performance tests, and built-in recovery paths. Most wins come from cutting moving parts and tightening loops between code, metrics, and people.

Event-driven architectures Sub-second E2E delivery Workflow engines On-call calmness DX & docs that teach

10+ yrs building data-centric systems

Ops · Risk · Data high-traffic platforms

Selected Work

Real-time event pipeline for customer operations — sub-second E2E, strict SLAs, benchmarking harness

Built a Kafka-backed stream to track live interactions across client/server/vendor layers; introduced a performance harness that kept latency within SLA and reduced drift at the tails. Outcome: faster triage and steadier operations during peaks. (Large-scale production)

Risk workflow engine — composable rules + model hooks + full audit trail

Unified legacy and modern inputs into a rule graph with model inlines and end-to-end auditability; reduced false positives via better triage and feature routing; shipped iteratively with product and QA. (Enterprise platform)

Data platform migration & self-service tooling — governed access, onboarding portal, catalogs

Migrated diverse data sets to a common platform; delivered self-service onboarding and data services for analytics users. Tech included distributed storage, stream processing, and policy enforcement. (Multi-team program)

Real-time telemetry dashboard — health, traces, calmer incident response

Built a live troubleshooting surface for health and traces; paired dashboards with playbooks so the “what now?” was obvious. Result: faster, calmer incidents. (Operational tooling)

DXPerformance harness — reproducible load tests + golden signals + CLI so anyone can run them.
DevInternal demos & workshops — taught telemetry, budgets, and failure drills across teams.

Principles

Make state observable

Metrics, logs, and traces first. If we can’t see it, we can’t improve it.

Small, sharp interfaces

Keep contracts narrow and explicit. Compatibility beats cleverness.

Recovery is a feature

Design steady-state and failure modes separately; drill the handoffs.

Cut latency, not corners

Measure p50–p99.9. Budgets and alerts keep drift honest.

Capabilities

Languages & Frameworks

Java (Spring), Python & Shell for tooling, REST APIs, microservices.

JavaSpringRESTPythonShell

Data & Streaming

Kafka, Spark, distributed storage, access control; real-time event processing; data warehousing.

KafkaSparkDistributed storageGovernance

Cloud & DevOps

CI/CD, containerized deploys, automated tests, performance & resiliency testing (e.g., load, chaos), QA automation.

CI/CDContainersPerf & ResiliencyQA Automation

Learning & Teaching

Applied Data Science (MIT), Applied Machine Learning (Columbia). Frequent internal talks and code walkthroughs.

MITColumbiaWorkshops

Notes

Short essays I keep to guide builds.

OpsFriction beats features — the right defaults and fewer steps outperform clever options.
TestingGolden signals > dashboards — a few enforced thresholds beat a sea of charts.
DXDocs as UI — onboarding should feel like discovering a tool, not reading homework.

Contact

Reach out

Briefs welcome; constraints appreciated.

Now

Building event pipelines, keeping on-call calm, training strength, and reading widely.

Rock climbingStrengthLongevity