Theo Marchetti
Staff platform engineer · reliability and infrastructure
theo@marchetti.devBoston, MAmarchetti.devgithub.com/tmarchetti
Summary
Eleven years on platform teams at small-to-mid-sized companies. Last four years split between incident response and the infrastructure work that prevents repeat incidents. Looking for a staff or principal role where the platform is the product.
Experience
Staff Platform Engineer
Rivermark • Boston, MA
Jan 2023 — Present
- Cut p99 API latency from 940ms to 280ms by replacing a synchronous billing webhook fanout with a queued retry layer; landed inside one quarter against an original two-quarter estimate.
- Owned the on-call rotation rebuild, wrote the runbooks, ran the post-incident reviews, and cut average mean-time-to-resolve from 47 minutes to 12 over six months.
- Led the migration of a 240-instance fleet from EC2 to ECS Fargate, retiring two pager-rotation roles in the process.
- Mentored the two senior engineers who took over the on-call captaincy; both promoted to staff within a year.
Senior Site Reliability Engineer
Honeycomb • Remote
Mar 2020 — Dec 2022
- Built the internal observability layer the customer-facing product team now uses; reduced time-to-detection on customer-impacting issues from 22 minutes to under 4.
- Authored the team's incident review template; still in use, picked up by two acquirers during the M&A diligence cycle.
- Ran the reliability hiring pipeline through 2021, interviewed 90+ candidates, made eight hires, two still on the team today.
Software Engineer
Stitch Fix • San Francisco, CA
Aug 2016 — Feb 2020
- Shipped the inventory-allocation service that replaced a nightly Spark job with a sub-second API; the service is still in production seven years later.
- Co-led the platform team's adoption of Kubernetes, wrote the migration playbook, ran the workshops, kept the rollout to a single quarter.
Education
Carnegie Mellon University
BS • Computer Science
2014