Professional work and what I do for a living

I enjoy building systems that feel simple, reliable, and thoughtful. Most of my work revolves around understanding real-world behavior and designing solutions that stay practical and predictable.

Work

Tapsy Mobile App

Tapsy began as a thought experiment to understand how quickly a modern mobile app can be built and released using today’s tooling. The intention was not to create a feature-rich product, but to observe the full journey from idea to production under realistic constraints.

The core application was built in roughly three hours using React Native, with Cursor and ChatGPT assisting in scaffolding and iteration. Writing the code itself was surprisingly straightforward — modern frameworks and AI tools have made shipping functionality faster than ever.

What turned out to be significantly harder were the things around the code. App store compliance, Expo configuration, signing, environment setup, and platform-specific checks consumed far more time than development itself. This gap between “working code” and “production-ready software” was very noticeable.

Beyond release mechanics, the real complexity surfaced in non-functional concerns: observability, security, secrets management, and defining sensible reliability boundaries. These aspects are rarely visible in demos, but they ultimately determine whether a system is safe, operable, and sustainable.

The experiment reinforced a lesson I’ve seen repeatedly in larger systems as well: shipping features is increasingly easy, but building software that can be trusted, observed, and operated in the real world is where most of the engineering effort still lives.

The source code for this experiment is available on GitHub: github.com/sacssuresh/Tapsy

Writing

Observability vs Monitoring

Monitoring tells you what is happening. Observability helps you understand why it’s happening.

Monitoring is tool-driven — dashboards, alerts, metrics. Observability is behaviour-driven — signals, traces, and context that reveal how a system behaves under real-world conditions.

How do you set up effective monitoring?

In my experience, the foundation of monitoring must be driven by a senior architect or a distinguished engineer — not delegated downward. Proper monitoring is a first-class architecture concern.

Step 1 — Define SLAs, SLOs, and Error Budgets
Sit down and write your service-level objectives clearly. Define acceptable latency, slowness, and failure windows. These numbers drive everything else.

Example: A user-facing authentication service needs tighter alert boundaries than a notification service. A missed login breaks the user journey; a delayed notification does not.

Step 2 — Socialize shared language
Teams should deeply understand terms like error budget, acceptable downtime, and recovery time. Without common language, monitoring becomes inconsistent across services.

Step 3 — Prefer percentages, not raw counts
“Error rate > 5%” is correct. “Error count > 10” is not. Percentages scale with traffic; raw counts don’t.

Step 4 — Avoid unnecessary alerts
Alert fatigue is very real. I’ve seen on-call engineers and escalation leads get woken up at midnight for alerts that were simply misconfigured.

Example: Firing an alert for “error increase in last 15 minutes” is misleading. What you want is a sustained error rate increase, not small pockets of failure due to natural downstream variance, throttling, or partner system hiccups.

Modern Tooling

Modern observability platforms offer anomaly detection, pattern-based alerting, and AI-assisted insights. But the real value still comes from choosing an architecture that fits your system’s nature, scale, and cost boundaries.

What I Prefer

I like OpenTelemetry for tracing, Prometheus for metrics, and Elastic for logs. It’s a modern, clean, vendor-neutral setup — but cost remains the invisible architecture decision-maker.

Over the years, I’ve also used AppDynamics, Datadog, and Splunk. All are excellent tools if cost is not your top concern.

Cursor vs Other AI coding assistants

I will update my experience on these tools in this section...

Event-Driven Architecture

Why predictable asynchronous flows matter.