G

Telemetry and Stability Tooling

G-Xchange Inc.
Full-time
On-site
15F W City Center Philippines

Do you want to take the first step in making Filipinos’ lives better everyday? Here in GCash we want to stay at the forefront of the FinTech industry by creating innovative, meaningful, and convenient financial solutions for the nation! G ka ba? Join the G Nation today!

  • Telemetry and Tools Engineering

    • Design and implement telemetry instrumentation across distributed applications.

    • Design, develop, and maintain custom tools, scripts, and services that support operations.

    • Integrate with observability platforms (e.g., Datadog, Prometheus, Grafana, OpenTelemetry).

    • Define and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets.

    • Develop real-time dashboards and alerting frameworks aligned with operational and business KPIs.

    • Standardize telemetry across teams with reusable patterns, naming conventions, and tagging schemas.

    • Instrumentation of services for logs, metrics, and traces.

    • Deployment and configuration of telemetry agents, collectors, exporters, dashboards, and alerting rules.

  • Stability Analysis

    • Analyze incident trends, error budgets, and system health to identify high-risk areas.

    • Work closely with System Owner, Business, SRE, QA, and Dev teams to perform post-incident reviews and root cause analysis.

    • Create and maintain stability scorecards and application health reports.

    • Recommend and track stability engineering tasks such as retries, fallbacks, timeouts, and circuit breakers.

    • Use of telemetry data for:

      • Operational monitoring and performance tuning

      • Incident detection and troubleshooting

      • Root cause analysis and forensics

      • Regulatory compliance reporting

      • Internal and external audits

      • Providing data-driven insights for capacity planning and scaling decisions, using trends and baselines derived from telemetry to proactively recommend infrastructure or service adjustments.

  • Collaboration and Governance

    • Work with teams to integrate telemetry into CI/CD pipelines and deployment gates.

    • Support governance initiatives by enforcing monitoring and alerting compliance.

    • Drive adoption of telemetry standards across product and infrastructure teams.

    • Participate in architectural reviews to ensure observability and stability are considered early.

  • Continuous Improvement

    • Propose improvements to telemetry architecture, tooling, and data ingestion pipelines.

    • Advocate for automation in alert noise reduction, incident tagging, and anomaly detection.

    • Benchmark system reliability and provide recommendations for platform-level enhancements.

    • Maintain Telemetry Standards and Threshold Repository (TSTR) document

What We Offer

Opportunity for career growth and development in the #1 FinTech company in the country Working with a dynamic and highly collaborative team who want to change the game A company that values their people with highly competitive and flexible compensation and benefits package