Arthur Grand Technologies Inc is a staffing and recruiting company.
Site Reliability Engineer (SRE) (multiple openings)
Location: Plano, TX (Need Day 1 onsite or within 1 or 2 months)
.
Required skills/experience:
Looking for candidates with more Software Engineering skills (specifically containerization and micro-services architecture on AWS) with Systems Engineering experience where they implemented/built Observability layer focusing on Telemetry and Monitoring. We are looking at the following stack at the moment:
• Loki for Logs
• Prometheus for Metrics
• Tempo for Tracing
• Grafana (with PromQL) for Visualization
Some additional tools that will be very helpful:
• Splunk
• DynaTrace / AppDynamics
• Terraform
• Chef / Puppet / Ansible