TS/SCI Required, poly is a plus

Hybrid with 2-3 days of flexibility per week

Some examples of initiatives you might be a part of include:

Re-engineering the deployment strategy to support air-gapped systems.
Design solutions for stress testing and benchmarking candidate versions ahead of customer release.
Create internal tools to track code quality metrics, static analysis and potential vulnerabilities.

Key Responsibilities:

End-to-End Platform Design: Lead the design and development of highly reliable and scalable hosting platforms across both public and private cloud environments.
Kubernetes Environments: Deploy, manage, and scale Kubernetes clusters to ensure seamless orchestration of containerized applications and services.
Infrastructure & Performance: Ensure our infrastructure delivers services with high availability and performance that our customers depend on, addressing system bottlenecks and implementing optimization solutions.
Monitoring & Alerting: Implement comprehensive monitoring and alerting solutions to manage system health and ensure smooth operation at scale.
Site Reliability Engineering (SRE) Practices: Participate in and promote a culture of SRE best practices, including defining and refining service-level objectives (SLOs).
Stability & Scalability: Work closely with cross-functional teams to drive optimization efforts and ensure system reliability across the full stack as we scale.
Incident Response: Lead troubleshooting efforts, conduct root cause analyses, and develop preventive measures to enhance system reliability.
Development Influence: Use performance data and SLOs to influence development roadmaps and ensure alignment with long-term goals.
Collaboration: Collaborate across engineering teams to ensure infrastructure supports both feature development and scaling needs effectively.

Qualifications:

Experience: 5+ years of experience in software engineering, with a strong focus on system stability, performance optimization, and infrastructure management.
Technical Expertise: Proficiency in C++ or Go preferred, as well as familiarity with cloud platforms such as GCP or AWS. Familiarity with Kubernetes is a plus.
Experience: Hands on experience with performance, large scale systems data analysis, visualization tools, or debugging.
Monitoring Tools: Experience with monitoring and alerting tools such as Prometheus, Grafana or related tools.
Adaptability: Proven ability to thrive in a fast-paced, high-growth environment. Comfortable with evolving requirements.
Problem-Solving: Strong analytical and problem-solving skills, with a track record of effectively diagnosing and resolving complex issues.
Communication: Excellent verbal and written communication skills, with the ability to convey technical concepts to diverse audiences.

Platforms Infrastructure Software Engineer

Subscribe to get the latest jobs posted