Platforms Infrastructure Software Engineer

TS/SCI Required, poly is a plus

Hybrid with 2-3 days of flexibility per week

Some examples of initiatives you might be a part of include:

  • Re-engineering the deployment strategy to support air-gapped systems.
  • Design solutions for stress testing and benchmarking candidate versions ahead of customer release.
  • Create internal tools to track code quality metrics, static analysis and potential vulnerabilities.

Key Responsibilities:

  • End-to-End Platform Design: Lead the design and development of highly reliable and scalable hosting platforms across both public and private cloud environments.
  • Kubernetes Environments: Deploy, manage, and scale Kubernetes clusters to ensure seamless orchestration of containerized applications and services.
  • Infrastructure & Performance: Ensure our infrastructure delivers services with high availability and performance that our customers depend on, addressing system bottlenecks and implementing optimization solutions.
  • Monitoring & Alerting: Implement comprehensive monitoring and alerting solutions to manage system health and ensure smooth operation at scale.
  • Site Reliability Engineering (SRE) Practices: Participate in and promote a culture of SRE best practices, including defining and refining service-level objectives (SLOs).
  • Stability & Scalability: Work closely with cross-functional teams to drive optimization efforts and ensure system reliability across the full stack as we scale.
  • Incident Response: Lead troubleshooting efforts, conduct root cause analyses, and develop preventive measures to enhance system reliability.
  • Development Influence: Use performance data and SLOs to influence development roadmaps and ensure alignment with long-term goals.
  • Collaboration: Collaborate across engineering teams to ensure infrastructure supports both feature development and scaling needs effectively.

Qualifications:

  • Experience: 5+ years of experience in software engineering, with a strong focus on system stability, performance optimization, and infrastructure management.
  • Technical Expertise: Proficiency in C++ or Go preferred, as well as familiarity with cloud platforms such as GCP or AWS. Familiarity with Kubernetes is a plus.
  • Experience: Hands on experience with performance, large scale systems data analysis, visualization tools, or debugging.
  • Monitoring Tools: Experience with monitoring and alerting tools such as Prometheus, Grafana or related tools.
  • Adaptability: Proven ability to thrive in a fast-paced, high-growth environment. Comfortable with evolving requirements.
  • Problem-Solving: Strong analytical and problem-solving skills, with a track record of effectively diagnosing and resolving complex issues.
  • Communication: Excellent verbal and written communication skills, with the ability to convey technical concepts to diverse audiences.

Apply now

  • Accepted file types: pdf, docx, Max. file size: 2 GB.
  • This field is for validation purposes and should be left unchanged.