Product Reliability Engineer - Defense
Palantir
- Base score
- Posted 1 days ago
- has location, quality description (3725 chars)
- 238 new listings in 30d (×0.98 age 1d)
- 7 skills
- Low confidence (30%)
- Direct ATS (lever)
ATS links often expire — Google search finds the latest posting
Job Description
A World-Changing Company Palantir builds the world’s leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more. The Role Product Reliability Engineers (PREs) are responsible for the health, performance, and stability of the services that power services at Palantir.
PREs take ownership over the entire end-to-end cycle of service reliability, from responding to outages to improving codebases and building lasting solutions. You will tackle critical issues for key customers, introduce observability into complex systems, address tech debt in essential codebases, and inform strategic investments in core products. We are looking for engineers who enjoy deep-dive troubleshooting, feel strong ownership over the problems they encounter, and recognize the urgency of customer-facing outages.
PREs spend the majority of their time on forward-looking product work, including but not limited to, infrastructure migrations, product contributions to improve stability and observability, and codebase enhancements that increase resilience. During periodic on-call shifts, we respond to automated alerts, investigate issues reported by customers, and share technical expertise with adjacent product teams. Whatever the technical issue or question about your service is, you'll play a central and critical role in resolving it, seeking not just a one-time fix, but a permanent solution.
We provide new team members with an experienced mentor and a clear onboarding framework to set them up for success in the role. Core Responsibilities Continuously invest in documentation, metrics, monitors and other troubleshooting tools Participate in on-call rotations during business hours and occasional weekends. This is a challenging yet rewarding opportunity to help remediate the most pressing issues across the Palantir fleet.
Diagnose, resolve, and prevent issues encountered in the field. Deliver end-to-end improvements to core products based on these issues you encounter in the field. Improve observability by refactoring codepaths and introducing telemetry Identify and implement data-driven opportunities for improved service resilience Develop strategic opinions on stability investments and inform the vision for long-term product stability What We Value Comfortable with and curious about large scale production systems and technologies.
For example, load balancing, monitoring, distributed systems, and configuration management. Confidence in troubleshooting complex issues independently using observability tools and stack traces Familiarity with monitoring tools such as Prometheus and health checks Experience coding with Java, Go and/or web technologies (e.g. HTML, CSS, JavaScript, Python/Ruby, Django/Flask/Ruby on Rails, etc.) is a plus Track record of identifying bugs in codebases and contributing fixes leading to long term service stability Demonstrated ability making data-driven decisions and engaging with stakeholders on strategy What We Require Engineering background in Computer Science, Mathematics, Software Engineering, Physics or similar field Ability to work with a high degree of ownership and a strong sense of urgency in a dynamic environment Experience producing code in backend languages such as Java, as part of a past role or personal projects Familiarity with storage and data processing systems and cloud infrastructure Strong written and verbal communication and ability to iterate quickly with teammates and incorporate feedback Eligibility and willingness to obtain a US Security clearance
Skills
Quick Actions
Job Information
-
Company:
Palantir -
Location:
New York, NY -
Job Type:
Full-Time -
Experience Level:
Mid -
Source:
Lever -
Status:
Active
Activity Score
Higher scores indicate more likely active hiring based on listing freshness, company activity, and other signals. Learn more →
More from Palantir
-
International Payroll Analyst
London, United Kingdom -
Incident Management Engineer
London, United Kingdom -
Year at Palantir - Software Engineer, Internship
New York, NY -
Year at Palantir - Forward Deployed Software Engineer, Internship - USG
New York, NY -
Year at Palantir - Forward Deployed Software Engineer, Internship - Commercial
New York, NY