Member of Technical Staff (Data Scientist, Evals)
Perplexity
- Base score
- Posted 1 days ago
- has location, quality description (2249 chars)
- 29 new listings in 30d (×0.98 age 1d)
- 3 skills
- High confidence (90%)
- Direct ATS (ashby)
ATS links often expire — Google search finds the latest posting
Job Description
Perplexity serves tens of millions of users daily with reliable, high-quality answers grounded in an LLM-first search engine and our specialized data sources. We aim to use the latest models as they are released, but the intelligence frontier is a jagged one, and popular benchmarks do not effectively cover our use cases. In this role, you will build specialized evals to improve answer quality across Perplexity, covering search-based LLM answers and other scenarios popular with our users.
RESPONSIBILITIES - Architect and maintain automated evaluation pipelines to assess answer quality across Perplexity's products, ensuring high standards for accuracy and helpfulness - Design evaluation sets and methods specifically to measure the impact of tool calls (particularly web search retrieval) on the final answer's quality - Develop VLM-based solutions to programmatically evaluate how final answers render visually across different platforms and devices - Continuously review public benchmarks and academic evaluations for their applicability to the Perplexity product, adapting and incorporating them into our regular performance measurements - Operate within a small, high-impact team where your evaluation metrics directly shape product changes, collaborating closely with technical leadership to measure and improve Answer Quality QUALIFICATIONS - PhD or MS in a technical field or equivalent experience - 4+ years of experience in data science or machine learning - Strong proficiency in Python and SQL (expected to write production-grade code) - Experience building within a modern cloud data stack, specifically AWS and Databricks - Comfortable with agentic coding workflows and using AI-assisted development tools to iterate faster PREFERRED QUALIFICATIONS - 1+ years of experience working with LLMs at scale, specifically with LLM-as-a-judge setups - Prior experience working on customer-facing web products or consumer apps, with real user traffic at scale - A strong research background, with experience applying research methods to real-world ML problems - Experience defining evaluation metrics (e.g., factual consistency, hallucination rate, retrieval precision) and building ground truth datasets
Skills
Quick Actions
Job Information
-
Company:
Perplexity -
Location:
London -
Job Type:
Full-Time -
Work Location:
Remote -
Experience Level:
Senior -
Source:
Ashby -
Status:
Active
Activity Score
Higher scores indicate more likely active hiring based on listing freshness, company activity, and other signals. Learn more →
More from Perplexity
-
Product Manager (Builder)
San Francisco -
Internship - Search Machine Learning Engineer
London -
Member of Technical Staff (Software Engineer, Monetization)
San Francisco -
IT Systems Administrator
San Francisco -
Member of Technical Staff (Data Scientist/Engineer, Online Metrics)
New York City