New RJRP now shows Market-Observed Roles alongside verified postings — scored by our Hiring Activity algorithm. How it works →
🔍
Market-Observed Role 🔍 Observed Likely Active (65-79)
This role was detected through Perplexity's hiring system and hasn't been verified directly by the employer. Our algorithm scored it as Likely Active (65-79) based on freshness, specificity, and company patterns. What does this mean? →

Member of Technical Staff (Data Scientist, Evals)

Perplexity
🔍 Observed
79
Hiring Activity Score
Likely Active (65-79)
  • Base score
  • Posted 1 days ago
  • has location, quality description (2249 chars)
  • 29 new listings in 30d (×0.98 age 1d)
  • 3 skills
  • High confidence (90%)
  • Direct ATS (ashby)
How the Hiring Activity Score works →
London First seen 1 day, 4 hours ago Last seen 4 hours, 45 minutes ago Ashby
Apply on Ashby Search Google for This Role

ATS links often expire — Google search finds the latest posting

Job Description

Perplexity serves tens of millions of users daily with reliable, high-quality answers grounded in an LLM-first search engine and our specialized data sources. We aim to use the latest models as they are released, but the intelligence frontier is a jagged one, and popular benchmarks do not effectively cover our use cases. In this role, you will build specialized evals to improve answer quality across Perplexity, covering search-based LLM answers and other scenarios popular with our users.

RESPONSIBILITIES - Architect and maintain automated evaluation pipelines to assess answer quality across Perplexity's products, ensuring high standards for accuracy and helpfulness - Design evaluation sets and methods specifically to measure the impact of tool calls (particularly web search retrieval) on the final answer's quality - Develop VLM-based solutions to programmatically evaluate how final answers render visually across different platforms and devices - Continuously review public benchmarks and academic evaluations for their applicability to the Perplexity product, adapting and incorporating them into our regular performance measurements - Operate within a small, high-impact team where your evaluation metrics directly shape product changes, collaborating closely with technical leadership to measure and improve Answer Quality QUALIFICATIONS - PhD or MS in a technical field or equivalent experience - 4+ years of experience in data science or machine learning - Strong proficiency in Python and SQL (expected to write production-grade code) - Experience building within a modern cloud data stack, specifically AWS and Databricks - Comfortable with agentic coding workflows and using AI-assisted development tools to iterate faster PREFERRED QUALIFICATIONS - 1+ years of experience working with LLMs at scale, specifically with LLM-as-a-judge setups - Prior experience working on customer-facing web products or consumer apps, with real user traffic at scale - A strong research background, with experience applying research methods to real-world ML problems - Experience defining evaluation metrics (e.g., factual consistency, hallucination rate, retrieval precision) and building ground truth datasets

Skills

sql python aws
Job Information
  • Company:
    Perplexity
  • Location:
    London
  • Job Type:
    Full-Time
  • Work Location:
    Remote
  • Experience Level:
    Senior
  • Source:
    Ashby
  • Status:
    Active
Activity Score
79 /100
Likely Active (79)

Higher scores indicate more likely active hiring based on listing freshness, company activity, and other signals. Learn more →

+
🔍

We now show two types of job listings

Same commitment to real jobs. More opportunities for you. Here's how it works.

✓ Verified Employer-Verified Posts

These jobs were posted directly to RJRP by the employer. The company has been verified through our multi-step process. This is our gold standard — the employer is real, the job is real, and you can apply with confidence.

✓ 100% employer verified
🔍 Observed Market-Observed Roles

These roles were detected through employer hiring systems like Workday. They haven't been verified by the employer directly, so we score each one using our Hiring Activity Score — an algorithm that analyzes freshness, specificity, company hiring patterns, and more to estimate whether the role is actively being filled.

📊 Only high-scoring listings are shown

Our promise hasn't changed. We will never show you a listing we can't stand behind. Market-observed roles must pass our scoring threshold before they appear on RJRP. Anything that looks like a ghost job, a talent pipeline, or a dead listing gets filtered out — you'll never see it.