Research Software Engineer, Generative AI Evaluations, Health AI

San Francisco, CA

Full-time

Onsite

$141K/yr - $202K/yr

Entry, Mid Level

Google specializes in internet-related services and products, including search, advertising, and software. They are seeking a Research Software Engineer focused on Generative AI Evaluations within Health AI to develop innovative solutions and evaluate health features, collaborating with cross-functional teams to improve AI models and health applications.

Apply Now

Responsibilities

Ensure robust and iterative evaluation of health features, establishing methods for continuous monitoring, feedback loops, and adaptation to new data or clinical insights.
Collaborate with cross-functional teams (e.g., Research, Engineering, Product, Clinical, Legal, Ethics) to define evaluation requirements, interpret results, and drive improvements in AI models and health applications.
Conduct in-depth analysis of AI agent behavior, identify failure modes, and provide actionable insights to improve model performance and safety.
Stay at the forefront of research in AI evaluation, particularly in areas of agent-based systems, human-AI interaction, and the specific issues of evaluating AI in consumer health and wellness.
Develop and advocate best practices, tools, and documentation for AI agent evaluation across health-related projects.

Qualification

Required

Bachelor's degree or equivalent practical experience.
2 years of experience with data structures or algorithms.
Experience in AI/ML research and Generative AI agent development.
One or more scientific publication submission(s) for conferences (eg, CVPR/ECCV/ICCV/NeurIPS/ICLR), journals, or public repositories

Preferred

Master’s degree or PhD in Engineering, Computer Science, a technical related field, or equivalent practical experience.
4 years of experience in AI/ML research or agent development.
Experience leading evaluation efforts for complex AI systems, particularly AI agents or interactive AI.
Understanding of current AI agent architectures (e.g., LLM-based agents, reinforcement learning agents) and their evaluation issues.
Familiarity with the unique tests of evaluating AI in health-related applications, such as safety, need for clinical validation, etc.
Excellent programming skills with Python.