GSK is a global biopharma company focused on getting ahead of disease through innovative medicines and vaccines. They are seeking an AIML Optimization Engineer to optimize and enhance their Compute and AIML platforms, ensuring high performance and scalability for scientific workflows.
Serve as a key engineer for the optimization team and contribute technical expertise to teams in closely aligned technical areas such as DevOps, Cloud and Infrastructure
Accountable for delivery of scalable solutions to the Compute and AIML Platforms that supports the entire application lifecycle (interactive development and explorations/analysis, scalable batch processing, application deployment) with particular focus on performance at scale
Partner with both AIML and Compute platform teams as well as scientific users to help optimize and scale scientific workflows by utilizing deep understanding of both software as well as underlying infrastructure (networking, storage, GPU architecture)
Participate in scrum team and contribute technical expertise to teams in closely aligned technical areas
Qualification
Required
Bachelor's, Master's or PhD degree in Computer Science, Software Engineering, or related discipline
1+ year experience in industry in software engineering in AIML and MLOps
Experience using one interpreted and one compiled common industry programming language: e.g., Python, C/C++, Scala, Java, including toolchains for documentation, testing, and operations / observability
Preferred
Experience with application performance tuning and optimization, including in parallel and distributed computing paradigms and communication libraries such as MPI, OpenMP, Gloo, including deep understanding of the underlying systems (hardware, networks, storage) and their impact on application performance
Expertise in modern software development tools / ways of working (e.g. git/GitHub, DevOps tools, metrics / monitoring, …)
Cloud expertise (e.g., AWS, Google Cloud, Azure), including infrastructure-as-code tools (Terraform, Ansible, Packer, …) and scalable cloud compute technologies, such as Google Batch and Vertex AI
Understanding of AIML training optimization, including distributed multi-node training best practices and associated tools and libraries as well as hands-on practical experience in accelerating training jobs
Understanding of ML model deployment strategies, including agent systems as well as scalable LLM model inference systems deployed in multi-GPU, multi-node environments
Experience with CI/CD implementations using git and a common CI/CD stack (e.g., Azure DevOps, CloudBuild, Jenkins, CircleCI, GitLab)
Experience with Docker, Kubernetes, and the larger CNCF ecosystem including experience with application deployment tools such as Helm
Experience with low level application builds tools (make, CMake) and understanding of optimization at the build and compile level
Demonstrated excellence with agile software development environments using tools like Jira and Confluence
Benefits
Annual bonus
Eligibility to participate in our share based long term incentive program
Health care and other insurance benefits (for employee and family)
Retirement benefits
Paid holidays
Vacation
Paid caregiver/parental and medical leave
We are uniting science, technology and talent to get ahead of disease together. Our community guidelines: https://GSK.to/socialmedia