Google is a leading technology company that develops next-generation technologies for billions of users. They are seeking a Software Engineer to design, develop, and enhance software solutions for distributed debugging of numerical issues in large-scale production machine learning workloads.
Design and build tooling that enables distributed debugging of numerical issues in large-scale production ML workloads
Identify sources of numerical instability in production ML workloads, and design and implement software changes that eliminate or mitigate the instability
Work with modeling teams to root-cause and fix numerical problems in their ML models
Improve understanding of numerics in ML models by analyzing and documenting compiler and hardware internals that affect numerics
Qualification
Required
Bachelor's degree or equivalent practical experience
2 years of experience with software development in the C++ programming language
2 years of experience building developer tools (e.g., compilers, automated releases, code design and testing, test automation frameworks)
2 years of experience with developing large-scale machine learning infrastructure, distributed systems or networks
Preferred
Master's degree or PhD in Computer Science or related technical fields
2 years of experience with data structures and algorithms
Experience developing ML compilers or low-level debugging tools
Experience with machine learning frameworks such as JAX
Experience with numerical issues such as floating point, numerical stability, error analysis, etc
Benefits
Bonus
Equity
Benefits
Google specializes in internet-related services and products, including search, advertising, and software. It is a sub-organization of Alphabet.