Senior Software Engineer - Infrastructure
Company: Tbwa Chiat/Day Inc
Location: Redwood City
Posted on: January 26, 2025
Job Description:
Senior Software Engineer - InfrastructureHybrid / San Francisco,
CA or Redwood City, CAWe're on a mission to democratize AI by
building the definitive AI data development platform. The AI
landscape has gone through incredible change between 2016, when
Snorkel started as a research project in the Stanford AI Lab, to
the generative AI breakthroughs of today. But one thing has
remained constant: the data you use to build AI is the key to
achieving differentiation, high performance, and production-ready
systems. We work with some of the world's largest organizations to
empower scientists, engineers, financial experts, product creators,
journalists, and more to build custom AI with their data faster
than ever before. Excited to help us redefine how AI is built?
Apply to be the newest Snorkeler!As a Senior Software Engineer on
the Infrastructure team, you'll accelerate the Snorkel AI team and
our customers by improving our developer platform and services for
user and data management across the stack. You'll work closely with
other engineers, researchers, and product management to align on
the highest leverage improvements for CI/CD, cloud infrastructure,
deployment, security, authentication/authorization, and more.Main
Responsibilities
- Design, build, and maintain services and deployment for
Snorkel's enterprise platforms
- Design, build, and improve observability and alerting for
Snorkel's enterprise platforms
- Contribute to Snorkel's inhouse deployment management software
to installation and upgrades of various deployments for Snorkel's
enterprise customers
- Build and maintain Snorkel's production and staging
infrastructure; own our k8s and cloud strategy
- Work closely with various engineering teams in defining test
strategies and build infrastructure to execute the same
- Deploy and optimize CI/CD pipelines across multiple
environments and continuously improve development and deployment
best practices
- Collaborate with enterprise customers to understand product use
cases and translate into engineering specifications, and deliver
high-quality solutions
- Participate in on-call rotations, post-incident reviews, and
other operational duties to ensure service delivery qualityMinimum
Qualifications
- Bachelor's degree in Computer Science or related field, or
equivalent demonstrated experience
- Strong development and debugging skills in Python
- 5+ years of software development experience in distributed
systems and cloud-native applications
- Strong experience with cloud platforms and infrastructure as
code (Terraform, CloudFormation, Helm)
- Practical experience with Docker containerization and
clustering (Kubernetes/EKS/GKE)
- Proficiency in code and system health, diagnosis, resolution
and software test engineering
- Strong communication and coding skills
- Regularly follows the best software engineering practices and
hold a high bar for the team by leading design, code review and
test plan reviewsPreferred Skills
- Extremely well versed in building and managing cloud
infrastructure for enterprise platforms on (AWS, GCP, Azure) and
services like EC2, EKS, VPC etc
- Experience in one or more of the build tools like Bazel,
Gradle, Make etc. Extra points for someone who has hands on
experience in building and managing large code bases with these
tools
- Designed and implemented developer-friendly APIs or tools to
boost developer productivity
- Familiarity in deployment, monitoring and maintenance of
large-scale enterprise software products
- Familiarity in developing and releasing infrastructure software
for SaaS and on-prem platforms
- [Nice to have]: Hands-on experience setting up and operating
Kubernetes clusters at scale
- [Nice to have]: Experience with large scale distributed
computing systems for ML Training or Serving, eg: Ray, Spark,
Tensorflow etc
- [Nice to have]:Hands-on experience in creating and maintaining
metrics and dashboards on observability platforms such as New
Relic, DataDog, Chronosphere, or similar tools
- [Nice to have]:Experience building services and infrastructure
for Machine learning and AI Systems
- [Nice to have]:Experience in cloud networking, security and
service mesh like istioBe Your Best At SnorkelSnorkel AI is on a
mission to make machine learning practical for everyone, and it
starts with building a team that welcomes, represents and gives
opportunity to all. We work at the frontier of AI and software
engineering, and believe that underrepresented communities need to
play a part in shaping the future of these fields. At Snorkel AI,
we actively work to create an environment that values end-to-end
ownership, diverse forms of impact, and opportunities for personal
growth.Snorkelers are supported by an amazing team and an amazing
set of benefits. For Full-time employees, we offer comprehensive
medical, dental, and vision plans for Snorkelers and their
families, plus a yearly wellness stipend. Our 401k program lets
Snorkelers plan for their future and our parental leave program
lets new parents take up to 20 weeks of paid time off. Learn more
about these benefits and more - like our workstation setup
allowance - on our Careers page.Apply for this job* indicates a
required fieldFirst Name *Last Name *Email *Phone *Resume/CV
*LinkedIn ProfileGDPR Disclosure * Select...
#J-18808-Ljbffr
Keywords: Tbwa Chiat/Day Inc, San Jose , Senior Software Engineer - Infrastructure, IT / Software / Systems , Redwood City, California
Didn't find what you're looking for? Search again!
Loading more jobs...