Senior Data Engineer (Spark and GCP)
Company: Decision Minds, Inc.
Location: San Jose
Posted on: March 26, 2025
Job Description:
San Jose, United States - Posted on 03/21/2025We are looking for
an experienced Senior Data Engineer with a strong background in
Google Cloud Platform (GCP) and Apache Spark to join our dynamic
team. You will be responsible for designing, building, and
optimizing scalable data pipelines, leveraging GCP services and
Spark to handle large-scale data processing and analytics. You will
play a key role in shaping the architecture of our data platform
and work closely with cross-functional teams to enable data-driven
decision-making.Key Responsibilities:
- Design & Build Scalable Data Pipelines: Architect, build, and
optimize highly efficient data pipelines using Apache Spark on
Google Cloud Platform (GCP) (e.g., BigQuery, Dataflow, Dataproc,
Pub/Sub, etc.).
- Data Processing & Transformation: Work with large volumes of
structured and unstructured data, developing data processing and
transformation workflows that support business intelligence and
analytics use cases.
- Collaborate with Cross-Functional Teams: Work closely with Data
Scientists, Business Intelligence teams, and Product teams to
understand business requirements and deliver scalable data
solutions.
- Big Data Engineering: Utilize Spark to process and analyze
large datasets in distributed computing environments, ensuring data
processing tasks are efficient and scalable.
- Optimize Performance & Cost Efficiency: Fine-tune the
performance of data workflows and reduce processing costs through
the effective use of GCP services and Spark performance
optimizations (e.g., partitioning, caching, memory
management).
- Cloud Infrastructure Management: Manage and optimize cloud
resources in GCP, ensuring high availability, scalability, and
reliability of data pipelines and processing jobs.
- ETL & Data Integration: Design and implement complex ETL
workflows, including data extraction, transformation, and loading
from multiple source systems into cloud-based data warehouses or
data lakes.
- Data Quality & Governance: Ensure data quality and consistency
across pipelines and adhere to data governance, security, and
privacy standards.
- Mentorship & Leadership: Provide technical leadership and
mentorship to junior data engineers and foster a culture of best
practices in data engineering.
- Monitoring & Troubleshooting: Implement monitoring solutions to
track pipeline performance, set up alerting for failures, and
troubleshoot any issues in the data processing workflows.
- Documentation & Reporting: Create detailed technical
documentation and reports to communicate data pipeline designs,
performance metrics, and optimizations to stakeholders.Skills &
Qualifications:
- Proven Experience: 5+ years of hands-on experience in data
engineering, with strong expertise in Google Cloud Platform (GCP)
and Apache Spark.
- GCP Services Expertise: Experience with GCP services such as
BigQuery, Dataflow, Dataproc, Pub/Sub, Cloud Storage, Cloud
Composer, and Cloud Functions.
- Big Data Technologies: Proficiency in working with Apache Spark
(PySpark, Scala, or Java), Hadoop, and Kafka for building
distributed data processing pipelines.
- ETL Process Design: Expertise in designing and implementing
complex ETL workflows and understanding of data ingestion,
transformation, and storage.
- Programming Skills: Strong programming skills in Python, Scala,
or Java, with hands-on experience in big data frameworks (e.g.,
Apache Spark).
- SQL & NoSQL Databases: Expertise in SQL (BigQuery, PostgreSQL,
etc.) and knowledge of NoSQL databases (e.g., MongoDB,
Cassandra).
- Data Warehousing: Experience building and managing data
warehouses, especially using BigQuery or similar cloud-based
storage systems.
- Performance Optimization: Expertise in optimizing Spark jobs
and cloud-based data workflows for performance, scalability, and
cost efficiency.
- Cloud Infrastructure Management: Familiarity with cloud-native
DevOps practices, containerization (e.g., Docker), and CI/CD
pipelines.
- Data Governance & Security: Strong knowledge of data privacy,
governance, and security best practices in cloud environments.
- Version Control & Collaboration: Proficient in using version
control tools (e.g., Git) and agile development practices.
- Education: Bachelor's or Master's degree in Computer Science,
Engineering, Data Science, or a related field. Certifications in
GCP (e.g., Google Cloud Professional Data Engineer) are a
plus.Preferred Qualifications:
- Real-Time Data Processing: Knowledge of real-time data
processing tools such as Apache Kafka or Google Pub/Sub.Personal
Attributes:
- Leadership: Strong leadership skills with a track record of
leading data engineering teams and driving initiatives that improve
data workflows.
- Problem-Solving: Excellent analytical and problem-solving
skills, particularly in distributed computing and large-scale data
processing.
- Collaboration: Effective communicator who can collaborate with
technical and non-technical stakeholders.
- Adaptability: Ability to thrive in a fast-paced, constantly
evolving environment and embrace new technologies.
- Mentorship: Passion for coaching and mentoring junior team
members to develop their technical skills.Why Join Us:
- Innovative Work Environment: Join a team working with
cutting-edge technologies to build scalable data solutions.
- Career Growth: Opportunities to expand your expertise in GCP
and Spark, and work on exciting and complex data engineering
projects.
- Competitive Compensation: Attractive salary, benefits, and
opportunities for career advancement.
#J-18808-Ljbffr
Keywords: Decision Minds, Inc., San Jose , Senior Data Engineer (Spark and GCP), Engineering , San Jose, California
Didn't find what you're looking for? Search again!
Loading more jobs...