Looking for a career?

Senior Data Engineer

 We are committed to connecting you with top-tier employers who recognize the value of your military background. Don’t miss out on this chance to unlock your potential and embark on a fulfilling civilian career. Submit your information today. Stay tuned for exciting updates and get ready to take the next step towards a brighter future with Skilled Vets!

Senior Data Engineer:

Senior Data Engineer

The role involves will contributing with the migration of the existing data platform to on prem Hadoop platform. Implementing standards, governance, and automation. As a senior to the team, this requires coordination and communication skills to work with key stakeholders, data scientist, and other team members.

Responsibilities:

  • Design and Develop Data Pipelines:
    • Architect and implement scalable and efficient ETL (Extract, Transform, Load) pipelines using PySpark.
    • Optimize data processing workflows to handle large-scale datasets.
  • Machine Learning Model Development:
    • Develop and train machine learning models using appropriate algorithms and frameworks.
    • Collaborate with data scientists to translate models into production-ready code.
  • MLOps Implementation:
    • Establish and maintain automated CI/CD pipelines for machine learning models.
    • Implement version control for data, models, and code using tools like DVC, MLflow, or similar.
    • Monitor and automate the retraining of models as new data becomes available.
  • Data Quality and Governance:
    • Implement data validation, quality checks, and data governance best practices.
    • Ensure data lineage and documentation for reproducibility and compliance.
  • Performance Tuning:
    • Optimize PySpark jobs for performance, including tuning Spark configurations, optimizing shuffles, and managing memory.
    • Profile and debug PySpark applications to identify and resolve performance bottlenecks.
  • Integration with Cloud Platforms:
    • Deploy and manage data pipelines and machine learning models on cloud platforms (e.g., AWS).
    • Utilize cloud-native services for data storage, processing, and orchestration
  • Collaboration and Communication:
    • Work closely with data scientists, software engineers, and DevOps teams to integrate machine learning models into the broader software infrastructure.
    • Collaborate with business stakeholders to understand requirements and ensure the successful deployment of machine learning solutions.
    • Communicate technical concepts and project status to non-technical stakeholders effectively

Qualifications:

  • 8+ years of experience in Engineering field
  • Strong experience with Pyspark, Hadoop (on-prem).
  • Experience with MLOps tools like DVC, MLflow, or similar.
  • Experience with Jupyter, Data Robot, or similar tools.
  • Experience with AWS Sagemaker

Pay rate is 120K, Remote