Data Engineer : W2 Onsite role
Information Technology | Jersey City, NJ | Contract
Note: This is a w2 role, cannot do C2C
Role : Data Engineer
Location : New Jersey, NJ onsite
Long term contract
About the Role We are looking for a highly skilled Senior Data Engineer with strong expertise in PySpark, Python, SQL, and database technologies, along with exposure to Data Science, AI/ML techniques. The ideal candidate will design and optimize scalable data pipelines, collaborate with cross-functional teams, and contribute to the development of analytical and machine learning–driven solutions.
Key Responsibilities
Data Engineering & Pipeline Development
- Design, develop, and optimize large-scale ETL/ELT pipelines using PySpark and distributed data processing frameworks.
- Build high-performance data ingestion workflows from diverse structured and unstructured sources.
- Implement scalable data models, data marts, and warehousing solutions.
Programming & Database Expertise
• Write clean, modular, and optimized code using Python for data processing and automation.
• Develop complex SQL queries, stored procedures, and performance-tuned database operations.
• Work with relational and NoSQL databases (e.g., MySQL, PostgreSQL, SQL Server, MongoDB, etc.).
Data Science + AI/ML Collaboration
• Partner with Data Science teams to productionize ML models and enable ML-driven pipelines.
• Contribute to model deployment, feature engineering, and ML workflow optimization.
• Integrate ML models into scalable data platforms.
Architecture & Best Practices
• Ensure data quality, reliability, lineage, and governance across data workflows.
• Drive best practices in coding, testing, CI/CD, and cloud-based deployments.
• Work with cross‑functional teams to translate business requirements into robust data solutions.
Required Skills & Qualifications
• 5+ years of experience in Data Engineering with strong hands-on work in PySpark.
• Strong proficiency in Python, including libraries for data processing.
• Advanced knowledge of SQL and performance optimization techniques.
• Experience with distributed data systems (Spark, Databricks, Hive, or similar).
• Exposure to AI/ML workflows, including model deployment or MLOps.
• Solid understanding of data modeling, warehousing concepts, and ETL/ELT architectures.
Good to Have
• US Healthcare domain experience (HIPAA, claims data, EHR/EMR, HL7, FHIR, etc.).
• Experience with cloud platforms (Azure, AWS, GCP).
• Knowledge of MLflow, Airflow, or similar tools.
