Data Engineer is responsible for designing, building, and maintaining the infrastructure and systems required for collecting, storing, and processing large datasets efficiently. **Education**:Bachelor's degree in computer science with 8+ years of experience **Experience**: - Technical Skills - Programming Languages: Proficiency in Python, SQL, Java, or Scala for data manipulation and pipeline development. - Data Processing Frameworks: Experience with tools like Apache Spark, Hadoop, or Apache Kafka for large-scale data processing. - Data Systems and Platforms - Databases: Knowledge of both relational databases (e.g., MySQL, PostgreSQL) and NoSQL databases (e.g., MongoDB, Cassandra). - Data Warehousing: Experience with platforms like Snowflake, Amazon Redshift and Azure Synapse. - Cloud Platforms: Familiarity with AWS, Azure Cloud for deploying and managing data pipelines. Having Good experience in Fabric is advantageous - Experience working with distributed computing systems such as Hadoop HDFS, Hive, or Spark. - Managing and optimizing data lakes and delta lakes for structured and unstructured data. - Data Modeling and Architecture - Expertise in designing efficient data models (e.g., star schema, snowflake schema) and maintaining data integrity. - Understanding of modern data architectures like Data Mesh or Lambda Architecture. - Data Pipeline Development - Building and automating ETL/ELT pipelines for extracting data from diverse sources, transforming it, and loading it into target systems. - Monitoring and troubleshooting pipeline performance and failures. - Workflow Orchestration - Hands-on experience with orchestration tools such as Azure Data Factory, AWS Glue jobs, DMS or Prefect to schedule and manage workflows. - Version Control and CI/CD - Utilizing Git for version control and implementing CI/CD practices for data pipeline deployments. **Key Skills**: - Proficiency in programming languages such as Python, SQL, and optionally Scala or Java. - Proficiency in data processing frameworks like Apache Spark and Hadoop is crucial for handling large-scale and real-time data. - Expertise in ETL/ELT tools such as Azure ADF and Fabric Data Pipeline is important for creating efficient and scalable data pipelines. - A solid understanding of database systems, including relational databases like MySQL and PostgreSQL, as well as NoSQL solutions such as MongoDB and Cassandra, is fundamental. - Experience with cloud platforms, including AWS, Azure and their data-specific services like S3, BigQuery, and Azure Data Factory, is highly valuable. - Data modeling skills, including designing star or snowflake schema, and knowledge of modern architectures like Lambda and Data Mesh, are critical for building scalable solutions. **Role and Responsibilities**: - Responsible for designing, developing, and maintaining data pipelines and infrastructure to support our data-driven decision-making processes. - Design, build, and maintain data pipelines to extract, transform, and load data from various sources into our data warehouse and data lake. - Proficient in creating data bricks creating notebooks, working with catalogs, native SQL, creating clusters, Parameterizing notebooks, and administrating data bricks. Define security models and assign roles as per requirement. - Responsible for creating data flow in Synapse analytics integrating external source systems, creating external tables, data flows and create data models. Schedule the pipelines using various jobs, creating trigger - Design and develop data pipelines using Fabric pipelines, spark notebooks accessing multiple data sources. Proficient in developing Data bricks notebooks and data optimization - Develop and implement data models to ensure data integrity and consistency. Manage and optimize data storage solutions, including databases and data warehouses. - Develop and implement data quality checks and validation procedures to ensure data accuracy and reliability. - Design and implement data infrastructure components, including data pipelines, data lakes, and data warehouses. - Collaborate with data scientists, analysts, and other stakeholders to understand business requirements and translate them into technical solutions. - Monitoring Azure and Fabric data pipelines, spark jobs and work on fixes based on the request priority. - Responsible for data monitoring activities, having good knowledge on reporting tools like Power Bi and Tableau is required. - Responsible for understanding the client requirements and architect solutions in both Azure and AWS cloud platforms. - Monitor and optimize data pipeline performance and scalability to ensure efficient data processing.