The Data Engineering Concentration builds on the curriculum of the MSDS program, focusing on a comprehensive theoretical and technical understanding of statistical analysis, machine learning, and systems to deploy and maintain a life cycle of data and models. The Data Engineering concentration extends the MSDS program by only one semester, with a total program time of 18 months. During the spring and summer, students complete the core MSDS curriculum, while the second year fall semester is dedicated to Data Engineering concentration courses.
Students will- Possess a theoretical understanding of classical statistical models (e.g., generalized linear models, linear time series models, etc.), as well as the ability to apply those models effectively. Possess a theoretical understanding of machine learning techniques (e.g., random forests, neural networks, naive Bayes, k-means, etc.), as well as the ability to apply those techniques effectively to data and maintain its life cycle. Effectively use modern programming languages (e.g., R, Python, SQL, etc.), technologies (Cloud Computing, AWS, GCP, etc.), and Distributed Systems (Hive, Spark, Hadoop, Airflow, etc.) to scrape, clean, organize, query, summarize, visualize, and model large volumes and varieties of data. Prepared for careers as data scientists and engineers by solving real-world, data-driven business problems with other data scientists and engineers in an ethical and responsible way.