This website uses cookies to improve your user experience and to show you content related to your preferences. If you continue browsing, we consider that you agree to their use. More information.
Ok, don't show again

Apache Spark. Apache Flink. ML in production

Tue, Thu 19:00 - 22:00

MegFon office,
41 Oruzheyniy Lane, Moscow
February 11 - March 17, 2020


Language of new opportunities
Both for data scientists and data engineers
Knowing Scala, you will write more stable and faster applications on Apache Spark. You won't have to wait until new features are implemented in Python API. You will not be terrified by number of lines written in Java. You will be able to work with Apache Flink and Akka and write ML code in production, including using libraries XGBoost4j, Deeplearning4j.
What is included in the program
5 labs
Every week you will have to complete a lab task and advanced one.
1000 lines of code
Written at seminars and to complete lab tasks and advanced ones.
10 lessons
With instructors-practicioners with strong experience in data analysis with Scala.
We developed the program for
Data scientists
Already analyzing data with Python? During our program, you will learn how to write code with Scala in production and get more Spark functionality in Scala API.
Data engineers
Can you retrieve, process, and load data using Python or Java? Now learn how to do this by interacting with Spark, Flink, Kafka through Scala API.
Already have Scala programming experience? We will teach you to use Scala to analize data so that you could change your career path and to move into an adjacent and more promising industry.
What you will learn
Our program has three components
Learn how to use Scala as part of a functional programming and object-oriented paradigm. Learn to use higher-order functions, partially defined functions, currying, collections, and other.
Learn to process data using RDD, Dataframes and Datasets. Write ETL jobs on Scala, build machine learning models, optimize their hyperparameters, and create applications for near real-time processing.
Learn to use Apache Flink for real real-time. Work with machine learning libraries such as XGBoost4j, Deeplearning4j, which are more suitable for use in a production environment.
Introductory lab to learn about Scala syntax and principles. Here you should implement a non-personalized recommender system in it: calculate the tops of the best films.

In this lab, you will need to calculate the similarity of the descriptions of various online courses. This will form the basis of another recommendation system. You will operate on data frames and datasets on Spark.
You will need to solve the classification task: will a client watch a particular movie based on what he\she watches on TV. To solve the task you will use the Spark ML library.

In Kafka you will receive data of a website user. You will need, using Spark Streaming, to make a forecast regarding its gender and age category.

Using data on the behavior of the bank's customers, make a forecast whether this or that client will leave within the next 3 months. The model needs to be built using XGBoost4j, which is more suitable for production.
Our instructors – industry practitioners who can explain complex things in simple words
Andrey Titov
Senior Spark Engineer, NVIDIA
Egor Mateshuk
Head of Analytics, Data Science and Data Engineering Department, MaximaTelecom
Dmitry Bugaychenko
Program infrastructure
What you will be working with every day
Our program is about big data, that's why you will work with Hadoop-cluster that we administer, configure, and support.
All the presentations, jupyter-notebooks, labs and manuals we upload to a private repository on GitHub. This tool has become the standard of work among programmers and data professionals.
Our portal
Here you can check the correct execution of labs using automatic checkers and also watch live broadcasts and previous videos of classes.
All the communication during the program is in Slack - a convenient messenger for teams. There you can ask questions during the live broadcast, communicate with instructors, organizers and with each other, follow updates on GitHub and be informed of program news.
You need to know
Program prerequisites
Python 3 or Java
If you know how to analyze data using one of these programming languages, then with us you can learn how to analyze data using Scala.
Linux basic knowledge
You'll spend some time in Linux command line working with the cluster. Great if you already familiar with navigation through directories, creating and editing files, connecting to the remote server via ssh.
On the program you will use a tool such as Apache Spark. To work with it, you may need the ability to write queries in this language: selects, joins, filters, subqueries.
Statistics and linear algebra
During the program we will consider machine learning algorithms and their implementation in various libraries that have the Scala API. It's good if you know the basics of statistics and linear algebra: average, variance, probability, Bayes theorem, correlation, matrix rank.
Where our alumni work
Here they live and work

Our principles in teaching
To make learning effective and interesting we use andragogy
The material is focused on specific tasks
Our goal is to teach you to solve problems from real life, and not just cover the list of topics. Theory is only a tool necessary for solving problems, and not a goal itself.
The ability to apply new knowledge immediately
After the first week you will learn how to deploy your Hadoop-cluster in cloud and will be able to use this knowledge for a pilot project at work.
Autonomy in lab tasks

Our lab tasks are composed in such a way so that you often need to google something. After the program, you will have your 'luggage' of quality resources to deal with different tasks.
Will be happy to answer your questions
Please, leave a question below
I have read and accept your Privacy Policy.