Requirements To follow along with the examples, you'll need a personal computer. The course is filmed using Windows 10, but the tools we install are available for Linux and MacOS as well. We'll walk through installing the required software in the first lecture: The Scala IDE, Spark, and a JDK. My "Taming Big Data with Apache Spark - Hands On!" would be a helpful introduction to Spark in general, but it is not required for this course. A quick introduction to Spark is included. The course includes a crash course in the Scala programming language if you're new to it; if you already know Scala, then great.
Description "Big Data" analysis is a hot and highly valuable skill. Thing is, "big data" never stops flowing! Spark Streaming is a new and quickly developing technology for processing massive data sets as they are created - why wait for some nightly analysis to run when you can constantly update your analysis in real time, all the time? Whether it's clickstream data from a big website, sensor data from a massive "Internet of Things" deployment, financial data, or something else - Spark Streaming is a powerful technology for transforming and analyzing that data right when it is created, all the time. You'll be learning from an ex-engineer and senior manager from Amazon and IMDb. This course gets your hands on to some real live Twitter data, simulated streams of Apache access logs, and even data used to train machine learning models! You'll write and run real Spark Streaming jobs right at home on your own PC, and toward the end of the course, we'll show you how to take those jobs to a real Hadoop cluster and run them in a production environment too. Across over 30 lectures and almost 6 hours of video content, you'll: Get a crash course in the Scala programming languageLearn how Apache Spark operates on a clusterSet up discretized streams with Spark Streaming and transform them as data is receivedAnalyze streaming data over sliding windows of timeMaintain stateful information across streams of dataConnect Spark Streaming with highly scalable sources of data, including Kafka, Flume, and KinesisDump streams of data in real-time to NoSQL databases such as CassandraRun SQL queries on streamed data in real timeTrain machine learning models in real time with streaming data, and use them to make predictions that keep getting better over timePackage, deploy, and run self-contained Spark Streaming code to a real Hadoop cluser using Amazon Elastic MapReduce. This course is very hands-on, filled with achievable activities and exercises to reinforce your learning. By the end of this course, you'll be confidently creating Spark Streaming scripts in Scala, and be prepared to tackle massive streams of data in a whole new way. You'll be surprised at how easy Spark Streaming makes it! Who is the target audience? Students with some prior programming or scripting ability SHOULD take this course. If you're working for a company with "big data" that is being generated continuously, or hope to work for one, this course is for you. Students with no prior software engineering or programming experience should seek an introductory programming course first.