August 27th, 2014 by Chance
Join us for an evening of Spark demos and use cases!
Chance Coble will be presenting on the development of a proof-of-concept project to detect network problems for a telecommunications company. As the organization’s main business is to route customer calls, price efficiency is a critical concern, along with quality of service and fraud prevention. The company manages 3 billion call minutes per month for its customers, and must store historical metadata on all of these connections. Each of these business goals has a set of metrics that must be monitored on a continual basis, and rapid intervention is essential to maintain the health of the service for customers.
Please register for this class asap, space is limited and filling up quickly. Don’t miss out!
During this session, Chance will show how he utilized Spark’s low latency to quickly integrate new information in the form of the telecommunications company’s call records into in-memory structures from HDFS. He will then demonstrate how to build pattern recognition using Spark’s concise functional API on top of those data structures to identify network problems at much higher speeds than the company had been able to achieve.
Chance will explore the use of Spark as a high performance workbench to facilitate streaming analytics on the call detail records, including:
- deployment of Spark to AWS,
- execution of code on the cloud cluster, and
- accessing cloud storage with Spark’s streaming engine.
The development of a simple anomaly detection technique in Scala will be reviewed, as well as applications to other types of business cases. For those considering adding Spark to their infrastructure, Chance will also highlight the business benefits of this approach.
Don’t miss out on learning about this real-life Spark analytics test case. Please sign up for the Meetup group and register for the event.
Apache Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS). However, Spark is not tied to the two-stage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce, for certain applications. Spark provides primitives for in-memory cluster computing that allow user programs to load data into a cluster’s memory and query it repeatedly, making it well-suited to machine-learning algorithms.
Until then, here are some resources on Spark to give you background: