Formally, an RDD is a read-only, partitioned collection of records. RDDs can be created through deterministic operations on either data on stable storage or other RDDs. RDD is a fault-tolerant collection of elements that can be operated on in parallel. The Spark documentation defines an…
Spark-Scala the machine learning algorithm
Scala is an open source programming language. It was created by Martin Odersky in 2001.. Another important event in Scala history was the creation of Typesafe Incorporation in May 2011 for providing commercial support to Scala. Scala is an open source programming language. It was…
Spark Use Cases in Finance Industry
Spark’s generalized abstraction and growing helper libraries mean that companies can use Spark for a vast number of usages. Recommendations and other personalizations using big data are a vast use case, covering companies such as Yahoo, Comcast, Ooyala, Conviva, and Netflix. Another case is data…
Actions and different transformations spark
Spark revolves around the concept of a resilient distributed dataset (RDD), which is a fault-tolerant collection of elements that can be operated on in parallel. There are two ways to create RDDs. The action can be an operation that returns a value to the calling…
Large cluster of machines in a fault-tolerant
MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. The solution is to provide a distributed memory abstraction that lets programmer perform in-memory computation on a large cluster of machines…
Modern distributed computing problems
While attending university, I recall more than a little time spent discussing the nuances and efficiency of various sorting algorithms. Those considerations seem largely irrelevant just barely a decade later. Modern distributed computing problems need a programming abstraction that is better than map-reduce, as an…
Logical directed acyclic graph
In graph theory, a graph is a series of vertexes connected by edges. In a directed graph, the edges are connected so that each edge only goes one way. When a client submits a spark user-application code, the driver implicitly converts the code containing transformations…
Dotnet core features top 10
Octopus Deploy provides the most value when it is used by your whole team. Developers and testers might be allowed to deploy specific projects to pre-production environments, but not production environments. Stakeholders might be permitted to view certain projects, but not modify or deploy them….