# Graph Connected Component Analysis with Bulk-Synchronous-Parallel Processing

In an undirected graph, a connected component is a subgraph in which any two vertices are connected to each other by paths. Usual algorithms build a graph where each vertex is labeled with its component number. They use a search from an unclassified vertex to build the connected component containing this vertex. The process repeats until all vertices are labeled with a component. Vertices that are not connected form a component with only one vertex.

In an edge-centric BSP implementation, like GraphX, finding all the connected components is done in parallel using Pregel iterations:

```// working copy of the input graph, set label of vertices to their vertex Id
val cGraph : Graph[ Int, ED ] = graph.mapVertices(
(vid, attr) => vid )

cGraph.pregel[ Long ](initialMsg = Long.MaxValue) (
// vertex program
(vid : VertexId, attr : Long, msg : Long) => Math.min(msg, attr),
// sendMsg: send to vertex the min of component label along the edges
(edgeT : EdgeTriplet[Long, ED]) =>
if (edgeT.srcAttr < edgeT.dstAttr)
Iterator( (edgeT.dstId, edgeT.srcAttr) )
else if (edgeT.dstAttr < edgeT.srcAttr)
Iterator( (edgeT.srcId, edgeT.dstAttr) )
else
Iterator.empty,
// mergeMsg
Math.min(_, _) )
```

This algorithm will build a graph where each vertex is labeled with the lowest vertex Id of the connected component that contains this vertex. The vertex program is a function called each time a vertex receives a message, in this case a Long number which is the component Id of one of its neighbor. The `sendMsg` function processes an edge and propagate the minimum of the vertex attribute of one vertex to the other vertex, and finally messages are merged with the `min()` binary function.