In an undirected graph, a connected component is a subgraph in which any two vertices are connected to each other by paths. Usual algorithms build a graph where each vertex is labeled with its component number. They use a search from an unclassified vertex to build the connected component containing this vertex. The process repeats until all vertices are labeled with a component. Vertices that are not connected form a component with only one vertex.

In an edge-centric BSP implementation, like GraphX, finding all the connected components is done in parallel using Pregel iterations:

//working copy of the input graph, set label of vertices to their vertex Idval cGraph : Graph[ Int, ED ] = graph.mapVertices( (vid, attr) => vid ) cGraph.pregel[ Long ](initialMsg = Long.MaxValue) ( //vertex program(vid : VertexId, attr : Long, msg : Long) => Math.min(msg, attr), //sendMsg: send to vertex the min of component label along the edges(edgeT : EdgeTriplet[Long, ED]) => if (edgeT.srcAttr < edgeT.dstAttr) Iterator( (edgeT.dstId, edgeT.srcAttr) ) else if (edgeT.dstAttr < edgeT.srcAttr) Iterator( (edgeT.srcId, edgeT.dstAttr) ) else Iterator.empty, //mergeMsgMath.min(_, _) )

This algorithm will build a graph where each vertex is labeled with the lowest vertex Id of the connected component that contains this vertex. The vertex program is a function called each time a vertex receives a message, in this case a Long number which is the component Id of one of its neighbor. The `sendMsg`

function processes an edge and propagate the minimum of the vertex attribute of one vertex to the other vertex, and finally messages are merged with the `min()`

binary function.