In here, I am going to show how to integrate local single node Cassandra db with standalone spark using spark-cassandra-connector.
Setup Cassandra, Spark, Scala & ScalaBuildTool
1. Download Cassandra & Spark. I am using Cassandra version 3.11.2 and Spark version 2.2.1 .
http://cassandra.apache.org/download/
https://spark.apache.org/releases/spark-release-2-2-1.html
https://www.scala-lang.org/download/2.11.8.html
https://www.scala-sbt.org/download.html
2. Environment setup in .profile
#cassandra setup
export CASSANDRA_HOME=/home/dhanuka/software/apache-cassandra-3.11.2
#spark, sbt and scala setup
export SPARK_HOME=/home/dhanuka/software/spark/spark-2.2.1-bin-hadoop2.7
export SBT_HOME=/home/dhanuka/software/spark/sbt-launcher-packaging-0.13.13
export SCALA_HOME=/home/dhanuka/software/scala-2.11.8
PATH=$PATH:$JAVA_HOME/bin:$MAVEN_HOME/bin:SPARK_HOME/bin:$SBT_HOME/bin:$SCALA_HOME/bin:CASSANDRA_HOME/bin
3. $ source ~/.profile
Create Cassandra Keyspace and Table
1. Start cassandra with following command
$ cassandra -f
2. Start CQL shell
$ cqlsh
3. Create keyspace and a table
cqlsh> CREATE KEYSPACE people WITH replication = {'class': 'SimpleStrategy', 'replication_factor':1};
cqlsh> use people;
cqlsh:people> CREATE TABLE users(
... id varchar ,
... first_name varchar,
... last_name varchar,
... city varchar,
... emails varchar,
... PRIMARY KEY (id));
cqlsh:people> Insert into users (id,first_name,last_name,city,emails) values('1','dhanuka','ranasinghe','colombo','dhanuka.priyanath@gmail.com');
cqlsh:people> select * from users;
id | city | emails | first_name | last_name
---------+---------+-----------------------------+------------+------------
1 | colombo | dhanuka.priyanath@gmail.com | dhanuka | ranasinghe
Build spark-cassandra-connector.
1. clone from git hub repository.
$ git clone https://github.com/datastax/spark-cassandra-connector.git
$ cd spark-cassandra-connector
2. Build the project with scala 2.11 and cassandra 3.11.2
$ spark-cassandra-connector$ sbt -Dscala-2.11=true -Dtest.cassandra.version=3.11.2 assembly
You can find the jar location below.
$ spark-cassandra-connector/spark-cassandra-connector/target/full/scala-2.11/spark-cassandra-connector-assembly-2.0.7-82-g0369a7b.jar
$ mv spark-cassandra-connector-assembly-2.0.7-82-g0369a7b.jar spark-cassandra-connector-assembly-2.0.7.jar
Connect Spark with Cassandra through Spark-Shell
1. Copy cassandra-connector-assembly-2.0.7.jar to spark jars location. Copy to below location
$ cp cassandra-connector-assembly-2.0.7.jar $SPARK_HOME/jars
2. Start spark-shell
$ spark-shell --jars $SPARK_HOME/jars/spark-cassandra-connector-assembly-2.0.7.jar
3. Stop current spark context
scala> sc.stop
4. Program to read Cassandra from spark
scala> import com.datastax.spark.connector._, org.apache.spark.SparkContext, org.apache.spark.SparkContext._, org.apache.spark.SparkConf
import com.datastax.spark.connector._
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
scala> val conf = new SparkConf(true).set("spark.cassandra.connection.host", "localhost")
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@2c2a7d53
scala> val sc = new SparkContext(conf)
sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@15914bb5
scala> val test_spark_rdd = sc.cassandraTable("people", "users")
test_spark_rdd: com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow] = CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:19
scala> test_spark_rdd.first
res1: com.datastax.spark.connector.CassandraRow = CassandraRow{id: 1, city: colombo, emails: dhanuka.priyanath@gmail.com, first_name: dhanuka, last_name: ranasinghe}
References:
[1] https://www.datastax.com/dev/blog/kindling-an-introduction-to-spark-with-cassandra-part-1
[2] https://www.youtube.com/watch?v=jpEABn80OCU
Thanks for sharing your innovative ideas to our vision. I have read your blog and I gathered some new information through your blog. Your blog is really very informative and unique. Keep posting like this. Awaiting for your further update.If you are looking for any How to install Cassandra on ubuntu related information, please visit our website Cassandra Cluster ubuntu Setup
ReplyDelete