5/15/2020

Secure HDFS With Kerberos And Access From Presto











































The purpose of this blog post is to setup Hadoop cluster which is secured by Kerberos authentication.
Also please note that, this setup only for learning purpose and not suitable for production setup.



Prerequisites:


  • Install docker and docker-compose
  • Install Java 8
  • My /etc/hosts file added entries. Need to change according to your machine IP address.
192.168.0.114    hadoop.docker.com
192.168.0.114    hive-metastore hadoop.docker.com  kdc.kerberos.com kdc EXAMPLE.COM

Steps:


1. Clone below 3 github projects


git clone https://github.com/dhanuka84/docker-hadoop.git

git clone https://github.com/dhanuka84/docker-hive.git

git clone https://github.com/dhanuka84/docker-hadoop-secure.git


2.  Build docker-hadoop docker image.


cd docker-hadoop/base


docker build -t dhanuka/hadoop:2.7.7 .


3. Build docker-hive docker image


cd docker-hive


docker build -t dhanuka/hive:2.3.2 .


4. Replace my local machine ip with your local machine ip.


cd docker-hadoop-secure


find ./ -type f -exec sed -i 's/192.168.0.114/your_local_ip/g' {} \;

5. Start Kerberos Key Distributed Service


cd docker-hadoop-secure

docker-compose -f docker-kdc.yml up -d

6. Create Kerberos principals and keytabs


Login to kdc docker container







Login as Kadmin









Execute below commands one by one to create principals


kadmin.local:  addprinc -randkey jhs/hadoop.docker.com@EXAMPLE.COM


kadmin.local:  addprinc -randkey yarn/hadoop.docker.com@EXAMPLE.COM


kadmin.local:  addprinc -randkey rm/hadoop.docker.com@EXAMPLE.COM


kadmin.local:  addprinc -randkey nm/hadoop.docker.com@EXAMPLE.COM


kadmin.local:  addprinc -randkey hive/hadoop.docker.com@EXAMPLE.COM

kadmin.local:  addprinc -randkey hive/hive-metastore@EXAMPLE.COM



Execute below commands one by one to create keytabs from principals


kadmin.local:  ktadd -k /opt/nn.service.keytab  nn/hadoop.docker.com


kadmin.local:  ktadd -k /opt/dn.service.keytab  dn/hadoop.docker.com


kadmin.local:  ktadd -k /opt/spnego.service.keytab  spnego/hadoop.docker.com


kadmin.local:  ktadd -k /opt/jhs.service.keytab  jhs/hadoop.docker.com


kadmin.local:  ktadd -k /opt/yarn.service.keytab  yarn/hadoop.docker.com


kadmin.local:  ktadd -k /opt/nm.service.keytab  nm/hadoop.docker.com


kadmin.local:  ktadd -k /opt/rm.service.keytab  rm/hadoop.docker.com


kadmin.local:  ktadd -k /opt/hive.keytab  hive/hive-metastore

kadmin.local: q

[root@kdc /]# exit


Keytabs can be found from kdc-opt directory. Now we need to copy them to keytabs directory































7. Start Hadoop Cluster



docker-compose -f docker-hdfs.yml up -d





























Full Log file

Please note hadoop configuration files in below location (specially core-site.xml & hdfs-site.xml )





















8. Start Hive Meta Store Service



docker-compose -f docker-hive.yml up -d

Please note , this will bootup, hive metastore, hive server and postgres db.

























We actually don't need hive server, so we can stop that docker container.







Login to Hive Metastore and use CLI tool to create external table






















As you can see above, it throws and error

Caused by: java.net.ConnectException: Call From hive-metastore/192.168.0.114 to hadoop.docker.com:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

Run below command:

kinit hive/hive-metastore@EXAMPLE.COM -k -t /etc/security/keytabs/hive.keytab

root@hive-metastore:/opt# cd hive/bin
root@hive-metastore:/opt/hive/bin# ./hive









Then hive CLI will be appeared.


9. Create hdfs folder structure and copy sample data file into that folder


Login to hdfs docker container





Authenticate

bash-4.1# kinit nn/hadoop.docker.com@EXAMPLE.COM -k -t /etc/security/keytabs/nn.service.keytab

Create hdfs folder

bash-4.1# hdfs dfs -mkdir /data

List down hdfs folder

bash-4.1# hdfs dfs -ls /

Found 2 items drwxr-xr-x - root root 0 2020-05-15 05:15 /data drwxrwx--- - root root 0 2020-05-15 05:02 /tmp

Create a test data file with below content

dhanu,colombo kithnu,colombo yuki,colombo

bash-4.1# vi test.dat

Copy test.dat file to hdfs folder /data





bash-4.1# hdfs dfs -ls /data








10. Create External Hive table.


execute below command in hive CLI

CREATE EXTERNAL TABLE data_t5  (name string, city string) ROW FORMAT DELIMITED FIELDS TERMINATED BY "," STORED AS TEXTFILE LOCATION "/data";




Use postgres client application to connect to postgres db































As you can see table metadata in postgress metastore database.


11. Setup Apache Presto 



Follow below documentation to setup Presto-Server in your local machine

https://docs.starburstdata.com/latest/installation/deployment.html

Follow below documentation to setup Presto-CLI in your local machine.

https://docs.starburstdata.com/latest/installation/cli.html

I have used single node Presto-Server , which means single coordinator and single worker nodes.

My Presto-Server folder structure:


My Presto-CLI folder structure:





Configure Presto-Server: etc/config.properties

dhanuka@dhanuka:~/research/hdfs/prestor/presto-server-332$ vim etc/config.properties

node.id=presto-master
node.environment=test
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=1024MB
query.max-memory-per-node=2048MB
query.max-total-memory-per-node=2048MB
discovery-server.enabled=true
discovery.uri=http://localhost:8080
task.max-worker-threads=8

Configure Hive catalog and jvm:

Replace presto/etc/ with docker-hadoop-secure/presto/etc/ files










dhanuka@dhanuka:~/research/hdfs/prestor/presto-server-332$ vim etc/catalog/hive.properties

connector.name=hive-hadoop2
hive.metastore.uri=thrift://hive-metastore:9083
hive.metastore.authentication.type=KERBEROS
hive.metastore.service.principal=hive/hive-metastore@EXAMPLE.COM
hive.metastore.client.principal=hive/hive-metastore@EXAMPLE.COM
hive.metastore.client.keytab=/home/dhanuka/research/hdfs/prestor/presto-server-332/keytabs/hive.keytab
hive.hdfs.authentication.type=KERBEROS
hive.hdfs.presto.principal=nn/hadoop.docker.com@EXAMPLE.COM
hive.hdfs.presto.keytab=/home/dhanuka/research/hdfs/prestor/presto-server-332/keytabs/nn.service.keytab
hive.hdfs.impersonation.enabled=false
hive.config.resources=/home/dhanuka/research/hdfs/docker-hadoop-secure/hive/conf/hdfs-site.xml,/home/dhanuka/research/hdfs/docker-hadoop-secure/hive/conf/core-site.xml

Replace /home/dhanuka/research/hdfs/ with your local machine location

dhanuka@dhanuka:~/research/hdfs/prestor/presto-server-332$ vim etc/jvm.config

-server-Xmx3G-XX:+UseG1GC-XX:+UseGCOverheadLimit-XX:+ExplicitGCInvokesConcurrent-XX:+HeapDumpOnOutOfMemoryError-XX:+ExitOnOutOfMemoryError-XX:ReservedCodeCacheSize=150M-DHADOOP_USER_NAME=hive-Duser.timezone=UTC-Djdk.attach.allowAttachSelf=true-Djdk.nio.maxCachedBufferSize=2000000-Dpresto-temporarily-allow-java8=true-Djava.security.krb5.conf=/home/dhanuka/research/hdfs/docker-hadoop-secure/config_files/krb5.conf-Dsun.security.krb5.debug=true-Dlog.enable-console=true


12. Start Presto-Server and execute SQL commands in Presto-CLI


Launch Presto-Server

./bin/launcher run











Launch Presto-CLI

 ./presto-cli --catalog hive --schema default


Run Select query

select * from data_t5;



































References:


https://github.com/Knappek/docker-hadoop-secure