Apache Sqoop

Before proceeding with SQOOP , you need a basic knowledge of  Database concepts of RDBMS, Hadoop File system (HDFS), and any of Linux operating system flavors.
ALL this things I have posted on this  blog

Sqoop is a tool designed to transfer data between Hadoop and relational database servers. It is used to import data from relational databases such as MySQL, Oracle to Hadoop HDFS, and export from Hadoop file system to relational databases.
 You must know the connector like MySQL download connector and copy to sqoop/lib
: Import the tables from lfs  to hdfs parallely
With default parallelism 4
You can check it using import stmt
Parallelism always apply on primary key or unique kay
Sqoop does only map operation
Divide the task Into slot in order to run parallelism  called boundary val
Split does on primary key
 Move to practice
Create database in MySQL name Create table emp with id fname ,fname, salary, did
==============SQOOP IMPORT==========

sqoop import --connect jdbc:mysql://localhost:3306/employeedb --table employees --username root --password root --target-dir /data/sqoop/example_1

============IF TABLE DOES NOT HAVE A PRIMARY KEY============
----IT HAS A UNIQUE COLUMN-----

sqoop import --connect jdbc:mysql://localhost:3306/empdb --table employees --username root --password  root --target-dir /data/sqoop/example_3 --split-by 'EMPLOYEE_ID'


----IT DOES NOT HAVE ANY UNIQUE COLUMN ALSO------

sqoop import --connect jdbc:mysql://localhost:3306/empdb --table employees --username root --password root --target-dir /data/sqoop/example_3 -m 1

============COLUMNAR IMPORT============

sqoop import --connect jdbc:mysql://localhost:3306/empdb --table employees --username root --password --columns 'EMPLOYEE_ID,FIRST_NAME,SALARY,MANAGER_ID,JOB_ID,HIRE_DATE' --target-dir /data/sqoop/example_4

============CONDITIONAL IMPORT============

sqoop import --connect jdbc:mysql://localhost:3306/empdb --table employees --username root --password  --target-dir /data/sqoop/example_5 --where 'SALARY > 5000'

sqoop import --connect jdbc:mysql://localhost:3306/empdb --table employees --username root --password root --target-dir /data/sqoop/example_6 \
--where 'FIRST_NAME LIKE "A%"'
 Perform this
Go to hdfs
You will see
m-00001
m-00002
m-00003
m-00004
 If your  table having primary key starts from 100 to 500
Means 500entries

500/4
Each part having
Almost 125 entries starts from 0 min() and 500
Max()
So m-00001 having 0-125enteies and our primary key starts from 100 so it is having 25 2ntries

≠======!Incremental import =======
What is incremental import
In our RDBMS  there will be chances to increase the data than what we last import on Hdfs
How to deal with this.. . ..
Example let's say we have 200gb data present on RDBMS, then daily increament by 5gb
So that we use incremental import
It has two mode
Increamantal append - is on column primary key we have to remember last value of primary key imported
eg if we import last value of primary key was 100
It must me specify


Increamental last modified
Specify date which  last imported
Eg imported last date let's say 20-10-18
Must be specify

No comments:

Post a Comment