Wednesday, October 5, 2016

HBase Lab1 : Hbase Shell Commands

Hbase Shell commands:
_____________________

 hbase(main):002:0> list
TABLE                                                     
Hbemp                                                     
Hemp                                                      
hnemp                                                     
htable                                                    
htable1                                                   
raja                                                      
sample                                                    
sample1                                                   
soon                                                      
stab                                                      
test                                                      
test1                                                     
12 row(s) in 0.0320 seconds

hbase(main):003:0> 

hbase(main):003:0> list 'htable'
TABLE                                             
htable                                            
htable1                                           
2 row(s) in 0.0200 seconds

hbase(main):004:0> 

hbase(main):005:0> create  'mytab','cf'
0 row(s) in 1.0690 seconds

hbase(main):006:0> list 'mytab' 
TABLE                                             
mytab                                             
1 row(s) in 0.0580 seconds

hbase(main):007:0> create 'urtab','f1','f2'
0 row(s) in 1.4660 seconds

hbase(main):008:0> list 'urtab'            
TABLE                                             
urtab                                             
1 row(s) in 0.0640 seconds

hbase(main):009:0> 

hbase(main):002:0> describe 'urtab'
DESCRIPTION                       ENABLED         
 {NAME => 'urtab', FAMILIES => [{ true            
 NAME => 'f1', BLOOMFILTER => 'NO                 
 NE', REPLICATION_SCOPE => '0', C                 
 OMPRESSION => 'NONE', VERSIONS =                 
 > '3', TTL => '2147483647', BLOC                 
 KSIZE => '65536', IN_MEMORY => '                 
 false', BLOCKCACHE => 'true'}, {                 
 NAME => 'f2', BLOOMFILTER => 'NO                 
 NE', REPLICATION_SCOPE => '0', C                 
 OMPRESSION => 'NONE', VERSIONS =                 
 > '3', TTL => '2147483647', BLOC                 
 KSIZE => '65536', IN_MEMORY => '                 
 false', BLOCKCACHE => 'true'}]}                  
1 row(s) in 0.0960 seconds

hbase(main):003:0> 


hbase(main):002:0> put 'urtab','r1','f1:a',100 
0 row(s) in 0.0950 seconds

hbase(main):003:0> put 'urtab','r1','f1:b',100
0 row(s) in 0.0100 seconds

hbase(main):004:0> put 'urtab','r1','f2:b',200
0 row(s) in 0.0130 seconds

hbase(main):005:0> put 'urtab','r1','f2:c',300
0 row(s) in 0.0150 seconds

hbase(main):006:0> put 'urtab','r1','f2:d',500
0 row(s) in 0.0050 seconds

hbase(main):007:0> 

hbase(main):008:0> put 'urtab','r2','f1:a',5000
0 row(s) in 0.0100 seconds

hbase(main):009:0> put 'urtab','r2','f1:x',6000
0 row(s) in 0.0130 seconds

hbase(main):010:0> put 'urtab','r2','f2:b',7000
0 row(s) in 0.0100 seconds

hbase(main):011:0> put 'urtab','r2','f2:y',8000
0 row(s) in 0.0110 seconds

hbase(main):012:0> 

hbase(main):014:0> scan 'urtab',{COLUMNS=>'f1:a'}
ROW           COLUMN+CELL                         
 r1           column=f1:a, timestamp=1467165821966
              , value=100                         
 r2           column=f1:a, timestamp=1467165997467
              , value=5000                        
2 row(s) in 0.0240 seconds

hbase(main):015:0> 

hbase(main):022:0> scan 'urtab',{COLUMNS=>['f1:a','f2:y']}
ROW           COLUMN+CELL                         
 r1           column=f1:a, timestamp=1467165821966
              , value=100                         
 r2           column=f1:a, timestamp=1467165997467
              , value=5000                        
 r2           column=f2:y, timestamp=1467166038766
              , value=8000                        
2 row(s) in 0.0380 seconds

hbase(main):023:0> 

hbase(main):023:0> scan 'urtab',{COLUMNS=>'f1:'}
ROW           COLUMN+CELL                         
 r1           column=f1:a, timestamp=1467165821966
              , value=100                         
 r1           column=f1:b, timestamp=1467165828140
              , value=100                         
 r2           column=f1:a, timestamp=1467165997467
              , value=5000                        
 r2           column=f1:x, timestamp=1467166021078
              , value=6000                        
2 row(s) in 0.0250 seconds

hbase(main):024:0> 


hbase(main):024:0> get 'urtab','r2'
COLUMN        CELL                                
 f1:a         timestamp=1467165997467, value=5000 
 f1:x         timestamp=1467166021078, value=6000 
 f2:b         timestamp=1467166032165, value=7000 
 f2:y         timestamp=1467166038766, value=8000 
4 row(s) in 0.0230 seconds

hbase(main):025:0> get 'urtab','r1'
COLUMN        CELL                                
 f1:a         timestamp=1467165821966, value=100  
 f1:b         timestamp=1467165828140, value=100  
 f2:b         timestamp=1467165844182, value=200  
 f2:c         timestamp=1467165878424, value=300  
 f2:d         timestamp=1467165900629, value=500  
5 row(s) in 0.0150 seconds

hbase(main):026:0> 

hbase(main):026:0> get 'urtab','r1','f2:y'
COLUMN        CELL                                
0 row(s) in 0.0110 seconds

hbase(main):027:0> get 'urtab','r2','f2:y'
COLUMN        CELL                                
 f2:y         timestamp=1467166038766, value=8000 
1 row(s) in 0.0110 seconds

hbase(main):028:0> get 'urtab','r2','f2:' 
COLUMN        CELL                                
 f2:b         timestamp=1467166032165, value=7000 
 f2:y         timestamp=1467166038766, value=8000 
2 row(s) in 0.0200 seconds

hbase(main):029:0> 

hbase(main):029:0> get 'urtab','r2',['f1:a','f2:b']
COLUMN        CELL                                
 f1:a         timestamp=1467165997467, value=5000 
 f2:b         timestamp=1467166032165, value=7000 
2 row(s) in 0.0100 seconds

hbase(main):030:0> 

______________________________

hbase(main):031:0> put 'urtab','r2','f1:x',10000
0 row(s) in 0.0090 seconds

hbase(main):032:0> get 'urtab','r2'
COLUMN        CELL                                
 f1:a         timestamp=1467165997467, value=5000 
 f1:x         timestamp=1467166686843, value=10000
 f2:b         timestamp=1467166032165, value=7000 
 f2:y         timestamp=1467166038766, value=8000 
4 row(s) in 0.0140 seconds

hbase(main):033:0> 

_________________________________

hbase(main):044:0> delete  'urtab','r2','f1:x'
0 row(s) in 0.0110 seconds

note:
  to delete a column family, all columns of the family should be deleted.

 to delete a row, all columns families of the row should be deleted.

sqoop import and export lesson 3

$ mysql -u root

mysql>

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| datahub            |
| datamart           |
| halitics           |
| hivemetastore      |
| movielens          |
| mysql              |
| product            |
| siri               |
| training           |
+--------------------+
10 rows in set (0.11 sec)

mysql>

mysql> create database practice;
Query OK, 1 row affected (0.02 sec)

mysql> use practice;
Database changed


mysql> show tables;
Empty set (0.00 sec)

mysql> create table samp(id int primary key,
    ->     name char(10), sal int,
    ->   sex char(1), dno int);
Query OK, 0 rows affected (0.00 sec)

mysql> describe samp;
+-------+----------+------+-----+---------+-------+
| Field | Type     | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| id    | int(11)  | NO   | PRI | NULL    |       |
| name  | char(10) | YES  |     | NULL    |       |
| sal   | int(11)  | YES  |     | NULL    |       |
| sex   | char(1)  | YES  |     | NULL    |       |
| dno   | int(11)  | YES  |     | NULL    |       |
+-------+----------+------+-----+---------+-------+
5 rows in set (0.00 sec)

mysql>

mysql>
mysql> insert into samp values(101,'aaaa',10000,'m',12)
    -> ;
Query OK, 1 row affected (0.01 sec)

mysql> insert into samp values(102,'addaaa',20000,'f',12);
Query OK, 1 row affected (0.00 sec)

mysql> insert into samp values(103,'ada',20000,'f',13);
Query OK, 1 row affected (0.00 sec)

mysql> insert into samp values(104,'ada',50000,'f',11);
Query OK, 1 row affected (0.00 sec)

mysql> insert into samp values(105,'addda',70000,'m',12);
Query OK, 1 row affected (0.00 sec)

mysql> insert into samp values(106,'adddda',80000,'m',11);
Query OK, 1 row affected (0.00 sec)

mysql> insert into samp values(107,'xadddda',70000,'f',12);
Query OK, 1 row affected (0.00 sec)

mysql>

mysql> select * from samp;
+-----+---------+-------+------+------+
| id  | name    | sal   | sex  | dno  |
+-----+---------+-------+------+------+
| 101 | aaaa    | 10000 | m    |   12 |
| 102 | addaaa  | 20000 | f    |   12 |
| 103 | ada     | 20000 | f    |   13 |
| 104 | ada     | 50000 | f    |   11 |
| 105 | addda   | 70000 | m    |   12 |
| 106 | adddda  | 80000 | m    |   11 |
| 107 | xadddda | 70000 | f    |   12 |
+-----+---------+-------+------+------+
7 rows in set (0.00 sec)

______________________

[training@localhost ~]$ sqoop import    --connect  jdbc:mysql://localhost/practice   --username root   --table  samp   --target-dir   sqimp1

[training@localhost ~]$ hadoop fs -ls sqimp1
Found 6 items
-rw-r--r--   1 training supergroup          0 2016-06-01 19:36 /user/training/sqimp1/_SUCCESS
drwxr-xr-x   - training supergroup          0 2016-06-01 19:36 /user/training/sqimp1/_logs
-rw-r--r--   1 training supergroup         42 2016-06-01 19:36 /user/training/sqimp1/part-m-00000
-rw-r--r--   1 training supergroup         38 2016-06-01 19:36 /user/training/sqimp1/part-m-00001
-rw-r--r--   1 training supergroup         21 2016-06-01 19:36 /user/training/sqimp1/part-m-00002
-rw-r--r--   1 training supergroup         45 2016-06-01 19:36 /user/training/sqimp1/part-m-00003
[training@localhost ~]$

[training@localhost ~]$ hadoop fs -cat sqimp1/part-m-00000
101,aaaa,10000,m,12
102,addaaa,20000,f,12
[training@localhost ~]$ hadoop fs -cat sqimp1/part-m-00001
103,ada,20000,f,13
104,ada,50000,f,11
[training@localhost ~]$ hadoop fs -cat sqimp1/part-m-00002
105,addda,70000,m,12
[training@localhost ~]$ hadoop fs -cat sqimp1/part-m-00003
106,adddda,80000,m,11
107,xadddda,70000,f,12
[training@localhost ~]$

by default sqoop initiates 4 mappers.

 these mappers will parallely importing data .

that means, table is splitted into 4 parts,
 each part is taken by seperate mapper.

how to controll number of mappers.

 use the option -m <number>

  ex:   -m 2

[training@localhost ~]$ sqoop import \
>  --connect  jdbc:mysql://localhost/practice \
>  --username root  \
>  --table samp -m 2 --target-dir sqimp2

[training@localhost ~]$ hadoop fs -ls sqimp2
Found 4 items
-rw-r--r--   1 training supergroup          0 2016-06-01 19:48 /user/training/sqimp2/_SUCCESS
drwxr-xr-x   - training supergroup          0 2016-06-01 19:48 /user/training/sqimp2/_logs
-rw-r--r--   1 training supergroup         61 2016-06-01 19:48 /user/training/sqimp2/part-m-00000
-rw-r--r--   1 training supergroup         85 2016-06-01 19:48 /user/training/sqimp2/part-m-00001
[training@localhost ~]$ hadoop fs -cat sqimp2/part-m-00000
101,aaaa,10000,m,12
102,addaaa,20000,f,12
103,ada,20000,f,13
[training@localhost ~]$

____________________

mysql> create table damp(id int,
    ->   name char(10), sal int, sex char(1),
    ->   dno int);
Query OK, 0 rows affected (0.00 sec)

mysql> insert into damp select * from samp;
Query OK, 7 rows affected (0.01 sec)
Records: 7  Duplicates: 0  Warnings: 0

mysql> select * from damp;
+------+---------+-------+------+------+
| id   | name    | sal   | sex  | dno  |
+------+---------+-------+------+------+
|  101 | aaaa    | 10000 | m    |   12 |
|  102 | addaaa  | 20000 | f    |   12 |
|  103 | ada     | 20000 | f    |   13 |
|  104 | ada     | 50000 | f    |   11 |
|  105 | addda   | 70000 | m    |   12 |
|  106 | adddda  | 80000 | m    |   11 |
|  107 | xadddda | 70000 | f    |   12 |
+------+---------+-------+------+------+
7 rows in set (0.00 sec)

mysql>


if table is not having primary key,

 number of mappers should be 1.


why?

 when  multiple mappers mappers initiated,
  first mapper automatically points to begining  record of the first split. remaining mappers can not point to begining of their splits randomly. bcoz there is no primary key.

  so only sequential reading is allowed from beginning of table to end of the table.

   to make begining to ending as one split,
 only one mapper should be intiated.

    -m 1.

______________________

[training@localhost ~]$ sqoop import  --connect  jdbc:mysql://localhost/practice  --username root   --table damp -m 1  --target-dir sqimp4


[training@localhost ~]$ hadoop fs -ls sqimp4
Found 3 items
-rw-r--r--   1 training supergroup          0 2016-06-01 19:58 /user/training/sqimp4/_SUCCESS
drwxr-xr-x   - training supergroup          0 2016-06-01 19:58 /user/training/sqimp4/_logs
-rw-r--r--   1 training supergroup        146 2016-06-01 19:58 /user/training/sqimp4/part-m-00000
[training@localhost ~]$ hadoop fs -cat sqimp4/part-m-00000
101,aaaa,10000,m,12
102,addaaa,20000,f,12
103,ada,20000,f,13
104,ada,50000,f,11
105,addda,70000,m,12
106,adddda,80000,m,11
107,xadddda,70000,f,12
[training@localhost ~]$

_______________________________

to filter rows at the time importing..

[training@localhost ~]$ sqoop import  --connect  jdbc:mysql://localhost/practice  --username root   --table damp -m 1 --where 'sal<50000' --target-dir sqimp5



[training@localhost ~]$ hadoop fs -cat sqimp5/part-m-00000
101,aaaa,10000,m,12
102,addaaa,20000,f,12
103,ada,20000,f,13
[training@localhost ~]$

______________________

[training@localhost ~]$ sqoop import  --connect  jdbc:mysql://localhost/practice  --username root   --table damp -m 1 --where 'sex="m"' --target-dir sqimp6


[training@localhost ~]$ hadoop fs -cat sqimp6/part-m-00000
101,aaaa,10000,m,12
105,addda,70000,m,12
106,adddda,80000,m,11
[training@localhost ~]$

_______________________

importing selective columns....

[training@localhost ~]$ sqoop import  --connect  jdbc:mysql://localhost/practice  --username root   --table damp -m 1 --columns name,sal,dno --target-dir sqimp7

[training@localhost ~]$ hadoop fs -cat sqimp7/part-m-00000
aaaa,10000,12
addaaa,20000,12
ada,20000,13
ada,50000,11
addda,70000,12
adddda,80000,11
xadddda,70000,12
[training@localhost ~]$