Tech-Notes

Cassandra

Nosql stands for not only sql

Types of Nosql databases:

Document based - json, xml, excel files. used in content management, real time analtics, product management
Graph based - used in ML, fraud detection
Column based - has rows and columns. eg. casssadra,dynamodb
Key value pair based

Charateristics of cassandra:

highly scalable
high performance distributed database
designed to handle large amounts of data
high availability with no single point of failure
Column based storage database
fast write speed, because of clustering

About Cassandra: -developed by facebook 2008
-inspired from bigtable and dynamodb
-designed for unstructured or semi structured data
-masterless replication
-doesnt support relationship, but can use collection
-selective replication factor

Data replication:
-one or more nodes act as replication nodes. cassandra returns most recent updated value
-Cassandra performs a Read repair in background to update Replicas stale values
-Nodes communicate each other using Gossip protocol

Components in cassandra:

Node − place where data get stored
Data center − collection of related nodes
Cluster − component that contains one or more data centers.
Commit log − Crash-Recovery Mechanism, every write operation is captured here
Mem-table − memory-resident data structure. data will be store in mem-table
SSTable − when mem-table is full data will written here
Bloom filter − These are nothing but quick, nondeterministic, algorithms for testing whether an element is a member of a set. It is a special kind of cache. Bloom filters are accessed after every query.

Keyspace

schema or collection of tables.
it has two properties 1.replication,2.durable_write(default value is true, not applicable for simple strategy, verify durable_write we can use SELECT * FROM system_schema.keyspaces;)
Basic attributes are
1. Replication factor − denotes number of machines in cluster that will receive copies of same data
2. Replica placement strategy − strategy to place replicas in the ring. its attribtes are
- simple strategy (rack-aware strategy),
- old network topology strategy (rack-aware strategy), and
- network topology strategy (datacenter-shared strategy)
  1. Column families − represent structure of your data. its attributes are
- keys_cached − It represents the number of locations to keep cached per SSTable.
- rows_cached − It represents the number of rows whose entire contents will be cached in memory.
- preload_row_cache − It specifies whether you want to pre-populate the row cache.
```
CREATE KEYSPACE keyspaceName  
WITH replication = {'class': ‘strategyName’, 'replication_factor' : ‘NoOfReplicas’}  
AND durable_writes = ‘Boolean value’;   
```

Data Type:

text - Represents UTF8 encoded string
int - Represents 32-bit signed int
ascii - Represents ASCII character string
bigint - Represents 64-bit signed long
blob - Represents arbitrary bytes
Boolean - Represents true or false
counter - Represents counter column
decimal - Represents variable-precision decimal
double - Represents 64-bit IEEE-754 floating point
float - Represents 32-bit IEEE-754 floating point
inet - Represents an IP address, IPv4 or IPv6
timestamp - Represents a timestamp
timeuuid - Represents type 1 UUID
uuid - Represents type 1 or type 4
varchar - Represents uTF8 encoded string
varint - Represents arbitrary-precision integer

Collection	Description
list	A list is a collection of one or more ordered elements.
map	A map is a collection of key-value pairs.
set	A set is a collection of one or more elements.

User-defined datatypes

Cqlsh provides users a facility of creating their own data types. Given below are the commands used while dealing with user defined datatypes.

CREATE TYPE − Creates a user-defined datatype.
ALTER TYPE − Modifies a user-defined datatype.
DROP TYPE − Drops a user-defined datatype.
DESCRIBE TYPE − Describes a user-defined datatype.
DESCRIBE TYPES − Describes user-defined datatypes.

Types of statements in cql:

Data defination: create table, alter table, drop table, create keyspace etc
- CREATE KEYSPACE keyspaceName− Creates a KeySpace in Cassandra.
- USE keyspaceName_ − Connects to a created KeySpace.
- ALTER KEYSPACE keyspaceName − Changes the properties of a KeySpace.
- DROP KEYSPACE keyspaceName − Removes a KeySpace
- CREATE TABLE tableName (column1 name datatype PRIMARYKEY, column2 name data type, PRIMARY KEY ((column1), column2))
- ALTER TABLE tableName ADD/DROP columnName datatype
- DROP TABLE tableName
- TRUNCATE tableName
- CREATE INDEX columnName ON tablename(tableName_columnName)
- Drop INDEX columnName
Data Manipulation: insert, delete, update, select etc
- INSERT INTO tablename (column1 name, column2 name….) VALUES (value1, value2….) USING option
- UPDATE tablename SET column name = new value, column name = value…. WHERE condition
- DELETE FROM identifier WHERE condition
- SELECT FROM table name WHERE condition
- BATCH − Executes multiple DML statements at once. BEGIN BATCH insert stmt/ delete stmt/ update stmt APPLY BATCH
Secoundary indexes: create, drop indexes
- SELECT − This clause reads data from a table
- WHERE − The where clause is used along with select to read a specific data.
- ORDERBY − The orderby clause is used along with select to read a specific data in a specific order.
Materialized views: create, drop, alter materialized views etc
Database roles: create permission, granting permission, creating user etc
Documented Shell Commands- cqlsh commands
- HELP − Displays help topics for all cqlsh commands.
- CAPTURE filepath − Captures output of a command and adds it to a file.
- CONSISTENCY − Shows the current consistency level, or sets a new consistency level.
- COPY tableName TO filepath − Copies data from Cassandra to given file.
- DESCRIBE −
  - DESCRIBE keyspaceName - do list of tables
  - DESCRIBE tableName - do description of table
  - DESCRIBE type tableName - list all column types
  - DESCRIBE TYPES - list all user defined datatypes UDT
- EXPAND on/off− Beautify the output vertically
- EXIT − Using this command, you can terminate cqlsh.
- PAGING − Enables or disables query paging.
- SHOW host/version
- SOURCE fileName − can execute cql commands from mentioned filename
- TRACING − Enables or disables request tracing.

Rules in cassandra while select:

we need to use all primary key in where condition
Use clusterin key in where condition, in same order given in describe table