‎

1. General
- 1.1. Row-oriented
2. Data model
3. Testing
4. Noteworthy
- 4.1. A deep look into Cassandra's where clause
5. Resources

1 General

1.1 Row-oriented

Paritioned row store, in which data is stored in sparse multidimensional hashtables.
- sparse = any given row can have one or more columns, but each row doesn't have to have all the same columns
- paritioned = each row has unique key which makes data accessible
  - Keys distribute the rows across multiple data stores.
Cassandra stores data in a multidimensional, sorted hash table.
Data stored in each column is stored as a seperate entry in the hash table.

2 Data model

Column: is a name/value pair
Row: is a container for columns referenced by a primary key/row key
Table: is a container of rows
Keyspace: is a container for tables
Cluster: is a container for keyspaces that spans one or more nodes

2.1 Clusters

Cassandra is designed to be distributed over several machines operating together that appear as a single instance → cluster, also called ring, is the outermost structure.

2.2 Keyspaces

Outermost container for data
Container for tables
Defined by a name and set of attributes

2.3 Tables

Container of an ordered collection of rows
- Where each row is a container of columns
- Ordering is determined by the columns, which are identified as keys

3 Testing

You use the cassandra-stress tool together with some user-defined YAML files. Is quite flexible and functions really well for quickly testing schemas.

4 Noteworthy

4.1 A deep look into Cassandra's `where` clause

5 Resources

Cassandra Connector for Spark

Table of Contents