Q1). Explain what is Cassandra?
Ans: Cassandra is an open source data storage system developed at Facebook for inbox search and designed for storing and managing large amounts of data across commodity servers. It can server as both
Q2).What is the use of Cassandra and why to use Cassandra?
Ans: Cassandra was designed to handle big data workloads across multiple nodes without any single point of failure. The various factors responsible for using Cassandra are
Q3).Explain what is composite type in Cassandra?
Ans: In Cassandra, composite type allows to define key or a column name with a concatenation of data of different type. You can use two types of Composite Type
Q4).How Cassandra stores data?
Ans:
Q5). Mention what are the main components of Cassandra Data Model?
Ans: The main components of Cassandra Data Model are
Q6).What is Cassandra Data Model?
Ans: Cassandra Data Model consists of four main components:
Cluster: Made up of multiple nodes and keyspaces
Keyspace: a namespace to group multiple column families, especially one per partition
Column: consists of a column name, value and timestamp
ColumnFamily: multiple columns with row key reference.
Q7).Explain what is a column family in Cassandra?
Ans:
Q8). Explain what is a cluster in Cassandra?
Ans:
Q9).List out the other components of Cassandra?
Ans:
Q10).Explain what is a keyspace in Cassandra?
Ans: In Cassandra, a keyspace is a namespace that determines data replication on nodes. A cluster consist of one keyspace per node.
Q11).What is the syntax to create keyspace in Cassandra?
Ans: Syntax for creating keyspace in Cassandra is
CREATE KEYSPACE <identifier> WITH <properties>
Q12).Mention what are the values stored in the Cassandra Column?
Ans: In Cassandra Column, basically there are three values
Q13).Mention when you can use Alter keyspace?
Ans: ALTER KEYSPACE can be used to change properties such as the number of replicas and the durable_write of a keyspace.
Q14).Explain what is Cassandra-Cqlsh?
Ans: Cassandra-Cqlsh is a query language that enables users to communicate with its database. By using Cassandra cqlsh, you can do following things
Q15).Mention what does the shell commands “Capture” and “Consistency” determines?
Ans: There are various Cqlsh shell commands in Cassandra. Command “Capture”, captures the output of a command and adds it to a file while, command “Consistency” display the current consistency level or set a new consistency level.
Q16).What is mandatory while creating a table in Cassandra?
Ans: While creating a table primary key is mandatory, it is made up of one or more columns of a table.
Q17).Mention what needs to be taken care while adding a Column?
Ans: While adding a column you need to take care that the
Q18). Mention what is Cassandra- CQL collections?
Ans: Cassandra CQL collections help you to store multiple values in a single variable. In Cassandra, you can use CQL collections in following ways
Q19).Explain how Cassandra writes data?
Ans: Cassandra writes data in three components
Cassandra first writes data to a commit log and then to an in-memory table structure memtable and at last in SStable
Q20).Explain what is Memtable in Cassandra?
Ans:
Or:
Similar to table, memtable is in-memory/write-back cache space consisting of content in key and column format. The data in memtable is sorted by key, and each ColumnFamily consist of a distinct memtable that retrieves column data via key. It stores the writes until it is full, and then flushed out.
Q21).Explain what is SStable consist of?
Ans: SStable consist of mainly 2 files
Q22).What is SSTable? How is it different from other relational tables?
Ans: SSTable expands to ‘Sorted String Table,’ which refers to an important data file in Cassandra and accepts regular written memtables. They are stored on disk and exist for each Cassandra table. Exhibiting immutability, SStables do not allow any further addition and removal of data items once written. For each SSTable, Cassandra creates three separate files like partition index, partition summary and a bloom filter.
Q23).Explain what is Bloom Filter is used for in Cassandra?
Ans: A bloom filter is a space efficient data structure that is used to test whether an element is a member of a set. In other words, it is used to determine whether an SSTable has data for a particular row. In Cassandra it is used to save IO when performing a KEY LOOKUP.
OR
Explain the concept of Bloom Filter.
Ans: Associated with SSTable, Bloom filter is an off-heap (off the Java heap to native memory) data structure to check whether there is any data available in the SSTable before performing any I/O disk operation.Learn more about Apache Cassandra- A Brief Intro in this insightful blog now!
Q24).Explain how Cassandra writes changed data into commitlog?
Ans:
Data will not be lost once commitlog is flushed out to file.
Q25). Explain how Cassandra delete Data?
Ans: SSTables are immutable and cannot remove a row from SSTables. When a row needs to be deleted, Cassandra assigns the column value with a special value called Tombstone. When the data is read, the Tombstone value is considered as deleted.
Q26). Compare MongoDB and Cassandra
Ans:
Criteria | MongoDB | Cassandra |
Data Model | Document | Big Table like |
Database scalability | Read | Write |
Querying of data | Multi-indexed | Using Key or Scan |
Q27).List the benefits of using Cassandra.
Ans: Unlike traditional or any other database, Apache Cassandradelivers near real-time performance simplifying the work of Developers, Administrators, Data Analysts and Software Engineers.
Q28).Explain the concept of Tunable Consistency in Cassandra.
Ans: Tunable Consistency is a phenomenal characteristic that makes Cassandra a favored database choice of Developers, Analysts and Big data Architects. Consistency refers to the up-to-date and synchronized data rows on all their replicas. Cassandra’s Tunable Consistency allows users to select the consistency level best suited for their use cases. It supports two consistencies -Eventual and Consistency and Strong Consistency.
The former guarantees consistency when no new updates are made on a given data item, all accesses return the last updated value eventually. Systems with eventual consistency are known to have achieved replica convergence.
For Strong consistency, Cassandra supports the following condition:
R + W > N, where
N – Number of replicas
W – Number of nodes that need to agree for a successful write
R – Number of nodes that need to agree for a successful read
Q29).How does Cassandra write?
Ans: Cassandra performs the write function by applying two commits-first it writes to a commit log on disk and then commits to an in-memory structured known as memtable. Once the two commits are successful, the write is achieved. Writes are written in the table structure as SSTable (sorted string table). Cassandra offers speedier write performance.
Q30).Define the management tools in Cassandra.
Ans: DataStaxOpsCenter: internet-based management and monitoring solution for Cassandra cluster and DataStax. It is free to download and includes an additional Edition of OpsCenter
Q31).Explain CAP Theorem.
Ans: With a strong requirement to scale systems when additional resources are needed, CAP Theorem plays a major role in maintaining the scaling strategy. It is an efficient way to handle scaling in distributed systems. Consistency Availability and Partition tolerance (CAP) theorem states that in distributed systems like Cassandra, users can enjoy only two out of these three characteristics.
One of them needs to be sacrificed. Consistency guarantees the return of most recent write for the client, Availability returns a rational response within minimum time and in Partition Tolerance, the system will continue its operations when network partitions occur. The two options available are AP and CP.
Q32).State the differences between a node, a cluster and datacenter in Cassandra.
Ans: While a node is a single machine running Cassandra, cluster is a collection of nodes that have similar type of data grouped together. DataCentersare useful components when serving customers in different geographical areas. You can group different nodes of a cluster into different data centers.
Q33).How to write a query in Cassandra?
Ans: Using CQL (Cassandra Query Language).Cqlsh is used for interacting with database.
Q34).What OS Cassandra supports?
Ans: Windows and Linux
Q35). What is CQL?
Ans: CQL is Cassandra Query language to access and query the Apache distributed database. It consists of a CQL parser that incites all the implementation details to the server. The syntax of CQL is similar to SQL but it does not alter the Cassandra data model.
Q36). Explain the concept of compaction in Cassandra.
Ans: Compaction refers to a maintenance process in Cassandra , in which, the SSTables are reorganized for data optimization of data structure son the disk. The compaction process is useful during interactive with memtable. There are two type sof compaction in Cassandra:
Minor compaction: started automatically when a new sstable is created. Here, Cassandra condenses all the equally sized sstables into one.
Major compaction is triggered manually using nodetool. Compacts all sstables of a ColumnFamily into one.
Q37). Does Cassandra support ACID transactions?
Ans: Unlike relational databases, Cassandra does not support ACID transactions.
Q38).Explain Cqlsh
Ans: Cqlsh expands to Cassandra Query language Shell that configures the CQL interactive terminal. It is a Python-base command-line prompt used on Linux or Windows and exequte CQL commands like ASSUME, CAPTURE, CONSITENCY, COPY, DESCRIBE and many others. With cqlsh, users can define a schema, insert data and execute a query.
Q39).What is SuperColumn in Cassandra?
Ans: Cassandra Super Column is a unique element consisting of similar collections of data. They are actually key-value pairs with values as columns. It is a sorted array of columns, and they follow a hierarchy when in action: keystore> column family> super column> column data structure in JSON.
Similar to row keys, super column data entries contains no independent values but are used to collect other columns. It is interesting to note that super column keys appearing in different rows do not necessarily match and will not ever.
Q40). Define the consistency levels for read operations in Cassandra.
Ans:
Q41).What is difference between Column and Super Column?
Ans: Both elements work on the principle of tuple having name and value. However, the former‘s value is a string while the value in latter is a Map of Columns with different data types.
Unlike Columns, Super Columns do not contain the third component of timestamp.
Q42). What is ColumnFamily?
Ans: As the name suggests, ColumnFamily refers to a structure having infinite number of rows. That are referred by a key-value pair, where key is the name of the column and value represents the column data. It is much similar to a hashmap in java or dictionary in Python. Rememeber, the rows are not limited to a predefined list of Columns here. Also, the ColumnFamily is absolutely flexible with one row having 100 Columns while the other only 2 columns.
Q43). Define the use of Source Command in Cassandra.
Ans: Source command is used to execute a file consisting of CQL statements.
Q44). What is Thrift?
Ans: Thrift is a legacy RPC protocol or API unified with a code generation tool for CQL. The purpose of using Thrift in Cassandra is to facilitate access to the DB across the programming language.
Q45).Explain Tombstone in Cassandra.
Ans: Tombstone is row marker indicating a column deletion. These marked columns are deleted during compaction. Tombstones are of great significance as Cassnadra supports eventual consistency, where the data must respond before any successful operation.
Q46).What Platforms Cassandra runs on?
Ans: Since Cassandra Online Training is a Java application, it can successfully run on any Java-driven platform or Java Runtime Environment (JRE) or Java Virtual Machine (JVM). Cassandra also runs on RedHat, CentOS, Debian and Ubuntu Linux platforms.
Q47).Name the ports Cassandra uses.
Ans: The default settings state that Cassandra uses 7000 ports for Cluster Management, 9160 for Thrift Clients, 8080 for JMX. These are all TCP ports and can be edited in the configuration file: bin/Cassandra.in.sh
Q48). Can you add or remove Column Families in a working Cluster?
Ans: Yes, but keeping in mind the following processes.
Q49). What is Replication Factor in Cassandra?
Ans: ReplicationFactor is the measure of number of data copies existing. It is important to increase the replication factor to log into the cluster.
Q50).Can we change Replication Factor on a live cluster?
Ans: Yes, but it will require running repair to alter the replica count of existing data.
Q51).How to iterate all rows in ColumnFamily?
Ans: Using get_range_slices. You can start iteration with the empty string and after each iteration, the last key read serves as the start key for next iteration.