Which database types should you know? What are the differences?
Which one should I choose?
This article aims to answer these questions.
Let’s get started!
- 1 So many databases – why?
- 2 8 Examples of Databases
- 2.1 Relational databases – the enduring classic
- 2.2 Graphs – Simple Representation of Complex Data
- 2.3 Object-oriented – Close to the Programme
- 2.4 Hierarchical – Structure and Order
- 2.5 Columns – Columns and more Columns
- 2.6 Key-Value – Simplicity and Speed
- 2.7 Hadoop – The Big Data KingKong
- 2.8 Document orientated – More than expected
So many databases – why?
There are hundreds of databases on the market. Their task is to store, change and read data. You need a different database depending on your use case, budget and hardware. The old world only knew databases in the form of a table. New problems need new storage concepts.
Databases are divided into 2 basic types:
Relational – SQL
The word relation from mathematics is a set of ordered pairs – pragmatically: tables. Tables consist of rows, the database entry and the columns with the column names / attributes.
Pure relational databases can only query, add, manipulate and delete data records in this grid.
Not ONLY relational – NoSQL
NoSQL is a collective term for database types that are not only based on tabular structures. Depending on the type, it offers schema freedom, good scalability and new programming paradigms such as MapReduce. NoSQL will never completely replace SQL databases because relational databases are best suited for certain tasks.
Which database should I choose?
The choice of database can be the deciding factor in whether you succeed with your IT project. Take your time and analyse which database is right for your project. The database is the foundation, which is very difficult to replace during ongoing development.
- How long should the system exist?
- What are the requirements for the project?
- Can you estimate how much data the database needs to hold in 3 months, 1 year or 10 years?
Consider the following aspects:
- Scaling: NoSQL databases scale better than SQL databases because they can be distributed across many small servers. SQL databases are a good choice if you know that the database will not increase 100-fold by the end of the term (e.g. a personnel database at a university)
- Speed: Do you need a high read speed or a high write rate for your project? A materialised view or a read and write channel(CQRS) increases the SQL database performance.
- Data type: The data types to be stored can determine the database type. If you need to manage various documents (JSON), Couchbase or MongoDB are suitable. A big data cluster with Hadoop processes different data types in the same system.
- Accuracy or speed: The strict behavioural rules such as ACID in relational databases or the softer BASE rule set describe database properties (behaviour) that determine whether this database type is suitable for your software.
You can create a pro-con table for your implementation and evaluate it with the 4 points in each case.
8 Examples of Databases
Relational databases – the enduring classic
Relational databases use tables to store data. Relationships can exist between the tables via foreign and primary keys. The primary objective is to ensure the consistency, integrity and durability of the storage. If an error occurs during the storage process, the database rolls back the changes.
Application: Finance/transactions, business data and accounting processes
Products: MySQL, PostgresSQL
Pros
High reliability and accuracy
fixed schema
Cons
– Scales poorly for large amounts of data
– Merging tables requires a lot of main memory
Graphs – Simple Representation of Complex Data
A graph consists of nodes (entities) and edges (relationships). Attributes describe the nodes in more detail (JSON documents). The relationships can be directed (Y is X’s son) or non-directed (Z and A are friends).
Application: Graphs describe complex relationships, e.g. authorisations, genealogy, social media, networks in general
Products: Neo4j
Pro
Query and recognise complex relationships
Realistic modelling
Cons
– certain queries are very cumbersome
– Migrations are time-consuming
Object-oriented – Close to the Programme
Object-oriented programs (e.g. Java) use object-oriented databases to persist (store) the generated objects in RAM. This type of database was intended to relieve the developer of some of the work. This type has not yet become established.
Application: Any object-oriented programme (Java)
Products: ObjectDB
Pro
less development effort
no branch action level in between
Contra
– inflexible / portable?
– limited applications
Hierarchical – Structure and Order
A hierarchical database describes a tree with a root, branches and leaves that contain data. This structural form subdivides the data. This idea has not caught on, and relational databases have established themselves.
Application: The file system of an operating system corresponds to a hierarchical database. A folder structure on a computer enables the same form of storage. Some file systems work in a similar way to a database and prevent simultaneous writing.
Products: LDAP, Windows file system
Pro:
natural partitioning and storage
uncomplicated / no installation necessary
Contra:
– Updating cumbersome
– Reading cumbersome
Columns – Columns and more Columns
relational databases use rows to query data. If a programme calls up the last attribute in the row, the database iterates over all attributes. Column databases address the columns with an ID. This type supports the bundling of columns into column families, which your application retrieves in this constellation at recurring intervals.
Application: NoSQL with schema provides better planning capability.
Products: Cassandra
Pro
fast reading
partitioning of tables
Contra
– Cumbersome addressing (column, column family, timestamp)
Key-Value – Simplicity and Speed
This database stores a value for a key. This value can be a number, a string, a document, an image or another form of data. You can simply store any value that you can call up quickly and easily via the key.
Application: High score for games, caching content, PHP variable storage
Products: Redis
Pro
simplicity
speed
Contra
– No knowledge of the value (type, size, etc.)
– Key design determines success
Hadoop – The Big Data KingKong
The Hadoop database can handle petabytes of data. Heterogeneous data types such as JSON, CSV etc. are processed by the database file system (HDFS). The programmer can use MapReduce to perform calculations with the enormous amounts of data. The developer sends programming instructions to each individual computer, which calculates the results (Map) and then summarises them (Reduce). The client receives the results (if they are not too large).
Application: Hadoop is necessary wherever there is a heterogeneous data structure with huge amounts of data.
Products: Hadoop, Google Big Table (not available)
Pro
large amounts of data are possible
MapReduce offers new possibilities for data processing
Contra
– Reuse of MapReduce is difficult
– Non-trivial condition and queries
Document orientated – More than expected
instead of storing rows in a row, a document-oriented database stores the document almost unchanged as a BSON with a document ID. This schema enables more efficient queries and avoids impedeance mismatch (JOINS across multiple tables). The NoSQL design determines whether the queries are performant and how the application has to work with the data.
Application: e-commerce, customer databases, (film) ratings,
Products: MongoDB, Couchbase
Pro
efficient query
no JOINs
Contra
– Update due to schema freedom
– redundancies