What are the best blogs about NoSQL

Tuesday 20th November 2018

NoSQL database after the hype

Relational databases (RDB) have been around for a long time. The relational database model was introduced in 1970 by Edgar F. Codd. Many developers are very familiar with SQL, and a relational database has been used for virtually all persistence problems. These RDBs also do a good job: Terabytes of data are not a problem, reliability is also not a problem and there are suitable drivers for all (relevant) programming languages. When the internet took off, search engine operators needed a database to quickly answer search queries. However, the requirements here are very special (flat data model, extremely large databases, scaling, massive numbers of users). RDBs can't handle this particular scenario really well. Exactly suitable databases were invented for this at that time. These new databases are called NoSQL (Not only SQL) because they do not follow the relational model.

The term NoSQL has been around since 1998, when it was used for “no SQL”. The term NoSQL has only been used since 2009 for “Not only SQL”.

Difference between RDB and NoSQL

RDBs store their data in tables, whereby there can be relationships between the tables. This enables another feature of these databases: normalization. This avoids redundancies in the data, which ultimately simplifies maintenance. Edgar F. Codd has proposed 4 normal forms for this, of which only 3 are usually implemented in practice. Some RDBs offer replication out of the box. Only a few offer cluster capability. Usually an RDB is scaled vertically over the hardware (CPUs, cores and RAM), since the database server usually only runs on one machine.

NoSQL databases have no schema and can handle unstructured data very well. Most of these databases work natively with JSON or BSON (binary JSON). The RDB are catching up here, as many of them can now handle JSON natively. Since the issue of scaling is one of the reasons why this category of databases was invented, almost all NoSQL DBs offer horizontal scaling (sharding) by default. The database is distributed to different machines and there is usually a proxy that distributes the search queries to the machines and summarizes the results again. Fail-safety through replication is also provided with almost all NoSQL databases.

Different types of NoSQL

In principle, one could summarize all databases under the keyword NoSQL that do not meet the relational principle. In the course of time, however, a few sub-categories have emerged, which are listed below.

Key / value

This type is basically a persistent HashMap. The associated data object can be reached via the key. These NoSQL databases are often used for cache systems and by search engine operators.
Examples: Redis, memcached, DynamoDB

Document

Document databases have received a lot of attention lately because they offer many of the features of RDB. There are no tables, but e.g. Collections (MongoDB), which are quite similar to tables. However, these collections do not have a fixed structure. So each data record can have a completely different structure than the next. However, this is suboptimal for finding such data again. A certain structure should already exist. Normalization can be achieved by using several collections. However, this means that atomic access is lost.
Examples: MongoDB, Couchbase

graph

Graph databases are used when the data spans a complex network of relationships. Suitable use cases would be social networks such as Facebook, Xing and LinkedIn.
Example: neo4j

object

An object database represents the pure form for the native storage of objects. With RDB, a Relational Mapper (ORM) object such as Hibernate is required for this type of persistence. The tasks performed by an ORM have been relocated to an object database in the database.
Example: db4o

columns

Relational databases work line-oriented. In data warehouse applications (Online Analytical Processing - OLAP) it is advantageous if the database works in a column-oriented manner, since aggregates often have to be formed over very large columns. A column-oriented database has speed advantages for such queries.
Examples: Cassandra, Amazon SimpleDB

Multi model

Multi-model NoSQL databases master more than 1 paradigm. OrientDB can handle documents and graphs, for example. ArangoDB can also be used as a key / value store.
Examples: ArangoDB, OrientDB

Consistency with NoSQL

RDBs offer ACID (Atomicity, Consistency, Isolation, Durability). If a database client has written a value, it can be assumed after the end of the transaction that this value can be read by all other clients.
In the case of NoSQL databases, which often work with replication sets and sharding, this consistency model is mostly weakened due to scalability / speed. For example, it can happen that the written value was only actually written on one machine in the cluster and the other two machines (with a 3-part replica set) do this a little later due to network latencies. If all 3 machines are authorized for read operations, a client can catch the machine where the value has not yet been written. This behavior is called eventual consistency. Ultimately, the data is also consistent with NoSQL databases, just not immediately. This is the price for distributed data storage. With MongoDB, for example, the parameters that define the consistency behavior can be set very finely.

Transactions

RDBs provide transactions to perform multiple contiguous read / write operations, either all or not at all. It is therefore always ensured that either all operations were successful (commit) or nothing was changed (rollback).
With NoSQL databases there is no need for transactions over several operations, since a document contains all data that logically belong together and a document is either written in full or not. Here, too, the two worlds converge: MongoDB supports multi-document transactions, e.g. from v4.0.

Conclusion

For me, NoSQL databases are the specialists among persistence solutions. There is also a suitable NoSQL database for almost all special cases. RPVs, on the other hand, are the "all-purpose weapon" for persistence if there are no requirements that speak against it. It depends very much on the use case of the application whether, and if so, which NoSQL database should be used. In my opinion, RPVs are far from being considered a species that is becoming extinct. For a large number of applications (not just business applications), RDBs are still a good choice. It should not be forgotten that many developers have profound SQL knowledge and the large number of existing NoSQL databases makes it more difficult to find suitable specialists.

For me, NoSQL databases have arrived in the mainstream and you should definitely consider them when developing new products.

Andreas Lüdtke