Cloud Databases: Modern Approach in IT.
Almost any technology project requires a database, and throughout the history of the contemporary world, databases have been indispensable. Every time we make an online purchase, log in to a site, watch a show on a streaming service, access our bank accounts, etc, we unwittingly engage with databases. They are virtually everywhere.
Before computers, there was the concept of a database. Journals, libraries, and hundreds of filing cabinets were the primary locations for information storage back in the day. As you can imagine, paper documents took up valuable storage space, were difficult to locate, and were impossible to back up.
Luckily, with the advent of computers, databases were conceived for increased efficiency in record keeping. The history of databases dates back to the first two computerized database examples. In the 1960s, Charles Bachman created the first computerized database. The initial database is called the Integrated Data Store (IDS). After that, IBM unveiled its database system, dubbed the Information Management System.
In the 1970s, one of the most important database history developments occurred. This decade saw the publication of "A Relational Model of Data for Large Shared Data Banks," written by E. F. Codd. At the turn of the decade, this study popularized the term "relational database" and prompted the creation of this innovative approach to data storage and retrieval.
Years later cloud computing databases development was transformed into managed database services provided by various cloud providers. Nowadays database manufacturers include cloud or enterprise offers in their line of products that could be deployed to various clouds. Database services differ in deployment models, database engines, consumption models, and other cloud features but each service has its own benefits.
In this blog post, we take you from zero to hero in all things related to cloud databases. We'll take you through a cloud database journey including standard definitions, types, and database benefits. The second part will be devoted to cloud database benefits and database-managed offers.
What are Cloud Databases?
A cloud database is a database service built and accessed through a cloud platform. Different deployment models are used and different database engines are run. Deployment models are distinct in customizability and managing efforts, where more customization came with higher managing efforts and more customer responsibility. On the opposite side, low or no managing efforts give less customization with less customer responsibility but more built-in best practices. Various SQL and NoSQL engines could be provided as Software as a Service by cloud providers, and allow customers to select a database that exactly fits their needs.
Another term Database-as-a-Service often refers to Cloud Databases Services. It covers the idea that service providers implement part of the infrastructure and perform some database management tasks that allow users to use the database provided as a service. Overall, there are different ways of running a database engine, let get into details.
Infrastructure as a Service deployment model
Probably the most known deployment model for cloud-based databases is Infrastructure as a Service. To put it simply, Infrastructure as a Service or IaaS refers to the cloud provider hosting a virtual machine on which you install and operate your database, like how you currently manage the database on your physical servers on-premises. A good example of this model is SQL Server on Azure Virtual Machines by Microsoft Azure.
Cloud provider allows users to purchase and run virtual-machine instances. Image can be a pure operating system where the user needs to install a database server, purchase a required license if it is commercial software, and set up network access and security rules. As other options, users can either upload their own image with a database server installed on it or use ready-made virtual machine images that include an optimized installation of a database provided by a cloud provider or database vendor. In this case, the underlying operating system is already configured, but users should care about network access and security rules.
Database Infrastructure as a Service deployment model could be used in migration scenarios such as Relocate and Rehost. For Relocate scenario database servers that are run on-premise as virtual machines are moved to the cloud instances without making any changes. For Rehost scenario databases from on-premises database servers are migrated to the database servers hosted on the virtual instance in the cloud.
In this case, a database hosted in the cloud offers no significant advantages compared to a database hosted on an organization's servers. It differs in the pricing model and integration with cloud services:
- Users pay for consumed time for the running virtual machine and for a commercial license, if any, there are no upfront payments. The monthly fee to essentially rent a cloud database becomes an Operational Expense.
- Cloud databases' storage and computation capacities are elastic, giving you more options as your workload evolves.
- As the database service is hosted in the cloud, it could be natively integrated with various cloud services.
Disadvantages of this deployment model are quite obvious:
- A high customer effort to maintain the infrastructure.
- Additional payment for the infrastructure, the commercial license, and the staff who manages this infrastructure.
- Probably, that is the most important, this configuration is more prone to user errors.
Platform as a Service deployment model
The next deployment model is Platform as a Service or PaaS. In this case, users don’t need to manage underlying infrastructure such as hardware and operating systems because it is provided by cloud providers. Database-managed service provides cost-efficient, resizable capacity for an industry-standard database server combined with the benefits of a fully managed, up-to-date platform as a service. An example of this deployment model is Amazon RDS Service by AWS.
For database-managed service in the PaaS model, the cloud provider gives in some sense the predefined set of parameters for the underlying infrastructure. It allows users to select instance size and type, storage, and connectivity, but all common database administration tasks such as provisioning, patching and managing are the sole responsibility of the cloud provider. Database-managed services could provide additional features out-of-the-box as read-write replicas to improve performance, deployment to multi-availability zones to increase availability, scaling up settings and integration with logging and monitoring tools. Also, it's vital to calculate the total cost of the switch before making any commitments, as IaaS may be more expensive than PaaS, depending on the size of your server.
Database PaaS deployment model could be used in migration scenarios such as Repurchase and Replatform. Repurchase migration scenario could be implemented by purchasing and moving databases to the newer version of the same database engine run by database managed service in the PaaS model. For Replatform scenario architecture and data structure of the application’s databases are not changed, but databases are moved to database-managed service with the same or compatible database engine.
This approach gives more benefits to customers than the IaaS deployment model:
- You can run the same database as on-premises and concentrate more on application-related tasks instead of managing the infrastructure.
- More payment models are available, such as pay-as-you-go or pay upfront with discounts.
- Database engine and underlying operating system could be tuned and optimized, which gives users the ability to stay with already implemented database architecture but achieve better performance, stability and reliability.
- Cloud provider manages backups, software patching, automatic failure detection and recovery. In addition, cloud providers allow users to create manual backups and restore databases by using either manual or automated backups.
Disadvantages of PaaS deployment model are the following:
- Customers still need to configure infrastructure parameters such as instance size, storage, security and connectivity. Some cloud providers even provide limited access to the underlying infrastructure to help users manage legacy business applications.
- This model is mostly used for relational databases as they require complex underlying infrastructure.
- The cost of usage is still high and usually, users are charged for the database service even during idle periods.
Serverless database services
Serverless database services are a new trend that in essence is the next generation of database services deployed as a Platform as a Service model. In such a scenario a cloud provider is still responsible for the entirety of infrastructure. Contrary to the PaaS model, developers don’t get to select the underlying server parameters (storage size, processor, etc.). The most common use case for a serverless approach is cloud computing, with AWS Lambda or GCP Cloud Functions as great examples. It is essentially a tool that allows code execution when triggered. A developer is only billed for the computing power used. A cloud provider is responsible for ensuring service is highly available and that it scales when necessary. In the case of databases, by “serverless” we usually mean a database that is able to automatically start up, shut down, and scale capacity up or down based on your application's needs.
Serverless databases are highly available and scalable. They must provide minimum latency. Most cloud databases of this kind allow for easy creation and maintenance of read replicas for better accessibility, stability and automatic failover. In AWS, examples of such DBs are Amazon DynamoDB or Amazon Aurora Serverless. Because none of the server parameters are fixed, billing is based on actual usage. For instance, in the case of Aurora service, pricing is based on storage used for the main database, replicas and backup as well as based on the amount of IO operations and data transfer.
In terms of cost or maintenance, the serverless approach inherits all its pros and cons from the PaaS model. However, there is one more advantage that usually stands out, which is a wide range of tools and options for scalability. Databases can be scaled up and down as needed almost limitlessly. Pricing is also more scalable than in PaaS because size changes more dynamically.
As for disadvantages, except for the ones mentioned in the previous section, the biggest issue is that the easiest and cheapest way to implement serverless is to use an external cloud provider. But there are two important consequences of such a decision. First of all, data will be stored outside of a company's internal infrastructure which might be a problem for some companies. Secondly, a vendor lock-in might be a problem if an organization one day decides to switch to a cloud provider.
Software as a Service deployment model
The using Software as a Service deployment model frees customers from a lot of time-consuming database tasks such as the provisioning underlying hardware and software, installing, patching and backups. Database fully managed services provided by top-level cloud providers covers a whole range of database engines. An example of this deployment model is Firestore by Google Cloud Provider.
Probably, all NoSQL cloud database services are provided as fully managed database services. It supports the main idea of NoSQL databases that should just work and solve particular application tasks. Users need just a few clicks to start using these services, but propositions are slightly different from various cloud providers that we consider in the second part of the post.
Database fully managed services could be used in migration scenarios such as Repurchase and Refactor. For Repurchase migration scenario databases are moved to the newer version of the same database engine run by database managed service in SaaS model. Refactor or Rearchitect migration scenario supposes significant rework of application architecture and, consequently, database structure. To improve the application, databases could be migrated to a fully managed database service to reduce management tasks or data could be migrated to NoSQL database engines provided by database service in the SaaS model.
Fully managed database services have a lot of benefits:
- Users don’t need to care about all database management tasks and could concentrate on business tasks.
- Users don’t need to predict load and tune performance of the database services provided in the SaaS model, as they are scaled automatically by cloud providers.
- Users get additional cloud features out-of-the-box such as cross-region metrics, high availability, integration with other services, automatic backups, etc.
- Database services are usually priced based on the consumption of resources or by a number of queries instead of a number of cluster nodes or the size of the predefined storage.
Disadvantages of SaaS approach are mostly opposite side of its strengths:
- Costage is not so obvious as there is no fixed price like for the instance’s hours in IaaS.
- As fully managed database services are created for cloud architecture usually there is no on-premise equivalent, but compatible database services exist.
- Databases could have just a few parameters to configure.
Different Cloud database engines
It is a popular idea to divide all databases into two classes: relational and non-relational (NoSQL). However, we can distinguish many more types, each designed for a different use case.
This is one of the oldest types of databases and at the same time the most popular one so far. It allows storing very strictly structured data. This quality makes it very easy to analyze and navigate data. Relationships between objects can be easily tracked. Though, this kind of database is hard to scale, analyze large amounts of data and is not efficient for some more complex queries. Some of the popular solutions in this category there are Microsoft SQL Server, Postgres, MySQL and Oracle Database. There are also cloud-native solutions that were designed specifically for cloud environments, these are for instance Azure SQL Database or Amazon Aurora.
This type of database is a very simple concept. It is focused on a single table with a predefined or dynamic set of columns. Such an approach puts a lot of constraints on application’s design, however, enables very quick data storing, retrieval and incomparable scalability potential. Some cloud solutions of this kind also offer the possibility to span a single database globally in multiple physical locations. Examples of such databases are Azure Cosmos DB or Amazon DynamoDB.
Databases of this kind are the opposite of what people usually think of when you ask them about the concept. Most importantly, in-memory databases lack the ability to store data for long periods of time. Especially, all the data is lost whenever the server is restarted. This is because of in-memory database design purposes. Such databases are mostly used to store cache information or other real-time data, hence, they are created to provide maximum performance. Persistence is not so important in such scenarios. The most popular in-memory databases are Redis and Memcached, with devoted cloud-native implementations such as GCP Memorystore or IBM Cloud® Databases for Redis.
In some cases, for example, where DDD (Domain Driven Design) is in use, document databases come in handy. They are designed to store bigger chunks of data together (e.g. entire invoice data). They have limited querying capabilities but provide high consistency together with better performance than in the case of traditional RDS databases. The most popular implementation is MongoDB. There are also some cloud-native implementations, for example, a MongoDB-compatible database from AWS - Amazon DocumentDB or from IBM - IBM Cloud® Databases for MongoDB.
Relational databases allow users to easily follow table data structures and implement additional data structures that form specific relationships. At some point, however, engineers realized not all types of data and not all relations can be reflected in a set of tables. This is because sometimes relations are equally dynamic as the data itself. Imagine storing information about friendly relations between all social network users in the form of tables. It’s possible but highly inefficient.
That’s why a new kind of database was created - the graph databases. They allow us to easily monitor relations between distinct objects, and what’s more important, to analyze these relations efficiently. Continuing the example of storing friendship data, graph databases allow users to easily and quickly find, for instance, all 1st and 2nd-level friends of a given person. A similar task would be very demanding for a traditional relational database. The most popular examples of such databases are Neo4J Cloud, IBM Compose for JanusGraph and Amazon Neptune.
Time Series databases
With the growth of IoT applications database manufacturers are faced with a new challenge - when the database needs to store and analyze trillions of events per day. It forms a new kind of database service that is called time series databases. These are the databases that organize data into a series of events ordered by time of occurrence. They offer multiple tools to analyze this kind of data in real time, identify patterns and trends, and use additional aggregate functions for analysis and insights, which might be useful in the finance, agricultural, automotive or spacecraft industries. Examples of such are InfluxDB and Amazon Timestream database services.
The ledger, more commonly known as a transaction log, or as the infamous blockchain, is a database designed to store a history of changes in a certain environment, in a way that it cannot be modified by any of the parties involved. Only new changes can be registered upon the agreement of all the parties. So fully managed ledger databases provide a transparent, immutable, and cryptographically verifiable transaction log. Examples of such solutions are Blockchain Platform Cloud Service offered by Oracle, or Amazon Quantum Ledger Database (QLDB) provided by Amazon.
It would be unfair not to mention other data services that in essence are not databases from a classic point of view. Among them are cloud implementations of ElasticSearch, distributed search and analytics engine, Data Factory or Data Lakes services - fully managed cloud services that allow automating various ETL and ELT processes, and various OLAP services helps to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes.
Differences between MySQL, PostgreSQL, and Oracle
MySQL is the most-used open-source database in the world. DB-Engines says that MySQL is the second most-used database, after Oracle Database. Facebook, Twitter, Netflix, Uber, Airbnb, Shopify, and Booking.com are just some of the most popular websites that use MySQL. Since MySQL is open source, it has many features that have been built with the help of users over more than 25 years. So it's likely that MySQL Database will work with your favorite app or programming language. MySQL works quickly, is reliable, can grow, and is easy to use.
It was made to handle large databases quickly, and it has been used for many years in environments with a lot of demands. MySQL is always being improved, but it has a large number of useful functions. MySQL is a great way to connect to databases on the internet because it is fast, secure, and easy to use.
PostgreSQL is an advanced, enterprise-level, open-source relational database that supports both SQL (relational) and JSON (non-relational) querying. It is a very stable database management system that has been developed by the community for more than 20 years. This has made it very resilient, correct, and honest. PostgreSQL is used by many web, mobile, geospatial, and analytics apps as their main data store or data warehouse. PostgreSQL version 14 is the most recent major version.
PostgreSQL has been around for a long time and has a long history of supporting advanced data types. It also supports the same level of performance optimization as commercial databases like Oracle and SQL Server.
Oracle is a relational database management system made by the Oracle corporation. Any kind of data model can be used with RDBMS. It is a relational database management system with support for multiple models. It is mostly used for enterprise grid computing and data warehousing.
It has five different versions of its database, each with a different set of features.
- Standard Edition One: It is good for business applications with limited features that run on a single server or a large number of servers.
- Standard Edition: It gives you all the features that Standard Edition One did. It also offers support for more machines and the Oracle Real Application clustering service.
- Enterprise Edition: This version has a lot of features, such as security, performance, scalability, and availability, that are needed for online transaction processing in high-stakes applications.
- Express Edition is a free edition for beginners that can be downloaded, installed, managed, developed, and deployed.
- Personal Edition: It has the same features as the Enterprise edition, except for Oracle Real Application Clustering.
Maximizing Your Database-as-a-Service Approach
As systems move from monoliths with one large relational database management system (RDBMS) to microservices with devoted databases, the "right tool for the right job" approach is implemented in databases. In this post, we review different deployment types, and their advantages/disadvantages, that should be taken into consideration when users build application architectures.
Svitla Systems has unique experience helping clients extend and rearchitect current applications and build new systems using the most advanced technologies and development practices. Database services are a vital part of these applications, so Svitla’s database specialists and consultants provide to our clients with all required and exhaustive information concerning proposed architectures. With extensive knowledge of cloud databases, Svitla Systems has successfully implemented migration projects, including database migration for different engines and consumption models.
Let's meet Svitla
We look forward to sharing our expertise, consulting you about your product idea, or helping you find the right solution for an existing project.
Your message is received. Svitla's sales manager of your region will contact you to discuss how we could be helpful.