Anyone who wants to use data from different sources for complex analysis usually operates a data warehouse. However, being able to absorb peak loads requires a lot of work and costs. Are there alternatives?
A data warehouse (DWH) is necessary for companies if analyses (reporting, advanced analytics, business intelligence, machine learning, data mining) want to use different data in a targeted manner. The data records are then transferred from the respective source systems to the database using appropriate tools. More and more companies are opting for an ELT strategy, in which the transformations are performed on the query, and both structured and unstructured data are stored in raw format.
The hardware should scale an on-premises data warehouse so that queries can also be served at peak loads. This implies that the hardware is over designed much of the time. In addition, their own employees have to take care of the operation of the ETL routes. Therefore, operating an on-premises data warehouse is challenging because of its scalability and future security. There are also conditions, such as government regulations, that make a local DWH indispensable. So are there any alternatives to on-premises operation?
Qlik Sense As A DWH Replacement
No question, Qlik Sense is an outstanding and modern business intelligence ( BI ) tool. It’s excellent for visualisation, especially when data needs to be loaded from multiple sources.
Qlik Sense comes with its own SQL-like and compelling query language and, in contrast to other BI tools, can, in principle, even replace a data warehouse thanks to these ETL query options. For this purpose, Qlik’s choice can be used to extract data after the query (“extract”) and, if necessary, further calculations, such as e.g. B. Data Cleansing or the unification of different sources (“Transform”) as a QVD file on the hard drive. The data stored within these QVD files is column-oriented and compressed, so a compression factor of one to ten compared to the raw data. Subsequent access to this data is very efficient.
Corresponding analyses, reports and dashboards can be created from the QVD files, or ad-hoc self-service BI queries can be started. Delta loads can also be implemented by skilfully saving the QVD files, data marts set up, and historical records of data sources implemented.
The positive aspects of such a development are that a Qlik project can be implemented much faster, more agile and therefore cheaper than programming ETL routes for a data warehouse. The data extraction, the data modelling, the data visualisation, and the reporting take place in the same tool so that licence costs for the DWH and any necessary ETL tools are eliminated.
However, the disadvantage of this option is a strong vendor lock-in: QVD files can only be read and used by Qlik products. Further use of the data outside of Qlik is challenging to implement. In addition, there is poor scalability: QVD files are not intended for long-term archiving in terabytes or petabytes in size. So they cannot be considered total replacements for big data use cases.
Query Engine On A (Cloud) Data Lake
Another alternative to running a data warehouse is fast but cheap cloud data lake storage for data storage combined with a query engine for querying the data. One advantage of this architecture is that the full scalability of the cloud in terms of storage space (storage) and computing power (compute) can be used independently of one another. Filling the data lake is much easier and faster compared to a conventional DWH since the raw data is stored one-to-one in the data lake. The strategy is ELT.
Necessary transformations do not (yet) take place at this point but are based on subsequent queries, e.g. B and moved to Qlik Sense or other tools. Alternatively, however, prepared views, ” materialised opinions,” or “reflections” can be made available, in which logic is already installed to simplify and accelerate queries. The widespread query language SQL can be used to query the data, even on unstructured data such as JSON files. There is no need to transform the data into an ordered schema of a DWH. There is no vendor lock-in since the data can be easily exchanged at any time, e.g. B. can be transferred to another cloud provider. In addition to the advantages already described, the data does not have to be duplicated in a DWH, and there are no costs.
On the other hand, the disadvantage is creating prepared views or virtual datasets so that even less trained users can carry out secure data queries based on a checked data basis. Much of the effort is shifted to querying the data’s inappropriate tools. The lower complexity of avoiding ETL is only postponed to a later point in time and partly to other agencies.
Cloud Data Warehouses
The third alternative is to use a modern cloud data warehouse. Combined with cloud data lake storage, this results in almost unlimited storage options and scalability. Compared to an on-premises data warehouse, the cloud-based variants offer considerable potential for reducing costs while increasing security and availability at the same time. According to the Forrester Wave Cloud Data Warehouse Report, the “pay-per-use” strategy can save at least 20 per cent compared to on-premises data warehouses. Some companies have even achieved savings of between 70 and 80 per cent.
A significant problem with DWH architectures is the laborious creation and operation of the ETL routes to fill the DWH systems. If changes are made to the source system, this will result in many activities and programming efforts in the downstream systems. With DWH automation tools such as B. Qlik Replicate or Compose, data marts can be created automatically, and data can be transferred live to the cloud using the change data capture method.
The manual effort involved in creating and maintaining the ETL routes is drastically reduced, and one of the main points of criticism of DWH solutions is eliminated. Of course, the combination of different solutions leads to increased licence costs.
Can Qlik Sense Replace A DWH?
Yes, suppose structured data, e.g. B. from ERP systems, are processed exclusively for reporting purposes and dashboarding. In that case, no DWH is required for Qlik Sense. Although a strong vendor lock-in is expected here, the speed of development and the savings compared to the DWH costs are exceptionally positive.
Can A Query Engine In Combination With Cloud Storage Replace A DWH?
Yes, but then the ETL routes’ effort during the DWH operation’s data query shifts to the corresponding ELT query engine to construct materialised views and reflections or subsequent analysis tools. In combination with a BI tool, some disadvantages compared to pure Qlik operation can be compensated, namely vendor lock-in or scalability.
In the case of unstructured data, such as JSON files from IoT devices, the ELT methodology has clear advantages thanks to “Schema on reading” compared to a DWH with ETL and “Schema on Write”. High-performance Cloud Data Lake Storage and a fast query engine are highly recommended in these cases. However, the data may have to be transferred to the cloud data lake using an appropriate tool.
The operation of a modern cloud data warehouse combined with data warehouse automation is generally recommended. In this way, all the advantages of the cloud can be combined with the benefits of a DWH. The usually very complex programming and subsequent maintenance of the ETL routes can be automated to a large extent using appropriate tools. Even unstructured data can be stored in any size inappropriate data lake storage and still be queried in the DWH via SQL.
This setup also guarantees the availability of the data for future questions that are not currently in focus at all. The system can grow with the requirements. The disadvantages of a giant tool stack, the associated costs, and the need for a broader employee skill set are outweighed by the advantages and future-proofing.