The biggest reason behind people’s obsession with Tableau analytics is it’s amazing engine, Tableau Hyper Database (“Extract”). By enhancing its multi-node capacity, separating and then distributing the query process, Hyper can act like original business MPP data, which can run on modern infrastructure like Kubernetes. Here, we will discuss how to build an MPP database with Hyper. If you are new to this technology, join in Tableau Bootcamp to practically implement these aspects.
Fundamentals and Background Tableau’s Hyper database was built from Scratch with standard functionalities (LLVM Code generation, Columnar data store, in-built capabilities, etc.), SQL dialect compatibilities and Postgres network. It is a fast, neat and convenient database. You can assign a full set of Postgres SQL statements with the new Extract API including copying and moving data in huge amounts.
It’s great that we can approach the Hyper database with just minor adjustments to the Libpq based apps like PostgresODBC and PSQL by accessing the core potential of the engine.
MPP- (Shared-nothing Use Case)
Commonly MPP (Massive Parallel Processing) architecture provides a database to run multiple worker nodes, partial datasets and aggregators combining the results from processing nodes. But if the horizontal scalability is missing, it won’t be able to leverage multi-server nodes to accelerate single queries.
By adding twice as many nodes you then see a two-fold increase in the performance level. Take an example of a webshop, where you store all your transactions in a single extract file. For evaluating the overall customer value, firstly you need to process the transaction for a specific customer, and then you can view the output in your Tableau report for all customers. And, if you have multiple computers to calculate the customer value, which is located in the same node, the algorithm will independently work on separate servers for each customer’s transactions. Multiple nodes will get more performance.
Converting Hyper Database to MPP Architecture
So how would you do this? Let’s check out a few things to make this conversion successful:
1. Build independent worker nodes on generic hyper Database
- Create a docker image from Hyper Database that can be monitored from different sources
- To manage it’s flexibility, deploy it on Kubernetes as a Service
2. Build an aggregator that will be the master node. In Postgres 11, there is a link-like facility that diverts the queries to other databases (Hyper also acts like a Postgres). So firstly deploy the Postgres 11 on Kubernetes, and set-up foreign-data wrapper with the help of Hyper. Then, import and synchronize metadata across master nodes.
3. Finally, the aggregation will be done on shared-nothing data, and then you can validate it easily.
After a thorough study and practically using it’s various aspects, you will be able to build a distributed Hyper MPP database cluster that supports horizontal and vertical scaling, ingestions and allotting queries between servers in a Kubernetes cluster. There is a single limitation of custom SQL based data source as Postgres that it drives back on partition tables.
So, if you want to leverage more benefits for your business by using tableau software, study and solve more practical use-cases. Join ExistBI’s Tableau Bootcamp to gain knowledge and hands-on experience to fulfill your company’s needs.
For more information about ExistBI’s Tableau Training and Tableau Consulting services, call your nearest office: US/Canada: +020 8610 1823 | UK/Europe: +44 (0)207 554 8568 or complete ExistBI’s contact form.