If you are a Data expert who deals with data warehouse consulting and different schemas in data warehouses, you probably already know the importance of these terms. However, if you are a beginner, you probably don’t know the subjects’ basic knowledge. As a data expert, it is essential for you to understand these basic terminologies, what they mean, and what purpose they serve. Throughout this article, you will find everything you need to know about schemas in data warehouse. We will discuss their two significant types, Star schema, Snowflake schema, and each’s advantages and challenges.
What Are Schemas In A Data Warehouse?
Schemas in data warehouse are logical descriptions of a database. One schema is a complete collection of objects like synonyms, indexes, views, and tables from a database. You can arrange schema objects in a variety of ways in different models for data warehousing.
Different kinds of schemas in data warehouses include Galaxy schema, Star schema, and Snowflake schema. We will discuss two of them ahead, but if you want to know more about data warehouses, ExistBi has plenty of information on the subject. You can find out what a data warehouse is, why it is essential, its advantages and disadvantages, and everything else relevant.
What Is Star Schema?
As mentioned earlier, one of the two schemas in data warehouse is the Star schema. It is undoubtedly the most straightforward data mart schema styling. Therefore, it is one of the most widely used approaches when developing dimensional data marts and data warehouses.
A star schema’s characteristics and components include a dimension table that is connected to a fact table through a foreign key. The schema also includes dimension tables that are not interrelated. Other characteristics include BI tools that support a schema, non-normalized dimension tables, easy understandability, and disk usage.
Designing A Star Schema:
Creating a Star schema isn’t a tough job if you know what you’re doing. Understanding how to make it can also clarify many concepts regarding the topic, like what it’s made of, how complex it is, and how you can enhance its usage. Here, the process is broken down into simple steps for you to understand:
Step 1: Identification after the business process to analyze. These business processes include sales.
Step 2: Identification of the facts and measures, such as the sales dollar.
Step 3: Identification of the various factual dimensions. These include the organization dimension, time dimension, location dimension, and product dimension.
Step 4: Organization of the columns describing every dimension, including the region name, branch name, etc. Lining up these dimensions and organizing them is an important aspect of the job.
Step 5: Determination of a fact table’s lowest summary level, which includes the sales dollar.
And that’s how you create a Star schema on your own!
The star schema is so widely used because it has several benefits over types of schemas. Some of these fantastic benefits are the following:
- Star schemas have a higher speed, and they are relatively faster.
- Their read-only performance is very high and efficient.
- Theories are more compatible and manageable to perform since one large table of data represents various dimension tables.
- The star schema provides data for Online Analytical Processing systems.
- It also simplifies the transactions of making the period over period and business reports.
What Is Snowflake?
This one is the other type of significant schema in data warehouse. Snowflake schemas are logical arrangements of various tables In a single multi-dimensional database. This arrangement happens so that the diagram mimics the shape of a snowflake, hence its name. This particular schema is actually an extension of the Star schema, meaning that they’re both pretty similar with added dimensions. In this schema, however, the dimensional table is normalized and divides the data into various separate tables.
A snowflake schema comes with its own interesting characteristics. For example, they are relatively more high maintenance And require more effort because of the excessive lookup tables. Plus, they involve multiple tables query, so the performance is somewhat reduced. They take more time and effort than the Star schema, which is why it intimidates many people. However, if you know how to make it and understand its composition, you can slowly start to like it!
Designing a Snowflake Schema:
Like the characteristics, creating a Snowflake Schema is also different from that of a Star schema. The following parameters are a part of this process:
- Name: you must create a unique name for your schema.
- Transient: it presents a schema that is temporary and volatile. Hence, it is automatically deleted once you terminate the session.
- Clone: a clone creates an identical copy of a schema that already exists. You simply have to enter the specific name of the selected schema.
- At|Before: this part provides a timestamp for cloning an existing schema. It chooses a particular period from where you wish to copy the data.
- With Managed Access: this particular field identifies managed schemas. It adds you to monitor your access controls.
- Data Retention: data retention specifies a particular number of days that the object remains retained within the memory. Data retention has a default value of 1, but you can alter it as you wish.
- Comments: the comments provide a minimalistic description of your schema that you just created.
And this way, you can create your own schema using these specific components of a Snowflake schema model.
Despite the challenging characteristics we just discussed above, there are some significant advantages of the Snowflake schema. These benefits include:
- A Snowflake schema occupies a much smaller amount of disk space compared to the Star schema. Lesser disk space means more convenience and less hassle.
- Snowflake schema of small protection from various Data integrity issues. Most people tend to prefer the Snowflake schema because of how safe if it is
Which Schema Is Best For A Data Warehouse?
Considering that both the systems have their perks and drawbacks, different experts prefer Snowflake and Star schema depending on their needs and preferences. The Snowflake schemas generally take up less space, which is always convenient. However, the Star schema is much faster and involves a more straightforward design. So, depending on what your priorities and needs are, you can choose one that fits you best.
That being said, IT teams around the world generally like to prefer the Star schema versus the snowflake schema. This worldwide preference is a result of several reasons. One of these reasons is that a star schema consists of one or more tables, much more straightforward than the other schema. Since this schema does not compromise the team’s speed and efficiency, experts around the world tend to widely use the Star schema, as mentioned in the beginning.
Examples of Dimensional Schemas
Apart from the Star schema and Snowflake schema, there is another type of schemas as well. It’s called the Galaxy schema or Fact Constellation Schema.
This one is another extension of the star schema and is a collection of multiple stars. A fact constellation measures online analytical processing, and it consists of dimensions segregated into several independent ones depending on their hierarchy levels. It has various fact tables and is often called a Galaxy schema, even though some argue that they’re both different systems. At this point, there is quite a lot of mixed information and opinions you’ll find on the web.
For example, suppose geography has a total of five hierarchy levels. These include city, state, country, region, and territory. In such a case, a fact constellation schema would consist of five dimensions and not one. Also, if you split a 1-star schema into multiple star schemes, you can generate a Galaxy schema. The sizes are relatively more extensive in a Galaxy schema, and it is helpful to aggregate fact tables and get a better understanding of the data.
Is Snowflake OLAP or OLTP?
Before discussing the answer to this question, let’s first discuss the terms OLTP and OLAP and what they stand for.
Both of these are different systems. OLTP refers to online transaction processing, which gathers data from various transactions and stores, processes, and captures them in real-time. On the other side, OLAP involves analyzing aggregated historical data through complex queries from OLTP systems.
Now, let us use this information and co-relate it with the question. Apparently, a snowflake schema is an OLAP system and was specifically designed to be one. One of the most significant and highlighted aspects of a Snowflake schema is that it separates between processing and storage, clearly making it an OLAP database.
What Are The Major Differences Between The Star And Snowflake Schemas?
Indeed, different schemas in data warehouses are an extension of each other, and they have a lot in common. However, they are significantly different from each other in various aspects. For example, even though the Snowflake schema is an extension of the Star schema, some characteristics differ massively between the two. These differences are discussed below in detail:
- The star schema offers queries with relatively higher performance through the Star Join Query Optimisation system. The tables in the schema can connect through multiple dimensions. In contrast, the Snowflake schema involves a centralized fact table, improbable to connect with other various dimensions.
- Cube processing is much faster in a Star schema as compared to the Snowflake schema. The reason behind this difference, as mentioned earlier, is because a Snowflake schema is much more complicated, and it requires more time and effort.
- Thanks to this reduced time and effort, the productivity and efficiency levels are much higher for star schemas compared to Snowflake schemas. Since the processes are simpler and easier, transactions are smoother, and results are faster, more accurate.
- Star schema also has a higher data redundancy. In contrast, a Snowflake schema has deficient levels of data redundancy.
- The single dimension table of a Star schema consists of aggregated data while the data is split into various dimension tables in a snowflake schema.
- Star schemas have a de-normalized data structure, which is why their queries run much faster. On the opposite side, a Snowflake schema has a normalized data structure.
- A Star schema has a relatively more uncomplicated and more straightforward DB design, while a snowflake schema has more complex and complicated DB designs.
- Star schemas involve a single join only, which generates a relationship between dimension tables and a fact table. A Snowflake schema, however, needs multiple joins to gather the data and collect it.
- Star schemas involve fact tables that are surrounded by multiple dimension tables. A snowflake schema contains just one fact table that is surrounded by dimension tables.
- Hierarchies in a star schema gather in a dimensional table, while the hierarchies in a snowflake schema are further divided into multiple tables.
Which Schema Is Faster, Star Or Snowflake?
As discussed earlier, Star schemas are widely popular for their fast speed and efficiency. Since their dimension tables and fact tables are much more straightforward, they result in faster, more straightforward SQL queries. For this reason, IT teams and specialists around the world prefer to use the Star schema since it provides aid and speeds up their work. Snowflake schemas, on the other hand, use less space compared to a Star schema, but they are relatively more complex. They require more effort, so they take more time and lower efficiency.
Various schemas in data warehouses serve different purposes but understanding them is essential for professionals. Identifying which schemas work best in specific scenarios can help you identify what would work best and how you can maximize efficiency. For a data warehouse expert, this knowledge is essential.
If you lack the necessary expertise in data warehouse, check out ExistBi first and read through the articles related to data warehouses. Once you understand the basics of a data warehouse and how it works, you can come back and learn more about the schemas. If you wish to take consultations and professional guidance, you can also find this particular facility on ExistBi.