Basic Concepts and Terminologies in DataModel
Core Ideas
Measures
Measures are quantitative variables that quantify groups of dimensional values. These are numerical values that can have mathematical functions applied to them. Measures represent the metrics you want to analyze, such as sales amounts, counts, or calculations.
Dimensions
Dimensions are qualitative variables that help categorize data points. They provide the context for your measures and can be divided into two main types:
Type | Description | Examples |
---|---|---|
Categorical | Represents types of data that can be divided into groups or categories | Race, sex, age group, educational level |
Temporal | Represents Date and Time values | Dates, timestamps |
What is a Schema?
A schema is used to describe variables present in your data. Schema definitions use key-value pairs to help DataModel understand the type of data in each field and provide options to modify the default behavior of each field.
Schema Attributes
Attribute | Description |
---|---|
name | Describes the field name |
type | Specifies whether the field is a dimension or measure |
subtype | For dimensions, specifies if it is categorical or temporal |
defAggFn | Specifies the default aggregation function to be applied on measure fields - sum by default |
format | For temporal fields, describes the date format string to parse the raw data |
displayName | Specifies the field name to be shown when displaying the field |
Example
Sample Data
[
{
"Maker": "chevrolet",
"Horsepower": 130,
"Origin": "USA",
"Year": 1978
},
{
"Maker": "buick",
"Horsepower": 165,
"Origin": "USA",
"Year": 1989
},
{
"Maker": "datsun",
"Horsepower": 88,
"Origin": "Japan",
"Year": 1981
}
]
Corresponding Schema
[
{
"name": "Maker",
"type": "dimension",
"subtype": "categorical",
"displayName": "Car Manufacturer"
},
{
"name": "Horsepower",
"type": "measure",
"defAggFn": "avg",
"displayName": "Engine Horsepower"
},
{
"name": "Origin",
"type": "dimension",
"subtype": "categorical",
"displayName": "Country of Origin"
},
{
"name": "Year",
"type": "dimension",
"subtype": "temporal",
"format": "%Y",
"displayName": "Manufacturing Year"
}
]
The schema example above demonstrates how to properly define each field from the sample data:
- Categorical dimensions: Maker and Origin
- Temporal dimension: Year (with format for four-digit year)
- Measure: Horsepower (with default aggregation function)
Each field includes its appropriate type, subtype (for dimensions), and a user-friendly display name. For the measure field (Horsepower), we've included a default aggregation function. The temporal field (Year) includes its format specification.