Default Aggregation in DataModel
Overview
Raw data often requires cleaning, transformation, and aggregation to be useful for analysis. DataModel provides built-in features for these operations, with aggregation being a key capability for summarizing grouped data.
Understanding Aggregations
Why Aggregations Matter
When performing operations like groupBy
or when creating a stacked visualization using Muze, you'll often have multiple values for a single field within each group. Aggregation functions determine how these multiple values should be combined into a single meaningful value.
For example, when grouping car data by country of origin:
- You might want to average the miles per gallon
- Sum the total sales
- Count the number of models
Default Behavior
DataModel handles measure aggregation in the following way:
- If no
defAggFn
is specified for a measure, it defaults toSUM
- Each measure can have its own aggregation function
- Aggregations are applied automatically during grouping operations
Specifying Default Aggregations
You can specify default aggregations in your schema using the defAggFn
property:
[
{
name: "Miles_per_Gallon",
type: "measure",
defAggFn: DataModel.AggregationFunctions.AVG,
},
{
name: "Displacement",
type: "measure",
defAggFn: DataModel.AggregationFunctions.SUM,
},
{
name: "Horsepower",
type: "measure",
defAggFn: DataModel.AggregationFunctions.SUM,
},
];
Example: Grouping with Default Aggregations
Here's how to use default aggregations when grouping data:
const DataModel = muze.DataModel;
// Format and create DataModel instance
const formattedData = await DataModel.loadData(data, schema);
const dm = new DataModel(formattedData);
// Group by Origin
const groupedDm = dm.groupBy(["Origin"]);
The resulting grouped data would look like this:
In this example:
- Miles per gallon is averaged for each origin
- Displacement and horsepower are summed
- Each row represents aggregated values for a group of cars from the same origin
Available Aggregation Functions
DataModel provides several aggregation functions through DataModel.AggregationFunctions
:
SUM
: Calculate the total of valuesAVG
: Calculate the arithmetic meanMIN
: Find the minimum valueMAX
: Find the maximum valueCOUNT
: Count the number of values
These functions can be specified in the schema to control how measures are combined during grouping operations.