Skip to main content
Version: 1.0.0

Default Aggregation in DataModel

Overview

Raw data often requires cleaning, transformation, and aggregation to be useful for analysis. DataModel provides built-in features for these operations, with aggregation being a key capability for summarizing grouped data.

Understanding Aggregations

Why Aggregations Matter

When performing operations like groupBy or when creating a stacked visualization using Muze, you'll often have multiple values for a single field within each group. Aggregation functions determine how these multiple values should be combined into a single meaningful value.

For example, when grouping car data by country of origin:

  • You might want to average the miles per gallon
  • Sum the total sales
  • Count the number of models

Default Behavior

DataModel handles measure aggregation in the following way:

  • If no defAggFn is specified for a measure, it defaults to SUM
  • Each measure can have its own aggregation function
  • Aggregations are applied automatically during grouping operations

Specifying Default Aggregations

You can specify default aggregations in your schema using the defAggFn property:

[
{
name: "Miles_per_Gallon",
type: "measure",
defAggFn: DataModel.AggregationFunctions.AVG,
},
{
name: "Displacement",
type: "measure",
defAggFn: DataModel.AggregationFunctions.SUM,
},
{
name: "Horsepower",
type: "measure",
defAggFn: DataModel.AggregationFunctions.SUM,
},
];

Example: Grouping with Default Aggregations

Here's how to use default aggregations when grouping data:

const DataModel = muze.DataModel;

// Format and create DataModel instance
const formattedData = await DataModel.loadData(data, schema);
const dm = new DataModel(formattedData);

// Group by Origin
const groupedDm = dm.groupBy(["Origin"]);

The resulting grouped data would look like this: Aggregated fields

In this example:

  • Miles per gallon is averaged for each origin
  • Displacement and horsepower are summed
  • Each row represents aggregated values for a group of cars from the same origin

Available Aggregation Functions

DataModel provides several aggregation functions through DataModel.AggregationFunctions:

  • SUM: Calculate the total of values
  • AVG: Calculate the arithmetic mean
  • MIN: Find the minimum value
  • MAX: Find the maximum value
  • COUNT: Count the number of values

These functions can be specified in the schema to control how measures are combined during grouping operations.