Data Wrangling with DataModel
DataModel provides a powerful set of operators for transforming your data. These operators are pure functions that return new DataModel instances, enabling immutable data transformations.
Understanding Operators
Operators in DataModel fall into two categories:
- Relational algebra operators (selection, projection, etc.)
- Utility operators for specific cases
All operators return a new DataModel instance, preserving immutability. This design enables building complex visualization systems and interactive applications elegantly using method chaining.
Core Operations
Selection (Filtering)
Filter rows based on specific conditions using the select
operator:
const Datamodel = muze.DataModel;
const formattedData = await Datamodel.loadData(data, schema);
let dm = new Datamodel(formattedData);
const outputDm = dm.select({
value: "Japan",
field: "Origin",
operator: Datamodel.ComparisonOperators.EQUAL,
});
Output:
Name | Maker | Miles_per_Gallon | Displacement | Horsepower | Weight_in_lbs | Acceleration | Origin | Cylinders | Year |
---|---|---|---|---|---|---|---|---|---|
toyota corona mark ii | toyota | 24 | 113 | 95 | 2372 | 15 | Japan | 4 | -19800000 |
datsun pl510 | datsun | 27 | 97 | 88 | 2130 | 14.5 | Japan | 4 | -19800000 |
datsun pl510 | datsun | 27 | 97 | 88 | 2130 | 14.5 | Japan | 4 | 31516200000 |
toyota corona | toyota | 25 | 113 | 95 | 2228 | 14 | Japan | 4 | 31516200000 |
toyota corolla 1200 | toyota | 31 | 71 | 65 | 1773 | 19 | Japan | 4 | 31516200000 |
Projection (Column Selection)
Select specific fields using the project
operator:
const outputDm = dm.project(["Name", "Origin"]);
Output:
Name | Origin |
---|---|
chevrolet chevelle malibu | USA |
buick skylark 320 | USA |
plymouth satellite | USA |
amc rebel sst | USA |
ford torino | USA |
Grouping
Aggregate data using the groupBy
operator:
const Datamodel = muze.DataModel;
const { MAX } = Datamodel.AggregationFunctions;
const groupDm = dm.groupBy(["Origin"], ["Horsepower", MAX]);
Output:
Origin | Miles_per_Gallon | Displacement | Horsepower | Weight_in_lbs | Acceleration |
---|---|---|---|---|---|
USA | 20.128225806451606 | 455 | 119.60642570281125 | 1800 | 14.928458498023707 |
Europe | 27.891428571428573 | 183 | 81 | 1825 | 16.82191780821918 |
Japan | 30.450632911392397 | 168 | 79.83544303797468 | 1613 | 16.172151898734175 |
Sorting
Order data using the sort
operator, supporting multi-level sorting:
const sortDm = dm.sort([["Maker"], ["Weight_in_lbs", "desc"]]);
Output:
Name | Maker | Miles_per_Gallon | Displacement | Horsepower | Weight_in_lbs | Acceleration | Origin | Cylinders | Year |
---|---|---|---|---|---|---|---|---|---|
amc matador (sw) | amc | 14 | 304 | 150 | 4257 | 15.5 | USA | 8 | 12621060000 |
amc matador | amc | 15.5 | 304 | 120 | 3962 | 13.9 | USA | 8 | 18928260000 |
amc matador (sw) | amc | 15 | 304 | 150 | 3892 | 12.5 | USA | 8 | 63052200000 |
amc ambassador dpl | amc | 15 | 390 | 190 | 3850 | 8.5 | USA | 8 | -19800000 |
amc rebel sst (sw) | amc | NaN | 360 | 175 | 3850 | 11 | USA | 8 | -19800000 |
The example outputs show the first few rows of the transformed data. Your actual results will depend on your dataset.
Operator Chaining
Chain multiple operators for complex transformations:
const resultantDm = dm
.select(/* selection criteria */)
.project(/* field list */)
.sort(/* sort criteria */);
Operator chaining provides a clean, functional approach to data transformation. Each operation in the chain receives the output of the previous operation as its input.
Common Use Cases
Filtering by Region
// Show only Japanese cars
dm.select({
value: "Japan",
field: "Origin",
operator: Datamodel.ComparisonOperators.EQUAL,
});
Creating Summary Views
// Get max horsepower by origin
dm.groupBy(["Origin"], ["Horsepower", Datamodel.AggregationFunctions.MAX]);
Multi-level Sorting
// Sort by maker, then by weight descending
dm.sort([["Maker"], ["Weight_in_lbs", "desc"]]);
The examples use a car dataset containing fields like Name
, Origin
, Horsepower
, etc. Your actual field names should match your dataset's schema.