Skip to main content
Version: 1.0.0

Parsing Data in DataModel

Understanding Data Formatting

Raw data can arrive in various formats such as JSON, CSV, DSV, or JavaScript objects. DataModel requires this raw data to be transformed into a standardized internal format for optimal performance and consistent operations.

Supported Input Formats

DataModel natively supports these common data formats:

  • JavaScript Object Notation (JSON)

    • Standard data interchange format for web applications and APIs
    • Objects with named fields
  • Comma-Separated Values (CSV)

    • Tabular format with comma-delimited values
    • Header row defines field names
  • Delimiter-Separated Values (DSV)

    • Similar to CSV but with custom field separators
    • Delimiter can be specified in formatter configuration
  • JavaScript Objects

    • Native JavaScript object collections
    • Direct usage in browser environments
  • Column-Major Arrays

    • Nested arrays where each inner array represents a column
    • Efficient for columnar operations

Data Transformation Process

DataModel automatically detects the input format and converts it to its internal representation. Here's an example of the transformation:

const schema = [
{
name: "Name",
type: "dimension",
},
{
name: "Maker",
type: "dimension",
},
{
name: "Horsepower",
type: "measure",
defAggFn: "avg",
},
{
name: "Origin",
type: "dimension",
},
];

const data = [
{
Name: "chevrolet chevelle malibu",
Maker: "chevrolet",
Horsepower: 130,
Origin: "USA",
},
{
Name: "buick skylark 320",
Maker: "buick",
Horsepower: 165,
Origin: "USA",
},
{
Name: "datsun pl510",
Maker: "datsun",
Horsepower: 88,
Origin: "Japan",
},
];

const formattedData = await DataModel.loadData(data, schema);

Internal Format Structure

The loadData method transforms the input into a standardized format consisting of two main components:

1. Data Array

  • Two-dimensional array in column-major order
  • Each column array corresponds to a field defined in the schema
  • Column order matches the schema field order
  • Optimized for DataModel's internal operations

2. Schema Reference

  • Copy of the original schema
  • Maintains field definitions and metadata
  • Preserves the relationship between data columns and their descriptions

This internal format enables:

  • Efficient data operations
  • Consistent behavior across different input sources
  • Optimal memory usage
  • Fast access patterns for DataModel operations

For custom delimiter formats (DSV), you can specify the delimiter in the formatter configuration when calling loadData. The automatic format detection will handle standard formats without additional configuration.