Documentation
Unify Data
Logo
Spreadsheet Formula

Spreadsheet Formula

Logo

3 mins READ

Overview

Spreadsheet Formula transformations bring familiar Excel-like formulas to your data pipeline, enabling complex calculations and manipulations using syntax that spreadsheet users already know. This powerful feature bridges the gap between spreadsheet expertise and data pipeline development.

Frame 427319236 (6).png
Frame 427319236 (6).png

Why Use Spreadsheet Formulas in Your Pipeline?

  • Leverage Existing Skills: Use familiar Excel-like syntax without learning complex programming

  • Accelerate Development: Implement complex logic quickly with pre-built functions

  • Enhance Data Quality: Create validation rules and cleansing operations with simple formulas

  • Improve Consistency: Standardize calculations across your entire data ecosystem

How to Apply a Spreadsheet Formula Transformation?

  1. Click on the "+ Transformed Field" button.

  2. Select "Spreadsheet Formula" from the Transformations list.

  3. Enter your formula using source fields as datapills.

  4. Specify the name and datatype for the transient field that will contain the results.

  5. Test your formula with sample data before applying to the entire dataset.

    Frame 427319237 (5).png
    Frame 427319237 (5).png

Function Categories and Common Pipeline Use Cases

Text Processing Functions

FunctionDescriptionPipeline Use Case
CONCATENATEJoins text valuesCombine first and last names or build complete addresses
LEFT, RIGHT, MIDExtract substringsParse product codes or extract specific portions of identifiers
TRIMRemove extra spacesClean messy data from external sources
UPPER, LOWER, PROPERChange text caseStandardize inconsistent text formatting
SUBSTITUTE, REPLACEReplace textFix common typos or standardize terminology
LENReturn string lengthValidate field length requirements

Pipeline Example: CONCATENATE(PROPER(First Name), " ", PROPER(Last Name)) - Create properly formatted full names from inconsistent source data.

Date and Time Functions

FunctionDescriptionPipeline Use Case
DATE, TIMECreate date/time valuesConvert separate date components into standard format
DAY, MONTH, YEARExtract date partsEnable date-based aggregation and analysis
WEEKDAYGet day of weekSupport day-of-week based business rules
TODAYCurrent dateCalculate time-based metrics (age, expiration, etc.)
DATEDIFDifference between datesCalculate duration metrics (days outstanding, age)

Pipeline Example: IF(DATEDIF(Invoice Date, Payment Date, "D") > 30, "Overdue", "Current") - Create payment status indicators based on date differences.

Numeric and Statistical Functions

FunctionDescriptionPipeline Use Case
ROUND, ROUNDUP, ROUNDDOWNControl roundingStandardize decimal precision
SUM, AVERAGE, MIN, MAXBasic calculationsAggregates for derived metrics
ABSAbsolute valueCalculate magnitudes (e.g., price differences)
CEILING, FLOORRound to multiplesCreate price bands or size categories
PRODUCTMultiply valuesCalculate area, volume, or compound metrics

Pipeline Example: ROUND(Quantity * Unit Price * (1 - Discount Rate), 2) - Calculate standardized line-item totals.

Lookup and Reference Functions

FunctionDescriptionPipeline Use Case
VLOOKUP, HLOOKUPLook up values in tablesMap codes to descriptions or implement business rules
INDEX, MATCHAdvanced lookupsCreate complex category mappings
CHOOSESelect from alternativesImplement conditional value mapping

Pipeline Example: VLOOKUP(Country Code, Country_Mapping_Table, 2, FALSE) - Map country codes to full country names.

Logical Functions

FunctionDescriptionPipeline Use Case
IFConditional logicImplement business rules and data-driven decisions
AND, OR, NOTBoolean operationsCreate complex conditional transformations
ISBLANK, ISERROR, ISTEXTData validationImplement data quality checks

Pipeline Example: IF(AND(Credit Score > 700, Income > 50000), "Approved", "Review") - Create approval status based on multiple criteria.

Advanced Pipeline Applications

Data Cleansing and Standardization

// Phone number formatting

IF(LEN(Phone) = 10, 

   CONCATENATE("(", LEFT(Phone, 3), ") ", MID(Phone, 4, 3), "-", RIGHT(Phone, 4)),

   Phone)

Derived Business Metrics

// Calculate customer lifetime value

ROUND(

  Average Order Value * 

  Purchase Frequency * 

  Customer Lifespan,

  2)

Data Validation and Quality Checks

// Validate email format

IF(AND(

   SEARCH("@", Email) > 1,

   SEARCH(".", Email, SEARCH("@", Email)) > SEARCH("@", Email)

), "Valid", "Invalid")

Enhanced Categorization

// Create price bands

CHOOSE(

  IF(Price < 10, 1, 

    IF(Price < 50, 2,

      IF(Price < 100, 3, 4))),

  "Budget", "Economy", "Standard", "Premium")

Testing and Troubleshooting

To ensure your formulas are working as expected:

  1. Click the Test button before saving your transformation

  2. Enter sample values for all fields used in your formula

  3. Verify the test results match your expected output

  4. For complex formulas, test incrementally by building up from simpler components

    Frame 427319238 (6).png
    Frame 427319238 (6).png

Best Practices

  • Document Your Logic: Add comments to explain complex business rules

  • Break Down Complexity: Use multiple transformations for very complex calculations

  • Consistent Naming: Create a naming convention for transformed fields

  • Performance Awareness: Be mindful of formula complexity with large datasets

  • Create a Formula Library: Maintain a repository of validated formulas for reuse

FAQs

Can I reference fields created by other transformations?

Yes, you can use any field available in your pipeline, including those created by previous transformations.

How do I handle errors in my formulas?

Use the IFERROR function to provide fallback values when errors occur, e.g., IFERROR(calculation, default_value). This is especially important since records with NA/NULL/ERROR values will fail processing.

What happens if my transformation outputs NA, NULL, or ERROR values?

Records that result in NA, NULL, or ERROR values will fail processing and appear in the failed logs. Always implement proper error handling in your formulas to prevent data loss in your pipeline.

How do spreadsheet formulas handle NULL values?

NULL values are treated similarly to Excel's blank cells. Consider using ISBLANK or IFNA functions to handle these cases explicitly.

Are user-defined functions supported?

No, only the built-in functions are available. Complex custom logic should be implemented through combinations of existing functions or other transformation types.