For Business Analytics we need to transform data into actionable insights that inform an organization’s strategic and tactical business decisions along with the current state of the business or organization. Data transformation is the process of converting data or information from one format to another, usually from the format of a source system into the required format of a new destination system. A translation of data extracted from a Salesforce Object or other data source into our defined format. Transformation refers to the cleansing and aggregation that may need to happen to data to prepare it for analysis. Transformations occur prior to loading/creating final datasets.
Data transformation may be constructive (adding, copying, and replicating data), destructive (deleting fields and records), aesthetic (standardizing salutations or street names), or structural (renaming, moving, and combining columns in a database). To achieve this, we use different transformations such as Sfdc Digest, Edgemart, Augment, ComputeExpression, ComputeRelative, Flatten, and SfdcRegister. Data Builder offers many different manipulations to transform your datasets.
In this article, we will discuss Compute expressions. Compute expression is basically used to add additional fields, and these additional derived fields are added into the datasets without changing/altering the source data.
When we create a compute expression, we can create many fields on a single node. And the order of creating these fields matters when you need to refer one field in the other field on the same node. In case you want to create three fields A, B, C, and the third field C needs to be referred in both A and B, then you need to create field C first and A and B later so that C is available in the input stream for A and B, otherwise, C won’t be available.
mergeWithSource: this checkbox indicates whether the input fields (all source fields) are included with the derived fields (computed fields) in the resulting datasets. When true, the resulting datasets contain all input fields from the source and the newly generated derived fields as well. When false, the resulting datasets contain the computed/derived fields only. If you don’t change checkbox/by Default node is set to true.
We will discuss following SAQL operators and functions with examples:
- Arithmetic operators: for creating Measure/Numeric fields
- String operators: for creating Dimension/Text fields
- Date functions: to create Date fields
Arithmetic operators: Use arithmetic operators to perform addition, subtraction, multiplication, division, and modulo operations. The analytics field type associated with the Arithmetic calculation fields is Numeric. Mentioning precision and scale is mandatory. Precision is the maximum number of digits in a numeric value. For numeric values: Includes all numbers to the left and to the right of the decimal point (but excludes the decimal point character). The value must be from 1 to 16. Scale is the number of digits to the right of the decimal point in a numeric value. Must be less than the precision value. The value must be from 1 to 15 characters. For example, you can choose precision 4 and scale 2 for a number like 1000.10.
Einstein Analytics internally stores numeric values in datasets as long values. For example, Analytics stores the number 3,200.99 with a scale of 2 as 320099. The user interface converts the stored value back to decimal notation to display the number as 3200.99. The maximum numeric value that can be stored in a dataset is 36,028,797,018,963,967 and the minimum numeric value is -36,028,797,018,963,968.
When you create or update a dataset through your dataflow, recipe, or a CSV upload, Einstein Analytics replaces blank numeric values with the specified default value. When no default value is specified, Analytics replaces blanks in numeric columns with 0 or null based on whether you enable null measure handling. Null measure handling lets you specify null as the default value for numeric columns in your recipes, dataflows, and CSV uploads. When no default value is specified and null measure handling is enabled, Analytics replaces blanks with nulls in numeric columns in your datasets. you can also Update your existing dataflow definition files to use null instead of 0 in default Value attributes of each derived numeric field.
You have a scenario to calculate the percentage of expenses vs Savings, then you need to create Compute expression fields with the following syntax:
Compute expression for derived field: TotalExpenses
‘ExpensesField1’ + ‘ExpensesField1’
Compute expression for derived field: Expenses percentage
((‘ExpensesField1’ + ‘ExpensesField1’) / ‘TotalSalaryField1’) * 100
Compute expression for derived field: Savings percentage
((‘TotalSalaryField1’ – ‘TotalExpenses’) / ‘TotalSalaryField1’) * 100
You have a scenario to evaluate if the record creation date is working day or weekday: ‘createdDateInSeconds’ % 7
String operators: Use String operators to perform a combination of two text field data (concat, append using plus “+” sign), remove extra spaces(trim), remove some letters from text fields (substring). The analytics field type associated with the String calculation fields is Text.
- ‘EndDate_Week’+”-“+’EndDate_Year’ to show data as 32-2020
Date operators: When Einstein Analytics loads date into a dataset, it breaks up each date into multiple columns, such as day, week, month, quarter, and year, based on the calendar year. For example, if you extract dates from a CreateDate column, Analytics generates columns such as CreateDate_Day and CreateDate_Week. If your fiscal year differs from the calendar year, you can enable Analytics to generate fiscal date columns as well.
When combining data from different data sources, sometimes dates are stored in different formats, such as MM-dd-yyyy and yy-MM-dd’T’HH:mm:ss’Z’. Ensure consistent date formats. Einstein Analytics does not officially support multiple time zones. If a column contains a mix of time zones, consider creating a calculated column with the Formula to add or subtract hours to datetime values to ensure a single time zone. Fix time zone differences in datetime columns and Confirm that date values in a column are uniform in format and time zone.
Format of the derived date field is mandatory to be mentioned. Date Format examples: “yyyy-MM-dd”, “yyyy-MM-dd HH:mm:ss”
- daysBetween(toDate(‘CreatedDate_LocalTimeStamp’), toDate(‘LastModifiedDate_LocalTimeStamp’))
- case when ‘CreatedDate_LocalTime’ is null then null else (‘CreatedDate_LocalTimeStamp’ + ‘UTC offset’) end
Hope this article helps you in creating Compute expressions in dataflows.
Thank you for reading this long article.
Happy Reading 🙂