GROUP/GROUP BY clauses

The GROUP/GROUP BY clause specifies how to map source records to result records to group statement output.

There are three ways to use this clause in a query:

Specifying only GROUP

You can use a GROUP clause to aggregate results into a single bucket.

For example, the following statement uses the SUM statement to return a single sum across a set of records:
RETURN "ReviewCount" AS SELECT
SUM(number_of_reviews) AS "NumReviews"
GROUP

This statement returns one record for NumReviews. The value is the sum of the values for the attribute number_of_reviews.

Grouping is allowed on source and locally defined attributes.

Note: If you group by a locally defined attribute, that attribute cannot refer to non-grouping attributes and cannot contain any aggregates.

All grouping attributes are part of the result records. A NULL value in any grouping attribute causes the source record to map to no result records. This is different from SQL, which treats NULL like any other value. For information about user-defined NULL-value handling in EQL, see COALESCE.

Specifying GROUP BY

You can use GROUP BY to aggregate results into buckets with common values for the grouping keys.

For example, suppose we have sales transaction data with records consisting of the following attributes:
{ TransId, ProductType, Amount, Year, Quarter, Region,
  SalesRep, Customer }
For example:
{ TransId = 1, ProductType = "Widget", Amount = 100.00,
  Year = 2011, Quarter = "11Q1", Region  = "East",
  SalesRep = "J. Smith", Customer = "Customer1" }
If an EQL statement uses Region and Year as GROUP BY attributes, the statement results contain an aggregated record for each valid, non-empty combination of Region and Year. In EQL, this example is expressed as:
DEFINE RegionsByYear AS
GROUP BY Region, Year
resulting in the aggregates of the form { Region, Year }, for example:
{ "East", "2010" }
{ "West", "2011" }
{ "East", "2011" }

Using a GROUP BY that is an output of a SELECT expression

A GROUP BY key can be the output of a SELECT expression, as long as that expression itself does not contain an aggregation function.

For example, the following syntax is a correct usage of GROUP BY:
SELECT COALESCE(Person, 'Unknown Person')
as Person2, ... GROUP BY Person2
The following syntax is incorrect and results in an error, because Sales2 contains an aggregation function (SUM):
SELECT SUM(Sales) as Sales2, ... GROUP
BY Sales2

Specifying the hierarchy level for a managed attribute

You can group by a specified depth of each managed attribute.

If you group results by a managed attribute, you can specify a hierarchy depth at which to group, using the syntax:

GROUP BY ManagedAttr:<level>
For example, the Region attribute contains the hierarchy Country, State, and City. To group the results at the State level (one level below the root of the managed attribute hierarchy), you would use the following syntax:
GROUP BY "Region":1
Note: This is equivalent to ANCESTOR(ManagedAttr, level), but GROUP BY statements need to use the syntax managedAttr:level, because you cannot group by an expression.

Grouping by a multi-assign attribute

If you group by a multi-assign attribute, each source record will map to multiple corresponding output records. For example, the record [A:1, A:2, B:3, B:4, B:5] will map to:

This can only occur with a corpus source, because result records are always single assign.

In this example, UserTag is multi-assign:

RETURN "Example" AS SELECT
   AVG("Gross") AS "AvgGross",
   SUM("Gross") AS "TotalGross",
GROUP BY UserTag

To define the set of resulting buckets, a statement must specify a set of GROUP BY attributes. The cross product of all values in these grouping attributes defines the set of candidate buckets.

The results are automatically pruned to include only non-empty buckets.

If an attribute reference appears in a statement with a GROUP clause in the definition of an attribute not in the GROUP clause, the attribute will have an implicit ARB aggregate applied.