Friday, February 13, 2009

Oracle function to aggregate a number by date

One of the common scenarios while working with relational database tables is to aggregate a column on a table and find out the corresponding value of a different column. For example, if you have a table that has the name, location and checkin time of all employees in a table and and you want to find out the location from which each employee ckecked in last, the sql will involve creating a inner query. For most cases, an inner query is fine. However, if the table has a huge number of rows querying the table twice might be expensive. That is when a custom aggregate function might be helpful.
Here is the sql that represents the table:

CREATE TABLE EMP_TIME_LOG
(
REPORT_ID NUMBER(5),
LOCATION_ID NUMBER(2),
LOGIN_DTG DATE DEFAULT sysdate,
EMP_NAME VARCHAR2(50 BYTE)
)
/

SET DEFINE OFF;
Insert into EMP_TIME_LOG
(REPORT_ID, LOCATION_ID, LOGIN_DTG, EMP_NAME)
Values
(1, 1, TO_DATE('02/13/2008 13:35:59', 'MM/DD/YYYY HH24:MI:SS'), 'emp1');
Insert into EMP_TIME_LOG
(REPORT_ID, LOCATION_ID, LOGIN_DTG, EMP_NAME)
Values
(2, 1, TO_DATE('02/13/2008 13:36:23', 'MM/DD/YYYY HH24:MI:SS'), 'emp2');
Insert into EMP_TIME_LOG
(REPORT_ID, LOCATION_ID, LOGIN_DTG, EMP_NAME)
Values
(3, 2, TO_DATE('02/13/2008 13:36:50', 'MM/DD/YYYY HH24:MI:SS'), 'emp1');
Insert into EMP_TIME_LOG
(REPORT_ID, LOCATION_ID, LOGIN_DTG, EMP_NAME)
Values
(4, 2, TO_DATE('02/13/2008 13:37:32', 'MM/DD/YYYY HH24:MI:SS'), 'emp3');
Insert into EMP_TIME_LOG
(REPORT_ID, LOCATION_ID, LOGIN_DTG, EMP_NAME)
Values
(5, 4, TO_DATE('02/13/2008 13:45:48', 'MM/DD/YYYY HH24:MI:SS'), 'emp3');
COMMIT;

In this example we need to get group the rows by employee name but we want to get the latest checkin time and the location_id of the row corresponding to the latest checkin time. Without using any custom function here is how one might achieve this.
select t1.emp_name , t1.location_id from emp_time_log T1 ,
(select location_id, emp_name, max(login_dtg) max_login_dt from emp_time_log group by emp_name,location_id ) T2
where t1.emp_name = t2.emp_name and t2.max_login_dt = t1.LOGIN_DTG

Oracle supports a feature called User-Defined Aggregate Functions that makes this very easy to implement. If we had a custom aggregate function that can take both the location id and the login_dtg and returned the location_id of the row corresponding to the max(login_dtg) of each group, we could eliminate the inner query. However, for aggregate functions there is limit on the number of arguments that can be passed in. Fortunately, we could create a custom data type that can hold both the values and use that custom type as a parameter to the aggregate function. With a custom function, the query to achieve the same result would look like the following.
select emp_name , MAX_DATES_NUMBER (NUMBER_DATE (location_id, login_dtg)) as location_id from emp_time_log group by emp_name

Here NUMBER_DATE is the custom data type that I created to hold both the number and a date, and MAX_DATES_NUMBER is the custom aggregate function.
The function definition is as follows:

CREATE OR REPLACE FUNCTION MAX_DATES_NUMBER
( idDtVal Number_Date
) RETURN NUMBER
PARALLEL_ENABLE
AGGREGATE USING Number_Date;
/

The code for creating the custom data type and the the logic for implementing the aggregation function is given below. Jonathan Gennick has written a nice article "Build Custom Aggregate Functions" that explains who the custom aggregation function works. I encourage you to read that article.

CREATE OR REPLACE TYPE Number_Date
AS OBJECT (

maxId NUMBER,
maxDt DATE,

STATIC FUNCTION ODCIAggregateInitialize
( actx IN OUT Number_Date
) RETURN NUMBER,

MEMBER FUNCTION ODCIAggregateIterate
( self IN OUT Number_Date,
idDtVal IN Number_Date
) RETURN NUMBER,

MEMBER FUNCTION ODCIAggregateTerminate
( self IN Number_Date,
returnValue OUT NUMBER,
flags IN NUMBER
) RETURN NUMBER,

MEMBER FUNCTION ODCIAggregateMerge
(self IN OUT Number_Date,
ctx2 IN Number_Date
) RETURN NUMBER

);
/

CREATE OR REPLACE TYPE BODY Number_Date AS

STATIC FUNCTION ODCIAggregateInitialize
( actx IN OUT Number_Date
) RETURN NUMBER IS
BEGIN
IF actx IS NULL THEN
actx := Number_Date (0,'01-JAN-1900');
ELSE
actx.maxDt := '01-JAN-1900';
actx.maxId := 0;
END IF;
RETURN ODCIConst.Success;
END;

MEMBER FUNCTION ODCIAggregateIterate
( self IN OUT Number_Date,
idDtVal IN Number_Date
) RETURN NUMBER IS
BEGIN
IF idDtVal.maxDt > self.maxDt THEN
self.maxId := idDtVal.maxId;
self.maxDt := idDtVal.maxDt;
END IF;
RETURN ODCIConst.Success;
END;

MEMBER FUNCTION ODCIAggregateTerminate
( self IN Number_Date,
ReturnValue OUT NUMBER,
flags IN NUMBER
) RETURN NUMBER IS
BEGIN
returnValue := self.maxId;
RETURN ODCIConst.Success;
END;

MEMBER FUNCTION ODCIAggregateMerge
(self IN OUT Number_Date,
ctx2 IN Number_Date
) RETURN NUMBER IS
BEGIN
IF ctx2.maxDt > self.maxDt THEN
self.maxId := ctx2.maxId;
self.maxDt := ctx2.maxDt;
END IF;

RETURN ODCIConst.Success;
END;

END;
/

No comments: