SRB Error Recovery

Now that we can schedule a SRB into another address space and use cross memory post to synchronize between the scheduling task and the SRB it is almost time to start writing some more complex code in our SRB routine. Before doing this we should probably consider what happens when our SRB routine encounters an error.

A Simple Test

We can start with a simple experiment to see what happens when our SRB routine fails.  All we need to do is cause an error condition which can easily be done by inserting the line:

         DC    X'0000'

into our SRB routine. It can be placed anywhere before the BALR to the POST routine. I placed it immediately after the LR to save the address of the parameter list.

When our program is compiled, link edited, and run we see nothing happen.  The SRB is scheduled, the task waits on the ECB, the SRB is dispatched and immediately fails with a S0C1 (or program interruption x’01’ to be more specific) and the ECB is never posted.  The scheduling task has no way of knowing the SRB failed and continues to wait.  If the job is not cancelled by an operator command it should eventually fail with a S522 ABEND.

This is probably not the behaviour we want when our SRB routine fails.  It would probably be better if the scheduling task failed when the SRB routine failed. This can be done with just a few more lines of code.  There are two fields in the SRB control block we need to fill in.  SRBPASID is a two byte field where we place the ASID (not the ASCB address) of the task associated with the SRB.  SRBPTCB is a pointer to the TCB associated with the SRB.  When these fields are specified and the SRB routine fails the failure will be propagated back to the TCB.

We already have the address of our ASCB so it is pretty simple to get the ASID value.  It is also pretty simple to get the address of our TCB at the same time we are finding our ASCB address.

          L     R8,CVTPTR          CVT ADDRESS         
          L     R8,0(,R8)          PSATNEW             
          L     R10,4(,R8)         CURRENT TCB ADDRESS  <------ Insert This Line
          L     R8,12(,R8)         CURRENT ASCB (OURS)

Now register 10 will contain the address of our TCB.

         ST    R10,SRBPTCB        PURGE TCB ADDRESS       
         SLR   R1,R1                                      
         ICM   R1,B'0011',ASCBASID-ASCB(R8)  GET OUR ASID 
         STH   R1,SRBPASID        PURGE ASID

By adding the above lines in our SRB setup code we now should be able to detect SRB errors.  Now we assemble, link edit, and run our test program.

JOB 6771  $HASP100 SRB03    ON INTRDR      RUN TEST PGM            
JOB 6771  $HASP373 SRB03    STARTED - INIT 12 - CLASS A - SYS TCS3 
JOB 6771  IEF450I SRB03 MVSSP - ABEND S0C1 U0000 - TIME=08.33.41   
JOB 6771   8.33.42   0.00.00   0.00.00  S0C1   SRB03     MVSSP     
JOB 6771   8.33.42   0.00.00   0.00.00  S0C1   SRB03     ########  
JOB 6771  $HASP395 SRB03    ENDED

And there we are!  When the SRB routine abnormally terminates our job ABENDs with a SoC1.  This is much better than waiting for the eventual S522 ABEND.

Functional Recovery Routines

Now we can deal with how to recover from an error while running a SRB routine.  We can’t use ESTAE to trap errors since ESTAE doesn’t work when running under a SRB.  Instead we have to use a Functional Recovery Routine (FRR) to trap the error.  If you are familiar with ESTAE then using a FRR should be pretty familiar.  When a FRR has been established and and error occurs, the FRR will be given control and then decide what action to take.

A FRR is established using the SETFRR macro.

         SETFRR A,FRRAD=(R3),WRKREGS=(R7,R8),RELATED=(DELETE)

This will add a FRR, the address of the FRR is contained in register 3, and we supply two work registers to the macro (registers 7 and 8).  The RELATED parameter is provided to allow a method of associating comments directly with the macro.  It must be specified but the value doesn’t matter to the expansion of the macro.  To use this macro we must include the mapping macros IHAFRRS and IHAPSA.

Now when an error occurs during execution of our SRB the FRR will be given control.  On entry to the FRR register 0 contains the address of a 200 byte FRR work area.  Register 1 contains the address of the SDWA.  Unlike an ESTAE routine where there are conditions where a SDWA may not be available, a SDWA is always passed to the FRR.

Here is a simple FRR that will resume execution of our SRB at a recovery point.  This type of recovery is good when we want to try an action that may fail and recover from that failure.

FRREXIT  DS    0H
         USING FRREXIT,R15
*
         L     R7,RETADDR
         SETRP RC=4,RETREGS=YES,RETADDR=(R7)
         BR    R14
*
RETADDR  DC    A(0)

We use the SETRP macro to set recovery options in the SDWA.  A return code of 4 specifys a retry action.  We indicate that the registers should be restored and that the retry address is contained in register 7.  We then return using register 14 to initiate the retry.

Note that the retry address (RETADDR) must be set before establishing the FRR environment.  We can’t specify it using something like A(RETRY) because we relocate the SRB code from our private area into CSA.  Instead we need to use something like this.

        LA    R1,RETRY
        ST    R1,RETADDR

We now have enough to implement a FRR in our SRB routine.  Here is the updated SRB routine.

SRBRTN   DS    0H                                            
         LR    R6,R15             SRB ROUTINE EP             
         USING SRBRTN,R6                                     
*                                                            
         LR    R4,R1              SAVE PARM REGISTER         
         USING PARMLIST,R4                                   
*                                                            
         LR    R9,R14             SAVE RETURN ADDRESS        
*                                                            
         LA    R1,RTRYRTN         ADDRESS OF RETRY POINT     
         ST    R1,RETADDR         SAVE FOR FRR               
*                                                            
         LA    R3,FRREXIT         ADDRESS OF FRR             
         SETFRR A,FRRAD=(R3),WRKREGS=(R7,R8),RELATED=(DELETE)
*                                                            
         LA    R10,X'41'          POST                       
         SLL   R10,24                 COMPLETION CODE        
*                                                            
         DC    X'0000'            SOC1 ABEND                 
*                                                            
         LA    R10,X'7F'          POST                       
         SLL   R10,24                 COMPLETION CODE        
*                                                            
RTRYRTN  DS    0H                                            
         SETFRR D,WRKREGS=(R7,R8),RELATED=(ADD)              
*                                                            
         L     R11,WAITECB        ECB ADDRESS                
         LA    R1,X'80'           SET                        
         SLL   R1,24                 HIGH-ORDER              
         OR    R11,R1                          BIT           
*                                                            
         LA    R12,POSTERR        ERROR ROUTINE              
         L     R13,WAITASCB       ASCB ADDRESS               
*                                                            
         L     R15,CVTPTR         CVT ADDRESS                
         L     R15,CVT0PT01-CVTMAP(,R15)   POST BRANCH-ENTRY 
         BALR  R14,R15                                       
*                                                            
*                                                            
         BR    R9                  DONE - EXIT               
*                                                        
SRBCLEAN DS    0H                  SRB PURGE ROUTINE     
         XC    0(SRBSIZE,R1),0(R1)                       
         BR    R14                                       
*                                                        
*                                                        
POSTERR  DS    0H                  POST ERROR ROUTINE    
         BR    R14                                       
*                                                        
*                                                        
FRREXIT  DS    0H                                        
         USING FRREXIT,R15                               
*                                                        
         L     R7,RETADDR                                
         SETRP RC=4,RETREGS=YES,RETADDR=(R7)             
         BR    R14                                       
*                                                        
RETADDR  DC    A(0)                                      
*                            
         LTORG ,             
*                            
*                            
         DROP  R4,R6         
*                            
*                            
ENDSRTN  EQU   *

Notice we included a SETFRR macro with the D or delete option to remove our FRR environment once we were past the  section of code where we wanted to trap errors.  If we did not delete our FRR and an error occured after our retry point a endless loop could occur.

Checking The Post Code

Now we have two different post codes used by our SRB routine.  A value of x’7F000000′ indicates success and a value of x’41000000′ indicates a failure.  There is nothing magic about these codes and they only have meaning withing our program.  We can update our scheduling program to report on the post code status.

         CLI   LOCALECB,X'7F'     CHECK FOR GOOD COMPLETION CODE 
         BE    SUCCESS                                           
*                                                                
         CLI   LOCALECB,X'41'     CHECK FOR FAIL COMPLETION CODE 
         BE    FAIL                                              
*                                                                
         WTO   '*** UNKNOWN POST CODE ***',ROUTCDE=(1,11)        
         B     CLEANUP                                           
*                                                                
*                                                                
SUCCESS  DS    0H                                                
         WTO   '*** SUCCESS POST CODE ***',ROUTCDE=(1,11)        
         B     CLEANUP                                           
*                                                                
*                                                                
FAIL     DS    0H                                                
         WTO   '*** FAIL POST CODE ***',ROUTCDE=(1,11)           
         B     CLEANUP

Now with these updates in place we can assemble, linkedit and execute our program.

JOB 6775  $HASP100 SRB04    ON INTRDR      RUN TEST PGM              
JOB 6775  $HASP373 SRB04    STARTED - INIT 12 - CLASS A - SYS TCS3   
JOB 6775  *** FAIL POST CODE ***                                     
JOB 6775   7.39.43   0.00.00   0.00.00  0000   SRB04     MVSSP       
JOB 6775   7.39.43   0.00.00   0.00.00  0000   SRB04     ########    
JOB 6775  $HASP395 SRB04    ENDED

We can remove our abend by commenting out the line

*------->DC    X'0000'            SOC1 ABEND

and then we assemble, linkedit and execute.

6778  $HASP100 SRB04    ON INTRDR      RUN TEST PGM            
6778  $HASP373 SRB04    STARTED - INIT 12 - CLASS A - SYS TCS3 
6778  *** SUCCESS POST CODE ***                                
6778   8.43.04   0.00.00   0.00.00  0000   SRB04     MVSSP     
6778   8.43.04   0.00.00   0.00.00  0000   SRB04     ########  
6778  $HASP395 SRB04    ENDED

An this proves we can capture and report on an error contition in our SRB routine.

Leave a Reply

Your email address will not be published. Required fields are marked *