Source Scanner for HLASM/ASM

Source Scanner for HLASM/ASM


This document describes the entities and relations the HLASM Scanner creates on encountering various ALC code constructs.

Metrics

The scanner computes a variety of standard code-line metrics, including:

  • total lines of code
  • comment line count
  • blank line count
  • non-comment line count

It also computes a variety of complexity and maintainability metrics based on standard industry practices, including:

  • McCabe_cyclomatic_complexity
  • decision_density
  • volume_sum
  • perCM_sum
  • number_of_unique_operators
  • number_of_unique_operands
  • number_of_operator_occurrences
  • number_of_operand_occurrences
  • halstead_program_length
  • halstead_program_vocabulary
  • halstead_program_volume
  • halstead_program_difficulty
  • halstead_program_effort
  • halstead_bug_prediction
  • SEI_maintainability_index

All metrics are reported in the same XML format in the same manner as other languages such as JCL and COBOL.

File Input/Output

The scanner all I/O is done using a fixed set of standard HLASM macro invocations. These are the DCB, ACB, OPEN, GET, PUT, READ, WRITE, and CLOSE macros.

FILE entities are created by recognition of a DCB or ACB instruction with a DDNAME setting. If DDNAME value is an ordinary identifier or a string, the value is recorded as the name of a FILE ENTITY. Invocations of the other I/O related macros are analyzed to establish whether the file is read or written. PRG_FILE entity relations between the containing COMPILATION_UNIT entity and the FILE entity are then created, with mode settings indicating READ or WRITE (or if used in UPDAT mode, both READ and WRITE).

We note that internally the scanner maintains additional information regarding streams and specific input-output sites, but this need not be communicated by the CAScanner, as it has no bearing on the CA Repository model.

Entities like the DDNAME are recorded only if their names are simple identifiers or strings. Variable references serving as a DDNAME value in a DCB instruction take on values at execution time and could be altered in ways difficult to detect or track by a static analyzer.

Examples of DDNAMEs not recorded are those where the DDNAME is built during program execution, overlaying a placeholder setting in the DCB macro call or DDNAMEs modified via XODDMOD. Likewise, the base tool does not handle dynamically allocated files, as this requires execution monitoring.

COPY Instructions

Programs are scanned for COPY statements. Upon encountering a COPY statement, the name of the copylib is to name a COPYBOOK entity, and a PRG COPY relational entity is introduced between the COMPILATION_UNIT and COPYBOOK entities.

COPY statements are not expanded during analysis of a compilation unit.

Not expanding COPY instructions has several consequences, among which are the following:

  • Virtually any of the instructions which trigger actions by the scanner may be buried within copylibs abstractions.

Thus, some of the input-output behavior of a program may be missed, or the naming of its entry points, or the declaration or use of its variables. This is not entirely undesirable, as it localizes the analytical result for a compilation unit to its primary program file. Copylib files may be independently processed to derive their entities and relations, which would then be stored with containment relations relative to the COPYBOOK entities. The net result may be information that is more usefully tracked via repository navigation.

  • DSECT declarations terminate (in preprocessed code) with a CSECT, another DSECT, or an END instruction.

The scanner, however, analyzes code that has not been preprocessed. The occurrences of these terminators may be buried in the expansion of a COPY instruction, the consequence of which is that the termination may not be recognized, and subsequent code in the COMPILATION_UNIT may be interpreted as within the DSECT when in fact it is not. (To mitigate this problem, the scanner regards any machine instruction as the termination of a DSECT.)

External Calls

The SD HLASM CA Scanner tracks use of several conventional mechanisms that allow HLASM programs to make calls to external programs. These are specifically described below. To understand the scheme, keep in mind that an HLASM program file is treated as a COMPILATION_UNIT, that a COMPILATION_UNIT has various ENTRY points, and that calls to external programs are made by reference to their ENTRY points. The ultimate connectivity suggested by these relations involves determining the containment relationship between the called ENTRY points and their respective COMPILATION_UNIT entities. These can only be resolved with linker information, which as absent from the source code analysis but that might be obtained by the repository by other means.

Instruction Reference

ENTRY, CSECT, RSECT, and START Instructions

For any ENTRY, CSECT, RSECT, or START instruction with a name entry, an ENTRY entity is created bearing the name. These and all subsequent ENTRY entities will be related to the COMPILATION-UNIT by a CONTAINS relation. (This is derived by the repository loader by virtue of the ENTRY entities being physically nested within the COMPILATION_UNIT entity.)

The LOAD Instruction

Example:

LOAD EP=CSZIPSC

If the EP parameter has a name as a value, the scanner will record an ENTRY entity of that name and a PRG_LINK relation from the COMPILATION_UNIT which the LOAD instruction appears to the entity named from the EP setting.

The LINK Instruction

Example:

LINK EP=IDCAMS,PARAM=(IDCAMPRM,DDLIST),VL=1

The LINK macro with an EP setting of a direct name indicates a call to the entry point by that name. The scanner will record an ENTRY entity of that name and a call from the COMPILATION_UNIT in which the LINK instruction appears to the entity named from the EP setting.

The ATTACH Instruction

As with LINK.

EXTRN/V

In all discussion below, we record the connection to an external entry point only if the name of the entry point is declared in an EXTRN declaration or used in a Vcon setting, e.g., V(PROGNAME).

The CALL Instruction

The CALL macro performs a call to the entry point named in its first argument. The scanner records an ENTRY entity with the given name and PRG_LINK relation between the COMPILATION_UNIT and the ENTRY entity.

The XCTL/XCTLX Instructions

Example:

XCTL (2,12),EPLOC=XCTLEP

The XCTL macro names its target with an EPLOC parameter setting. If this is a direct name, the scanner will record the name as a ENTRY entity and a PRG_LINK relation from the current COMPILATION_UNIT to that entity.

The SETAF Instruction

The SETAF macro performs a call to the "function name" or entry point named as the first argument. The named entry point will be recorded as an ENTRY entity and the call from will be represented by a PRG_LINK relation between the current COMPILATION_UNIT and the ENTRY entity.

The SETCF Instruction

The SETCF macro also performs a call to its first argument, but in some use cases this first argument may be an opcode attribute reference or a type attribute reference instead of a character expression (literal name or string). Only calls where the argument is a literal name or string will be recorded, and in the same manner as with SETAF.

BALR, BASR, BAL, BAS, and Other Branching Instructions

Several HLASM branching instructions, can be used for performing calls to subroutines and external programs, including BALR (branch and link register), BASR (branch and save register), BAL (branch and link), and BAS (branch and save). For purposes of connectivity analysis, these instructions all have identical form and semantics, varying only in their suitability for different situations and in treatment of a few special cases. Each takes two arguments, the name of a register to contain the return address, and a destination address, typically represented as a register but also possibly as an explicit entry point name.

In most common use cases, the destination address used for external calls in these branch instructions is computed dynamically, for example via indirect reference. As a static analysis tool, the ALC~HLASM CAScanner does not perform the symbolic execution necessary to resolve these references. Therefore, these branching statements receive no special treatment.

Heuristic Detection of External Calls

Rather, the scanner employs a heuristic that allows it to capture not only the directly named transfers (required by contract), but also computed transfers resolving ultimately to named entry points.

Connections to external programs are suggested by EXTRN and Vcon, e.g.
V(FOONAME), instructions. As references to external data are handled via the USING mechanism, typically in conjunction with a DSECT, the practical and conventional use of EXTRN and Vcons is limited to resolution of references to external control entry points.

That being the case, the appearance of a name in an EXTRN instruction or a Vcon is sufficient to interpret that name as the target entry point of a program call. As the granularity of linkage desired in the repository is simply that the calling compilation unit calls to the target entry point, the existence of the EXTRN or Vcon for the entry point within the calling program is sufficient to establish that a call will or could take place. Not only is no symbolic execution or data flow analysis required to discover all the candidate external entry points, but the vagaries inherent in such static analysis are avoided, simply by noticing that the declarations of the external entry points exist.

Therefore, the scanner interprets any name appearing in an EXTRN or Vcon within a compilation unit as an external entry point. These are declared as ENTRY entities, and a PRG_LINK relation recording a call from current COMPILATION_UNIT entity to the ENTRY point is created.

Macros

Macros definitions are recorded as MACRO entities bearing the name of the macro, and the containment relationship of a macro within a compilation unit is expressed in scanner output by nesting the MACRO entity within the COMPILATION_UNIT entity. Within macro definitions, the information recorded by the scanner includes only data (named DS and DC) and DSECT declarations, nested macro declarations, and nested calls to the COPY macro.

Upon encountering a macro call, the scanner will create a MACRO entity representing the macro, if it does not already exist. It will also create a USES_MACRO relation between the current COMPILATION_UNIT and the MACRO entity.

The CA Repository meta-model will need to be extended to accommodate the requested recording of macro uses by an HLASM program.

Macros will not be expanded during analysis of a compilation unit.

Not expanding macros has several consequences, among which are the following:

  • Virtually any of the instructions which trigger actions by the scanner may be buried within macro abstractions.

Thus, some of the input-output behavior of a program or the naming of its entry points may be missed. This is not entirely undesirable, as it localizes the analytical result for a compilation unit to its primary program file. Moreover, copylib files are typically used to contain macro definitions, and these may be independently processed to derive their entities and relations independently. The net result may be information that is more usefully tracked via repository navigation.

  • DSECT declarations terminate (in preprocessed code) with a CSECT, another DSECT, or an END instruction.

The scanner, however, analyzes code that has not been preprocessed. The occurrences of these terminators may be buried in the expansion of a macro call, the consequence of which is that the termination may not be recognized, and subsequent code in the COMPILATION_UNIT may be interpreted as within the DSECT when in fact it is not. (To mitigate this problem, the scanner regards any machine instruction as the termination of a DSECT.)

Calls To Stored Procedures

Calls to stored procedures within HLASM programs are assumed to occur only within EXEC SQL passages. These are treated as such by the COBOL Scanner.

Variables Used Within the Program

Variables are declared in HLASM with DS and DC instructions. These will be treated similarly by the scanner. Only DC and DS instructions with name entries are recorded by the scanner. For any such instruction, the scanner records the named data as a GROUP entity and also a PRG_DATA relation between the current COMPILATION_UNIT entity and the GROUP entity.

The scanner does not currently record the nesting of named or unnamed HLASM data elements within named GROUP entities.

When a reference to a named variable occurs, the scanner creates a
REFERENCES_VARIABLE relation between the COMPILATION_UNIT entity and the GROUP entity.

DSECT declarations are treated differently, as they are not actually variable data declarations, but rather, types. Upon encountering a named DSECT instruction, the scanner creates a RECORD entity. It also creates a CONTAINS_DSECT relation linking the current COMPILATION_UNIT entity and the DSECT entity. Any named DS or DC instruction within the "body" of the DSECT is recorded not as a GROUP entity, but rather as a RECORD_GROUP entity, and DSECT_CONTAINS relation is created linking the RECORD and the RECORD_GROUP entities. Both the CONTAINS_DSECT and the DSECT_CONTAINS relations are expressed in the scanner output via physical XML containment rather than by separately expressed relation declarations.

See the discussion on macros regarding an issue identifying the termination of the lexical scope of a DSECT.

Files Used Within the Program

A "file" to an HLASM program is a DDNAME. All DDNAMEs will be identified by the scanner and recorded as entities associated with a program, along with whether the "file" is opened for read, write, or update. See the discussion of File Input/Output above.

Calls to Databases, Including the Type of Access to Tables and Columns Affected
Calls to supported databases are assumed to occur only within EXEC SQL passages and are recorded in the same manner as the COBOL scanner.