HP Fortran
Release Notes for Tru64 UNIX Systems

1.8.3 Version 5.3 ECO 01 HPF New Features

The following information pertains to HPF using MPI.

Overview of HPF and MPI

The Compaq Fortran compiler now generates code that uses MPI as its message-passing library instead of PSE's HPF-specific support. The compiler provides a choice of three different variants of MPI: one for Compaq's SC supercomputer systems, one that supports shared-memory and Memory Channel interconnects, and public domain MPI for other interconnects that include Ethernet and FDDI.

It is now possible to write HPF programs that also call or use MPI (such as distributed-memory libraries that invoke MPI). The compiler's MPI runtime library uses its own private MPI "communicator" so it won't interfere with other MPI code. A new example program, /usr/examples/hpf/call_mpi.f90, illustrates this.

You enable the new MPI-based runtime library, that supports Compaq Fortran's HPF directives, by adding the -wsf_target option. This option, which requires an argument, belongs in the compilation and link commands.

Compiling HPF Programs for MPI

You must now specify which variant of MPI support you wish to use for HPF programs by including the option -wsf_target with an MPI selection (argument target ) in the command to the f90 compiler. An example is next that selects Compaq MPI.

% f90 -wsf 2 -wsf_target cmpi -c lu.f90

An expansion of this example is next that invokes both the compiler and linker.

% f90 -wsf 2 -wsf_target cmpi -o lu lu.f90

The values of target in the option -wsf_target target appear next with their explanations.

target Explanation

smpi SC (Quadrics) MPI
This MPI comes installed on SC-series systems. It works with the SC's RMS software that provides a set of commands for launching MPI jobs, scheduling these jobs on SC clusters, and performing other miscellaneous tasks.

cmpi Compaq MPI
This MPI is a version that is specifically tuned for Alpha systems. It is distributed as a Compaq layered product. Compaq MPI supports only Memory Channel clusters and shared-memory (SMP) machines.

gmpi Generic MPI
This target is for use with MPICH V1.2.0 or other compatible libraries. MPICH is a public domain implementation of the MPI specification that is available for many platforms. You can obtain this implementation from http://www-unix.mcs.anl.gov/mpi/mpich/. MPICH V1.2.0 supports many interconnection networks including Ethernet, FDDI, and other hardware. Using Compaq Fortran and HPF with this MPI is, officially, not supported. Compaq does not guarantee support of problems caused by specifying -wsf_target gmpi . However, Compaq remains quite interested in receiving problem reports and will attempt to respond to them.

`target`	Explanation
`smpi`	SC (Quadrics) MPI This MPI comes installed on SC-series systems. It works with the SC's RMS software that provides a set of commands for launching MPI jobs, scheduling these jobs on SC clusters, and performing other miscellaneous tasks.
`cmpi`	Compaq MPI This MPI is a version that is specifically tuned for Alpha systems. It is distributed as a Compaq layered product. Compaq MPI supports only Memory Channel clusters and shared-memory (SMP) machines.
`gmpi`	Generic MPI This target is for use with MPICH V1.2.0 or other compatible libraries. MPICH is a public domain implementation of the MPI specification that is available for many platforms. You can obtain this implementation from http://www-unix.mcs.anl.gov/mpi/mpich/. MPICH V1.2.0 supports many interconnection networks including Ethernet, FDDI, and other hardware. Using Compaq Fortran and HPF with this MPI is, officially, not supported. Compaq does not guarantee support of problems caused by specifying `-wsf_target gmpi` . However, Compaq remains quite interested in receiving problem reports and will attempt to respond to them.

If the command to the f90 compiler includes -wsf_target target , then the command must also include -wsf .

Another way of specifying the version of MPI to the compiler, instead of using the option -wsf_target , is to set the environment variable DECF90_WSF_TARGET to a value in the first column of the previous table. For example, the command

% f90 -wsf 2 -wsf_target cmpi -c lu.f90

is equivalent to the commands

% setenv DECF90_WSF_TARGET cmpi % f90 -wsf 2 -c lu.f90

If an f90 command contains -wsf_target with a value (such as cmpi ) and environment variable DECF90_WSF_TARGET is set to a different value, then the value in the f90 command overrides the value of the environment variable.

Using the environment variable to select the desired MPI variant is the recommended method. This will require the fewest changes to existing scripts for building HPF programs, and will allow users generating code for more than one MPI variant to do so more easily. Compaq additionally recommends setting the environment variable in your shell initialization file (e.g. .cshrc if you use 'csh'), particularly if you usually use only one MPI variant.

A table, showing all changes to HPF-related compiler options between Fortran V5.3 and V5.3 ECO 01, is next.

Fortran V5.3 Fortran V5.3 ECO 01

-assume bigarrays No change

-assume nozsize No change

-hpf_matmul Deleted

-nearest_neighbor No change

-nowsf_main No change (but currently does not work)

-pprof Use only with -wsf_target pse

-show hpf* No change

-show wsfinfo No change

-wsf No change

--- -wsf_target target

Fortran V5.3	Fortran V5.3 ECO 01
`-assume bigarrays`	No change
`-assume nozsize`	No change
`-hpf_matmul`	Deleted
`-nearest_neighbor`	No change
`-nowsf_main`	No change (but currently does not work)
`-pprof`	Use only with `-wsf_target pse`
`-show hpf*`	No change
`-show wsfinfo`	No change
`-wsf`	No change
---	`-wsf_target target`

Linking HPF Programs with MPI

You must now specify which variant of MPI support you wish to use for HPF programs by including the option -wsf_target with an MPI selection (argument target ) in the link command. An example is next.

% f90 -wsf 2 -wsf_target cmpi -o lu lu.o

The values of target come from the table in the section "Compiling HPF Programs for MPI".

If you specified generic MPI at compilation time, either by including the -wsf_target gmpi option or by setting the environment variable DECF90_WSF_TARGET to gmpi, you must specify a path to the desired generic MPI library during linking. Do this in one of these ways:

Set the environment variable DECF90_GMPILIB to the path of the desired generic MPI library to link with
In the link command line, include -l (possibly along with -l ) with the path of the desired generic MPI library to link with. Or, explicitly add the library to the link command line.

An example of a link command for a generic MPI library is next.

% f90 -wsf 2 -wsf_target gmpi -o lu lu.o /usr/users/me/libmpich.a

In addition, you must have the Developer's Tool Kit software installed on your computer to link properly with the option -wsf_target gmpi .

Finally, programs linked with -wsf_target and an MPI target must be linked with -call_shared (which is the default); the -non_shared option does not link correctly.

Running HPF Programs Linked with MPI

The dmpirun command executes program files created with the -wsf_target cmpi option. Include the -n n option in the command line where n is the same value of -wsf n in the compilation command line. Or, if no value was given with the -wsf option, then set n to the desired number of peers. Also include the name of the program file.

An example is next where the compilation command line included -wsf 4 and the name of the program file is heat8.

% dmpirun -n 4 heat8

If your AlphaServer SC system is running with Revision A of the Quadrics switch, your boot log will contain the message:

elan0: Rev A Elite network detected - disabling adaptive routing (1)

To make MPI programs (including HPF programs generated with the "-wsf_target smpi" option) run properly with Revision A hardware, you need to set the LIBELAN_GROUP_HWBCAST environment variable to DISABLE; for example, from csh:

% setenv LIBELAN_GROUP_HWBCAST DISABLE

The manpage dmpirun contains a full description of this command.

The prun command executes program files created with the -wsf_target smpi option. Include the -n n option in the command line where n is the same value of -wsf n in the compilation command line. Or, if no value was given with the -wsf option, then set n to the desired number of peers. Also include the name of the program file.

An example is next where the compilation command line included -wsf 4 and the name of the program file is heat8.

% prun -n 4 -N 4 heat8

The mpirun command executes program files created with the -wsf_target gmpi option. Include the -np n option in the command line where n is the same value of -wsf n in the compilation command line. Also include the name of the program file. The mpirun command varies according to where you installed the generic MPI.

An example is next where the compilation command line included -wsf 4 and the name of the program file is heat8.

% /usr/users/me/mpirun -np 4 heat8

In the /usr/examples/hpf directory, there is a sample script that will launch an HPF program for any variant of MPI. This script, called "hpfrun", will even determine the number of processors a source program was compiled for (if that was specified at compile time), and invoke the proper MPI run command with the number of processors specified. Portions of the script, or the entire script, may be useful for users automating the building and running of HPF programs.

Cleaning up After Running HPF Programs Linked with MPI

Execution of the dmpirun command (but not the prun and mpirun commands) may leave various system resources allocated after the program has completed. To free them, give the mpiclean with no arguments. An example is next.

% mpiclean

Changing HPF Programs for MPI

There two changes you should make to Fortran source files before compiling them for MPI. If a module contains an EXTRINSIC (HPF_LOCAL) statement and it executes on a system different from peer 0, then its output intended for stdout may (depending on the variant of MPI used) go instead to /dev/null. Change such modules or your execution commands to have the extrinsic subroutine do input/output only from peer 0.

In addition, the ability to call parallel HPF subprograms from non-parallel (Fortran or non-Fortran) main programs, is not supported in this release. For more information, see Chapter 6 of the DIGITAL High Performance Fortran 90 HPF and PSE Manual.

1.8.4 Version 5.3 New Features

The following new Compaq Fortran features are now supported:

The following new features are now supported:
- You can now CALL a function. In other words, a routine that is declared to be a FUNCTION can be invoked by a CALL statement. The function's return value is discarded.
- Compaq Fortran now supports COMPLEX(KIND=16), also spelled COMPLEX*32. This is a complex number composed of two 128-bit extended floating point numbers (ie, REAL(KIND=16)). Complete documentation is in the updated Compaq Fortran Language Reference Manual as well as the /usr/lib/cmplrs/fort90/decfortran90.hlp help file. Here are some highlights:
  - COMPLEX*32 or COMPLEX(KIND=16) declares a pair of REAL*16 128-bit reals as a complex pair. It is 32 bytes big.
  - COMPLEX*32 constants are (x,y) where at least one of x and y is a REAL*16 constant, eg, (1,2Q0).
  - COMPLEX arithmetic supports + - * / ** . Mixed type arithmetic converts everything up to COMPLEX*32 since COMPLEX*32 is the biggest.
  - COMPLEX*32 can be read and written in all I/O forms.
  - Command line option "-real_size 128" forces "COMPLEX" to be COMPLEX*32 and DOUBLE COMPLEX to be COMPLEX*32. "-double_size 128" forces DOUBLE COMPLEX to be COMPLEX*32.
  - Intrinsic generic functions that take COMPLEX now take COMPLEX*32. New specific intrinsic functions for COMPLEX*32 are CQABS, QIMAG, QCONJG, CQCOS, CQEXP, CQLOG, QREAL, CQSIN, CQSQRT, QCMPLX.
  - Operations involving a COMPLEX*16 and a REAL*16 now produce a COMPLEX*32 result. These used to produce a COMPLEX*16 result.
- The BUFFERED= keyword has been added to the OPEN and INQUIRE statements. The default is BUFFERED='NO' for all I/O, in which case the RTL empties its internal buffer for each WRITE. If BUFFERED='YES' is specified and the device is a disk, the internal buffer will be filled, possibly by many WRITE statements, before it is emptied.
  If the OPEN has BUFFERCOUNT and BLOCKSIZE arguments, their product is the size in bytes of the internal buffer. If these are not specified, the default size is 8192 bytes. This internal buffer will grow to hold the largest single record but will never shrink.
- Character vector constructors may now have unequal length elements. The length of each element is the maximum of the element lengths. For example,
  (/ 'ab', 'abc', 'a' /) == (/ 'ab ', 'abc', 'a ' /)
The Compaq Extended Math Library (CXML) routines are updated in the Compaq Fortran kit. See the CXML release notes in:
/usr/opt/XMDCOM360/docs/XMD360_release_note.txt
The following new f90 command options are now supported:
- -arch ev67 and -tune ev67 now provide instruction set support and performance tuning for the ev67 processor (21264A chip), which adds the count extension (CIX) instructions POPCNT, LEADZ, and TRAILZ.
- -align sequence allows the components of a SEQUENCEd derived type to be aligned according to the alignment rules set by the user. The default alignment rules are to align components on natural boundaries. The default is -align nosequence which means components of a SEQUENCEd derived type will be packed, regardless of the current alignment rules set by the user.
- -fast now sets -align sequence so that SEQUENCEd derived type components can be naturally aligned for improved performance.
- -fast now sets -arch host -tune host .
- -assume buffered_io turns on buffered I/O for all Fortran logical units opened for sequential writing. The default is -assume nobuffered_io .
- -dname=value now allows a quoted string as value . For example, -ddate="nov 20, 1999" passes the character string nov 20, 1999 as the value of date to cpp(1) and to the Compaq Fortran 90 compiler.
- -warn hpf tells the compiler to do both syntactic and semantics checking on HPF directives. The default is -warn nohpf unless -wsf is specified, in which case -warn hpf is assumed.
- -f77rtl tells the compiler to use the run-time behavior of Compaq Fortran 77 instead of Compaq Fortran 90. For example, this affects the output form for NAMELIST. The default is -nof77rtl .
- -mixed_str_len_arg tells the compiler that the hidden length passed for a character argument is to be placed immediately after its corresponding character argument in the argument list. The default is -nomixed_str_len_arg , which places the hidden lengths in sequential order at the end of the argument list.
- The file suffix .f90 now tells the driver that the file contains Fortran 90 free-form source that must be preprocessed by cpp(1) . cpp(1) produces an intermediate .i90 file that is then compiled.

1.8.5 Version 5.3 Important Information

Some important information to note about this release:

As of Compaq Fortran V5.3, the f77 command executes the Compaq Fortran 90 compiler instead of the Compaq Fortran 77 compiler. Use f77 -old_f77 to execute the Compaq Fortran 77 compiler.
There are four INCLUDE files in /usr/include that give definitions of DFAO RTL symbols:
- for_fpe_flags.f - flags for for_set/get_fpe(3f)
- fordef.f - return values for the fp_class intrinsic
- foriosdef.f - values for STAT= IO status results
- forompdef.f - interface blocks to the omp_* routines
- forreent.f - flags for for_set_reentrancy(3f)
PARAMETER constants ae now alloacted in a read-only PSECT.
Files that contain declarations that will be INCLUDEd into source code should declare data fully so that command line options used to compile the source code do not unexpectedly affect the INCLUDEd declarations. For example, if I is declared INTEGER, then using the -i2 changes I from INTEGER*4 to INTEGER*2. If I is declared INTEGER*4, then its definition is not affected by -i2 .

1.8.6 Version 5.3 Corrections

From version X5.2-829-4296F ECO 01 to FT1 T5.3-860-4498G, the following corrections have been made:

Fix problem with wrong generated code if an OPTIONAL and omitted descriptor-based dummy argument is passed as an actual argument to a routine which declares that argument as OPTIONAL.
Fix problem where ASSOCIATED did not always return the correct result for a pointer component that was transferred via pointer assignment.
Enable display of array bounds larger than 32 bits in listing summary.
Fix internal compiler error for certain uses of defined assignment where multiple defined operators appeared in the right-hand side of the assignment.
Add /ALIGN=SEQUENCE (/ALIGN:SEQUENCE, -align sequence) which specifies that SEQUENCE types may be padded for alignment.
Make the default for BLANK= in OPEN match the documentation when the -f66 (/NOF77) switch is specified, which is to default to BLANK='ZERO'. Previously, BLANK='NULL' was used regardless.
Allow array constructors to have scalar CHARACTER source elements of varying size.
Correct problem where a call to a routine with the C and VARYING attributes generates incorrect code.
Make sure that -g3 does not turn off optimization.
Fix internal compiler error for statement function which uses the function return variable of the host function.
Fix internal compiler error for incorrect program which uses an component of a derived type variable in an automatic array bounds expression, the derived type is undefined and IMPLICIT NONE is used.
Fix internal compiler error when RESULT variable has same name as a previously seen FUNCTION.
Fix problem with PUBLIC/PRIVATE attributes in a particular complicated module usage.
Eliminate spurious error message for valid generic procedure reference.
Fix problem with DATA initialization of zero-origin arrays.
Fix problem where compiler would not allow "# linenum" to appear in a source file if a !DEC$ or !MS$ directive was seen.
Don't give "unused" warning for EQUIVALENCEd variable.
Properly treat INT(n,KIND=) in an array constructor.
Don't disable type checking for %LOC.
Properly parse generic INTERFACE whose name begins with TO.
When -align dcommons is used, make sure that POINTER objects in COMMON are aligned on quadword boundaries.
Correctly parse program with IF construct whose name begins with IF.
Fix a case where two NaNs sometimes compared as equal.
If an attempt is made to DEALLOCATE an item which is not DEALLOCATEable, such as an array slice, a run-time error is now given. Previously, the results were unpredictable.

From version FT1 T5.3-860-4498G to FT2 T5.3-893-4499U, the following corrections have been made:

Allocate all PARAMETER constants in a read-only PSECT.
Ensure that locally-allocated derived-type arrays are naturally aligned.
Generate correct code for pointer assignment of an array generated from a section of a derived type.
Eliminate internal compiler error in certain cases with dummy argument that has OPTIONAL and INTENT(OUT) attributes.
Flag square-bracket array constructor syntax as an extension.
Eliminate internal compiler error for certain uses of TRANSFER.
Properly detect ambiguous generic reference when all distinguishing arguments are OPTIONAL.
Eliminate internal compiler error for a case involving a PRIVATE POINTER in a module.
Eliminate spurious "this name has already been used as an external procedure" error for recursive function which returns a derived type.
"Directive not supported on this platform" diagnostic is now informational, not warning severity.
Allow array sections in DATA statement variable list.

From version FT2 T5.3-893-4499U to V5.3-915-449BB, the following corrections have been made:

Eliminate access violation on some platforms for ALLOCATE of pointer in a derived type.
Correct problem where compiler could omit putting out declaration for a routine symbol.
Handle non-present, optional dummy arguments as third argument to INDEX, SPAN, and VERIFY.
Generate correct code when passing character array slices as arguments.
Fix case of contiguous array slice as first argument to TRANSFER.
Fix INQUIRE by IOLIST of ALLOCATABLE arrays.
Correct problem involving pointer assignment with sections of a derived type.
Eliminate inappropriate error messages when overloading SIGN intrinsic.
Eliminate internal compiler error when "-" defined as both unary and binary operators in separate modules.
Eliminate spurious unused warning for pointer target.
Implement OMP interpretation regarding DEFAULT(NONE).
Eliminate spurious standards diagnostic for !DEC$ UNROLL.
Correct problem with accessibility of NAMELIST names from module.
When -real_size 64 and -double_size 128 are used, make sure DOUBLE PRECISION gets REAL*16.
Correct evaluation of FLOAT intrinsic with -real_size 64.
Correct problem with array constructors in format expressions.

1.8.7 HPF in Compaq Fortran Version 5.3

As in Fortran 90 Version 5.2, the HPFLIBS subset replaces the old PSESHPF subset. If you previously installed the PSESHPF subset you do not need to delete it. If you choose to delete it, delete it before you install the Fortran 90 V5.3 HPFLIBS170 subset. If you delete the PSESHPF subset after you install the Fortran HPFLIBS170 subset, you need to delete the HPFLIBS170 subset and then reinstall it. For information on using the setld command to check for and delete subsets, see the Compaq Fortran Installation Guide for Tru64 UNIX Systems.

To execute HPF programs compiled with the -wsf switch you must have both PSE160 and Fortran 90 Version 5.3 with the HPFLIBS170 subset installed. For this release the order of the installation is important. You must first install PSE160 and then install Fortran 90 Version 5.3 with the HPFLIBS170 subset. The HPFLIBS170 subset must be installed last. If you do this it will be properly installed.

If you also need to use the latest versions of MPI and PVM, you must install PSE180. PSE180 contains only MPI and PVM support. The support for HPF programs compiled with the -wsf option is only found in PSE160. Therefore you must install both versions of PSE and you must install PSE180 after PSE160.

To install Compaq Fortran with HPF and MPI and PVM, install them in the following order. The order is very important.

Delete any old versions that you wish to delete.
Install PSE160.
Install Compaq Fortran Version 5.3 including the HPFLIBS170 subset.
Install PSE180.

The HPF runtime libraries in Compaq Fortran Version 5.3 are only compatible with PSE Version 1.6. Programs compiled with this version will not run correctly with older versions of PSE. In addition, programs compiled with older compilers will no longer run correctly when linked with programs compiled with this version. Relinking is not sufficient; programs must be recompiled and relinked.

If you cannot install these in the order described, follow these directions to correct the installation:

If you have installed Fortran Version 5.3 but are missing PSE160, then install PSE160. Delete the HPFLIBS170 subset of Fortran V5.3 and then reinstall the HPFLIBS170 subset.
If you installed Fortran Version 5.3 first and then PSE160, then delete the HPFLIBS170 subset of Fortran V5.3. Next, reinstall the HPFLIBS170 subset.
If you already have Fortran Version 5.3 and PSE160 installed but did not install the HPFLIBS170 subset of Fortran V5.3, then simply install the HPFLIBS170 subset.
If you deleted any old PSESHPF subset after installing Fortran V5.3, this will also cause problems. In this case delete the HPFLIBS170 subset of Fortran Version 5.3 and then reinstall the HPFLIBS170 subset.
If you installed PSE180 before PSE160, then delete PSE180 and reinstall it now.

For more information about installing PSE160, see the Compaq Parallel Software Environment Release Notes, Version 1.6.

For more information about installing PSE180, see the Compaq Parallel Software Environment Release Notes, Version 1.8.

1.8.8 Version 5.3 Known Problems

The following known problems exist with Compaq Fortran Version 5.3:

The following is a list of known problems for -omp parallel support in Version 5.3:
- Nested parallel regions are not supported by -omp . A program that contains nested parallel regions will cause the compiler to fail with an internal error.

1.9 New Features, Corrections, and Known Problems in Version 5.2

Version 5.2 is a minor release that includes corrections to problems discovered since Version 5.1 was released and certain new features.

The following topics are discussed:

1.9.1 Version 5.2 ECO 01 New Features

The following new Compaq Fortran (DIGITAL Fortran 90) features are now supported:

IVDEP Directive
The IVDEP directive assists the compiler's dependence analysis. It can also be specified as INIT_DEP_FWD (INITialize DEPendences ForWarD). The IVDEP directive takes the following form:
cDEC$ IVDEP c Is one of the following: C (or c), !, or *.
The IVDEP directive is an assertion to the compiler's optimizer about the order of memory references inside a DO loop.
The IVDEP directive tells the compiler to begin dependence analysis by assuming all dependences occur in the same forward direction as their appearance in the normal scalar execution order. This contrasts with normal compiler behavior, which is for the dependence analysis to make no initial assumptions about the direction of a dependence.
The IVDEP directive must precede the DO statement for each DO loop it affects. No source code lines, other than the following, can be placed between the IVDEP directive statement and the DO statement:
- An UNROLL directive
- A PARALLEL DO directive (TU*X only)
- A PDO directive (TU*X only)
- Placeholder lines
- Comment lines
- Blank lines
The IVDEP directive is applied to a DO loop in which you know that dependences are in lexical order. For example, if two memory references in the loop touch the same memory location and one of them modifies the memory location, then the first reference to touch the location has to be the one that appears earlier lexically in the program source code. This assumes that the right-hand side of an assignment statement is "earlier" than the left-hand side.
The IVDEP directive informs the compiler that the program would behave correctly if the statements were executed in certain orders other than the sequential execution order, such as executing the first statement or block to completion for all iterations, then the next statement or block for all iterations, and so forth. The optimizer can use this information, along with whatever else it can prove about the dependences, to choose other execution orders.
Example
In the following example, the IVDEP directive provides more information about the dependences within the loop, which may enable loop transformations to occur:
!DEC$ IVDEP DO I=1, N A(INDARR(I)) = A(INDARR(I)) + B(I) END DO
In this case, the scalar execution order follows:
1. Retrieve INDARR(I).
2. Use the result from step 1 to retrieve A(INDARR(I)).
3. Retrieve B(I).
4. Add the results from steps 2 and 3.
5. Store the results from step 4 into the location indicated by A(INDARR(I)) from step 1.
IVDEP directs the compiler to initially assume that when steps 1 and 5 access a common memory location, step 1 always accesses the location first because step 1 occurs earlier in the execution sequence. This approach lets the compiler reorder instructions, as long as it chooses an instruction schedule that maintains the relative order of the array references.
UNROLL Directive
The UNROLL directive tells the compiler's optimizer how many times to unroll a DO loop. It takes the following form:
cDEC$ UNROLL [(n)] c Is one of the following: C (or c), !, or *. n Is an integer constant. The range of "n" is 0 through 255.
The UNROLL directive must precede the DO statement for each DO loop it affects. No source code lines, other than the following, can be placed between the UNROLL directive statement and the DO statement:
- An IVDEP directive
- A PARALLEL DO directive (TU*X only)
- A PDO directive (TU*X only)
- Placeholder lines
- Comment lines
- Blank lines
If "n" is specified, the optimizer unrolls the loop "n" times. If "n" is omitted, or if it is outside the allowed range, the optimizer picks the number of times to unroll the loop.
The UNROLL directive overrides any setting of loop unrolling from the command line.

Some important information to note about this release:

-fast now implies "-arch host -tune host" as defaults. These can be overridden with explicit options. Note that this has an impact on redistributed programs - if they are to run on older generation processors than the compiling host, -arch, at least, must be overridden.
The command line option "-source_listing" is not documented but it produces a listing file with a file extension of ".lis" {as opposed to "-V" which produces a .l listing file}.
This ECO release includes the two subsets XMDLOA351 (DXML serial libraries) and XMDPLL351 (DXML parallel libraries).
Note that there is an installation order issue with PSE: PSE160 should be installed BEFORE Fortran, since the Fortran kit has newer HPF libraries. If you are also using MPI and/or PVM, then there is also a dependency with the latest MPI/PVM kits, which are in PSE V1.8 (PSE180): the installation order needs to be PSE160 then Fortran then PSE180 OR PSE160 then PSE180 then Fortran.

From version V5.2-705-428BH to X5.2-829-4296F, the following corrections have been made:

Correct a problem with PACK when the first argument is a two-dimensional slice of a three-dimensional array.
Correct problem with ADJUSTL, ADJUSTR and COTAN with array element arguments.
Fix internal compiler error for certain uses of LL* intrinsics.
Prevent internal compiler error when the size of a return value is based on a call to a pure function with the argument to this function.
Correct problems with nested uses of SPREAD intrinsic.
Make ASSOCIATED return the correct result when the target is an element of a deferred-shape array.
Correct a problem with a USE...ONLY of some symbols from an EQUIVALENCE group in a module. Previously, the compiler might generate an external reference to the wrong symbol.
Correct a problem with EOSHIFT of a structure array with a multidimensional structure component.
Eliminate the unnecessary use of temporary array copies in many cases.
Add support for specific names IMVBITS, JMVBITS and KMVBITS (already documented).
Correct a problem where calling an ELEMENTAL routine with a pointer array may give incorrect results.
Fix transfer intrinsic where the MOLD is a character substring with non-zero base, e.g., TRANSFER(X, CH(I1:I2)).
Fix problem where CSHIFT of an array of derived type generated bad code.
Correct problem with pointer assignment when the right-hand-side is an array of derived types.
Correct problems involving function return value whose size depends on properties of input arguments.
Fix problem that caused internal compiler error with RESHAPE.
Fix problem where IBCLR of mixed-kind arguments gave wrong answer.
When fpp is invoked, have it also look in the current directory for include files.
Correct problem with I/O of a slice of an assumed-size array.
Issue error message for lexically nested parallel regions.
In listing summary, list zero-length COMMON PSECTs.
Eliminate spurious warning when passing a POINTER or assumed-shape array in COMMON to a routine with a compatible dummy argument declaration.
Fix internal compiler error involving array-valued functions with entry points.
Generate correct code for unusual (and non-standard) dummy aliasing case involving an EQUIVALENCEd variable passed as an argument.
Fix problem with incorrect code for a call to ALLOCATE or DEALLOCATE where STAT= is specified using an array element.
-fast now implies -arch host -tune host as defaults. These can be overridden with explicit options. Note that this has an impact on redistributed programs - if they are to run on older generation processors than the compiling host, -arch, at least, must be overridden.
Fix internal compiler error for certain programs which CALL a function.
Correct compiler abort with ASSOCIATED (X,(Y))
Don't give standards warning for ELEMENTAL PURE.
Consider FORALL index variables "used" for -warn unused purposes.
Disallow leading underscore in identifiers, as documented.
Correct problem with implied DO loop in non-INTEGER array constructors in initialization expressions.
Allow expression involving array constructors in an initialization expression.
%LOC is treated the same as LOC for type checking purposes.
Correct problem involving generic routine resolution.
SEQUENCE now byte-packs fields, as the documentation says.
Correct compiler abort with RESHAPE in initialization expression.
Correct compiler abort for case with defined operators.
Correct compiler abort for syntax error X(;,:)
Give appropriate error if DO loop variable is too small for range.
Correct compiler abort for LEN_TRIM(array) in initialization expression.
Correct compiler abort for SIZE(non-array).
Correct problems with ISHFT(array) in initialization expression.
Allow SHAPE in initialization expression.
Don't give standards warning for use of INDEX in initialization expression.
Consider statement function dummy argument "used" for /warn=unused.
Correct compiler abort for invalid syntax in a Variable Format Expression (VFE).
Correct compiler abort for module procedure with ENTRY.
Allow full set of F95-permitted intrinsic functions in specification expressions.
Correct compiler abort with invalid VFE in FORMAT.
Correct problem with accessibility of MODULE symbols when two modules define the symbol but one has marked it PRIVATE.
Correct compiler abort for certain programs when -i8 and -wsf specified.
Correct problem with missing and duplicate alignment warnings.
Allow repeated NULL() in DATA initialization when variables have different types.
Correct spurious "shapes do not conform" error.
Correct compiler abort for invalid program using wrong component in ASSOCIATED.
When -names as_is specified, don't make IMPLICIT case-sensitive.
Give standards warning for Q exponent letter in floating literals.
Generate correct code for generic which replaces MIN or MAX.
Give more reasonable error message when variable used as control construct name.
Eliminate spurious message for vector-valued subscript in defined assignment.
Give error if INTENT not properly specified for defined assignment.
Correct internal compiler error for overloaded MAX.
Eliminate spurious warning for FORALL.
Give warning when INTENT(IN) argument modified in PURE FUNCTION.
Eliminate spurious error for valid DATA with array subscript.
Allow ORDER in RESHAPE to be non-constant.
Fix compiler abort with RESHAPE.
Don't give unused warning for TARGET argument used in pointer assignment.
Properly distinguish STRUCTUREs with the same name in different CONTAINed routines.
Allow NULL() to initialize a pointer to derived type.
Incorrect warning for variable IF when -omp specified.
Don't give unused warning for array constructor implied-DO variable.
Allow INTRINSIC :: name (new in F95).
Eliminate spurious standards warning for certain obscure uses of UNPACK.
Eliminate compiler abort when transformational intrinsic used (illegally) in statement function.
Raise limit of number of items in a FORMAT from 200 to 2048.
Disallow invalid INTENT keywords.
Allow CALL of a typed identifier (Compaq Fortran 77 extension).
Correct problem where USE-associated identifiers aren't seen in certain cases involving renaming.
Correctly evaluate CEIL intrinsic when used in a specification expression.
Allow SIZE intrinsic to be overloaded.
Don't issue spurious "function value has not been defined" warning for case involving ENTRY and RESULT.
Fix internal compiler error involving defined assignment.
Fix problem with incorrect CHARACTER initialization values and CHAR function.
Disallow array constructor being used to initialize a scalar.
Allow ALLOCATE/DEALLOCATE of argument to PURE SUBROUTINE.
Fix problem for certain uses of period separators for derived type fields.
Eliminate spurious syntax error for use-associated variable in NAMELIST.
Eliminate spurious syntax error for certain uses of variable format expression in FMT=.
Allow as an extension the use of a name previously seen in a CALL statement as an actual argument without an EXTERNAL statement or explicit interface.
Eliminate spurious overflow message for MS-style base-2 constant.
Correct problem with generic routine matching.
Correct internal compiler error when function return value used in statement function expression.

1.9.2 Version 5.2 New Features

Version 5.2 supports the following new features :

The following new features are now supported:
- The f90 compiler now gives "uninitialized variable" warnings at optimization levels lower than -O4.
- The RTL now has support for handling units *, 5 and 6 as separate units. Use of this feature, requires both RTL and compiler support. Programs must be compiled with a version of the compiler that implements this support and linked with or use a shareable RTL that implements the support. Older existing images will continue to work with the newer RTL. As a consequence of separating the units: if you were to connect unit 6 to a file, and then write to unit * - that write would produce output to the console (or stdout device). Previous to this, a write to unit * would go to the same file connected to unit 6. This new behavior is consistent to that of VMS and MS-FPS.
- For F90, a NAMELIST input group can start with either an ampersand (&) or dollar sign ($) in any column and can be terminated by one of a slash (/), an ampersand (&) or a dollar sign($) in any column.
The DIGITAL Extended Math Library (DXML) routines are now included in the Compaq (DIGITAL) Fortran kit.
The following new f90 command options are now supported:
- -assume gfullpath causes the full source file path to be included in the debug information. The default is -assume nogfullpath .
- -assume [no]pthreads_lock lets you select the kind of locking used for an unnamed critical section (when parallel processing is requested with -mp or -omp ). Using the default, -assume nopthreads_lock , provides the fastest performance by providing a single lock for all unnamed critical sections (but does not lock out other process threads).
  To request more restrictive locking, specify -assume pthreads_lock . This locks out all other process threads in addition to all critical sections, which slows application performance.
  When using -assume nopthreads_lock (default), enter critical is used with the _OtsGlobalLock argument. With -assume pthreads_lock , enter critical is used with the _OtsPthreadLock argument.
- -arch ev6 generates instructions for ev6 processors (21264 chips). This option permits the compiler to generate any EV6 instruction, including instructions contained in the BWX (Byte/Word manipulation instructions) or MAX (Multimedia instructions) extension, square root and floating-point convert, and count extension. Applications compiled with this option may incur emulation overhead on ev4, ev5, ev56, and pca56 processors, but will still run correctly.

1.9.3 Version 5.2 Important Information

Some important information to note about this release:

UNIX Virtual Memory from the Compaq Tru64 UNIX docset
There is a new manual in V4.0D of the docset: "System Configuration and Tuning". Section 4.7.3 from that book is "Increasing the Available Address Space".
If your applications are memory-intensive, you may want to increase the available address space. Increasing the address space will cause only a small increase in the demand for memory. However, you may not want to increase the address space if your applications use many forked processes.
The following attributes determine the available address space for processes:
vm-maxvas

This attribute controls the maximum amount of virtual address space available to a process. The default value is 1 GB (1073741824). For Internet servers, you may want to increase this value to 10 GB.
per-proc-address-space
max-per-proc-address-size

These attributes control the maximum amount of user process address space, which is the maximum number of valid virtual regions. The default value for both attributes is 1 GB.
per-proc-stack-size
max-per-proc-stack-size

These attributes control the maximum size of a user process stack. The default value of the per-proc-stack-size attribute is 2097152 bytes. The default value of the max-per-proc-stack-size attribute is 33554432 bytes. You may need to increase these values if you receive cannot grow stack messages.
per-proc-data-size
max-per-proc-data-size

These attributes control the maximum size of a user process data segment. The default value of the per-proc-data-size attribute is 134217728 bytes. The default value of the max-per-proc-data-size is 1 GB. You can use the setrlimit function to control the consumption of system resources by a parent process and its child processes. See setrlimit(2) for information.
If you try to link -non_shared a parallel application that uses -mp or -omp , you must explicitly add -lpset in addition to the libraries f90 links in.
The -nod command switch is now available to allow symbol definitions (using -d ) to be passed to fpp but not to be passed to the conditional compilation facilty inside the f90 compiler.
When -arch ev6 is used, the f90 driver will add -qlm_ev6 before -lm on the cc command so ld will look for the EV6-tuned math library.
Please note the behavior of NOWAIT reductions: each thread contributes its part, and proceeds without waiting for the final value of the reduction variable. The reduction variable's value is undefined until a synchronization operation has occurred, or the parallel region is left.
UNIX v4.0D contains ld options to restrict library searches to shared and archived libraries. See -no_so , -no_archive , and -so_archive in the ld(1) man page.
Use the setld -d option to install the software to another root directory. Everything in the installation then hangs off that root. Commands like f90 can be pointed to by PATH, the DECF90 environment variable can point to where the compiler is, -l can tell f90 where the RTL is, and the LD_LIBRARY_PATH environment variable can be used to ensure that the desired version of shareable libraries are picked up at run time.

1.9.4 Version 5.2 Corrections

From version V5.1-594-3882K to FT1 T5.2-682-4289P, the following corrections have been made:

Don't create stack temporary for character operands to ALL except when absolutely necessary.
Add -warn argument_checking warning for mismatch between INTEGER kinds with explicit interface.
Add -warn argument_checking warning for insufficent arguments.
Improve display of various diagnostic messages so that the "pointer" is more appropriate.
Fix internal compiler error when compiling a -mp or -omp program with any COMMON or EQUIVALENCED data declared in a PRIVATE, LASTPRIVATE, FIRSTPRIVATE, or REDUCTION list.
Fix problem with TRANSFER of CHARACTER items using non-1 substring offset.
Don't give use-before-defined warning for pointer structure assignment.
Allow LOC(intrinsic_name).
Allow RECORDs of empty STRUCTUREs.
Allow repeat counts in FORMATs to be up to 2147483647.
Always quadword-align EQUIVALENCE groups.
Prevent internal compiler error with very long list of -D definitions.
Correct problem relating to use of an AUTOMATIC array in a parallel region.
Allow contained function result to have dimension bounds depend upon size of one of its array arguments.
Eliminate inappropriate argument mismatch warning with record structures when -wsf is specified. Add support for -assume gfullpath, which causes the full source file path to be included in the debug information.
If -check bounds is in effect, don't optimize implied-DO in I/O as this can prevent bounds checking from occurring.
Eliminate inappropriate use-before-defined warnings when passing array slices.
Improve generated code when calling routines with INTENT(IN). Prevent an output statement (WRITE, etc.) from inhibiting use-before-defined warnings.
Improve generated code when calling intrinsic functions.
-fast or -math_library fast implies -check nopower.
Fortran 90 interpretation 100 - ASSOCIATED of two zero-sized arrays always returns .FALSE..
Eliminate internal compiler error for LOC(character-parameter-constant)
Eliminate "text handle table overflow" errors for certain programs that had very large and complicated single statements (e.g., DATA).
Allow structure field names which are the same as relational operators.
In pointer assignment, where the right-hand-side is a structure constructor, enforce the standard's requirement that the constructor expression be an allowable target.
Allow a module procedure as an actual argument.
Eliminate inappropriate error about use of PRIVATE type declared later in the module.
Eliminate parsing error where a KIND specifier is continued across multiple source lines.
Eliminate parsing error involving an assignment to a variable whose name begins with "PARAMETER".
When passing an element of a named array constant as an actual argument, make sure that sequence association works as if it had been a variable.
Correct problem with visibility of inherited identifier.
Eliminate internal compiler error for PARAMETER declaration where the constant value is an undefined identifier.
Eliminate internal compiler error involving a statement function having the same name as another routine in the same compilation.
Make severity of -warnings declarations diagnostics warning instead of error.
Eliminate internal compiler error when all source is conditionalized away.
Eliminate internal compiler error for certain programs which use TRANSFER in a PARAMETER declaration.
Allow a tab character in a FORMAT.
Assume INTEGER type for bit constants where required.
Don't sign extend result of ICHAR in a PARAMETER definition.
Eliminate internal compiler error for certain programs using functions with mask arguments.
Make !DEC$ATTRIBUTES (no space) work in any column in fixed-form.
Give proper error instead of internal compiler error when QFLOAT used on platforms that don't support REAL*16.
Don't consider a DECODE to modify the buffer argument for purposes of INTENT.
Eliminate internal compiler error for certain programs when -assume dummy_aliases is in effect.
Correct problem with certain programs using STRUCTUREs with %FILL fields.
When -real_size 64 is in effect, intrinsics with explicitly REAL*4 or COMPLEX*8 arguments are no longer inappropriately promoted to REAL*8/COMPLEX*16.
Do not cause internal compiler error for reference to undefined user operator.
Allow use of an array-constructor's implied DO variable in a specification expression.
Allow SIZE argument to be omitted to IISHFTC, JISHFTC, KISHFTC.
Make result type of IBSET, IBCLR, IBITS, etc. be type of the first argument.
Allow up to 256 arguments to an intrinsic function (e.g., MAX, MIN) in a specification expression - the previous limit was 8.
Give error for passing an array section with vector subscript to INTENT(INOUT) or INTENT(OUT) argument.
Fix internal compiler error for use in the length specification expression for a function LEN(concatenation) where one of the concatenation arguments is a passed-length argument to the function being declared.
Fix internal compiler error for use in the length specification expression for a function LEN(TRIM(arg)) where arg is a passed-length argument to the function being declared.
Treat a negative declared length for a CHARACTER variable as if it were zero.
Properly parse "ELSE IFCONSTRUCT" where CONSTRUCT is a construct name.
Give an error when an AUTOMATIC variable is DATA initialized.
Properly propagate (or not) PRIVATE attribute for nested USE.
Eliminate undeserved argument conformance error in certain cases involving WHERE masks.
Ensure that the return kind of ICHAR is "default integer", no matter what kind that is (due to integer_size switch).
Fix internal compiler error for type constructor with string argument for numeric element.
Fix internal compiler error when an INTERFACE TO block has certain syntax errors.
Correctly parse non-standard 'n syntax for REC= in I/O statement when the I/O list contains a quoted literal.
Fix problem relating to ONLY and nested USE.
Make variables whose names begin with $ have implicit INTEGER type.
Allow $ in the range for IMPLICIT (sorts after Z).
If a program has multiple USE statements where the module files cannot be found, give error messages for each of them.
Allow SIZEOF in EQUIVALENCE array index.
Fix internal compiler error with certain array initializers containing an implied DO.
Accept F95-style reference to MAXVAL, MINVAL, MAXLOC, MINLOC with a mask as a second non-keyword argument.
Accept F95-style reference to PRODUCT and SUM with a mask as a second non-keyword argument.
Don't give inappropriate alignment warnings for REAL*16 variables in COMMON.
Don't give error message for empty FORALL statement body.
Allow FORALL to be nested 7 deep (previous limit was 3).
Correctly parse certain complex instances of named FORALL.
Allow RESULT of ENTRY to have same name as host FUNCTION.
Demote diagnostic for not using all active combinations of FORALL index names from error to warning.
Eliminate inappropriate error for certain uses of intrinsic functions in a specification expression.
Eliminate internal compiler error for a peculiar (and erroneous) case of a USE of a NAMELIST whose group contains a variable inherited from another module but which isn't visible due to an ONLY list.
Make OPTIONS /EXTEND_SOURCE persistent across an INCLUDE.
Add support for defined assignment statement from within a WHERE statement.
Allow a function result length to be computed using a field of an array element, where the array is a derived type passed as a dummy argument.
Fix problem with functions returning complex/doublecomplex.

From version FT1 T5.2-682-4289P to FT2 T5.2-695-428AU, the following corrections have been made:

Allow an ALLOCATABLE variable to be PRIVATE in a parallel scope.
Support ISHC for INTEGER*8.
Correct problem with overlapping CHARACTER assignment in FORALL.
Correct debug information for CHARACTER POINTERs.
Correct problems with ISHFTC which can cause alignment errors.
Correct problem with FORALL and WHERE with non-default integer size.
Don't issue spurious UNUSED warning for argument whose interface comes from a MODULE.
Fix internal compiler error for invalid IMPLICIT syntax.
Eliminate inappropriate type mismatch error for certain cases of references to a generic procedure with a procedure argument.
Allow use of . field separator in addition to % in ALLOCATE/DEALLOCATE.
Give warning of unused variable in module procedure when appropriate.
Do not allow a non-integer/logical expression in a logical IF.
Fix another case of recognizing a RECORD field that has the same name as a relational operator.
Correct compiler failure for CMPLX(R8,R8) when real_size=64 is in effect.
Allow gaps in keyword names in MAX/MIN, for example MAX(A1=x,A4=y).
Correct compiler failure when a COMPLEX array is initialized with a REAL array constructor.
Correct compiler failure when the CHAR intrinsic is used in an initialization expression.
Correct compiler failure ("possible out of order or missing USE") in certain uses of nested MODULEs and ONLY.
Show correct source pointer for syntax error in declaration.

From version FT2 T5.2-695-428AU to V5.2-705-428BH, the following corrections have been made:

The compiler now accepts a new DEFAULT keyword on the !DEC$ ATTRIBUTES directive. This tells the compiler to ignore any compiler options that change external routine or COMMON block naming or argument passing conventions, and uses just the other attributes specified (if any). The options which this affects are -names and -assume underscore.
Avoid giving a spurious "Inconsistent THREADPRIVATE declaration of common block" error if one COMMON block has a name which is an initial substring of another and one of them is named in a THREADPRIVATE directive.
Prevent FUSE XREF from dying when !DEC$ ATTRIBUTES is used.
Add support for -source_listing option. The listing file has the extension .lis.
The f66 option now establishes OPEN defaults of STATUS='NEW' and BLANK='ZERO'.
Correct compiler failure with RESHAPE and SHAPE used in an initialization expression.
Eliminate spurious error when a defined operator is used in a specification expression
Correct compiler failure when undefined user-defined operator is seen.
Eliminate spurious error when component of derived type named constant is used in a context where a constant is required.
Correct problem with host association and contained procedure.
Correct compiler failure with WHERE when non-default integer_size is in effect.

1.10 High Performance Fortran (HPF) Support in Version 5.2

Compaq Fortran (DIGITAL Fortran 90) Version 5.2 supports the entire High Performance Fortran (HPF) Version 2.0 specification with the following exceptions:

Nested FORALL statements
WHERE statements within FORALL statements
Passing CYCLIC(N) arguments to EXTRINSIC (HPF_LOCAL) routines. See Section 1.10.5.3.
Accessing non-local data (other than arguments) within PURE functions in FORALL statements
SORT_UP library procedure
SORT_DOWN library procedure

In addition, the compiler supports many HPF Version 2.0 approved extensions including:

Extrinsic (HPF_LOCAL) routines
Extrinsic (HPF_SERIAL) routines
Mapping of derived type components
Pointers to mapped objects
Shadow-width declarations
All HPF_LOCAL_LIBRARY routines (except LOCAL_BLKCNT, LOCAL_LINDEX, and LOCAL_UINDEX). Other exceptions are the approved extensions to HPF_LOCAL_LIBRARY routines.
ON directive within INDEPENDENT loops
RESIDENT directive used with INDEPENDENT loops

1.10.1 Optimization

This section contains release notes relevant to increasing code performance. You should also refer to Chapter 7 of the DIGITAL High Performance Fortran 90 HPF and PSE Manual for more detail.

1.10.1.1 The -fast Compile-Time Option

To get optimal performance from the compiler, use the -fast option if possible.

Use of the -fast option is not permitted in certain cases, such as programs with zero-sized data objects or with very small nearest-neighbor arrays.

For More Information:

On the cases where use of -fast is not permitted, see the "Optimizing" and "Compiling" chapters of the DIGITAL High Performance Fortran 90 HPF and PSE Manual.

1.10.1.2 Non-Parallel Execution of Code

The following constructs are not handled in parallel:

Reductions with non-constant DIM argument.
CSHIFT, EOSHIFT and SPREAD with non-constant DIM argument.
Some array-constructors
PACK, UNPACK, RESHAPE
xxx_PREFIX, xxx_SUFFIX, GRADE_UP, GRADE_DOWN
In the current implementation of Compaq Fortran 95/90, all I/O operations are serialized through a single processor; see Chapter 7 of the DIGITAL High Performance Fortran 90 HPF and PSE Manual for more details
Date and time intrinsics, including DATE_AND_TIME, SYSTEM_CLOCK, DATE, IDATE, TIME, and SECNDS

If an expression contains a non-parallel construct, the entire statement containing the expression is executed in a nonparallel fashion. The use of such constructs can cause degradation of performance. Compaq recommends avoiding the use of constructs to which the above conditions apply in the computationally intensive kernel of a routine or program.

1.10.1.3 INDEPENDENT DO Loops Currently Parallelized

Not all INDEPENDENT DO loops are currently parallelized. It is important to use the -show hpf or -show hpf_indep compile-time option, which will give a message whenever a loop marked INDEPENDENT is not parallelized.

Currently, a nest of INDEPENDENT DO loops is parallelized whenever the following conditions are met:

When INDEPENDENT DO loops are nested, the NEW keyword must be used to assert that all loop variables (except the outer loop variable) are NEW. It is recommended that the outer DO loop variable be in the NEW list, as well.
The loop does not contain any of the constructs listed in Section 1.10.1.2 that cause non-parallel execution.
Each subscript of each array reference must either
- contain no references to INDEPENDENT DO loop variables, or
- contain one reference to an INDEPENDENT DO loop variable and the subscript expression is an affine function of that DO loop variable.
At least one array reference must reference all the independent loops in a nest of independent loops.
The compiler must be able to prove that loop nest either
- requires no inter-processor communication, or
- can be made to require no inter-processor communication with compiler-generated copyin/copyout code around the loop nest.
Any reductions in an interior (i.e. any but the outer) loop may use an INDEPENDENT DO index as a subscript only if that index represents a serially distributed dimension of the array. An exception to this is the index of the outermost DO loop, which may be used as a subscript even if it represents a non-serially distributed array dimension.
There must not be any assignments to scalars, except for NEW or reduction variables.
Any procedure call inside an INDEPENDENT DO loop must either be PURE, or be encapsulated in an ON HOME RESIDENT region (see Section 1.10.5.6).

When the entire loop nest is encapsulated in an ON HOME RESIDENT region, then only the first two restrictions apply.

For More Information:

On enclosing INDEPENDENT DO loops in an ON HOME RESIDENT region, see Section 1.10.5.6

1.10.1.4 Nearest-Neighbor Optimization

The following is a list of conditions that must be satisfied in an array assignment, FORALL statement, or INDEPENDENT DO loop in order to take advantage of the nearest-neighbor optimization:

Relevant arrays with the POINTER or TARGET attributes must have shadow edges explicitly declared with the SHADOW directive.
The arrays involved in the nearest-neighbor style assignment statements should not be module variables or variables assigned by USE association. However, if both the actual and all associated dummies are assigned a shadow-edge width with the SHADOW directive, this restriction is lifted.
A value must be specified for the -wsf option on the command line.
Some interprocessor communication must be necessary in the statement.
Corresponding dimensions of an array must be distributed in the same way (though they can be offset using an ALIGN directive). If the -nearest_neighbor flag's optional nn field is used to specify a maximum shadow-edge width, only constructs with a subscript difference (adjusted for any ALIGN offset) less than or equal to the value specified by nn will be recognized as nearest neighbor. For example, the assignment statement ( FORALL (i=1:n) A(i) = B(i-3) ) has a subscript difference of 3 . In a program compiled with the flag -nearest_neighbor 2 , this assignment statement would not be eligible for the nearest neighbor optimization.
The left-hand side array must be distributed BLOCK in at least one dimension.
The arrays must not have complicated subscripts (no vector-valued subscripts, and any subscripts containing a FORALL index must be affine functions of one FORALL index; further, that FORALL index must not be repeated in any other subscript of a particular array reference).
Statements with scalar subscripts are eligible only if that array dimension is (effectively) mapped serially.
Subscript triplet strides must be known at compile time and be greater than 0.
The arrays must be distributed BLOCK or serial (*) in each dimension.

Compile with the -show hpf or -show hpf_nearest switch to see which lines are treated as nearest-neighbor.

Nearest-neighbor communications are not profiled by the pprof profiler. See the section about the pprof Profile Analysis Tool in the Parallel Software Environment (PSE) Version 1.6 release notes.

For More Information:

On profiling nearest-neighbor computations, see the section about the pprof Profile Analysis Tool in the Parallel Software Environment (PSE) Version 1.6 release notes.
On using EOSHIFT for nearest-neighbor computations, see Section 1.10.1.6

1.10.1.5 Widths Given with the SHADOW Directive Agree with Automatically Generated Widths

When compiler-determined shadow widths don't agree with the widths given with the SHADOW directive, less efficient code will usually be generated.

To avoid this problem, create a version of your program without the SHADOW directive, and compile with the -show hpf or -show hpf_near option. The compiler will generate messages that include the sizes of the compiler-determined shadow widths. Make sure that any widths you specify with the SHADOW directive match the compiler-generated widths.

1.10.1.6 Using EOSHIFT Intrinsic for Nearest Neighbor Calculations

In the current compiler version, the compiler does not always recognize nearest-neighbor calculations coded using EOSHIFT. Also, EOSHIFT is sometimes converted into a series of statements, only some of which may be eligible for the nearest neighbor optimization.

To avoid these problems, Compaq recommends using CSHIFT or FORALL instead of EOSHIFT if these alternatives meet the needs of your program.

1.10.2 New Features

This section describes the new HPF features in this release of Compaq Fortran.

1.10.2.1 RANDOM_NUMBER Executes in Parallel

The RANDOM_NUMBER intrinsic subroutine now executes in parallel for mapped data. The result is a significant decrease in execution time.

1.10.2.2 Improved Performance of TRANSPOSE Intrinsic

The TRANSPOSE intrinsic will execute faster for most arrays that are mapped either * or BLOCK in all dimensions.

1.10.2.3 Improved Performance of DO Loops Marked as INDEPENDENT

Certain induction variables are now recognized as affine functions of the INDEPENDENT DO loop indices, thus meeting the requirements listed in Section 1.10.1.3. Now, the compiler can parallelize array references containing such variables as subscripts. An example is next.

! Compiler now recognizes a loop as INDEPENDENT because it ! knows that variable k1 is k+1. PROGRAM gauss INTEGER, PARAMETER :: n = 1024 REAL, DIMENSION (n,n) :: A !HPF$ DISTRIBUTE A(*,CYCLIC) DO k = 1, n-1 k1 = k+1 !HPF$ INDEPENDENT, NEW(i) DO j = k1, n DO i = k1, n A(i,j) = A(i,j) - A(i,k) * A(k,j) ENDDO ENDDO ENDDO END PROGRAM gauss

1.10.3 Corrections

This section lists problems in previous versions that have been fixed in this version.

In programs compiled with the -wsf option, pointer assignments inside a FORALL did not work reliably. In many cases, incorrect program results occurred.
The ASSOCIATED intrinisc sometimes returned incorrect results in programs compiled with the -wsf compile-time option.
GRADE_UP and GRADE_DOWN were not stable sorts.

1.10.4 Known Problems

1.10.4.1 "Variable used before its value has been defined" Warning

The compiler may inappropriately issue a "Variable is used before its value has been defined" warning. If the variable named in the warning does not appear in your program (e.g. var$0354), you should ignore the warning.

1.10.4.2 Mask Expressions Referencing Multiple FORALL Indices

FORALL statements containing mask expressions referencing more than seven FORALL indices do not work properly.

1.10.5 Unsupported Features

This section lists unsupported features in this release of Compaq Fortran.

1.10.5.1 SMP Decomposition (OpenMP) not Currently Compatible with HPF

Manual decomposition directives for SMP (such as the OpenMP directives enabled with the -omp option, or the directives enabled with the -mp option) are not currently compatible with the -wsf option.

1.10.5.2 Command Line Options not Compatible with the -wsf Option

The following command line options may not be used with the -wsf option:

The -feedback and -cord options are not compatible, since they require the use of -p, which is not compatible with -wsf .
-double_size 128
-gen_feedback
-p, -p1, -pg (use -pprof instead)
-fpe1, -fpe2, -fpe3, -fpe4
-om
-mp
-omp

1.10.5.3 HPF_LOCAL Routines

Arguments passed to HPF_LOCAL procedures cannot be distributed CYCLIC(n). Furthermore, they can have neither the inherit attribute nor a transcriptive distribution.

Also, the following procedures in the HPF Local Routine Library are not supported in the current release:

ACTIVE_NUM_PROCS
ACTIVE_PROCS_SHAPE
HPF_MAP_ARRAY
HPF_NUMBER_MAPPED
LOCAL_BLKCNT
LOCAL_LINDEX
LOCAL_UINDEX

1.10.5.4 SORT_UP and SORT_DOWN Functions

The SORT_UP and SORT_DOWN HPF library procedures are not supported. Instead, use GRADE_UP and GRADE_DOWN, respectively.

1.10.5.5 Restricted Definition of PURE

In addition to the restrictions on PURE functions listed in the Fortran 95 language standard and in the High Performance Fortran Language Specification, Compaq Fortran adds the additional restriction that PURE functions must be resident. "Resident" means that the function can execute on each processor without reading or writing any data that is not local to that processor.

Non-resident PURE functions are not handled. They will probably cause failure of the executable at run-time if used in FORALLs or in INDEPENDENT DO loops.

1.10.5.6 Restrictions on Procedure Calls in INDEPENDENT DO and FORALL

In order to execute in parallel, procedure calls from FORALL and DO INDEPENDENT constructs must be resident. "Resident" means that the function can execute on each processor without reading or writing any data that is not local to that processor. The compiler requires an explicit assertion that all procedure calls are resident. You can make this assertion in one of two ways:

by labeling every procedure called by the FORALL or INDEPENDENT DO loop as PURE
by encapsulating the entire body of the loop in an ON HOME RESIDENT region.

Because of the restricted definition of PURE in Compaq Fortran (see Section 1.10.5.5), the compiler interprets PURE as an assertion by the program that a procedure is resident.

Unlike procedures called from inside FORALLs, procedures called from inside INDEPENDENT DO loops are not required to be PURE. To assert to the compiler that any non-PURE procedures called from the loop are resident, you can encapsulate the entire body of the loop in an ON HOME RESIDENT region.

If you incorrectly assert that a procedure is resident (using either PURE or ON HOME RESIDENT), the program will either fail at run time, or produce incorrect program results.

Here is an example of an INDEPENDENT DO loop containing an ON HOME RESIDENT directive and a procedure call:

!HPF$ INDEPENDENT DO i = 1, 10 !HPF$ ON HOME (B(i)), RESIDENT BEGIN A(i) = addone(B(i)) !HPF$ END ON END DO . . . CONTAINS FUNCTION addone(x) INTEGER, INTENT(IN) :: x INTEGER addone addone = x + 1 END FUNCTION addone

The ON HOME RESIDENT region does not impose any syntactic restrictions. It is merely an assertion that inter-processor communication will not actually be required at run time.

For More Information:

On the requirements for parallel execution of INDEPENDENT DO loops, see Section 1.10.1.3

1.10.5.7 Restrictions on Routines Compiled with -nowsf_main

The following are restrictions on dummy arguments to routines compiled with the -nowsf_main compile-time option:

The dummy must not be assumed-size
The dummy must not be of type CHARACTER*(*)
The dummy must not have the POINTER attribute
%LOC must not be applied to distributed arguments

Failure to adhere to these restrictions may result in program failure, or incorrect program results.

1.10.5.8 RAN and SECNDS Are Not PURE

The intrinsic functions RAN and SECNDS are serialized (not executed in parallel). As a result, they are not PURE functions, and cannot be used within a FORALL construct or statement.

1.10.5.9 Nonadvancing I/O on stdin and stdout

Nonadvancing I/O does not work correctly on stdin and stdout . For example, this program is supposed to print the prompt ending with the colon and keep the cursor on that line. Unfortunately, the prompt does not appear until after the input is entered.

PROGRAM SIMPLE INTEGER STOCKPRICE WRITE (6,'(A)',ADVANCE='NO') 'Stock price1 : ' READ (5, *) STOCKPRICE WRITE (6,200) 'The number you entered was ', STOCKPRICE 200 FORMAT(A,I) END PROGRAM SIMPLE

The work-around for this bug is to insert a CLOSE statement after the WRITE to stdout . This effectively flushes the buffer.

PROGRAM SIMPLE INTEGER STOCKPRICE WRITE (6,'(A)',ADVANCE='NO') 'Stock price1 : ' CLOSE (6) ! Add close to get around bug READ (5, *) STOCKPRICE WRITE (6,200) 'The number you entered was ', STOCKPRICE 200 FORMAT(A,I) END PROGRAM SIMPLE

1.10.5.10 WHERE and Nested FORALL

The following statements are not currently supported:

WHERE statements inside FORALLs
FORALLs inside WHEREs
Nested FORALL statements

When nested DO loops are converted into FORALLs, nesting is ordinarily not necessary. For example,

DO x=1, 6 DO y=1, 6 A(x, y) = B(x) + C(y) END DO END DO

can be converted into

FORALL (x=1:6, y=1:6) A(x, y) = B(x) + C(y)

In this example, both indices (x and y) can be defined in a single FORALL statement that produces the same result as the nested DO loops.

In general, nested FORALLs are required only when the outer index is used in the definition of the inner index. For example, consider the following DO loop nest, which adds 3 to the elements in the upper triangle of a 6 x 6 array:

DO x=1, 6 DO y=x, 6 A(x, y) = A(x, y) + 3 END DO END DO

In Fortran 95/90, this DO loop nest can be replaced with the following nest of FORALL structures:

FORALL (x=1:6) FORALL (y=x:6) A(x, y) = A(x, y) + 3 END FORALL END FORALL

However, nested FORALL is not currently supported in parallel (i.e. with the -wsf option).

A work-around is to use the INDEPENDENT directive:

integer, parameter :: n=6 integer, dimension (n,n) :: A !hpf$ distribute A(block,block) A = 8 !hpf$ independent, new(i) do j=1,n !hpf$ independent do i=j,n A(i,j) = A(i,j) + 3 end do end do print "(6i3)", A end

All three of these code fragments would convert a matrix like this:
<left[ symbol><matrix symbol> 8&8&8&8&8&8<cr symbol> 8&8&8&8&8&8<cr symbol> 8&8&8&8&8&8<cr symbol> 8&8&8&8&8&8<cr symbol> 8&8&8&8&8&8<cr symbol> 8&8&8&8&8&8<cr symbol> <right] symbol>

into this matrix:
<left[ symbol><matrix symbol> 11&11&11&11&11&11<cr symbol> 8&11&11&11&11&11<cr symbol> 8&8&11&11&11&11<cr symbol> 8&8&8&11&11&11<cr symbol> 8&8&8&8&11&11<cr symbol> 8&8&8&8&8&11<cr symbol> <right] symbol>

Contents

HP FortranRelease Notes for Tru64 UNIX Systems

1.8.3 Version 5.3 ECO 01 HPF New Features

1.8.5 Version 5.3 Important Information

1.8.6 Version 5.3 Corrections

1.9 New Features, Corrections, and Known Problems in Version 5.2

1.9.1 Version 5.2 ECO 01 New Features

1.9.2 Version 5.2 New Features

1.9.3 Version 5.2 Important Information

1.9.4 Version 5.2 Corrections

1.10 High Performance Fortran (HPF) Support in Version 5.2

1.10.1 Optimization

1.10.1.2 Non-Parallel Execution of Code

1.10.1.4 Nearest-Neighbor Optimization

1.10.1.5 Widths Given with the SHADOW Directive Agree with Automatically Generated Widths

1.10.4 Known Problems

1.10.4.1 "Variable used before its value has been defined" Warning

1.10.5.3 HPF_LOCAL Routines

1.10.5.4 SORT_UP and SORT_DOWN Functions

1.10.5.7 Restrictions on Routines Compiled with -nowsf_main

HP Fortran
Release Notes for Tru64 UNIX Systems