6. Ouput Files & Postprocessing

This section covers the syntax for the statements that define output.

It also explains a great deal about ADB's "postprocessing" cycle, which can consist of:

  • Sorting and/or Duplicate Removal

  • Applying an Awk Script

  • Conversion to commas-n'-quotes

Feel free to mail your comments to me at: rog@NOSPAM_rs-freeware.org.

Return to the global table of contents
Frames mode, or No Frames


1:  Basic Output File Syntax

2:  Output File Postprocessing Sequence

3:  Output files with Awk and CQ (& multiple output files)

4:  Complete Output File Syntax

5:  Checking file definitions with the "-P" parm


1:  Basic Output File Syntax

In the simplist case, you code an output file statment as follows:

M&T  fieldset  { sort_fieldset  {NODUPS} }  {AWK}  {CQ}  END

The "M&T" here is for records that are in both the mast and trans input file (i.e. they're equal based on the key: note that if the mast key is shorter than the trans key, then only the shorter length will be used for the case-INsensitive key comparison).

Instead of "M&T", you can also code "M-T" for master records that have no corresponding trans record.  This works even if there is no trans file defined on the command line (with the "TRANS" parm.) "T-M" works exactly the same way, but for transactions that aren't matched by masters.

The first fieldset specifies the file layout for the output file, and is obviously mandatory.

The curly braces around the sort fieldset (and the optional "NODUPS"), specify that this sequence is optional.  Obviously you can't code "NODUPS" without a sort fieldset.

"AWK" and "CQ" mean the same things as they do for input files, but the postprocessing sequence (described next) is essentially the reverse of the preprocessing sequence.


As with input files the command line parms correspond precisely to the statements.  To specify the output file name for the (e.g.) records resulting from each matched pair of master and trans records, you code "-M&Tfilename" on the command line.  To specify the output file name for unmatched masters, you code "-M-Tfilename" on the command line.  And for unmatched trans, the "-T-M" command-line parm specifies these.

As with input files, the extension (the part after the dot) must be equal to your project file suffix.  You may specify it if you wish, but ADB will add it for you if it's not there.


I personally recommend that you never code the extension (the dot and what's after it) . . . for the simple reason that this makes it harder to "clone" DOS batch files that are intended for one project.

Return to local table of contents
Return to global table of contents
Frames mode, or No frames

2:  Output File Postprocessing Sequence

Output files are initially built from matching the two input files.

If you've specified a sorting fieldset, then the results are then sorted (case-INsensitively), according to that fieldset.  If you've asked for "NODUPS", then all-but-the-first duplicate in the file will be removed.

After any sorting step, your AWK script will be run (if you've coded "AWK").  As with input files, the name of the AWK script is the same as the output file name (everything before the period), followed by the project file suffix, and an "A".

For example, if you code -M-Twhatever on the command line, and your project file suffix is "xx", then the output file name corresponding to unmatched masters will be "whatever.xx", and the Awk script that will be used to postprocess this file will be called "whatever.xxA" (where the "A" is a constant).

If you've specified "CQ", that will be the last step in output file postprocessing.


It's interesting to note that output file postprocessing is almost the reverse of input file preprocessing:


Input files:                      Output files:
------------                      -------------
Commas-n-quotes conversion        Sorting &/or dup removal
Basic preprocessing               Awk Script
Sorting &/or dup. removal         Commas-n-quotes conversion
Awk script

Return to local table of contents
Return to global table of contents
Frames mode, or No frames

3:  Output files with Awk and CQ (& multiple output files)

If you've coded both AWK and CQ with output files, then you need to make sure that your AWK script doesn't disturb the file layout.

I'll discuss this in more detail in the Awk sections, but the bottom line is that the commas-n'-quotes conversion will rely upon the fact that your output file is in its specified fixed-length format.


However, if you aren't using commas-n'-quotes conversion, your Awk script can output any file format that it wishes.

This is precisely how you can generate multiple output files.

For example if you want to separate the output records into two different groups and write two different reports, you prefix each of the different output record types with a unique prefix.  E.g., group 1 records will have a prefix of "1", group 2 records of "2", and records corresponding to each report will be prefixed with a "3" and "4" (respectively).

You can then run the output files through the utility "FileMux.Exe," which will split them into separate files (and remove the file type prefix).

I'll be showing you precisely how to do this in the extended example.

Return to local table of contents
Return to global table of contents
Frames mode, or No frames

4:  Complete Output File Syntax

Normally, ADB will compare the input file(s)' fieldsets to the output file fieldset.

If a field name is in the output file fieldset and in exactly one of the input file(s)' fieldsets, ADB will automatically do the "obvious" (i.e. recognize that the field name in the output file fieldset corresponds to the field name in the input file fieldset).

What if you're matching a master and a trans record, and both contain a field that you wish to place in the output file?

Obviously, ADB has to have some way of knowing whether this field is to be taken from the master or the trans input.

Similarly, if you want to put a constant value in the output record or create an output record with empty fields in it, then you need to tell ADB what to put in those fields.


So there's an expanded syntax for output files:

M&T  fieldset  { sort_fieldset  {NODUPS} }  {AWK}  {CQ}
OutField  ( {M.|T.}InField | "value" )  {"padding char"}
. . .
. . .
. . .
END

The second line above is the fifth (and last) type of ADB statment that can appear in your definitions file.

"OutField" is the output file field name.  "InField" is the input file file name.  The "M." or "T." prefix on the input file field name is used to tell ADB whether the source field is drawn from the master or trans input.

Obviously you don't need the "M." or "T." prefix for "M-T" or "T-M" outputs (unmatched masters, or unmatched trans, respec.).

In fact, there's really no reason to ever specify a source field name for these two types of outputs, since by definition, all "source" fields (other than constants) come from the sole eligible input file.

However, for an "M&T" output, you have to have one of these lines for every field name that appears in all three files (i.e. the output as well as both inputs).  There's no "default" input source file for the output field, so you must preceed the input file field name with either a "M." or a "T.".  No space occurs after the period.


You may also specify a constant value.  This value will be truncated if it's too long for the output field.

WARNING:
ADB doesn't perform type checking on these values.  If you specify a non-integer value for an integer field (the field name ends a single underscore), then ADB will go merrily on its way.


So, to review, any of the following formats is valid:


OutField_Name M.InField_Name
OutField_Name M.InField_Name "Padding_Character"
OutField_Name T.InField_Name
OutField_Name T.InField_Name "Padding_Character"
OutField_Name "constant"
OutField_Name "constant"     "Padding_Character"
OutField_Name InField_Name
OutField_Name InField_Name   "Padding_Character"


You may specify as many different lines as you need to cover all of the output fields.

If the output field name is in only one of the input files, and the output field value is idential to the input file's field value, then you need not list it.

Output fields can be specified in any order, however, they must immediately follow the output file (M&T, M-T, or T-M) line, and must preceed any END.

You may put END on a line by itself after all output fields have been defined, or you can put it on the end of the last line that defines an output field.


What if the output field is shorter than the input field?

For right-justified fields (those with names ending in exactly one or two underscores), ADB will truncate ("shorten") these fields on the left.  Other fields get truncated on the right. ADB will also issue a warning message when scanning your definitions (i.e. before any records have been read). Note that warning messages result in a return code of 1.

Finally, you may specify a padding character.  It has to be enclosed in double quotes, and not "escape" sequences for unprintable values is currently supported (but if you can enter them in your line editor, and the character isn't a CR, LF, or control-Z, then it will probably work).

Note that all spaces are stripped from the right and left of all input file fields prior to moving them into the output file, even if you haven't requested any form of input file preprocessing. Unless you specify a padding character, the default of a space is always used.  Field whose names end in exactly one or two underscores will be right-justified, all others will be left-justified.


The "END" can go either on the last of your output field specification lines, or on a separate line by itself.

However, it must appear because this is ADB's only way of knowing that you've completed your field specifications).  Although I could've elimiminated this requirement and assumed that they end as soon as a reserved word is read as the first token of the next definitions line, I've opted for an approach that's more likely to detect unexpected definitions file truncation or other coding errors.

Return to local table of contents
Return to global table of contents
Frames mode, or No frames

5:  Checking file definitions with the "-P" parm

As with input files, ADB doesn't "care" if you specify a M&T, M-T, or T-M statement in the definitions file, but there's no corresponding command line parm.

If you want to make sure that your output file definitions are correct, you can code just the "-P" parm.  ADB will then "pretty print" the output file defintions.

However: if you try to specify M&T without a preceeding MAST and a preceeding TRANS statement, ADB will consider that to be an error (because it has no way of knowing how to determine the origin of the output files).  The same goes for specifying a M-T statement without a MAST statment, or T-M statement without a TRANS statement.

Return to local table of contents
Return to global table of contents
Frames mode, or No frames

Next documentation section