3. Preview and Critical Facts

This section gives you a taste of how ADB works, and lists the most important facts that typical programmers need to know about its behavior.

Feel free to mail your comments to me at: rog@NOSPAM_rs-freeware.org.

Return to the global table of contents
Frames mode, or No frames


1:  Introduction

2:  Putting it all together (part 1 of 4)

3:  Putting it all together (part 2 of 4)

4:  Putting it all together (part 3 of 4)

5:  Putting it all together (part 4 of 4)


1:  Introduction

Aside from the data files themselves, ADB's behavior is determined from three sources:

1. The project file suffix
2. The definitions file(s)
3. The command line

Keep in mind that everything about ADB is *case INsensitive*, except for the field names in Awk (Awk's case-sensitive by definition) and the DOS environmental variable names (case-sensitive in DOS 6.0 and above).

Return to local table of contents
Return to global table of contents
Frames mode, or No frames

2:  Putting it all together (part 1 of 4)

Here's a short example that will demonstrate how all the notions in this section work together.

It will also give you an excellent idea of how simple it is to use ADB by comparison to SQL or a compiled excutable.  I'll be building on this example after I describe ADB in detail.

(Think of this as a "preview.")

Let's assume that the project file suffix is "xx".

We have two input files in this example: Cust.xx and Invoice.xx.

The first is our customer master, and the second is for invoices.

Here's the global defintions file, which will define all the fields and the layouts.

Note that both the pound sign and the exclamation point are for "line comments."  (You can also use a slash.)

I've simply chosen to use them in a stylistic manner.

Similarly, I've decided to use semicolons to separate fields in a file layout, and commas to do so for sort keys.

However, this is also a stylistic decision that I've made. Spaces and/or tabs may also be used.

To distinguish reserved words from values, I've capitalized the reserved words.


# Customer Master fields: customer name, balance and #
FLD Cust_No_           9    ! Integer (8/9 digits)
FLD Cust_Name         25    ! A regular string field
FLD Cust_Balance__    12 2  ! Fixed decimal: -00000000.00

# Layout of the customer master file, and the sort key
FSET Cust_Mast  Cust_No_; Cust_Name; Cust_Balance__ END
FSET Cust_Srt   Cust_No_                            END

# Invoice fields: invoice #, date, amount and cust#
FLD Inv_No            10    ! A regular string field
FLD Inv_Date___       10    ! Date field from the DMBS
FLD Inv_Amnt__        12 2  ! Fixed decimal: -00000000.00
FLD Inv_Cust_No_       9    ! Integer (8/9 digits)

# Layout of the invoice sort key, and invoice file:
FSET Inv_Srt Inv_Cust_No, Inv_No          END
FSET Invoice @Inv_Srt; Inv_Amt; Inv_Date; END

Note that @-sign preceeding "Inv_Srt" in the "Invoice" FSET.

What this means is that the fields from the field set "Inv_Srt" have been included in the field set "Invoice". I.e. the field set "Invoice" contains the fields: Inv_Cust_No, Inv_No, Inv_Amt, and Inv_Date (in that order).

Because you can use the "@" prefix on a field set name to incorporate an entire group of fields, you'll have very little trouble maintaining consistent file layouts (when field lengths or types change).  If you're a C/C++ programmer, you can think of "@" as a kind of "#include" (except one that's useful for fields).

Why do some fields have underscores after their names?  These actually determine the field type.  For now, don't worry about their significance: just use the remarks on the right as your guide.

Now if you're used to programming in any language, you'll recognize FSET as a kind of record layout.  It's not a file name in the sense that it's associated with a specific physical file.

Instead, FSET defines structure.

But because ADB always sorts in an increasing and a case-INsensitive manner, an FSET can also be used to declare a sort sequence.

So I've named the sort FSET in each pair with the suffix "_Srt" and used commas instead of semicolons to delimit the fields (in order to remind the reader that I do intend to use the second FSET in each pair in a different way.)

Again, that's a purely "stylistic" decision on my part.  ADB won't know the difference between an FSET used as a file record declaration (AKA "layout" or "struct") and one used as a sort key declaration--not until I actually put those FSETs to work.

And in fact there's no reason why I can't sort a file on every field within it (provided it has no more than 15 fields).

Let me remind you again that everything above would probably go in your global definitions file.

Why?  Because chances are that the customer master and the invoice file are both used for many other jobs (processes).

Return to local table of contents
Return to global table of contents
Frames mode, or No frames

3:  Putting it all together (part 2 of 4)

Now let's take a look at the step definitions file.

What we're going to be doing here is to compute new customer balances (i.e. we'll be "applying" the invoice amounts to the customer balances).

At this point, I can't show you exactly how that's done without describing the Awk interface, but at least the definitions for the "join" of the customer and invoice files should be clear:


FSET IC_Join Cust_No_; Cust_Name; Cust_Balance__; Inv_Amnt__; END

This is the list of customer master fields, _plus_ the invoice amount.

To specify that the customer master is the "master" file (no duplicates allowed), we'll code this definition:


MAST Cust_Mast Cust_Srt SORT CQ

That says that the "master" file (referenced by the "-MAST" argument on the command line) has a structure which is determined by the fieldset (FSET) called "Cust_Mast" and is to be sorted in a sequence determined by the FSET called "Cust_Srt".

The CQ at the end tells ADB that the data will be arriving in "commas-n-'quotes" format, and must therefore be converted to fixed-length format, prior to processing.

To specify that the invoices are the "transaction" file (duplicates are allowed, but the sort key must be at least as long as the master file's sort key), we specify:


TRANS Invoice Inv_Srt SORT CQ

This says that the incoming "transaction" file for this step will have the format (structure) referenced by the fieldset named "Invoice," that it will be sorted according to the fieldset named "Inv_Srt" (customer number, then invoice number), and that it will arrive in commas-n'-quotes format.

Note that our other input file (the customer master), is keyed by customer number only.

ADB will deem the two to "match" on the shortest of the two keys (although it will sort each file by their respective keys).

This gives you the flexibility to associate a group of "transaction" records with a single "master" record--in fact that's precisely what we're doing here.

There are other reasons for this flexibility as well.  I'll cover this later in the section on removing duplicates within a given file: ADB lets you remove either all-but-the-first or all-but-the-last such duplicate with ease.

To specify that the output file will consist of matching records, we code:


M&T IC_Join AWK END

In this case, the "IC_Join" specifies the record layout.  I.e. it's the name of an FSET.

As you can see, fields are drawn from both files (look at the definition of the FSET called IC_Join, above).  The "M&T" is suggestive of "and," AKA "intersection," AKA "join."

The "Awk" tells ADB that we have our own Awk script.  This script will make sure to output only one record for each customer master (which will have the updated Cust_Balance in it for all invoices of this customer).

What about invoices that have "bad" customer numbers?

We'll put this additional statement in the step definitions file:


T-M Invoice END

That says that unmatched "transactions" have a file layout determined by the FSET called "Invoice".  The minus sign between T and M is suggestive of "set difference".

We're going to use the special utility FILESTAT to make sure that the "T-M" file is empty (of length 0).

Why empty?  Because any records in it will be invoices that don't have a corresponding customer number.  Something's probably wrong with the data in this application if any such invoices exist.  (Perhaps someone mistakenly deleted that customer?)

Return to local table of contents
Return to global table of contents
Frames mode, or No frames

4:  Putting it all together (part 3 of 4)

If you start with the global definitions file as given above, it takes just a few lines of a DOS batch file to do a lot of work.

(If you don't know much about DOS batch files, there's a tutorial in this documentation.)

Suffice to say that Echo is a way of writing a line to a file: the single greater-than means create a new file; and the double-greater-than means append to end-of-file.

Set is how you define a variable value, and variable values are referenced by enclosing them in percent signs.  String concatenation of variable values and constants occurs when you physically run them together.  The "If ErrorLevel . . . " construct does things based on the return code of the previous program.

Labels begin with colons, and obviously the GOTO statement branches to a lable.  The @-sign before a statement prevents it from showing at the DOS command line.  REM is for remarks.


@Rem Define project file suffix
@Rem
@Set ADBPROJ=xx
@Rem
@Rem Define inputs and outputs
@Rem
@Echo FSET  IC_Join Cust_No_; Cust_Name;         >StepDefs.%ADBPROJ%
@Echo         Cust_Balance__; Inv_Amnt__  END   >>StepDefs.%ADBPROJ%
@Echo MAST  Cust_Mast Cust_Srt Sort CQ          >>StepDefs.%ADBPROJ%
@Echo TRANS Invoice Inv_Srt Sort CQ             >>StepDefs.%ADBPROJ%
@Echo M&T   IC_Join Awk                   END   >>StepDefs.%ADBPROJ%
@Echo T-M   Invoice                       END   >>StepDefs.%ADBPROJ%
@Rem
@Rem Run ADB, which returns 1 on failure
@Rem
@ADB -MASTcust -TRANSinvoice -M&Tnewcust -T-Mbadinv -STEPStepDefs
@If ErrorLevel 1 goto FAIL
@Rem
@Rem Any bad invoices (illegal customer numbers)?
@Rem
@FileStat /BadInv.%ADBPROJ% >>ADBLOG.%ADBPROJ% <NUL
@If ErrorLevel 1 goto FAIL
@Rem
@Rem Tell user that job went okay
@Rem
@Echo This job went okay!!
@Goto DONE
@Rem
@Rem Label FAIL: zip up the bad stuff and have it e-mailed
@Rem
:FAIL
@PkZip BadStuff.Zip *.%ADBPROJ%*
@ErrMsg Please e-mail BadStuff.Zip to your programmer!!!
@Rem
:DONE

Now there are 35 lines of code here.

But let's start with the fact that half are remarks  :-)

Of the remaining 17, 6 are there just to tell the user to zip up the file (etc.) if there were problems.

(The utility ErrMsg will print its arguments and force the user to press the space bar to acknowledge them.  It makes your batch files shorter, and I describe it and all of ADB's suite of utilities in the utilities section.)

It took 6 lines to define the input and output files (it would've taken 5 without breaking the first FSET into two lines), 2 to run ADB (and check for errors), and 2 to make sure there were no unmatched invoices.  Plus there was 1 line to set the project file suffix.

11 lines isn't too bad to get these two files properly sorted and matched (along with making sure that there are no bad customer numbers in the invoice files).

Note how the command line used the -MAST argument to specify that the "master" file was Cust.xx, and the -TRANS argument to name the "transaction" file (Invoice.xx).

Similarly for the step definitions file (StepDefs.xx), the unmatched invoice file (BadInv.xx), and the matched file (NewCust.xx).

Return to local table of contents
Return to global table of contents
Frames mode, or No frames

5:  Putting it all together (part 4 of 4)

Of course this isn't a "finished" task here.  We still have to write the Awk script that will actually increment the customer balances with the invoice amounts (and write out just one record per customer).  And eventually those new customer master records are going to have to go back into the database.

But if you think about the amount of code you'd have to write to handle this all in a compiled environment, you'll probably suspect that ADB can save you a substantial amount of time.

Of course some DBMSs will handle the matching and sorting process for you with ease.  But they'll fall down when you decide that you need to do something tricky with the "joined" records, such as selectively accumulate totals.

(Or: they have any of the following defects: (1) they're expensive and time-consuming to install on multiple machines; (2) they're likely to break when you upgrade your OS or network software or encounter registry corruption errors; (3) they're slow for large volumes of data; (4) they don't let you integrate your own executable programs easily into a jobstream and/or don't support any form of "batch" processing); or (5) they print inscrutable error messages that force you to retrace all your steps from "scratch" and/or wait for hours in telephone hold queues.)

That's why ADB offers GNU Awk as an "inner" language for processing input files (before matching) or output files (after matching).

Awk is a very simple C-like programming language that's not too different from JavaScript.  It gives you much of a compiled language's power in script form.

In fact, when we get to the point of fleshing out this example in its full glory, we'll discover that there's an easy way to write several output files.

In the example, we're not only going to compute customer balances, but we'll use an expanded customer master that includes a separate customer balance that's over 1 month old.  We'll import as well as export global data, and output several special-purpose files in addition to the revised customer master.

All of this will be done with a surprisingly small amount of Awk code, because the "heavy lifting" of sorting, matching, and moving fields around has already been done by ADB.  Yet the flexibility of Awk rivals that of a fully-featured compiled language in many ways.

Return to local table of contents
Return to global table of contents
Frames mode, or No frames

Next documentation section