Project Cleansweep

From Pickwiki
Jump to navigationJump to search

Users >> Rex Gozar >> Project Cleansweep

Project Cleansweep

2008-03-24 by Rex Gozar


Brief

I firmly believe that shoddy programming practices have hurt the credibility of Pick applications. I discuss active steps to conform your legacy application to modern coding standards.

Note that this document is written with an IBM Universe slant, but these concepts are general enough to apply to any Pick-variant.

Summary

I've actually done this. It takes relatively little time compared to the hours wasted trying to decipher and modify bad code. In fact, your team has already wasted more time than it takes to do this; having not enough time or money is not an excuse. If you do a little bit each week, you'd be done in no time.

A former boss of mine used to say, "software changes should be evolutionary, not revolutionary". You'll start with seemingly trivial tasks like gettting all the programs formatted and spaced consistently. You'll commit your changes to source code control so you'll get used to using it. You'll then move on to more serious changes like deleting unused programs and fixing the ones that are left. During the process, you'll actually start to notice bugs that haven't been reported yet and quickly fix them. All the while, your source code is progressively moving closer and closer to your goal of clean code.


Prerequisites

You must have source code control.
\ First off, you really need to use source code control. Any kind will do. If you don't have it, get it. Currently I'm using CVS, which works just fine:

If I had to do it all over again, I'd use Subversion (for the atomic commit feature):

Making dated copies of your account or programs is not source code control. "Having a way" to get older copies of programs is not source code control.

You must be able to compile everything.
\ Second, you need to be able to compile EVERYTHING. I know some people are like, "oh my god, I can't compile EVERYTHING; the whole system will break!". I've heard it before, and frankly it's the most retarded thing I've ever heard a developer say. Basically it's an admission that they like living in chaos, and can't be bothered with the details of knowing what actually goes into their application.

Everything may not compile without errors, but that's not what we're after now. We need to know we have all the source code that's currently in production. When we compile it before and after cleanup, we should have fewer errors not more.

You must devise a coding standard
\ Some people think that a coding standard is just an excuse to play "code-Nazi". The reality is that they uncover bugs so you can squash them, among other wonderful things. You need to devise your own coding standard.


Formatting

Before you can enforce a clean coding standard, all your programs should be consistently formatted.

This should be the easiest step; just type:

>FORMAT BP *

The FORMAT command will take care of it. Or you could use a program like BFORMAT. Or you could use a prestored editor command to format everything.

Identify all files that contain programs and code fragments, including EQUATE'd attribute constants, and format them. If your program files contain "non-code" items, e.g. README files or other text, be careful NOT to format them; ideally, they shouldn't be kept in the same file with programs.

Commit all these changes to source code control.


Spacing

Formatting will cleanup most of the whitespace inconsistencies. You will also need to make the spacing between operands and subroutine arguments consistent using a utility like SPACEOUT. Don't decide to do it manually; it's too easy to screw up (e.g. adding extra whitespace within a quoted string) or forget a line. You want a utility that can be run repeatedly as people make changes.

Enforcing consistent formatting and spacing will help us later when we programmatically rename variables and attribute constants.

Commit all these changes to source code control.


Adding source code headers

Take the time now to get all source code items to have the same header (comment) format. If you are using source code control, add the appropriate "revision" keyword to the header so you can easily identify the source item's revision level.

You can add headers to just about all of your source items, not just programs. There are a few exceptions, like dictionary and customized VOC items.

Commit all these changes to source code control.


Removing unused source code items

So far you've only done some superficial changes to the appearance of the source. Now is the time to cut deeper into the code and trim off some of the excess. To do this you'll need to be able to "see" into all the source code items using a single utility.

I keep my source code in a directory structure named "src", so if I'm looking to see where BOBTEST3 is being used in my application, I can type:

find src -type f -exec grep -H BOBTEST3 {} \;

I can use the find and grep commands to see where BOBTEST3 is called from. (Windows users: I've installed cygwin on my windows machines. I use the same command scripts on unix, linux, and windows.)

If you don't have your source code in a convenient structure, you will have to develop your own utility to look for program usage within your source code "tree"; the only caveat is that it must look in ALL source code items. Using FIND or SEARCH is not acceptable since you can only search one file at a time (it's too easy to skip a file) and you don't see the context (did it find BOBTEST3 or BOBTEST33).

Unused source code should be deleted. If you really need it again, rely on source code control to have a copy. Don't clutter up your application with unused "backup" copies.

Every source code item should be used somewhere. If you don't reference that program / paragraph / dictionary item, why keep it? Again, you rely on source code control to have a copy if you need it.

Commit your changes to source code control. You will probably have to do this step in several iterations; as you remove an item, you may find other items become "unused".


Making sure ALL programs compile

Now's the time to eliminate compilation errors. Unassigned variables, incorrect statement syntax, every error and warning should be eliminated. This should be SOP (standard operating procedure) from now on. Compilation errors are programming defects that are guaranteed to generate support calls.

Program files should only contain programs; no source code fragments, no INCLUDE'd items, no README text, etc.. The goal is to have a program file where everything compiles AND everything can be cataloged. It's okay to have more than one program file, but the same "everything gets compiled and cataloged" rule applies.

If you have more than a dozen programs, you really can't just type BASIC BP * and expect to catch every error that scrolls up the screen. If you look away for even a second, you're likely to miss an error. Use a utility like BC or create your own to compile all programs in a file and log the errors separately. Use the resulting error log to correct the errors, then compile everything again; repeat until everything compiles cleanly.

Commit all your changes to source code control.


Making sure ALL subroutines are correctly called

There are some types of programming defects that the compiler doesn't catch, e.g. calling a subroutine with the wrong number of arguments. Use a utility like BC or create your own to check each and every CALL statement in your programs.

To keep this from being a problem in new development, one strategy to consider is writing functions instead of subroutines. Function arguments get checked at compile time, so a programmer's mistake is more likely to be caught early.

Commit all your changes to source code control.


Externalizing common code

You may have a similar problem. The former programmers here had the peculiar practice of copying reused initialization and COMMON area declaration logic directly into their programs, rather than just $INCLUDE it. Once copied into their program, they proceeded to rename variables and otherwise "de-standardize" it. That left me with hundreds of programs where the code looked similar to the naked eye, but it was impossible to write a single utility to "standardize" it.

It's really not as bad as it looks (but yes, it could have been easier). COMMON areas must have the same number of arguments to work, so the original programmers couldn't deviate too far from the standard. By treating the all the programs that use a particular COMMON as a "set", it's easy to externalize the COMMON in an INCLUDE file and adjust all the programs in the "set" to use the same variables.

In externalizing reused initialization code, I found that the former programmers would "clone" their programs. So even with hundreds of programs with "de-standardized" logic, I found "families" of programs where the logic was identical and I was able to programmatically discover and fix them.

When you look at your source code , undoubtedly you'll find similar code patterns and you'll be able to create utilities to fix them. The good news is that this is a one-time fix -- with coding standards you'll never have to deal with this issue again.

Another strategy is to only externalize code in the programs you modify. If you do any mods or fixes to a program, make it your SOP to externalize COMMON declarations and initialization logic. You'll distribute the workload over a longer period, but eventually you'll get it done.

Compile everything and fix any errors. Commit your changes to source code control.

Renaming file variables

Your coding standard should define a file variable naming convention. You'll want to make sure ALL programs conform to it.

My standard is that the file variable should be named the same as the filename. Unfortunately, I inherited files named "CUST-MAST" (with a minus separator) and the like, so the best I can use is a variable named CUST.MAST.

Changing standalone programs is easy enough. You can write a oneshot (i.e. one-time, single use program) to scan the file OPEN statements to get the actual filename and variable, then perform the necessary substitutions or generate an editor script to do them. For example:

OPEN "CUST-MAST" TO F2 ELSE ABORT 201, "CUST-MAST"
READ BILLTO FROM F2, BILLTO.ID ELSE
   * blah, blah, blah

It would be simple to create an editor prestore that did:

0001: E
0002: C/ FROM F2,/ FROM CUST.MAST,[[/G99999]]
0003: T
0004: C/ TO F2,/ TO CUST.MAST,[[/G99999]]
0005: T
0006: C/ ON F2,/ ON CUST.MAST,[[/G99999]]

Because we enforced consistent formatting and spacing in previous cleanup steps, building an editor prestore is easy and effective. And you could just as easily execute CHANGE functions programmatically.

Subroutines with an ambiguous file variable passed as an argument will be a little tougher to fix; you'll have to check all the programs that call it to get the correct filename. This may not be as easy as it sounds: some subroutines may intentionally work against different files based on the argument passed in. For example, you may have a subroutine:

SUBROUTINE PRINT.INVOICE(INVOICENO, F3)

When called from order entry F3 might be "INVOICE.MASTER", but when called from customer history F3 might be "INVOICE.HISTORY". In this case, it might be better to choose a file variable name that reflects its dual role, e.g. instead of F3 use EITHER.INVOICE.FILE or even just INVOICE.FILE.

Programs that rely on COMMON area declarations will be the trickiest. First you'll want to make sure the COMMON is declared outside of the programs in an INCLUDE file. Find all the programs that $INCLUDE it and perform your conversions on the entire set (programs and INCLUDE file.)

Be on the lookout for programs that reuse file variables for different files. It's a poor programming practice that you'll probably have to fix manually. It may be more expedient to refactor the program into a driver/function architecture.

As always everything must compile without errors or warnings. Commit your changes to source code control.


Defining file attribute constants

This is typically a one-time activity, so write a oneshot to go through each production file dictionary and create an INCLUDE record. Each INCLUDE record should define ONE constant (i.e. EQUATE) for each field number. Every attribute constant must be unique throughout your entire application.

Use your coding standard to revise the attribute names. Yes, they should match the dictionary ID as best as possible, but keep in mind the "standardized" name should take precedent. Also cross check related files so the same attribute gets the same name, not ORD$PO.NUMBER in one file and INV$PONO in another.

I know many people have the firm belief that the dictionary dictates field usage. So if the dict item ENTRY.DATE is field 4, then all programs must be made to conform to the dictionary. If you're one of these people, that's okay. My experience is that programs write data, not dictionaries, and if the dictionary says "4", but all the programs use "3" for the entry date, then the production file records all have their entry date in "3" and the dict item must be changed to reflect this reality. You'll want to "test" convert your data entry and posting programs first, making sure the data is getting stuffed into the correct attribute. You may have to do this a few times as you resolve program-dict mismatches.

Don't convert all your programs just yet. First commit all your INCLUDE items and dictionary changes to source code control. They are now an integral part of your application.

When you're ready, use a utility like PARSER to replace the attribute numbers with names, or write your own oneshot. PARSER reads the source line by line, making a map of variables and their respective files, substituting the correct attribute constant for each magic number. I guess you could do this manually one program at a time, but that'll take forever and be more error prone.

Before you commit any program changes to source code control, everything must compile.


Applying readability standards

We spend much more time reading programs than we do actually writing them. Even while writing, we often read and reread our code. Instead of devising ways to write code faster, we need ways to read and comprehend it faster. The goal is to simplify the code so a junior programmer can be productive, even if that means we have to type more.

If you've completed all the steps to this point, your code is looking 100% better. You can easily find programs that update particular files or fields. You can quickly scan over a program and spot "wrong-looking" code and fix it.

You may even choose to stop your cleanup project at this point, but if you want some ideas what to cleanup next, I have some suggestions:

  • Write a oneshot to break single line IF-THEN-ELSE logic into multiple lines.
  • Write a oneshot to rename file record variables to match your standard.
  • Remove old commented-out logic (it's already in source code control).

This pretty much puts us at the limit of what you can do programmatically. The rest of the cleanup needs to be done manually; every time you change a program, you can refactor it to aid programmer comprehension:

  • Eliminate deep nesting; allow only 2 levels at most. Modularize and abstract the code with GOSUB's or external functions.
  • Allow the formatting to reveal the program's structure. Loops should look loops. Conditional code should be clearly blocked. The "natural" logic flow should proceed from the top to the bottom, with error handling "breaking" the flow.
  • Adopt a "self-documenting" coding style that reveals your business rules.


http://www.autopower.com/rgozar/pixel.gif


Comments

2008-03-31 by Rex Gozar
Simplified original 2008-03-07 version; still need to add utilities.

[[[add a comment|How to add a comment.]]]