validate

Commands ››
Parent Previous Next

The validate command is used to validate CSV data against a number of validation rules. This command does not validate the basic CSV syntax (the input to it must be syntactically  correct CSV) - it's intended to be used to validate business rules.


See also: check

Flag

Req'd?

Description

-vf file

Yes

Specifies the file containing the validation rules.

-om mode

No

Specifies the output mode. Possible values are:


report, which displays the filename, line number and validation error message for each failure. This is the default


pass, which displays all rows that pass validation  


fail, which displays all rows that fail the validation  


-ec

No

If this option is specified, the validate command returns a value of 2 to the host operating system on validation failure. Without this option, validation failure does not return an error value to the OS.



The validation rules are contained in a text file. Here's an example:

# val_names.txt
#
# check that:
#   each row in the file has at least three fields
#   all fields contain some non-whitespace data
#   the third field contains only the values "M" and "F"

required    1,2,3
notempty    1,2,3
values      3        "M" "F"

The format of the validation file is fairly straightforward. Lines starting with '#' are comments, other lines consists of two or more fields, separated by spaces. The fields are as follows:


Once you have a validation file you, you can use it to validate CSV data:

csvfix validate -vf val_names.txt data/names.csv

This will produce no output, because all the data in names.csv pass all the validation rules. However, if you try it with another file, like bad_names.csv:

csvfix validate -vf val_names.txt data/bad_names.csv

you get a list of rows and fields that fail the validation rules:

data/bad_names.csv (2): Jane,,F
    field: 2 - field is empty
data/bad_names.csv (4): Flann,O'Brien,X
    field: 3 - "X" is invalid value
data/bad_names.csv (5): George,Elliot
    field: 3 - required field missing
data/bad_names.csv (6): Virginia,
    field: 3 - required field missing

The following rules are currently available:

Rule

Description

required

The field(s) must exist in the CSV, though they may be empty. For all other rules, it is not required that the field exists, so if a field must exist, it must be tested with this rule.

notempty

The field(s) must not contain only white space.

fields

Specifies a minimum and maximum number of fields in each row. This doers not require a field list:

fields * 2:4

says we need a minimum of 2 and a maximum of 4 fields. If you don't need a range, make the minimum and maximum values the same.

length

The length of the field(s) must be between specified minimum and maximum values:

length 1,2,3  10:20

numeric

The field must contain a numeric value. Additionally, you can specify a number of ranges as parameters. For example

numeric 1 1:1000 -1,-1

specifies that field 1 must be numeric and in the range 1 to 1000 (inclusive) or in the range -1 to -1 (i.e. it may also have the value -1)

values

The field must contain one of a number of values. For example:

values 2 'EUR' 'USD' 'GPP'

says that field 2 must contain one of EUR, USD or GBP. Values may be contained in single or double quotes, or simply be space-separated.

notvalues

As above, but the field must not contain the listed values.

lookup

Lookup one or more fields in a second CSV file (actually, you can use the same CSV file as the one you are validating, which is useful in some recondite circumstances). For example:

lookup * 1:4,2:7 data/lookupfile.csv

Here, the field list is not needed, so an asterisk is used as a place holder. The first parameter is a comma-separated list of field number pairs. The first value of each pair indicates the field in the current file and the second the field in the lookup file

date

Check that field is a valid date. The format of the date is specified by a mask value that must be supplied. See the date_iso command for details of mask format. An optional date range, consisting of two dates in ISO format separated by a colon can also be provided. For example:

date 1,2 'd/m/y' 2000-1-1:2010-12-31

validates dates in dd/mm/yyyy format and checks that they are in the first decade of the 21st century.



The following example validates the cities.csv file against countries,csv. The validation file looks like this:

# val_country.txt
# lookup second field in cities.csv against 
# the first field in countries.csv
required    1,2
lookup      *      2:1  data/countries.csv

The command line to use it is:

csvfix validate -vf rules/val_country.txt data/cities

which produces the following output (because Greece is not in the countries.csv file):

data/cities.csv (6): Athens,GR
    lookup of 'GR' in data/countries.csv failed


Created with the Personal Edition of HelpNDoc: Easy CHM and documentation editor