join

Commands ››
Parent Previous Next

The join command performs a relational join between two sets of CSV data. This command is intended to be used for enriching one CSV data set with data from another. There are two important points to note when using join. Firstly, This command treats files somewhat differently from other CSVfix commands. Each file from the command line except the last one will be joined, in turn against the last one. So in the following command line (from which the flags have been elided):


csvfix join ... file1.csv file2.csv file3.csv


file1.csv will be joined with file3.csv and output produced, then file2.csv will be joined with file3.csv and output produced.  It follows from this that the join command requires at least two input streams. Note that either one of these may be the standard input stream, if required.  Secondly, in the output from the command, fields that partook in the join are removed from the fields in the last file.




Flag

Req'd?

Description

-f fields

Yes

Specifies a comma-separated list of fields to join. Each entry in the list consists of two colon-separated field indices:

-f 1:2,3:3

This would specify that you wish to join field 1 from the first file with field 2 from the second, and field 3 from the first file with field 3 from the second.

Note that the join command does not currently support joins between more than two files.

-oj

No

Specifies that an outer join should be performed. This will retain any rows in the first file that cannot be joined to rows in the second - by default, such rows are removed.

-inv

No

Inverts the sense of the join so that only rows from the left hand side of the join with fields that do not match rows on the right are output. This can be used to create exclusion lists. This flag is mutually exclusive with the -oj flag.

-ic

No

Ignore case for fields being joined, so a join on fields containing 'foo' and 'FOO' would succeed. Default is to respect case.



The following example joins the cities.csv and countries.csv files to produce a list of cities with long country names:

csvfix join -f 2:1 data/cities.csv  data/countries.csv

which produces:

"London","GB","United Kingdom"
"Paris","FR","France"
"Edinburgh","GB","United Kingdom"
"Amsterdam","NL","Netherlands"
"Rome","IT","Italy"
"Berlin","DE","Germany"



Created with the Personal Edition of HelpNDoc: Produce Kindle eBooks easily