Example 41: Arrays
Arrays in awk are associative.
Each of the awk elements are identified by their indices.
Awk arrays are different from arrays in other languages:
1) no need to specify the size of the arrays before using them
2) any number or string can be an index.
array1["CAT"]="meoww"
array2["DOG"]="barks"
Above array is valid even when we dont have numeric indices.
Also we can add elememts at any position.
a[1]="Sukul"
a[2]="uma"
a[20]="shushant"
Note that we can add element at 20th position irrespective whether we have added elements 3,4,5...
Notice the below 2 for loops and understand why for(i in array) is used with awk arrays.
awk '{ a[1]="sukul";a[2]="uma";a[5]="bhanu";
for (i=1;i<=5;i++)
{ print a[i]
}
}' testx
sukul
uma
bhanu
Note that since we had not assigned values to a[3] and a[4] above for loop printed blanks for them.
Ideally we should not printed anything because they dont exist.
Thus the above for loop is not inteligent enough to understand whether
the element exists or not.
Instead below for loop makes more sense
awk '{ a[1]="sukul";a[2]="uma";a[5]="bhanu";
for (i in a )
{ print a[i]
}
}' testx
uma
bhanu
sukul
Note that this for loop understand existence or non-existence of an
array element and prints them accordingly.
This is the reason why we use for( i in array) syntax when working with arrays in awk.
Example 42: numeric built in functions
awk ' {
print int(17.23) #gives integer part
print sqrt(900) #gives square root
print exp(2) # exponential
print log(10) # natural log
print sin(30) # sine. (x in radians)
print cos(30) # cosine. (x in radians)
} ' testx
17
30
7.38906
2.30259
30
30
Example 43: String built in function- index
index(string1,string2) : searches string1 for 1st occurenence of string2 and returns the position of beginning of string2.
If not found it returns zero
Below shows the position of 1st "u" in the data file
awk '{ print index($0,"u")}' test1
2
1
5
3
8
Example 44: String built in function- length
Returns the length of the string input
#prints the lengths of names
awk '{ print length($1)}' test1
5
3
5
8
8
Example 45: String built in function- match
match(string,regexp): searches for regexp in the string
and returns the position where the substring begins and
if no match found returns 0.
It also sets two built in variables
1) RSTART: sets the value of index where the substring begins
2) RLENGTH: length of the characters of matached string
note: did not work on my installation.
Example 46: String built in function- split
split(string,arrayname,separator)
awk splits the string 'string' into array 'arrayname' based on the separator we provide.
Split returns the number of array elements th split created.
If we skip separator, FS value is used.
awk '{ numberofelements=split($0,array1,"u")
print "Record no:" NR
print "Number of array elements created:" numberofelements
print array1[1],"|",array1[2],"|",array1[3]}' test1
Record no:1
Number of array elements created:4
s | k | l 8149158828 m
Record no:2
Number of array elements created:2
| ma 8149122222 chennai 100/800/300 |
Record no:3
Number of array elements created:2
bhan | 8097123451 Jhansi 200/1000/500 |
Record no:4
Number of array elements created:2
sh | shant 7798977047 nepal 200/9000/100 |
Record no:5
Number of array elements created:2
himansh | 9090909090 bokharo 100/800/300 |
Example 47: String built in function- sub
Sub stands for substitute.
sub(regexp,replacement,target)
sub replaces the 1st occurence of regexp with the replacement text
in the target.
It returns 0 or 1 depending upon number of strings replaced.
awk '{str = "water, water, everywhere"
sub(/at/, "ith", str);
print str}' test1
wither, water, everywhere
wither, water, everywhere
wither, water, everywhere
wither, water, everywhere
wither, water, everywhere
pht022e2:/home/nemo_dev/sm017r> awk '{ sub(/uma/,"shri",$0);print $0}' test1
sukul 8149158828 mumbai 100/900/200
shri 8149122222 chennai 100/800/300
bhanu 8097123451 Jhansi 200/1000/500
shushant 7798977047 nepal 200/9000/100
himanshu 9090909090 bokharo 100/800/300
Note that the 1st occurenece of "uma" is replaced by "shri".
Note another variance of this using &.
This keeps the original string intact and just appends the new data.
awk '{ noofrep=sub(/uma/,"& shri",$0);print "Replace cnt:" noofrep, "|", $0}' test1
pht022e2:/home/nemo_dev/sm017r> awk '{ noofrep=sub(/uma/,"& shri",$0);print "Replace cnt:" noofrep, "|", $0}' test1
Replace cnt:0 | sukul 8149158828 mumbai 100/900/200
Replace cnt:1 | uma shri 8149122222 chennai 100/800/300
Replace cnt:0 | bhanu 8097123451 Jhansi 200/1000/500
Replace cnt:0 | shushant 7798977047 nepal 200/9000/100
Replace cnt:0 | himanshu 9090909090 bokharo 100/800/300
Example 48: String built in function- global sub
Same as sub but it replaces all the occurences in the input record.
awk '{ noofrep=gsub(/u/,"A",$0);print "Replace cnt:" noofrep, "|", $0}' test1
pht022e2:/home/nemo_dev/sm017r> awk '{ noofrep=gsub(/u/,"A",$0);print "Replace cnt:" noofrep, "|", $0}' test1
Replace cnt:3 | sAkAl 8149158828 mAmbai 100/900/200
Replace cnt:1 | Ama 8149122222 chennai 100/800/300
Replace cnt:1 | bhanA 8097123451 Jhansi 200/1000/500
Replace cnt:1 | shAshant 7798977047 nepal 200/9000/100
Replace cnt:1 | himanshA 9090909090 bokharo 100/800/300
Example 49: String built in function- substr
Substring is used to extract a part of the string.
substr(string,start,length)
pht022e2:/home/nemo_dev/sm017r> awk '{ s1=substr($0,5,10);print s1}' test1
l 81491588
8149122222
u 80971234
hant 77989
nshu 90909
Example 50: String built in function-toupper, tolower
Used to convert case from upper to lower OR lower to upper case.
pht022e2:/home/nemo_dev/sm017r> awk '{ record=toupper($0);print record}' test1
SUKUL 8149158828 MUMBAI 100/900/200
UMA 8149122222 CHENNAI 100/800/300
BHANU 8097123451 JHANSI 200/1000/500
SHUSHANT 7798977047 NEPAL 200/9000/100
HIMANSHU 9090909090 BOKHARO 100/800/300
Example 51: system builtin function- system
Used to execute any system command from awk itself.
The system command is run and control comes back to awk.
pht022e2:/home/nemo_dev/sm017r> awk '{ record=toupper($0);print record}
END { system("ls -lrt test*")}' test1>
SUKUL 8149158828 MUMBAI 100/900/200
UMA 8149122222 CHENNAI 100/800/300
BHANU 8097123451 JHANSI 200/1000/500
SHUSHANT 7798977047 NEPAL 200/9000/100
HIMANSHU 9090909090 BOKHARO 100/800/300
-rw-r----- 1 sm017r nemo_dev 187 Aug 8 05:31 test1
note the last line of the output. It contains the result of ls -lrt test* that was run
from within awk.
Example 52 : understanding ARGV and ARGC.
The command line arguments that we pass to awk program are stored in an array called ARGV.
ARGC: This contains the number of command line arguments.
The ARGV is indexed from 0 to ARGC-1
awk '{print ARGC;
print ARGV[0]
print ARGV[1]}' test1
this prints all the 3 for each line in the input file.
Note that ARGV[1] is the name of the input file .
2
awk
test1
2
awk
test1
2
awk
test1
2
awk
test1
2
awk
test1
Example 52: Built variables ENVIRON and FILENAME
awk also has a array ENVIRON which contains the values of the environment variables.
The index for this array is the name of the variable.
FILENAME variable gives the name of the input file.
If the data is read from standard input the value is set to "-".
awk '{print ENVIRON["HOME"], ENVIRON["SHELL"], FILENAME }' test1
/home/nemo_dev/sm017r /usr/bin/ksh test1
/home/nemo_dev/sm017r /usr/bin/ksh test1
/home/nemo_dev/sm017r /usr/bin/ksh test1
/home/nemo_dev/sm017r /usr/bin/ksh test1
/home/nemo_dev/sm017r /usr/bin/ksh test1
we can see that ENVIRON["HOME"] prints the value of the HOME
environment variable and same also applies to ENVIRON["SHELL"].
Arrays in awk are associative.
Each of the awk elements are identified by their indices.
Awk arrays are different from arrays in other languages:
1) no need to specify the size of the arrays before using them
2) any number or string can be an index.
array1["CAT"]="meoww"
array2["DOG"]="barks"
Above array is valid even when we dont have numeric indices.
Also we can add elememts at any position.
a[1]="Sukul"
a[2]="uma"
a[20]="shushant"
Note that we can add element at 20th position irrespective whether we have added elements 3,4,5...
Notice the below 2 for loops and understand why for(i in array) is used with awk arrays.
awk '{ a[1]="sukul";a[2]="uma";a[5]="bhanu";
for (i=1;i<=5;i++)
{ print a[i]
}
}' testx
sukul
uma
bhanu
Note that since we had not assigned values to a[3] and a[4] above for loop printed blanks for them.
Ideally we should not printed anything because they dont exist.
Thus the above for loop is not inteligent enough to understand whether
the element exists or not.
Instead below for loop makes more sense
awk '{ a[1]="sukul";a[2]="uma";a[5]="bhanu";
for (i in a )
{ print a[i]
}
}' testx
uma
bhanu
sukul
Note that this for loop understand existence or non-existence of an
array element and prints them accordingly.
This is the reason why we use for( i in array) syntax when working with arrays in awk.
Example 42: numeric built in functions
awk ' {
print int(17.23) #gives integer part
print sqrt(900) #gives square root
print exp(2) # exponential
print log(10) # natural log
print sin(30) # sine. (x in radians)
print cos(30) # cosine. (x in radians)
} ' testx
17
30
7.38906
2.30259
30
30
Example 43: String built in function- index
index(string1,string2) : searches string1 for 1st occurenence of string2 and returns the position of beginning of string2.
If not found it returns zero
Below shows the position of 1st "u" in the data file
awk '{ print index($0,"u")}' test1
2
1
5
3
8
Example 44: String built in function- length
Returns the length of the string input
#prints the lengths of names
awk '{ print length($1)}' test1
5
3
5
8
8
Example 45: String built in function- match
match(string,regexp): searches for regexp in the string
and returns the position where the substring begins and
if no match found returns 0.
It also sets two built in variables
1) RSTART: sets the value of index where the substring begins
2) RLENGTH: length of the characters of matached string
note: did not work on my installation.
Example 46: String built in function- split
split(string,arrayname,separator)
awk splits the string 'string' into array 'arrayname' based on the separator we provide.
Split returns the number of array elements th split created.
If we skip separator, FS value is used.
awk '{ numberofelements=split($0,array1,"u")
print "Record no:" NR
print "Number of array elements created:" numberofelements
print array1[1],"|",array1[2],"|",array1[3]}' test1
Record no:1
Number of array elements created:4
s | k | l 8149158828 m
Record no:2
Number of array elements created:2
| ma 8149122222 chennai 100/800/300 |
Record no:3
Number of array elements created:2
bhan | 8097123451 Jhansi 200/1000/500 |
Record no:4
Number of array elements created:2
sh | shant 7798977047 nepal 200/9000/100 |
Record no:5
Number of array elements created:2
himansh | 9090909090 bokharo 100/800/300 |
Example 47: String built in function- sub
Sub stands for substitute.
sub(regexp,replacement,target)
sub replaces the 1st occurence of regexp with the replacement text
in the target.
It returns 0 or 1 depending upon number of strings replaced.
awk '{str = "water, water, everywhere"
sub(/at/, "ith", str);
print str}' test1
wither, water, everywhere
wither, water, everywhere
wither, water, everywhere
wither, water, everywhere
wither, water, everywhere
pht022e2:/home/nemo_dev/sm017r> awk '{ sub(/uma/,"shri",$0);print $0}' test1
sukul 8149158828 mumbai 100/900/200
shri 8149122222 chennai 100/800/300
bhanu 8097123451 Jhansi 200/1000/500
shushant 7798977047 nepal 200/9000/100
himanshu 9090909090 bokharo 100/800/300
Note that the 1st occurenece of "uma" is replaced by "shri".
Note another variance of this using &.
This keeps the original string intact and just appends the new data.
awk '{ noofrep=sub(/uma/,"& shri",$0);print "Replace cnt:" noofrep, "|", $0}' test1
pht022e2:/home/nemo_dev/sm017r> awk '{ noofrep=sub(/uma/,"& shri",$0);print "Replace cnt:" noofrep, "|", $0}' test1
Replace cnt:0 | sukul 8149158828 mumbai 100/900/200
Replace cnt:1 | uma shri 8149122222 chennai 100/800/300
Replace cnt:0 | bhanu 8097123451 Jhansi 200/1000/500
Replace cnt:0 | shushant 7798977047 nepal 200/9000/100
Replace cnt:0 | himanshu 9090909090 bokharo 100/800/300
Example 48: String built in function- global sub
Same as sub but it replaces all the occurences in the input record.
awk '{ noofrep=gsub(/u/,"A",$0);print "Replace cnt:" noofrep, "|", $0}' test1
pht022e2:/home/nemo_dev/sm017r> awk '{ noofrep=gsub(/u/,"A",$0);print "Replace cnt:" noofrep, "|", $0}' test1
Replace cnt:3 | sAkAl 8149158828 mAmbai 100/900/200
Replace cnt:1 | Ama 8149122222 chennai 100/800/300
Replace cnt:1 | bhanA 8097123451 Jhansi 200/1000/500
Replace cnt:1 | shAshant 7798977047 nepal 200/9000/100
Replace cnt:1 | himanshA 9090909090 bokharo 100/800/300
Example 49: String built in function- substr
Substring is used to extract a part of the string.
substr(string,start,length)
pht022e2:/home/nemo_dev/sm017r> awk '{ s1=substr($0,5,10);print s1}' test1
l 81491588
8149122222
u 80971234
hant 77989
nshu 90909
Example 50: String built in function-toupper, tolower
Used to convert case from upper to lower OR lower to upper case.
pht022e2:/home/nemo_dev/sm017r> awk '{ record=toupper($0);print record}' test1
SUKUL 8149158828 MUMBAI 100/900/200
UMA 8149122222 CHENNAI 100/800/300
BHANU 8097123451 JHANSI 200/1000/500
SHUSHANT 7798977047 NEPAL 200/9000/100
HIMANSHU 9090909090 BOKHARO 100/800/300
Example 51: system builtin function- system
Used to execute any system command from awk itself.
The system command is run and control comes back to awk.
pht022e2:/home/nemo_dev/sm017r> awk '{ record=toupper($0);print record}
END { system("ls -lrt test*")}' test1>
SUKUL 8149158828 MUMBAI 100/900/200
UMA 8149122222 CHENNAI 100/800/300
BHANU 8097123451 JHANSI 200/1000/500
SHUSHANT 7798977047 NEPAL 200/9000/100
HIMANSHU 9090909090 BOKHARO 100/800/300
-rw-r----- 1 sm017r nemo_dev 187 Aug 8 05:31 test1
note the last line of the output. It contains the result of ls -lrt test* that was run
from within awk.
Example 52 : understanding ARGV and ARGC.
The command line arguments that we pass to awk program are stored in an array called ARGV.
ARGC: This contains the number of command line arguments.
The ARGV is indexed from 0 to ARGC-1
awk '{print ARGC;
print ARGV[0]
print ARGV[1]}' test1
this prints all the 3 for each line in the input file.
Note that ARGV[1] is the name of the input file .
2
awk
test1
2
awk
test1
2
awk
test1
2
awk
test1
2
awk
test1
Example 52: Built variables ENVIRON and FILENAME
awk also has a array ENVIRON which contains the values of the environment variables.
The index for this array is the name of the variable.
FILENAME variable gives the name of the input file.
If the data is read from standard input the value is set to "-".
awk '{print ENVIRON["HOME"], ENVIRON["SHELL"], FILENAME }' test1
/home/nemo_dev/sm017r /usr/bin/ksh test1
/home/nemo_dev/sm017r /usr/bin/ksh test1
/home/nemo_dev/sm017r /usr/bin/ksh test1
/home/nemo_dev/sm017r /usr/bin/ksh test1
/home/nemo_dev/sm017r /usr/bin/ksh test1
we can see that ENVIRON["HOME"] prints the value of the HOME
environment variable and same also applies to ENVIRON["SHELL"].
No comments:
Post a Comment