patternbashMinor
ECG Bash selection tool
Viewed 0 times
selectionecgbashtool
Problem
I made the following bash script for extracting a group of ECG signals from ECG files. I would like to know if there is any mistakes and/or weaknesses. I have experienced difficulties in integrating bash parameters to it as a function because of AWK part.
I think it would be better not to use so much different separate tools because of such problems, but not sure how to replace, for instance, the AWK part by something more stable together with bash.
Each ECG file contains two columns where the first column is the original signal and the second column is the improved ECG signal.
The database is AAMI MIT-BIH Arrhythmia. The script must be stable and must be valid, so I have not used wildcard characters there. The users give IDs which they want. They give also which ECG signal they want (
Now, the type of ECG signal has to be manually corrected because I cannot integrate
Logic of the script:
getEcgs.bash
```
#!/bin/bash
ids=(101 118 201 103 118)
dir="/home/masi/Documents/CSV/"
#Ecgs=()
index=0
ecg=2 # ecg=1 ecg; ecg=2 improved ecg # change AWK line $2/$1 to corresponding number manually for change; buggy AWK with bash params
#printf '%s\n' "${#ids[@]}"
#printf '%s\n' "${ids[0]}"
#printf '%s\n' "${ids[1]}"
for id in "${ids[@]}";
do
input=$(echo "${dir}P${id}C1.csv")
# take second column of the file here
file=$(awk -F "\",\"" '{print $2}' $input) # http://stackoverflow.com/a/19602188/54964 # http://stackoverflow.com/a/19075707/54964
# printf '%s\n' "${id}"
# printf '%s\n' "$index"
Ecgs[${index}]="${file}"
index=
I think it would be better not to use so much different separate tools because of such problems, but not sure how to replace, for instance, the AWK part by something more stable together with bash.
Each ECG file contains two columns where the first column is the original signal and the second column is the improved ECG signal.
The database is AAMI MIT-BIH Arrhythmia. The script must be stable and must be valid, so I have not used wildcard characters there. The users give IDs which they want. They give also which ECG signal they want (
1 or 2).Now, the type of ECG signal has to be manually corrected because I cannot integrate
$ecg in awk one-liner.Logic of the script:
- Get a list of wanted ECG columns into
ECGs; there is a repetition of the ID 118 because repetition should be allowed and duplicate IDs should not removed
- Greate and/or empty temporary files; keep iteration individual ECG in /tmp/test.csv and the combination result in result.csv
- Loop through
ECGsto have them in result.csv
- Add a header to the beginning of the file by
ids
getEcgs.bash
```
#!/bin/bash
ids=(101 118 201 103 118)
dir="/home/masi/Documents/CSV/"
#Ecgs=()
index=0
ecg=2 # ecg=1 ecg; ecg=2 improved ecg # change AWK line $2/$1 to corresponding number manually for change; buggy AWK with bash params
#printf '%s\n' "${#ids[@]}"
#printf '%s\n' "${ids[0]}"
#printf '%s\n' "${ids[1]}"
for id in "${ids[@]}";
do
input=$(echo "${dir}P${id}C1.csv")
# take second column of the file here
file=$(awk -F "\",\"" '{print $2}' $input) # http://stackoverflow.com/a/19602188/54964 # http://stackoverflow.com/a/19075707/54964
# printf '%s\n' "${id}"
# printf '%s\n' "$index"
Ecgs[${index}]="${file}"
index=
Solution
Shell scripts that do complex line-oriented text processing using Awk and other tools are usually better done using Awk alone. Not only would the script be more efficient, it would be more coherent, and have fewer quoting issues. Consider the following script, which I'll call
Observe what happens when you run it:
Note that
Since you are using GNU/Linux, I have taken advantage of some features specific to GNU Awk in the script above:
ecg:#!/usr/bin/gawk -f
# https://www.gnu.org/software/gawk/manual/html_node/Join-Function.html
@include "join.awk"
BEGIN {
FS = "\"*,\"*";
last_row = 0;
}
BEGINFILE {
rows[0][ARGIND] = gensub(".*P([0-9]*)C.*", "\\1", "g", FILENAME);
}
{
rows[FNR][ARGIND] = $col;
if (FNR > last_row) { last_row = FNR; }
}
END {
for (r = 0; r <= last_row; r++) {
print join(rows[r], 1, ARGC - 1, ",");
}
}Observe what happens when you run it:
$ ./ecg -v col=2 P{101,118,201,118}C1.csv
101,118,201,118
1.61,-1.84,-0.245,-1.84
0.67,-0.71,-0.22,-0.71
0.695,-0.49,-0.2,-0.49
0.38,-0.26,-0.2,-0.26
0.43,0.07,-0.195,0.07Note that
$col extracts the column specified by the parameter col.Since you are using GNU/Linux, I have taken advantage of some features specific to GNU Awk in the script above:
- Multidimensional arrays. Traditional Awk only has one-dimensional arrays which can be indexed using tuples to simulate extra dimensions.
- The
BEGINFILEspecial pattern and theARGINDspecial variable.
- The
gensub()function to extract the ID from the filename.
- The
join()function.
Code Snippets
#!/usr/bin/gawk -f
# https://www.gnu.org/software/gawk/manual/html_node/Join-Function.html
@include "join.awk"
BEGIN {
FS = "\"*,\"*";
last_row = 0;
}
BEGINFILE {
rows[0][ARGIND] = gensub(".*P([0-9]*)C.*", "\\1", "g", FILENAME);
}
{
rows[FNR][ARGIND] = $col;
if (FNR > last_row) { last_row = FNR; }
}
END {
for (r = 0; r <= last_row; r++) {
print join(rows[r], 1, ARGC - 1, ",");
}
}$ ./ecg -v col=2 P{101,118,201,118}C1.csv
101,118,201,118
1.61,-1.84,-0.245,-1.84
0.67,-0.71,-0.22,-0.71
0.695,-0.49,-0.2,-0.49
0.38,-0.26,-0.2,-0.26
0.43,0.07,-0.195,0.07Context
StackExchange Code Review Q#146360, answer score: 3
Revisions (0)
No revisions yet.