Each record in the data ends with \r, so you can replace all the \n with spaces and replace all the \r with \n.
#!/bin/bash
if [ -z "$1" ] ; then
echo "First replaces all the \\n with spaces then replaces all the \\r with \\n"
echo "usage: $0 input.csv output.csv"
exit 1;
fitr '\n' ' ' < "$1" | tr '\r' '\n' > "$2"
<post from kaggle forum>
head -n [number of lines] Train.csv > sample_train.csv
Python script to parse the data :
import csv, sys
if len(sys.argv) <> 3:
print >>sys.stderr, 'Wrong number of arguments. This tool will print first n records from a comma separated CSV file.'
print >>sys.stderr, 'Usage:'
print >>sys.stderr, ' python', sys.argv[0], '<file> <number-of-lines>'
sys.exit(1)
fileName = sys.argv[1]
n = int(sys.argv[2])
i = 0
out = csv.writer(sys.stdout, delimiter=',', quotechar='"', quoting=csv.QUOTE_NONNUMERIC)
with open(fileName, 'rb') as csvfile:
for row in csv.reader(csvfile, delimiter=',', quotechar='"'):
i += 1
if i > n: break
else:
out.writerow(row)
No comments:
Post a Comment