Pages

Friday, December 13, 2013

Data Preparation : kaggle Facebook Recruiting competition III

Each record in the data ends with \r, so you can replace all the \n with spaces and replace all the \r with \n.
#!/bin/bash
if [ -z "$1" ] ; then
echo "First replaces all the \\n with spaces then replaces all the \\r with \\n"
echo "usage: $0 input.csv output.csv"
exit 1;
fi
tr '\n' ' ' < "$1" | tr '\r' '\n' > "$2"

<post from kaggle forum>

head -n [number of lines] Train.csv > sample_train.csv

Python script to parse the data : 
import csv, sys

if len(sys.argv) <> 3:
    print >>sys.stderr, 'Wrong number of arguments. This tool will print first n records from a comma separated CSV file.' 
    print >>sys.stderr, 'Usage:' 
    print >>sys.stderr, '       python', sys.argv[0], '<file> <number-of-lines>'
    sys.exit(1)

fileName = sys.argv[1]
n = int(sys.argv[2])

i = 0
out = csv.writer(sys.stdout, delimiter=',', quotechar='"', quoting=csv.QUOTE_NONNUMERIC)
with open(fileName, 'rb') as csvfile:
    for row in csv.reader(csvfile, delimiter=',', quotechar='"'):
        i += 1
        if i > n: break
        else:

            out.writerow(row)

No comments:

Post a Comment