Tuesday, February 15, 2011

Use Python to normalize database

Many occasions, data needs to be normalized to speed up query operations before entering a database. Large text files have to depend on Python, given its’ excellent row-wise data manipulation ability.
First thought is to use a nested list to fill in all the data, such as the codes below.
import csv, sqlite3

infile = open('mtcars.csv', 'r')
f = csv.reader(infile)
header = f.next()
header.pop(0)

data = [] 
for r in f:
    name = r.pop(0)
    for i in range(0, len(r)):
        data.append([name, header[i], r[i]])
However, a dictionary will be much more convenient given its built-in iteration tools.
import csv, sqlite3

infile = open('mtcars.csv', 'r')
f = csv.DictReader(infile)

data = [] 
for r in f:
    a = r.popitem()[1]
    for key, value in r.iteritems():
        data.append([a, key, float(value)])

infile.close()

# Create an empty table for the codes below
db = sqlite3.connect('mtcars.db')
c = db.cursor()
c.execute('create table if not exists display(name text, varname text, varvalue real) ')
db.commit()

# Insert the normalized data into SQLite database 
c.executemany('insert into display values (?, ?, ?)', data)
db.commit()

# Show the result
for row in c.execute('select * from display'):
    print row

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...