I decided to absorb the data into a database. The first draft of the
code I wrote to do so informed me that it would take 25 days of
computing processing to complete. That was too long. Also I was out of
hard drive space. So I went to a store and bought a computer, a big,
boxy, unfashionable PC with a 4-GHz quad-core processor and ten
terabytes of extra hard-drive space, installed Linux on it, and got the
most recent version of the PostgreSQL database.
With the help of that machine and quite a few database tricks to massage
and extract the data, I got 25 days down to one, with searchable titles,
descriptions, and reviews. Seven days of programming and one day of
absorption to beat one day of programming and 25 days of absorption: a
pretty familiar set of trade-offs. You're always trying to balance your
time against the computer's, but there's also the challenge of the
thing. I probably should have just let it run for four weeks.
-- Paul Ford. "Does Amazon's Data Speak for Itself?"
New Republic (Feb 17, 2016).
https://newrepublic.com/article/129026/amazons-data-speak-itself