How we came across the need for Big Data way back in Y2K
We were working on a batch job and the requirement was to finish processing all the transaction in a database within a maintenance window of less than a hour. The requirement was to read hundreds of thousands of records, call a service and validate and enrich those records using reference data returned by the service (Back then we had no web services, we had CORBA or IIOP (Java RMI/IIOP) to invoke those services). Now the challenge was the time window that was very brief. We figured out that reading the records sequentially was not an option and we had to process those records in parallel. So I proposed we could use batch jobs running in multiple processes spread across multiple nodes to read and process the records from the database, the slaves or Yes, the "Mappers"! as in MapReduce. Now how could we prevent the parallel batch jobs from stepping on each others toes and avoid picking up the same set of records for processing. How could we avoid this duplication? use row l...