As I’ve written earlier, a lot of the open pools of big data sources are actually government data sources. Many big data firms would process this information and come up with analysis and conclusions that they can then sell, repackage, tweak, or fold into other data products. This has been the model for quite sometime now. This is the extent, for the most part, of open data.
The real future for any fundamental progress in big data lies as much in the sources of data available and the data sifting and processing software needed to handle this data. A lot of the hype regarding big data startups are focused primarily on analytical tools, data processing tools, and software used to handle data. This is all well and good, but as I’ve mentioned earlier, the real issue is getting new sources of data. Considering that large corporations that have spent a lot of money getting customers, servicing those customers, developing a brand, and developing relationships aren’t exactly jumping at the opportunity to open their data to potential competition. This is always going to be the sticking point.
The good news to this is there appear to be areas of agreement where certain subsets of corporate data can be made available to the public either in the form of exchanges or actual freely accessible information. The more pools are available, the more profound development we can see with big data. As I’ve mentioned earlier, big data is driven primarily by software. This is fine but the actual fuel for its ability to mutate into something that can truly reinvent key areas of the technology landscape arises from the data it crunches. So far, that supply has been limited to certain pools. This is the big frontier. If industry can get together and come up with an open sharing system where otherwise privately collected information is shared either through quid pro quo system or some sort of value-exchange system, the whole idea of big data innovation can truly take off.