Data is all around us — from the constant data that our smart phones generate to the data that results from our online browsing; from the information thermostats absorb in our homes to the data collected by our cars.
This “data exhaust” — or “the Internet of Things,” as it is sometimes called — comes from everything being connected to the Internet and creating some kind data imprint.
Capture Higher Ed’s Thom Golden, vice president of Data Science, and Brad Weiner, director of Data Science, briefly discuss data exhaust during a recent episode of their podcast, The WeightList. In most episodes, they like to introduce a number that in some way cuts through the “exhaust,” especially as it relates to higher education and enrollment management.
Last year, Katherine Noyes, senior U.S. correspondent for IDG News Service, wrote a piece for the online IT magazine Computerworld that discussed “5 things you need to know about data exhaust.” In it, she describes it as “in some ways, the evil twin brother” of big data and lays out some of its pros and cons.
What should we know?
First, data exhaust is essentially all the big data that is not core to our business, or in the case of enrollment management, recruitment efforts. If Big Data is “primary” data that relates to the core function of your efforts, data exhaust is “secondary” data — “everything else that’s created along the way,” Noyes writes.
Data exhaust is also typically “bigger than big,” she writes. The term “big data” is itself a relative term, boiling down essentially to “anything that’s so large that you couldn’t manually inspect or work with it record by record,” said Tye Rattenbury, director of data science and solutions engineering at Trifacta, which makes software for data preparation. He was interviewed for Noyes’ story.
In general, data exhaust tends to be even bigger, primarily because there are few limits on what a company can collect.
“Google is the leader here,” Rattenbury said. “They literally collect everything, even before they know what they will do with it.”
This means data exhaust (or secondary data) can become primary data once a use is found for it.
The story goes on to discuss the great potential, the real risks and the strong need for selectivity related to data exhaust. Read the full story for a good primer on this byproduct of big data.
By Kevin Hyde, Content Writer, Capture Higher Ed