This is a guest post by Lucas Baker, Andrea Duque, and Viet Yen Nguyen of Hypefactors.
At Hypefactors, we build tech for media intelligence and reputation management. The solution is a software as a service (SaaS) product that does large-scale media monitoring of social media, news sites, TV, radio, and reviews across the world. The tracked data is streamed continuously and enriched in real time. This yields insights that can reveal early business opportunities (for example, GameStop hype), track the success of product launches, and preempt disasters.
To this end, over a hundred million network requests are made daily from data pipelines for web crawling, social media firehoses, and other REST-based media data integrations. This yields millions of new articles and posts each day. This data can be segmented into three classes (as illustrated with the following examples):
Owned – Articles or posts written by a company and