For companies leveraging data to drive decisions, product development and more, keeping pace with the latest data collection trends is essential. As technology and regulations evolve, strategies for acquiring, managing and applying data must adapt.
In 2023, organizations seeking to become truly data-driven will need to align their practices with five key developments:
1. Exponentially More Data Needed for Cutting-Edge AI
State-of-the-art artificial intelligence and machine learning models have become astoundingly capable. However, with greater sophistication comes a voracious appetite for data.
Consider autonomous vehicles. Training algorithms to handle diverse driving scenarios requires massive datasets – on the order of billions of miles of video footage [7].
Or examine large language models like Google‘s LaMDA, which can generate remarkably human-like text. LaMDA was trained on over a trillion words scraped from online dialogues and books [8].
Clearly, cutting-edge AI is extremely data-hungry. Recommendations include:
- Plan for scale: Data pipelines must support terabyte/petabyte volumes.
- Seek variety: Train models on diverse, representative data samples.
- Enable velocity: Rapidly accumulate quality training data.
- Consider partnerships: Complement internal data with vendor datasets.
- Evaluate synthetic data: Generate missing samples algorithmically.
- Audit continuously: Ensure data quality and minimize bias.
With careful strategy, sufficient data can fuel AI leadership.
2.Expanding Data Privacy Regulations Worldwide
As data breaches proliferate, governments are enacting stricter regulations around data privacy and security. Major developments include:
- GDPR: Europe‘s far-reaching rules for handling EU citizen data. Fines up to 4% of global revenue for violations [9].
- CCPA/CPRA: California laws granting consumers new data access rights. Potential fines of $2,500 per violation [10].
- DMA: Proposed US law would limit data collection, sharing outside core services.
- PIPL: China‘s law restricts cross-border data transfers, increases surveillance [11].
- India data protection bill: Still evolving, but will likely impose data localization.
Data regulations continue expanding worldwide. [3]
For global enterprises, monitoring policies across regions is mandatory. Steps to take include:
- Appoint regional data governance teams.
- Conduct periodic data privacy audits.
- Enable automated redaction, anonymization.
- Support data subject rights fulfillment.
- Establish lawful data collection and compliance processes.
Being prepared to comply proactively is crucial, as scrutiny and penalties intensify.
3. The Unstructured Data Deluge
Unstructured data like images, video, audio and text now dominates many datasets. By 2025, unstructured data will surpass 175 zettabytes (trillion gigabytes) worldwide [4].
With AI/ML, this explosion of unstructured data presents a goldmine of potential insights. Forward-thinking companies are capitalizing through:
- Computer vision: Extracting visual details from images, video and documents.
- Natural language processing: Making sense of masses of unstructured text.
- Speech recognition: Transcribing audio into searchable text.
But working at this scale requires data strategies tailored to unstructured data‘s unique aspects:
- Metadata tagging for findability and meaning.
- Multi-cloud storage for cost and geography.
- Specialized data lakes to organize mixed data types.
- Streaming analytics to handle volumes.
- Data transformation tools to prepare unstructured data for analysis.
The companies that master unstructured data will gain a competitive edge.
4. Evolving Toward Intelligent Data Tiering
As data volumes explode, storage costs become exorbitant. Data tiering provides a strategic solution by organizing data into levels based on:
- Volume: Total amount and growth rate.
- Velocity: Speed of generation and change.
- Variety: Types, formats, and criticality.
- Value: Business impact and analytics utility.
Common tiers include:
- Hot: High-value data needed in real-time.
- Warm: Frequently accessed, medium-priority data.
- Cool: Important but rarely used archival data.
- Cold/Deep: Low-value data only kept for compliance.
Leading organizations are also implementing intelligent data tiering using ML to:
- Automatically classify data based on access patterns.
- Dynamically route data to optimal tiers.
- Identify "cold" data to archive or delete.
- Save costs by shrinking expensive hot storage.
As data proliferates, intelligent tiering is imperative.
5. Promoting Data Diversity, Expanding Representation
There is growing concern about biased datasets producing discriminatory ML systems. Some examples:
- Facial recognition tools misidentifying people of color, women [6].
- Hiring algorithms disadvantaging certain demographics [12].
- Medical diagnosis AI underserving minority populations [13].
Many organizations now prioritize expanding data diversity and representation. Key steps include:
- Perform extensive bias audits on training datasets.
- Broaden data collection to underrepresented groups.
- Synthetically generate missing data samples.
- Apply techniques like oversampling minority data.
- Provide transparency into datasets, algorithms and results.
Though challenging, reducing data bias enables truly fair, ethical and useful AI systems.
Key Takeaways
- Develop rigorous data strategies to feed exponentially more data-hungry AI.
- Closely track the patchwork of global data regulations.
- Mine valuable insights from unstructured data through specialized tools.
- Architect intelligent multi-tier data workflows to balance cost and performance.
- Promote diversity and inclusion in datasets to mitigate algorithmic bias.
Aligning data practices with these critical trends will strengthen competitiveness as a data-driven business.
Additional Resources
For more on crafting an effective data strategy, see:
- Top 4 Data Collection Methods
- Common Data Collection Challenges and Solutions
- Data Collection Best Practices
- Evaluating Data Collection Vendors
To go deeper, download our free whitepapers:
Data Collection Vendor Evaluation Guide →
Additional questions? Contact our data experts →