How to Ensure Your Data Science Is Inclusive
October 16, 2019
The potential of data science to support, measure, and amplify sustainable development is undeniable. And as public, private, and civic institutions around the world come to recognize the role that data science can play in advancing growth, an increasingly robust array of efforts aimed at fostering data science in lower-income countries has emerged.
This phenomenon is particularly salient in sub-Saharan Africa, where foundations are investing millions in building data literacy and data science skills; multilaterals and national governments are pioneering new investments in data science, artificial intelligence, and smart cities; private and public donors are investing in data science centers and local data science talent; and local universities are launching graduate-level data science courses.
Despite this progress (and the attendant hype) lurks an inconvenient truth: As a new generation of data scientists emerges in Africa, there is relatively little trusted, accurate, and accessible data available to them.
We often hear how data science can be used to help teachers tailor curricula according to student performance, but the fact remains that many school systems on the continent don't collect or track performance data with enough accuracy and timeliness to perform data science–enabled tweaks. Many firmly believe that data science can help us identify disease outbreaks early, but healthcare facilities often lack the patient data and digital capabilities needed to surface those clues.
Fundamental data gaps like these invite a question: Precisely what data do data scientists need to advance sustainable development?
There are, of course, compelling examples of data science being put to use for the public good. Emerging use cases include exploring call detail records to improve mobility and urban planning, using remote sensors to measure agricultural or economic growth, and mining online content to monitor election violence. These and other examples prove beyond a doubt that data science has a role to play in advancing sustainable development.
But obtaining call detail records requires time, money, and (often) political connections. Online content (like tweets) typically reflects the views of the relatively small number of people in lower-income countries who have Internet access and avail themselves of social media platforms. Even though we're working hard to make data science accessible to everyone, data scientists are left to work with information that remains either inaccessible to most technologists or is unrepresentative of the most marginalized populations.
The lack of good data has consequences. As leaders and influencers increasingly rely on data science to guide their decision-making, they risk making decisions that ignore the needs, perspectives, and values of the people they serve who are not online (more than half the world's population), or who don’t use a mobile device (which are used more by men than by women).
They also risk disenfranchising a new generation of African data scientists who lack the financial resources to access large and reliable datasets, or who have to watch as better-funded organizations an ocean away — for example, universities in the Global North — conduct data science and analytics focused on their communities.
The good news? There are steps we can take that will help data science achieve its full potential in the realm of sustainable development. Here are three:
1. Be wary of encouraging a generation of data scientists who must rely on expensive, hard-to-access data in order to meaningfully apply their skills. We should couple our data science training with efforts that build data collection skills through methods such as community mapping or data-sharing initiatives like data collaboratives.
2. Be conscious of the risk of reinforcing dependencies on companies whose technologies, platforms, and datasets comprise the bulk of data science case studies. We should intentionally pair our investments in data science with investments in indigenous innovations that produce data for data science. Low-cost, locally-built technologies such as unmanned aerial vehicles (UAVs) and initiatives that produce locally relevant training datasets can help mitigate such dependencies.
3. Be mindful of focusing too much on data science and not enough on data literacy. We should double down on building fundamental data skills — collecting, cleaning, analyzing, sharing — within health clinics, schools, and local government agencies, where so much valuable information is actually produced. Doing so will improve the availability and reliability of large datasets for use by homegrown data scientists.
Fortunately, momentum is beginning to shift in favor of indigenous data science. Entrepreneurs are rolling out innovations designed to address language gaps. Initiatives such as Data Science Africa and Deep Learning Indaba are nurturing communities of machine-learning experts. These are steps in the right direction.
Five years from now, a new generation of socially-conscious impact-driven African data scientists will have emerged, and many of them will be driven to use their skills to address sustainable development challenges. We must ensure that the information that powers their efforts isn't limited to expensive, inaccessible, or unrepresentative data that sits primarily in the hands of a few mobile operators, banks, or tech companies.
Getting there means complementing the hype of data science for global good with the long, difficult work of improving data quality at the local level, investing in indigenous technology and content, and investing in fundamental data skills. Only then will the data science revolution be primed to achieve its full potential.
Samhir Vasdev is an advisor for digital development at IREX's Center for Applied Learning and Impact. A version of this post originally appeared on the IREX website.
Posted by j.sandya | June 24, 2021 at 12:21 AM
Very useful information provided and very nice blog.