This study analyzes over 2,500 publicly disclosed African startup deals to uncover what drives investment sizes. By cleaning, merging, and engineering data from africathebigdeal.com, features were grouped into founding-team, company, and investment categories. Exploratory Data Analysis and four machine learning models — Linear Regression, SVR, Random Forest, and Gradient Boosting — were used to predict deal amounts. The best-performing model, validated via cross-validation, forms the basis for data-backed insights and policy recommendations aimed at strengthening Africa’s startup funding ecosystem.This study analyzes over 2,500 publicly disclosed African startup deals to uncover what drives investment sizes. By cleaning, merging, and engineering data from africathebigdeal.com, features were grouped into founding-team, company, and investment categories. Exploratory Data Analysis and four machine learning models — Linear Regression, SVR, Random Forest, and Gradient Boosting — were used to predict deal amounts. The best-performing model, validated via cross-validation, forms the basis for data-backed insights and policy recommendations aimed at strengthening Africa’s startup funding ecosystem.

Inside the Data: What Shapes Startup Deal Sizes in Africa

2025/11/05 14:00

ABSTRACT

INTRODUCTION

LITERATURE REVIEW

DATA AND METHODS

RESULTS

DISCUSSION

CONCLUSION AND REFERENCES

DATA AND METHODS

This section of the study provides an overview of the data sources, methodology, and techniques employed to investigate the key factors affecting deal amounts in African startup investments. The data and method section details the process of data collection, cleaning, and preparation, followed by feature grouping, exploratory data analysis, and the implementation of machine learning models to develop a predictive model. The combination of rigorous data handling and state-of-the-art analytical techniques ensures the robustness of the study's findings and enhances their academic credibility.

Data

This study employs a dataset sourced from africathebigdeal.com to systematically investigate the key factors influencing deal amounts in African startup investments and to formulate policy recommendations that bolster the growth of the startup ecosystem on the continent. The dataset compilation adhered to the following methodology:

● Inclusion criteria stipulated that startups must either operate in Africa with their headquarters situated within the continent or possess African founders despite having their headquarters located outside Africa.

● The database exclusively captures deals that have been publicly disclosed or openly shared by investors or founders themselves.

● Deal size limitations dictate the inclusion of transactions amounting to a minimum of +$100,000 for 2023, 2022, and 2021; +$500,000 for 2020; and +$1,000,000 for 2019. The principal dataset, serving as the primary focus of this investigation, comprises 2,521 startup deals, encompassing 34 attributes, including the specific deal amount. Simultaneously, the secondary dataset comprises information regarding 1,792 investors who engaged in a minimum of one investment in African startups.

Data Preprocessing

The process of preparing and cleaning the data involved several sequential steps, as outlined below:

  1. Scrutinizing both datasets for any missing or erroneous data points and addressing them accordingly by imputing or removing the data points as appropriate, as well as rechecking the integrity of the data from media releases.
  2. Integrating the two datasets by merging them based on the investor's name, which resulted in a comprehensive dataset containing both startup and investor information.
  3. Transformation of categorical variables such as sector and deal type into binary variables or dummy variables, to be utilized in the analysis.
  4. Standardizing numerical variables such as amount raised and valuation to facilitate comparability across diverse units of measurement.

Feature Grouping

To further understand the implications of the different key factors, the features extracted from the primary and secondary datasets were organized into three distinct categories, as outlined below:

● Founding team features (F): This group includes attributes related to the startup's founding team, such as the number of founders, gender-mix, presence of a woman co-founder or CEO, the CEO's university, country and continent of the university, graduation year, and the years elapsed between graduation and the startup's launch.

● Company-related features (C): This category encompasses variables associated with the startup itself, such as the name, website, country, and region of operation, launch date, sector, number of employees, and a brief description of the business.

● Investment-related features (I): This group consists of variables related to the investment deals, including the deal year and date, type of investment, valuation, exit status, investor details, and whether the startup is a Y Combinator alumnus.

Through diligent feature grouping, the study ensured that the variables are organized in a coherent manner, ultimately enhancing the clarity and interpretability of the analysis.

Exploratory Data Analysis (EDA)

In this study, Exploratory Data Analysis (EDA) was conducted to examine the dataset and identify key factors that affect deal amounts in African startup investments. A critical aspect of EDA was assessing correlations between variables. Using Pearson's correlation coefficient, we measured the linear association between the dependent variable (deal amount) and the independent variables (founding team, company, and investment-related features). In order to better investigate the correlation between the features, we used the three feature groups discussed earlier in 5 combinations: F, C, I, F+C, and F+C+I. This approach aimed to uncover complex relationships between variables and better understand the importance of each feature.

Models

Using the same combinations of feature groups discussed in the EDA section, four machine learning algorithms were employed: Linear Regression (LR), Support Vector Regression (SVR), Random Forest (RF), and Distributed Gradient Boosting (DGB). Each model was trained and tested using cross-validation. To evaluate the prediction models, the Mean Squared Error (MSE) metric was employed. During the cross-validation process, the performance of each model was assessed by averaging the MSE values obtained from each fold. The comparison of these averaged MSE values facilitated the selection of the most accurate and reliable algorithm for predicting funding amounts in African startups. The chosen model's performance, along with insights gained from the EDA process, served as the foundation for policy recommendations aimed at supporting the growth of the African startup ecosystem.

:::info Author:

Khalil Liouane

:::

:::info This paper is available on arxiv under by-SA 4.0 Deed (Attribution-Sahrealike 4.0 International) license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Fetch has sued Ocean and its founders, accusing them of undermining DAO governance by selling 263 million FET tokens without authorization.

Fetch has sued Ocean and its founders, accusing them of undermining DAO governance by selling 263 million FET tokens without authorization.

PANews reported on November 8th that, according to CryptoSlate, Fetch and three token holders have filed a class-action lawsuit in the Southern District of New York, accusing Ocean Protocol and its founders of misleading the community and causing misunderstandings about the autonomy of OceanDAO. The lawsuit, case number 1:25-cv-9210, was filed on November 4, 2025. The defendants include Ocean Protocol Foundation Ltd., Ocean Expeditions Ltd., OceanDAO, and Ocean's co-founders Bruce Pon, Trent McConaghy, and Christina Pon. The plaintiff alleges that Ocean falsely stated that hundreds of millions of OCEAN "community" tokens would be reserved for DAO rewards, but in reality, after joining the ASI consortium, it converted and sold these tokens, thereby depressing the value of FET and undermining the governance model claimed by the DAO. The lawsuit claims that over 661 million OCEAN were converted into approximately 286.46 million FET, and subsequently approximately 263 million FET were released into the market, equivalent to more than 10% of the then-circulating supply, causing downward pressure on the price of FET during and after Ocean's withdrawal from the market. The document states that Ocean transferred OceanDAO assets to the Cayman Islands entity Ocean Expeditions in late June, began converting OCEAN to FET in early July, liquidated most of the resulting FET on a centralized trading venue, and withdrew from the ASI consortium in October.
Share
PANews2025/11/08 09:28
The Elite Advisory Board Raising the Bar for Crypto Credibility!

The Elite Advisory Board Raising the Bar for Crypto Credibility!

The post The Elite Advisory Board Raising the Bar for Crypto Credibility! appeared on BitcoinEthereumNews.com. Crypto Presales Explore how BlockDAG’s world-class advisory board, led by Dr. Maurice Herlihy, turned academic excellence into real blockchain innovation! When most crypto projects struggle to prove their credibility, BlockDAG went a different route; it built one. Instead of relying on hype or flashy marketing, it assembled a board of advisors whose resumes could power an entire university department. This group doesn’t just lend prestige; it validates the technology behind BlockDAG’s hybrid Proof-of-Work and Directed Acyclic Graph system. Among them is Dr. Maurice Herlihy, one of computer science’s most decorated minds and a true authority in distributed computing. The strategy here is simple yet brilliant: combine practical blockchain expertise with academic strength to create a foundation built on real innovation and proven knowledge, not speculation. The Vision: Build More Than a Team BlockDAG understood early that innovation needs more than developers; it needs thinkers who have shaped the field itself. The leadership, headed by CEO Antony Turner, chose to build what they call a “Genius Bar” of blockchain intellect. This idea came from the realization that credibility in crypto doesn’t come from influencers or endorsements; it comes from having the right people asking the right questions. Turner’s background in fintech and Swiss regulation gave him the insight to merge institutional discipline with crypto creativity. This approach reshaped how investors perceive early-stage blockchain ventures. Instead of anonymous teams, BlockDAG offered transparency, leadership, and a network of experts who have not only theorized innovation but also implemented it at scale. That’s why it has become the best-performing crypto today, combining logic, structure, and execution. Dr. Maurice Herlihy: The Academic Powerhouse Every innovation needs an anchor, someone who ensures the foundation is scientifically sound. For BlockDAG, that anchor is Dr. Maurice Herlihy. As a professor at Brown University and winner of the Gödel…
Share
BitcoinEthereumNews2025/11/08 09:04