This page measures the concentration of user data on the Fediverse, the Atmosphere, and public git forges according to the Herfindahl–Hirschman Index (HHI) and the Shannon Index. User data is only one way to measure centralization: others include network structure, legal exposure, and concentration of social and technical power.
HHI is an indicator from economics used to measure competition between firms in an industry. Mathematically, HHI is the sum of the squares of market shares of all servers. Values close to zero indicate perfectly competitive markets (eg. many servers, with users spread evenly), while values close to 10000 indicate highly concentrated monopolies (eg. most users on a single server). In economics, values below 100 are considered "Highly Competitive", below 1500 is "Unconcentrated", and above 2500 is considered "Highly Concentrated".
The Shannon Index is an entropy-based measure used in ecological studies. Because it is logarithmic, it is more responsive than HHI to changes in the "smaller players". It is computed as Shannon entropy using the natural log: the negative sum over all servers of the "market share" times the log of the market share. Lower values indicate lower entropy (a high concentration of one species), while higher values indicate a more even population. In this context, the maximum value is the natural log of the number of servers.
For the social networks, this site measures the concentration of user data for active users: in the Fediverse, this data is on servers (also known as instances); in the Atmosphere, it is on the PDSes that host users' data repos. All PDSes run by the company Bluesky Social PBC are aggregated in this dataset, since they are under the control of a single entity. Similarly, mastodon.social and mastodon.online are combined as they are run by the same company.
For the public git forges, this site uses the number of "origins" of type "git" archived by Software Heritage; this is roughly equivalent to the number of git repositories they crawl. Software Heritage's archive coverage can be found on their coverage page.
Code and data are available on Codeberg. Comments and pull requests, including other metrics for measuring distribution and resiliency, are welcome!
By Rob Ricci: @ricci@discuss.systems / @ricci.io