This page measures the concentration of user data on the Fediverse and the Atmosphere according to the Herfindahl–Hirschman Index (HHI) and the Shannon Index.
HHI is an indicator from economics used to measure competition between firms in an industry. Mathematically, HHI is the sum of the squares of market shares of all servers. Values close to zero indicate perfectly competitive markets (eg. many servers, with users spread evenly), while values close to 10000 indicate highly concentrated monopolies (eg. most users on a single server). In economics, values below 100 are considered "Highly Competitive", below 1500 is "Unconcentrated", and above 2500 is considered "Highly Concentrated".
The Shannon Index is an entropy-based measure used in ecological studies. It is computed the same as Shannon entropy using the natural log: the negative sum over all servers of the "market share" times the log of the market share. Lower values indicate lower entropy (a high concentration of one species), while higher values indicate a more even population. In this context, the maximum value is the number of servers, which would mean that all servers have equal population.
This site currently measures the concentration of user data for active users: in the Fediverse, this data is on servers (also known as instances); in the Atmosphere, it is on the PDSes that host users' data repos. All PDSes run by the company Bluesky Social PBC are aggregated in this dataset, since they are under the control of a single entity. Similarly, mastodon.social and mastodon.online are combined as they are run by the same company.
The location of user data is not the only interesting measure of centralization. On a technical level, there is the network structure (peer to peer, relays, etc.), identity management, the infrastructure on which it is hosted, etc. On a legal level, there are issues regarding the jurisdictions where servers are located, companies are located, etc. On a social level, there are issues around where human power is concentrated in and on the platform, and whether that power is disproportionately held by certain groups. If you would like to help contribute other measures of decentralization, get in touch.
Code and data are available on GitHub. Comments and pull requests, including other metrics for measuring distribution and resiliency, are welcome!
By Rob Ricci: @ricci@discuss.systems / @ricci.io