Mapping language, ultimately, depends on years and decades of painstaking, ethnographic and linguistic study and the cooperation of many small communities across the world.
SIL (Summer Institute of Linguistics, Inc) has investigated over 2,590 languages spoken by over 1.7 billion people in nearly 100 countries. SIL makes its data and publications available via the Ethnologue, Languages of the World. Other organizations, such as the United Nations Education, Scientific and Cultural Organisation (UNESCO), track language statistics, as gathered by census and other means.
The sites listed below present language data in a geospatial context. For the most part, these sites aim toward one of several purposes:
- advancing academic understanding
- promoting the awareness and preservation of cultural diversity
- and, Christian evangelism.
Some sites present data to answer specific questions.
- Where is a language spoken?
- What languages are spoken in a given region?
- For which languages has evangelical contact been made?
For simple questions, a search interface may suffice. A few of the sites below provide exploratory interfaces: the user is invited to look at multivariate data in terms of user selected layers and via other map features.
Language mapping sites listed below are likely not a comprehensive set. The first set of sites are focal data providers that also publish language maps. Below these, are sites that combine variable sorts of information across multiple data providers to enable some sort of data exploration. Finally, listed are a few data providers that don’t have a native mapping capability, but clearly fit this space well.
Focal language mapping data providers
UNESCO Language Atlas
“There is no perfect way to reflect the complexities of languages and their communities on a map. The print edition of the Atlas seeks to provide global coverage, dividing the world somewhat arbitrarily into regions; those with the greatest linguistic diversity are presented at smaller scale than those with less diversity. For the online edition, users determine the zoom level themselves, allowing a panoramic view or a very detailed one. No attempt is made to show population density or the area in which a language is spoken; we have instead selected a central point for each language.”
Essentially, this site is a Google Maps search interface for endangered languages. This site uses pushpins to indicate a centroid region of where a language is spoken. Detailed information for each language is available by pressing the associated pushpin.
The World Atlas of Language Structures Online (WALS) is a… “large database of structural (phonological, grammatical, lexical) properties of languages gathered from descriptive materials (such as reference grammars) by a team of more than 40 authors (many of them the leading authorities on the subject).
WALS consists of 141 maps with accompanying texts on diverse features (such as vowel inventory size, noun-genitive order, passive constructions, and “hand”/”arm” polysemy), each of which is the responsibility of a single author (or team of authors). Each map shows between 120 and 1370 languages, each language being represented by a symbol, and different symbols showing different values of the feature. Altogether 2,650 languages are shown on the maps, and more than 58,000 datapoints give information on features in particular languages.”
Wals.info also provides a Google Maps interface to language data. The user can display either language “location” via pushpin or the user can examine particular linguistic features (e.g., consonant inventories) by colored pushpins. It is possible to view univariate data only. And, unfortunately, not all languages are represented.
The purpose of the Ethnologue is to provide a comprehensive listing of the known living languages of the world. “The Ethnologue is intended more as a catalog than as an encyclopedia and so provides summary data rather than more extensive descriptions of identified languages. Information comes from numerous sources and is confirmed by consulting both reliable published sources and a network of field correspondents. Much of the focus of Ethnologue is on the less commonly known languages. Greater detail and depth of description of many of the languages, especially the larger, more commonly studied languages, can be found in other works such as the International Encyclopedia of Linguistics (Frawley 2003), The World’s Major Languages (Comrie 1987), and The Atlas of Languages (Comrie, Matthews, and Polinsky 1997).”
Ethnologue has no interactive mapping and displays static language maps on pages separate from content about those languages. However, in conjunction with Ethnologue, Global Mapping International‘s (GMI) World Language Mapping System makes available Geographic Information System (GIS) data which maps language locations both as points and polygon, and including attribute information from Ethnologue. “The World Language Mapping System (WLMS) is the result of over 20 years of collaborative work between GMI and the SIL International (SIL), to map the over 6,800 languages described in SIL’s 16th edition Ethnologue.”
Here is an example of such a map, as posted on the Ethnologue website.
Sites that consolidate and present language data across multiple data sources
LL-MAP “…is a project designed to integrate language information with data from the physical and social sciences by means of a Geographical Information System (GIS). Data sources include genetic relationships between languages, topography, political boundaries, demographics, climate, vegetation, and wildlife, “…thus providing a basis upon which to build hypotheses about language movement across territory. Some cultural information, e.g., on religion, ethnicity, and economics, will also be included.”
Recently, an Australian researcher, Quentin D. Atkinson, published a paper giving evidence to the theory that language originated in Africa. He traced a theoretical migration path corresponding to human migratory paths using data from WALS (number of phonemes), Ethnologue (population data), and GMI data (geographic extents). Perhaps, LLMAP will combine data sources that will give rise to hypotheses of other, more modern migration paths.
LLMAP is an interactive viewer that uses OpenLayers and JSExt libraries for dynamic presentation. The viewer presents data layers in panel to the left of the map. Users navigate a tree structure in that panel to choose any number of layers to display. To select a layer, the user drags a layer to the map. Transparency can be set by right click in the “active layers” pane. In addition, LL-MAP can harvest and display data from any WMS (Web Mapping Service) compliant server.
One of the more sophisticated mapping applications listed in this post, LLMAP intends to link data to graphs of language trees generated via the MultiTree project. MultiTree is an NSF-funded project conducted by the Institute for Language Information and Technology (The LINGUIST List), at Eastern Michigan University. Relating genetic relations of language to geographic dispersion seems to invite linking of visualizations as LLMAP suggests.
Joshua Project is a Christian research initiative highlighting of ethnic groups around world. This site includes data from the World Christian Database, International Mission Board, and Ethnologue.
Joshua Project uses interactive flash maps from AmMap to provide limited geospatial context. Most information is provided in separate tabbed panes below the map. There is a great deal of information available from this site to include audio clips, pie charts displaying statistics about religions, ministries and contact, etc. Maps provide only a simple geospatial context defining country boundaries only, and there are no markers nor relation between map and associated text.
“The World Missions Atlas Project contains various forms of information including maps, tabular data sets, and written descriptions. The information is helpful in assessing the current status of Missions progress throughout the world. It is a constantly expanding site that seeks to produce a strategically significant World Missions Atlas.”
The WorldMap viewer is an ESRI ArcGIS Adobe Flex-based viewer. A great deal of interactivity is afforded via semi-transparent layers, search, drawing tools, etc. Language mapping is provided by a GMI data layer. This map has a very polished design and accommodates both multivariate data exploration and search.
Sites offering related data, but without custom mapping
The World Christian Database (WCD) includes “detailed information on 9,000 Christian denominations and on religions in every country of the world. Extensive data are available on 232 countries and 13,000 ethnolinguistic peoples, as well as on 5,000 cities and 3,000 provinces.”
The International Mission Board contains downloadable data (spreadsheets) consolidating information across several sources regarding peoples, languages, and status of evangelism.
Design considerations
Language maps are largely produced for either reference (answering very specific questions) or data exploration. In both cases, frequently used contextual features include population, related languages, geospatial extent, geophysical features, ethnicity, and religion. Cartographers often make explicit choices with regard to which features are presented. However, in more dynamic sites, maps tend to be more exploratory and users are presented with an array of data sources to choose from. In either case, most rely on the same set of original data providers such as Ethnologue, GMI, and official government statistics (surveys and census).
Language mapping sites vary from the production and delivery of static map images produced from traditional desktop GIS, pushpins layers on web maps, and multivariate data representations afforded by OpenLayers and Javascript, or Flash/Flex-based rich interactive visualizations. Generally, it appears map content is designed separate from textual narrative: maps are explored independent of other text on the site.
In future posts, I will examine how this paradigm might be shifted toward content-driven maps which are coordinated with, and embedded in, page content. Though not discussed here, Wikipedia contains language maps at varying levels of details. In contrast to the sites listed below, a language map in Wikipedia provides specific geospatial context for the article in which it is embedded. These maps vary greatly in style, detail, and quality. This is the sort of content that might benefit from a more automated map generation. Ethnologue, which currently makes a clear separation between textual content and maps might also benefit.














