core.util package¶
Subpackages¶
- core.util.webpub_manifest_parser package
- Subpackages
- core.util.webpub_manifest_parser.core package
- Submodules
- core.util.webpub_manifest_parser.core.ast module
- core.util.webpub_manifest_parser.core.errors module
- core.util.webpub_manifest_parser.core.parsers module
- core.util.webpub_manifest_parser.core.properties module
- core.util.webpub_manifest_parser.core.registry module
- core.util.webpub_manifest_parser.core.semantic module
- core.util.webpub_manifest_parser.core.syntax module
- Module contents
- core.util.webpub_manifest_parser.epub package
- core.util.webpub_manifest_parser.opds2 package
- core.util.webpub_manifest_parser.rwpm package
- core.util.webpub_manifest_parser.core package
- Submodules
- core.util.webpub_manifest_parser.errors module
- core.util.webpub_manifest_parser.utils module
- Module contents
- Subpackages
Submodules¶
core.util.accept_language module¶
A package to parse Accept-Language headers.
This is based on accept_language.py. Here is the original licensing information for accept_language.py:
Copyright [2017] [Chatbot Developers]
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at:
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
- class core.util.accept_language.Lang(language, locale, quality)¶
Bases:
tuple
- language¶
Alias for field number 0
- locale¶
Alias for field number 1
- quality¶
Alias for field number 2
- core.util.accept_language.parse_accept_language(accept_language_str, default_quality=None)[source]¶
Parse a RFC 2616 Accept-Language string. https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14
- Parameters:
accept_language_str (str) – A string in RFC 2616 format.
- Returns:
List of Lang namedtuples.
- Return type:
list
- Example:
>>> parse_accept_language('en-US,el;q=0.8') [ Lang(locale='en_US', language='en', quality=1.0), Lang(locale=None, language='el', quality=0.8), ]
core.util.authentication_for_opds module¶
- class core.util.authentication_for_opds.AuthenticationForOPDSDocument(id=None, title=None, authentication_flows=[], links=[])[source]¶
Bases:
object
A data structure that can become an Authentication For OPDS document.
- LINK_RELATION = 'http://opds-spec.org/auth/document'¶
- MEDIA_TYPE = 'application/vnd.opds.authentication.v1.0+json'¶
core.util.datetime_helpers module¶
- core.util.datetime_helpers.datetime_utc(*args, **kwargs)[source]¶
Return a datetime object but with UTC information from pytz. :return: datetime object
- core.util.datetime_helpers.from_timestamp(ts)[source]¶
Return a UTC datetime object from a timestamp.
- Returns:
datetime object
- core.util.datetime_helpers.strptime_utc(date_string, format)[source]¶
Parse a string that describes a time but includes no timezone, into a timezone-aware datetime object set to UTC.
- Raises:
ValueError – If format expects timezone information to be present in date_string.
core.util.epub module¶
- class core.util.epub.EpubAccessor[source]¶
Bases:
object
- CONTAINER_FILE = 'META-INF/container.xml'¶
- IDPF_NAMESPACE = 'http://www.idpf.org/2007/opf'¶
- classmethod get_element_from_package(zip_file, package_document_path, element_tag)[source]¶
Pulls one or more elements from the package_document
- classmethod get_elements_from_package(zip_file, package_document_path, element_tags)[source]¶
Pulls one or more elements from the package_document
- classmethod open_epub(url, content=None)[source]¶
Cracks open an EPUB to expose its contents
- Parameters:
url – A url representing the EPUB, only used for errors and in the absence of the content parameter
content – A string representing the compressed EPUB
- Returns:
A tuple containing a ZipFile of the EPUB and the path to its package
core.util.flask_util module¶
Utilities for Flask applications.
- class core.util.flask_util.OPDSEntryResponse(response=None, **kwargs)[source]¶
Bases:
Response
A convenience specialization of Response for typical OPDS entries.
- class core.util.flask_util.OPDSFeedResponse(response=None, status=None, headers=None, mimetype=None, content_type=None, direct_passthrough=False, max_age=None, private=None)[source]¶
Bases:
Response
A convenience specialization of Response for typical OPDS feeds.
- class core.util.flask_util.Response(response=None, status=None, headers=None, mimetype=None, content_type=None, direct_passthrough=False, max_age=0, private=None)[source]¶
Bases:
Response
A Flask Response object with some conveniences added.
The conveniences:
It’s easy to calculate header values such as Cache-Control.
A response can be easily converted into a string for use in tests.
core.util.http module¶
- exception core.util.http.BadResponseException(url_or_service, message, debug_message=None, status_code=None)[source]¶
Bases:
RemoteIntegrationException
The request seemingly went okay, but we got a bad response.
- BAD_STATUS_CODE_MESSAGE = 'Got status code %s from external server, cannot continue.'¶
- classmethod bad_status_code(url, response)[source]¶
The response is bad because the status code is wrong.
- detail = l'The server made a request to %(service)s, and got an unexpected or invalid response.'¶
- classmethod from_response(url, message, response)[source]¶
Helper method to turn a requests Response object into a BadResponseException.
- internal_message = 'Bad response from %s: %s'¶
- title = l'Bad response'¶
- class core.util.http.HTTP[source]¶
Bases:
object
A helper for the requests module.
- classmethod debuggable_get(url, **kwargs)[source]¶
Make a GET request that returns a detailed problem detail document on error.
- classmethod debuggable_post(url, payload, **kwargs)[source]¶
Make a POST request that returns a detailed problem detail document on error.
- classmethod debuggable_request(http_method, url, make_request_with=None, **kwargs)[source]¶
Make a request that returns a detailed problem detail document on error, rather than a generic “an integration error occured” message.
- Parameters:
http_method – HTTP method to use when making the request.
url – Make the request to this URL.
make_request_with – A function that actually makes the HTTP request.
kwargs – Keyword arguments for the make_request_with function.
- classmethod get_with_timeout(url, *args, **kwargs)[source]¶
Make a GET request with timeout handling.
- classmethod post_with_timeout(url, payload, *args, **kwargs)[source]¶
Make a POST request with timeout handling.
- classmethod process_debuggable_response(url, response, disallowed_response_codes=None, allowed_response_codes=None, expected_encoding='utf-8')[source]¶
If there was a problem with an integration request, return an appropriate ProblemDetail. Otherwise, return the response to the original request.
- Parameters:
response – A Response object from the requests library.
expected_encoding – Typically we expect HTTP responses to be UTF-8 encoded, but for certain requests we can change the encoding type.
- classmethod put_with_timeout(url, payload, *args, **kwargs)[source]¶
Make a PUT request with timeout handling.
- exception core.util.http.IntegrationException(message, debug_message=None)[source]¶
Bases:
Exception
An exception that happens when the site’s connection to a third-party service is broken.
This may be because communication failed (RemoteIntegrationException), or because local configuration is missing or obviously wrong (CannotLoadConfiguration).
- exception core.util.http.RemoteIntegrationException(url_or_service, message, debug_message=None)[source]¶
Bases:
IntegrationException
An exception that happens when we try and fail to communicate with a third-party service over HTTP.
- detail = l'The server tried to access %(service)s but the third-party service experienced an error.'¶
- internal_message = 'Error accessing %s: %s'¶
- title = l'Failure contacting external service'¶
- exception core.util.http.RequestNetworkException(url_or_service, message, debug_message=None)[source]¶
Bases:
RemoteIntegrationException
,RequestException
An exception from the requests module that can be represented as a problem detail document.
- detail = l'The server experienced a network error while contacting %(service)s.'¶
- internal_message = 'Network error contacting %s: %s'¶
- title = l'Network failure contacting third-party service'¶
- exception core.util.http.RequestTimedOut(url_or_service, message, debug_message=None)[source]¶
Bases:
RequestNetworkException
,Timeout
A timeout exception that can be represented as a problem detail document.
- detail = l'The server made a request to %(service)s, and that request timed out.'¶
- internal_message = 'Timeout accessing %s: %s'¶
- title = l'Timeout'¶
core.util.languages module¶
Data and functions for dealing with language names and codes.
- class core.util.languages.LanguageCodes[source]¶
Bases:
object
Convert between ISO-639-2 and ISO-693-1 language codes.
The data file comes from http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt
- NATIVE_NAMES_RAW_DATA = [{'code': 'en', 'name': 'English', 'nativeName': 'English'}, {'code': 'fr', 'name': 'French', 'nativeName': 'français'}, {'code': 'de', 'name': 'German', 'nativeName': 'Deutsch'}, {'code': 'el', 'name': 'Greek, Modern', 'nativeName': 'Ελληνικά'}, {'code': 'hu', 'name': 'Hungarian', 'nativeName': 'Magyar'}, {'code': 'it', 'name': 'Italian', 'nativeName': 'Italiano'}, {'code': 'no', 'name': 'Norwegian', 'nativeName': 'Norsk'}, {'code': 'pl', 'name': 'Polish', 'nativeName': 'polski'}, {'code': 'pt', 'name': 'Portuguese', 'nativeName': 'Português'}, {'code': 'ru', 'name': 'Russian', 'nativeName': 'русский'}, {'code': 'es', 'name': 'Spanish, Castilian', 'nativeName': 'español, castellano'}, {'code': 'sv', 'name': 'Swedish', 'nativeName': 'svenska'}]¶
- RAW_DATA = "aar||aa|Afar|afar\nabk||ab|Abkhazian|abkhaze\nace|||Achinese|aceh\nach|||Acoli|acoli\nada|||Adangme|adangme\nady|||Adyghe; Adygei|adyghé\nafa|||Afro-Asiatic languages|afro-asiatiques, langues\nafh|||Afrihili|afrihili\nafr||af|Afrikaans|afrikaans\nain|||Ainu|aïnou\naka||ak|Akan|akan\nakk|||Akkadian|akkadien\nalb|sqi|sq|Albanian|albanais\nale|||Aleut|aléoute\nalg|||Algonquian languages|algonquines, langues\nalt|||Southern Altai|altai du Sud\namh||am|Amharic|amharique\nang|||English, Old (ca.450-1100)|anglo-saxon (ca.450-1100)\nanp|||Angika|angika\napa|||Apache languages|apaches, langues\nara||ar|Arabic|arabe\narc|||Official Aramaic (700-300 BCE); Imperial Aramaic (700-300 BCE)|araméen d'empire (700-300 BCE)\narg||an|Aragonese|aragonais\narm|hye|hy|Armenian|arménien\narn|||Mapudungun; Mapuche|mapudungun; mapuche; mapuce\narp|||Arapaho|arapaho\nart|||Artificial languages|artificielles, langues\narw|||Arawak|arawak\nasm||as|Assamese|assamais\nast|||Asturian; Bable; Leonese; Asturleonese|asturien; bable; léonais; asturoléonais\nath|||Athapascan languages|athapascanes, langues\naus|||Australian languages|australiennes, langues\nava||av|Avaric|avar\nave||ae|Avestan|avestique\nawa|||Awadhi|awadhi\naym||ay|Aymara|aymara\naze||az|Azerbaijani|azéri\nbad|||Banda languages|banda, langues\nbai|||Bamileke languages|bamiléké, langues\nbak||ba|Bashkir|bachkir\nbal|||Baluchi|baloutchi\nbam||bm|Bambara|bambara\nban|||Balinese|balinais\nbaq|eus|eu|Basque|basque\nbas|||Basa|basa\nbat|||Baltic languages|baltes, langues\nbej|||Beja; Bedawiyet|bedja\nbel||be|Belarusian|biélorusse\nbem|||Bemba|bemba\nben||bn|Bengali|bengali\nber|||Berber languages|berbères, langues\nbho|||Bhojpuri|bhojpuri\nbih||bh|Bihari languages|langues biharis\nbik|||Bikol|bikol\nbin|||Bini; Edo|bini; edo\nbis||bi|Bislama|bichlamar\nbla|||Siksika|blackfoot\nbnt|||Bantu (Other)|bantoues, autres langues\nbos||bs|Bosnian|bosniaque\nbra|||Braj|braj\nbre||br|Breton|breton\nbtk|||Batak languages|batak, langues\nbua|||Buriat|bouriate\nbug|||Buginese|bugi\nbul||bg|Bulgarian|bulgare\nbur|mya|my|Burmese|birman\nbyn|||Blin; Bilin|blin; bilen\ncad|||Caddo|caddo\ncai|||Central American Indian languages|amérindiennes de L'Amérique centrale, langues\ncar|||Galibi Carib|karib; galibi; carib\ncat||ca|Catalan; Valencian|catalan; valencien\ncau|||Caucasian languages|caucasiennes, langues\nceb|||Cebuano|cebuano\ncel|||Celtic languages|celtiques, langues; celtes, langues\ncha||ch|Chamorro|chamorro\nchb|||Chibcha|chibcha\nche||ce|Chechen|tchétchène\nchg|||Chagatai|djaghataï\nchi|zho|zh|Chinese|chinois\nchk|||Chuukese|chuuk\nchm|||Mari|mari\nchn|||Chinook jargon|chinook, jargon\ncho|||Choctaw|choctaw\nchp|||Chipewyan; Dene Suline|chipewyan\nchr|||Cherokee|cherokee\nchu||cu|Church Slavic; Old Slavonic; Church Slavonic; Old Bulgarian; Old Church Slavonic|slavon d'église; vieux slave; slavon liturgique; vieux bulgare\nchv||cv|Chuvash|tchouvache\nchy|||Cheyenne|cheyenne\ncmc|||Chamic languages|chames, langues\ncop|||Coptic|copte\ncor||kw|Cornish|cornique\ncos||co|Corsican|corse\ncpe|||Creoles and pidgins, English based|créoles et pidgins basés sur l'anglais\ncpf|||Creoles and pidgins, French-based |créoles et pidgins basés sur le français\ncpp|||Creoles and pidgins, Portuguese-based |créoles et pidgins basés sur le portugais\ncre||cr|Cree|cree\ncrh|||Crimean Tatar; Crimean Turkish|tatar de Crimé\ncrp|||Creoles and pidgins |créoles et pidgins\ncsb|||Kashubian|kachoube\ncus|||Cushitic languages|couchitiques, langues\ncze|ces|cs|Czech|tchèque\ndak|||Dakota|dakota\ndan||da|Danish|danois\ndar|||Dargwa|dargwa\nday|||Land Dayak languages|dayak, langues\ndel|||Delaware|delaware\nden|||Slave (Athapascan)|esclave (athapascan)\ndgr|||Dogrib|dogrib\ndin|||Dinka|dinka\ndiv||dv|Divehi; Dhivehi; Maldivian|maldivien\ndoi|||Dogri|dogri\ndra|||Dravidian languages|dravidiennes, langues\ndsb|||Lower Sorbian|bas-sorabe\ndua|||Duala|douala\ndum|||Dutch, Middle (ca.1050-1350)|néerlandais moyen (ca. 1050-1350)\ndut|nld|nl|Dutch; Flemish|néerlandais; flamand\ndyu|||Dyula|dioula\ndzo||dz|Dzongkha|dzongkha\nefi|||Efik|efik\negy|||Egyptian (Ancient)|égyptien\neka|||Ekajuk|ekajuk\nelx|||Elamite|élamite\neng||en|English|anglais\nenm|||English, Middle (1100-1500)|anglais moyen (1100-1500)\nepo||eo|Esperanto|espéranto\nest||et|Estonian|estonien\newe||ee|Ewe|éwé\newo|||Ewondo|éwondo\nfan|||Fang|fang\nfao||fo|Faroese|féroïen\nfat|||Fanti|fanti\nfij||fj|Fijian|fidjien\nfil|||Filipino; Pilipino|filipino; pilipino\nfin||fi|Finnish|finnois\nfiu|||Finno-Ugrian languages|finno-ougriennes, langues\nfon|||Fon|fon\nfre|fra|fr|French|français\nfrm|||French, Middle (ca.1400-1600)|français moyen (1400-1600)\nfro|||French, Old (842-ca.1400)|français ancien (842-ca.1400)\nfrr|||Northern Frisian|frison septentrional\nfrs|||Eastern Frisian|frison oriental\nfry||fy|Western Frisian|frison occidental\nful||ff|Fulah|peul\nfur|||Friulian|frioulan\ngaa|||Ga|ga\ngay|||Gayo|gayo\ngba|||Gbaya|gbaya\ngem|||Germanic languages|germaniques, langues\ngeo|kat|ka|Georgian|géorgien\nger|deu|de|German|allemand\ngez|||Geez|guèze\ngil|||Gilbertese|kiribati\ngla||gd|Gaelic; Scottish Gaelic|gaélique; gaélique écossais\ngle||ga|Irish|irlandais\nglg||gl|Galician|galicien\nglv||gv|Manx|manx; mannois\ngmh|||German, Middle High (ca.1050-1500)|allemand, moyen haut (ca. 1050-1500)\ngoh|||German, Old High (ca.750-1050)|allemand, vieux haut (ca. 750-1050)\ngon|||Gondi|gond\ngor|||Gorontalo|gorontalo\ngot|||Gothic|gothique\ngrb|||Grebo|grebo\ngrc|||Greek, Ancient (to 1453)|grec ancien (jusqu'à 1453)\ngre|ell|el|Greek, Modern (1453-)|grec moderne (après 1453)\ngrn||gn|Guarani|guarani\ngsw|||Swiss German; Alemannic; Alsatian|suisse alémanique; alémanique; alsacien\nguj||gu|Gujarati|goudjrati\ngwi|||Gwich'in|gwich'in\nhai|||Haida|haida\nhat||ht|Haitian; Haitian Creole|haïtien; créole haïtien\nhau||ha|Hausa|haoussa\nhaw|||Hawaiian|hawaïen\nheb||he|Hebrew|hébreu\nher||hz|Herero|herero\nhil|||Hiligaynon|hiligaynon\nhim|||Himachali languages; Western Pahari languages|langues himachalis; langues paharis occidentales\nhin||hi|Hindi|hindi\nhit|||Hittite|hittite\nhmn|||Hmong; Mong|hmong\nhmo||ho|Hiri Motu|hiri motu\nhrv||hr|Croatian|croate\nhsb|||Upper Sorbian|haut-sorabe\nhun||hu|Hungarian|hongrois\nhup|||Hupa|hupa\niba|||Iban|iban\nibo||ig|Igbo|igbo\nice|isl|is|Icelandic|islandais\nido||io|Ido|ido\niii||ii|Sichuan Yi; Nuosu|yi de Sichuan\nijo|||Ijo languages|ijo, langues\niku||iu|Inuktitut|inuktitut\nile||ie|Interlingue; Occidental|interlingue\nilo|||Iloko|ilocano\nina||ia|Interlingua (International Auxiliary Language Association)|interlingua (langue auxiliaire internationale)\ninc|||Indic languages|indo-aryennes, langues\nind||id|Indonesian|indonésien\nine|||Indo-European languages|indo-européennes, langues\ninh|||Ingush|ingouche\nipk||ik|Inupiaq|inupiaq\nira|||Iranian languages|iraniennes, langues\niro|||Iroquoian languages|iroquoises, langues\nita||it|Italian|italien\njav||jv|Javanese|javanais\njbo|||Lojban|lojban\njpn||ja|Japanese|japonais\njpr|||Judeo-Persian|judéo-persan\njrb|||Judeo-Arabic|judéo-arabe\nkaa|||Kara-Kalpak|karakalpak\nkab|||Kabyle|kabyle\nkac|||Kachin; Jingpho|kachin; jingpho\nkal||kl|Kalaallisut; Greenlandic|groenlandais\nkam|||Kamba|kamba\nkan||kn|Kannada|kannada\nkar|||Karen languages|karen, langues\nkas||ks|Kashmiri|kashmiri\nkau||kr|Kanuri|kanouri\nkaw|||Kawi|kawi\nkaz||kk|Kazakh|kazakh\nkbd|||Kabardian|kabardien\nkha|||Khasi|khasi\nkhi|||Khoisan languages|khoïsan, langues\nkhm||km|Central Khmer|khmer central\nkho|||Khotanese; Sakan|khotanais; sakan\nkik||ki|Kikuyu; Gikuyu|kikuyu\nkin||rw|Kinyarwanda|rwanda\nkir||ky|Kirghiz; Kyrgyz|kirghiz\nkmb|||Kimbundu|kimbundu\nkok|||Konkani|konkani\nkom||kv|Komi|kom\nkon||kg|Kongo|kongo\nkor||ko|Korean|coréen\nkos|||Kosraean|kosrae\nkpe|||Kpelle|kpellé\nkrc|||Karachay-Balkar|karatchai balkar\nkrl|||Karelian|carélien\nkro|||Kru languages|krou, langues\nkru|||Kurukh|kurukh\nkua||kj|Kuanyama; Kwanyama|kuanyama; kwanyama\nkum|||Kumyk|koumyk\nkur||ku|Kurdish|kurde\nkut|||Kutenai|kutenai\nlad|||Ladino|judéo-espagnol\nlah|||Lahnda|lahnda\nlam|||Lamba|lamba\nlao||lo|Lao|lao\nlat||la|Latin|latin\nlav||lv|Latvian|letton\nlez|||Lezghian|lezghien\nlim||li|Limburgan; Limburger; Limburgish|limbourgeois\nlin||ln|Lingala|lingala\nlit||lt|Lithuanian|lituanien\nlol|||Mongo|mongo\nloz|||Lozi|lozi\nltz||lb|Luxembourgish; Letzeburgesch|luxembourgeois\nlua|||Luba-Lulua|luba-lulua\nlub||lu|Luba-Katanga|luba-katanga\nlug||lg|Ganda|ganda\nlui|||Luiseno|luiseno\nlun|||Lunda|lunda\nluo|||Luo (Kenya and Tanzania)|luo (Kenya et Tanzanie)\nlus|||Lushai|lushai\nmac|mkd|mk|Macedonian|macédonien\nmad|||Madurese|madourais\nmag|||Magahi|magahi\nmah||mh|Marshallese|marshall\nmai|||Maithili|maithili\nmak|||Makasar|makassar\nmal||ml|Malayalam|malayalam\nman|||Mandingo|mandingue\nmao|mri|mi|Maori|maori\nmap|||Austronesian languages|austronésiennes, langues\nmar||mr|Marathi|marathe\nmas|||Masai|massaï\nmay|msa|ms|Malay|malais\nmdf|||Moksha|moksa\nmdr|||Mandar|mandar\nmen|||Mende|mendé\nmga|||Irish, Middle (900-1200)|irlandais moyen (900-1200)\nmic|||Mi'kmaq; Micmac|mi'kmaq; micmac\nmin|||Minangkabau|minangkabau\nmis|||Uncoded languages|langues non codées\nmkh|||Mon-Khmer languages|môn-khmer, langues\nmlg||mg|Malagasy|malgache\nmlt||mt|Maltese|maltais\nmnc|||Manchu|mandchou\nmni|||Manipuri|manipuri\nmno|||Manobo languages|manobo, langues\nmoh|||Mohawk|mohawk\nmon||mn|Mongolian|mongol\nmos|||Mossi|moré\nmul|||Multiple languages|multilingue\nmun|||Munda languages|mounda, langues\nmus|||Creek|muskogee\nmwl|||Mirandese|mirandais\nmwr|||Marwari|marvari\nmyn|||Mayan languages|maya, langues\nmyv|||Erzya|erza\nnah|||Nahuatl languages|nahuatl, langues\nnai|||North American Indian languages|nord-amérindiennes, langues\nnap|||Neapolitan|napolitain\nnau||na|Nauru|nauruan\nnav||nv|Navajo; Navaho|navaho\nnbl||nr|Ndebele, South; South Ndebele|ndébélé du Sud\nnde||nd|Ndebele, North; North Ndebele|ndébélé du Nord\nndo||ng|Ndonga|ndonga\nnds|||Low German; Low Saxon; German, Low; Saxon, Low|bas allemand; bas saxon; allemand, bas; saxon, bas\nnep||ne|Nepali|népalais\nnew|||Nepal Bhasa; Newari|nepal bhasa; newari\nnia|||Nias|nias\nnic|||Niger-Kordofanian languages|nigéro-kordofaniennes, langues\nniu|||Niuean|niué\nnno||nn|Norwegian Nynorsk; Nynorsk, Norwegian|norvégien nynorsk; nynorsk, norvégien\nnob||nb|Bokmål, Norwegian; Norwegian Bokmål|norvégien bokmål\nnog|||Nogai|nogaï; nogay\nnon|||Norse, Old|norrois, vieux\nnor||no|Norwegian|norvégien\nnqo|||N'Ko|n'ko\nnso|||Pedi; Sepedi; Northern Sotho|pedi; sepedi; sotho du Nord\nnub|||Nubian languages|nubiennes, langues\nnwc|||Classical Newari; Old Newari; Classical Nepal Bhasa|newari classique\nnya||ny|Chichewa; Chewa; Nyanja|chichewa; chewa; nyanja\nnym|||Nyamwezi|nyamwezi\nnyn|||Nyankole|nyankolé\nnyo|||Nyoro|nyoro\nnzi|||Nzima|nzema\noci||oc|Occitan (post 1500); Provençal|occitan (après 1500); provençal\noji||oj|Ojibwa|ojibwa\nori||or|Oriya|oriya\norm||om|Oromo|galla\nosa|||Osage|osage\noss||os|Ossetian; Ossetic|ossète\nota|||Turkish, Ottoman (1500-1928)|turc ottoman (1500-1928)\noto|||Otomian languages|otomi, langues\npaa|||Papuan languages|papoues, langues\npag|||Pangasinan|pangasinan\npal|||Pahlavi|pahlavi\npam|||Pampanga; Kapampangan|pampangan\npan||pa|Panjabi; Punjabi|pendjabi\npap|||Papiamento|papiamento\npau|||Palauan|palau\npeo|||Persian, Old (ca.600-400 B.C.)|perse, vieux (ca. 600-400 av. J.-C.)\nper|fas|fa|Persian|persan\nphi|||Philippine languages|philippines, langues\nphn|||Phoenician|phénicien\npli||pi|Pali|pali\npol||pl|Polish|polonais\npon|||Pohnpeian|pohnpei\npor||pt|Portuguese|portugais\npra|||Prakrit languages|prâkrit, langues\npro|||Provençal, Old (to 1500)|provençal ancien (jusqu'à 1500)\npus||ps|Pushto; Pashto|pachto\nqaa-qtz|||Reserved for local use|réservée à l'usage local\nque||qu|Quechua|quechua\nraj|||Rajasthani|rajasthani\nrap|||Rapanui|rapanui\nrar|||Rarotongan; Cook Islands Maori|rarotonga; maori des îles Cook\nroa|||Romance languages|romanes, langues\nroh||rm|Romansh|romanche\nrom|||Romany|tsigane\nrum|ron|ro|Romanian; Moldavian; Moldovan|roumain; moldave\nrun||rn|Rundi|rundi\nrup|||Aromanian; Arumanian; Macedo-Romanian|aroumain; macédo-roumain\nrus||ru|Russian|russe\nsad|||Sandawe|sandawe\nsag||sg|Sango|sango\nsah|||Yakut|iakoute\nsai|||South American Indian (Other)|indiennes d'Amérique du Sud, autres langues\nsal|||Salishan languages|salishennes, langues\nsam|||Samaritan Aramaic|samaritain\nsan||sa|Sanskrit|sanskrit\nsas|||Sasak|sasak\nsat|||Santali|santal\nscn|||Sicilian|sicilien\nsco|||Scots|écossais\nsel|||Selkup|selkoupe\nsem|||Semitic languages|sémitiques, langues\nsga|||Irish, Old (to 900)|irlandais ancien (jusqu'à 900)\nsgn|||Sign Languages|langues des signes\nshn|||Shan|chan\nsid|||Sidamo|sidamo\nsin||si|Sinhala; Sinhalese|singhalais\nsio|||Siouan languages|sioux, langues\nsit|||Sino-Tibetan languages|sino-tibétaines, langues\nsla|||Slavic languages|slaves, langues\nslo|slk|sk|Slovak|slovaque\nslv||sl|Slovenian|slovène\nsma|||Southern Sami|sami du Sud\nsme||se|Northern Sami|sami du Nord\nsmi|||Sami languages|sames, langues\nsmj|||Lule Sami|sami de Lule\nsmn|||Inari Sami|sami d'Inari\nsmo||sm|Samoan|samoan\nsms|||Skolt Sami|sami skolt\nsna||sn|Shona|shona\nsnd||sd|Sindhi|sindhi\nsnk|||Soninke|soninké\nsog|||Sogdian|sogdien\nsom||so|Somali|somali\nson|||Songhai languages|songhai, langues\nsot||st|Sotho, Southern|sotho du Sud\nspa||es|Spanish; Castilian|espagnol; castillan\nsrd||sc|Sardinian|sarde\nsrn|||Sranan Tongo|sranan tongo\nsrp||sr|Serbian|serbe\nsrr|||Serer|sérère\nssa|||Nilo-Saharan languages|nilo-sahariennes, langues\nssw||ss|Swati|swati\nsuk|||Sukuma|sukuma\nsun||su|Sundanese|soundanais\nsus|||Susu|soussou\nsux|||Sumerian|sumérien\nswa||sw|Swahili|swahili\nswe||sv|Swedish|suédois\nsyc|||Classical Syriac|syriaque classique\nsyr|||Syriac|syriaque\ntah||ty|Tahitian|tahitien\ntai|||Tai languages|tai, langues\ntam||ta|Tamil|tamoul\ntat||tt|Tatar|tatar\ntel||te|Telugu|télougou\ntem|||Timne|temne\nter|||Tereno|tereno\ntet|||Tetum|tetum\ntgk||tg|Tajik|tadjik\ntgl||tl|Tagalog|tagalog\ntha||th|Thai|thaï\ntib|bod|bo|Tibetan|tibétain\ntig|||Tigre|tigré\ntir||ti|Tigrinya|tigrigna\ntiv|||Tiv|tiv\ntkl|||Tokelau|tokelau\ntlh|||Klingon; tlhIngan-Hol|klingon\ntli|||Tlingit|tlingit\ntmh|||Tamashek|tamacheq\ntog|||Tonga (Nyasa)|tonga (Nyasa)\nton||to|Tonga (Tonga Islands)|tongan (Îles Tonga)\ntpi|||Tok Pisin|tok pisin\ntsi|||Tsimshian|tsimshian\ntsn||tn|Tswana|tswana\ntso||ts|Tsonga|tsonga\ntuk||tk|Turkmen|turkmène\ntum|||Tumbuka|tumbuka\ntup|||Tupi languages|tupi, langues\ntur||tr|Turkish|turc\ntut|||Altaic languages|altaïques, langues\ntvl|||Tuvalu|tuvalu\ntwi||tw|Twi|twi\ntyv|||Tuvinian|touva\nudm|||Udmurt|oudmourte\nuga|||Ugaritic|ougaritique\nuig||ug|Uighur; Uyghur|ouïgour\nukr||uk|Ukrainian|ukrainien\numb|||Umbundu|umbundu\nund|||Undetermined|indéterminée\nurd||ur|Urdu|ourdou\nuzb||uz|Uzbek|ouszbek\nvai|||Vai|vaï\nven||ve|Venda|venda\nvie||vi|Vietnamese|vietnamien\nvol||vo|Volapük|volapük\nvot|||Votic|vote\nwak|||Wakashan languages|wakashanes, langues\nwal|||Walamo|walamo\nwar|||Waray|waray\nwas|||Washo|washo\nwel|cym|cy|Welsh|gallois\nwen|||Sorbian languages|sorabes, langues\nwln||wa|Walloon|wallon\nwol||wo|Wolof|wolof\nxal|||Kalmyk; Oirat|kalmouk; oïrat\nxho||xh|Xhosa|xhosa\nyao|||Yao|yao\nyap|||Yapese|yapois\nyid||yi|Yiddish|yiddish\nyor||yo|Yoruba|yoruba\nypk|||Yupik languages|yupik, langues\nzap|||Zapotec|zapotèque\nzbl|||Blissymbols; Blissymbolics; Bliss|symboles Bliss; Bliss\nzen|||Zenaga|zenaga\nzgh|||Standard Moroccan Tamazight|amazighe standard marocain\nzha||za|Zhuang; Chuang|zhuang; chuang\nznd|||Zande languages|zandé, langues\nzul||zu|Zulu|zoulou\nzun|||Zuni|zuni\nzxx|||No linguistic content; Not applicable|pas de contenu linguistique; non applicable\nzza|||Zaza; Dimili; Dimli; Kirdki; Kirmanjki; Zazaki|zaza; dimili; dimli; kirdki; kirmanjki; zazaki"¶
- alpha_2 = 'sv'¶
- alpha_3 = 'swe'¶
- english_names = {'aa': ['Afar'], 'aar': ['Afar'], 'ab': ['Abkhazian'], 'abk': ['Abkhazian'], 'ace': ['Achinese'], 'ach': ['Acoli'], 'ada': ['Adangme'], 'ady': ['Adyghe', 'Adygei'], 'ae': ['Avestan'], 'af': ['Afrikaans'], 'afa': ['Afro-Asiatic languages'], 'afh': ['Afrihili'], 'afr': ['Afrikaans'], 'ain': ['Ainu'], 'ak': ['Akan'], 'aka': ['Akan'], 'akk': ['Akkadian'], 'alb': ['Albanian'], 'ale': ['Aleut'], 'alg': ['Algonquian languages'], 'alt': ['Southern Altai'], 'am': ['Amharic'], 'amh': ['Amharic'], 'an': ['Aragonese'], 'ang': ['English, Old (ca.450-1100)'], 'anp': ['Angika'], 'apa': ['Apache languages'], 'ar': ['Arabic'], 'ara': ['Arabic'], 'arc': ['Official Aramaic (700-300 BCE)', 'Imperial Aramaic (700-300 BCE)'], 'arg': ['Aragonese'], 'arm': ['Armenian'], 'arn': ['Mapudungun', 'Mapuche'], 'arp': ['Arapaho'], 'art': ['Artificial languages'], 'arw': ['Arawak'], 'as': ['Assamese'], 'asm': ['Assamese'], 'ast': ['Asturian', 'Bable', 'Leonese', 'Asturleonese'], 'ath': ['Athapascan languages'], 'aus': ['Australian languages'], 'av': ['Avaric'], 'ava': ['Avaric'], 'ave': ['Avestan'], 'awa': ['Awadhi'], 'ay': ['Aymara'], 'aym': ['Aymara'], 'az': ['Azerbaijani'], 'aze': ['Azerbaijani'], 'ba': ['Bashkir'], 'bad': ['Banda languages'], 'bai': ['Bamileke languages'], 'bak': ['Bashkir'], 'bal': ['Baluchi'], 'bam': ['Bambara'], 'ban': ['Balinese'], 'baq': ['Basque'], 'bas': ['Basa'], 'bat': ['Baltic languages'], 'be': ['Belarusian'], 'bej': ['Beja', 'Bedawiyet'], 'bel': ['Belarusian'], 'bem': ['Bemba'], 'ben': ['Bengali'], 'ber': ['Berber languages'], 'bg': ['Bulgarian'], 'bh': ['Bihari languages'], 'bho': ['Bhojpuri'], 'bi': ['Bislama'], 'bih': ['Bihari languages'], 'bik': ['Bikol'], 'bin': ['Bini', 'Edo'], 'bis': ['Bislama'], 'bla': ['Siksika'], 'bm': ['Bambara'], 'bn': ['Bengali'], 'bnt': ['Bantu (Other)'], 'bo': ['Tibetan'], 'bos': ['Bosnian'], 'br': ['Breton'], 'bra': ['Braj'], 'bre': ['Breton'], 'bs': ['Bosnian'], 'btk': ['Batak languages'], 'bua': ['Buriat'], 'bug': ['Buginese'], 'bul': ['Bulgarian'], 'bur': ['Burmese'], 'byn': ['Blin', 'Bilin'], 'ca': ['Catalan', 'Valencian'], 'cad': ['Caddo'], 'cai': ['Central American Indian languages'], 'car': ['Galibi Carib'], 'cat': ['Catalan', 'Valencian'], 'cau': ['Caucasian languages'], 'ce': ['Chechen'], 'ceb': ['Cebuano'], 'cel': ['Celtic languages'], 'ch': ['Chamorro'], 'cha': ['Chamorro'], 'chb': ['Chibcha'], 'che': ['Chechen'], 'chg': ['Chagatai'], 'chi': ['Chinese'], 'chk': ['Chuukese'], 'chm': ['Mari'], 'chn': ['Chinook jargon'], 'cho': ['Choctaw'], 'chp': ['Chipewyan', 'Dene Suline'], 'chr': ['Cherokee'], 'chu': ['Church Slavic', 'Old Slavonic', 'Church Slavonic', 'Old Bulgarian', 'Old Church Slavonic'], 'chv': ['Chuvash'], 'chy': ['Cheyenne'], 'cmc': ['Chamic languages'], 'co': ['Corsican'], 'cop': ['Coptic'], 'cor': ['Cornish'], 'cos': ['Corsican'], 'cpe': ['Creoles and pidgins, English based'], 'cpf': ['Creoles and pidgins, French-based'], 'cpp': ['Creoles and pidgins, Portuguese-based'], 'cr': ['Cree'], 'cre': ['Cree'], 'crh': ['Crimean Tatar', 'Crimean Turkish'], 'crp': ['Creoles and pidgins'], 'cs': ['Czech'], 'csb': ['Kashubian'], 'cu': ['Church Slavic', 'Old Slavonic', 'Church Slavonic', 'Old Bulgarian', 'Old Church Slavonic'], 'cus': ['Cushitic languages'], 'cv': ['Chuvash'], 'cy': ['Welsh'], 'cze': ['Czech'], 'da': ['Danish'], 'dak': ['Dakota'], 'dan': ['Danish'], 'dar': ['Dargwa'], 'day': ['Land Dayak languages'], 'de': ['German'], 'del': ['Delaware'], 'den': ['Slave (Athapascan)'], 'dgr': ['Dogrib'], 'din': ['Dinka'], 'div': ['Divehi', 'Dhivehi', 'Maldivian'], 'doi': ['Dogri'], 'dra': ['Dravidian languages'], 'dsb': ['Lower Sorbian'], 'dua': ['Duala'], 'dum': ['Dutch, Middle (ca.1050-1350)'], 'dut': ['Dutch', 'Flemish'], 'dv': ['Divehi', 'Dhivehi', 'Maldivian'], 'dyu': ['Dyula'], 'dz': ['Dzongkha'], 'dzo': ['Dzongkha'], 'ee': ['Ewe'], 'efi': ['Efik'], 'egy': ['Egyptian (Ancient)'], 'eka': ['Ekajuk'], 'el': ['Greek, Modern (1453-)'], 'elx': ['Elamite'], 'en': ['English'], 'eng': ['English'], 'enm': ['English, Middle (1100-1500)'], 'eo': ['Esperanto'], 'epo': ['Esperanto'], 'es': ['Spanish', 'Castilian'], 'est': ['Estonian'], 'et': ['Estonian'], 'eu': ['Basque'], 'ewe': ['Ewe'], 'ewo': ['Ewondo'], 'fa': ['Persian'], 'fan': ['Fang'], 'fao': ['Faroese'], 'fat': ['Fanti'], 'ff': ['Fulah'], 'fi': ['Finnish'], 'fij': ['Fijian'], 'fil': ['Filipino', 'Pilipino'], 'fin': ['Finnish'], 'fiu': ['Finno-Ugrian languages'], 'fj': ['Fijian'], 'fo': ['Faroese'], 'fon': ['Fon'], 'fr': ['French'], 'fre': ['French'], 'frm': ['French, Middle (ca.1400-1600)'], 'fro': ['French, Old (842-ca.1400)'], 'frr': ['Northern Frisian'], 'frs': ['Eastern Frisian'], 'fry': ['Western Frisian'], 'ful': ['Fulah'], 'fur': ['Friulian'], 'fy': ['Western Frisian'], 'ga': ['Irish'], 'gaa': ['Ga'], 'gay': ['Gayo'], 'gba': ['Gbaya'], 'gd': ['Gaelic', 'Scottish Gaelic'], 'gem': ['Germanic languages'], 'geo': ['Georgian'], 'ger': ['German'], 'gez': ['Geez'], 'gil': ['Gilbertese'], 'gl': ['Galician'], 'gla': ['Gaelic', 'Scottish Gaelic'], 'gle': ['Irish'], 'glg': ['Galician'], 'glv': ['Manx'], 'gmh': ['German, Middle High (ca.1050-1500)'], 'gn': ['Guarani'], 'goh': ['German, Old High (ca.750-1050)'], 'gon': ['Gondi'], 'gor': ['Gorontalo'], 'got': ['Gothic'], 'grb': ['Grebo'], 'grc': ['Greek, Ancient (to 1453)'], 'gre': ['Greek, Modern (1453-)'], 'grn': ['Guarani'], 'gsw': ['Swiss German', 'Alemannic', 'Alsatian'], 'gu': ['Gujarati'], 'guj': ['Gujarati'], 'gv': ['Manx'], 'gwi': ["Gwich'in"], 'ha': ['Hausa'], 'hai': ['Haida'], 'hat': ['Haitian', 'Haitian Creole'], 'hau': ['Hausa'], 'haw': ['Hawaiian'], 'he': ['Hebrew'], 'heb': ['Hebrew'], 'her': ['Herero'], 'hi': ['Hindi'], 'hil': ['Hiligaynon'], 'him': ['Himachali languages', 'Western Pahari languages'], 'hin': ['Hindi'], 'hit': ['Hittite'], 'hmn': ['Hmong', 'Mong'], 'hmo': ['Hiri Motu'], 'ho': ['Hiri Motu'], 'hr': ['Croatian'], 'hrv': ['Croatian'], 'hsb': ['Upper Sorbian'], 'ht': ['Haitian', 'Haitian Creole'], 'hu': ['Hungarian'], 'hun': ['Hungarian'], 'hup': ['Hupa'], 'hy': ['Armenian'], 'hz': ['Herero'], 'ia': ['Interlingua (International Auxiliary Language Association)'], 'iba': ['Iban'], 'ibo': ['Igbo'], 'ice': ['Icelandic'], 'id': ['Indonesian'], 'ido': ['Ido'], 'ie': ['Interlingue', 'Occidental'], 'ig': ['Igbo'], 'ii': ['Sichuan Yi', 'Nuosu'], 'iii': ['Sichuan Yi', 'Nuosu'], 'ijo': ['Ijo languages'], 'ik': ['Inupiaq'], 'iku': ['Inuktitut'], 'ile': ['Interlingue', 'Occidental'], 'ilo': ['Iloko'], 'ina': ['Interlingua (International Auxiliary Language Association)'], 'inc': ['Indic languages'], 'ind': ['Indonesian'], 'ine': ['Indo-European languages'], 'inh': ['Ingush'], 'io': ['Ido'], 'ipk': ['Inupiaq'], 'ira': ['Iranian languages'], 'iro': ['Iroquoian languages'], 'is': ['Icelandic'], 'it': ['Italian'], 'ita': ['Italian'], 'iu': ['Inuktitut'], 'ja': ['Japanese'], 'jav': ['Javanese'], 'jbo': ['Lojban'], 'jpn': ['Japanese'], 'jpr': ['Judeo-Persian'], 'jrb': ['Judeo-Arabic'], 'jv': ['Javanese'], 'ka': ['Georgian'], 'kaa': ['Kara-Kalpak'], 'kab': ['Kabyle'], 'kac': ['Kachin', 'Jingpho'], 'kal': ['Kalaallisut', 'Greenlandic'], 'kam': ['Kamba'], 'kan': ['Kannada'], 'kar': ['Karen languages'], 'kas': ['Kashmiri'], 'kau': ['Kanuri'], 'kaw': ['Kawi'], 'kaz': ['Kazakh'], 'kbd': ['Kabardian'], 'kg': ['Kongo'], 'kha': ['Khasi'], 'khi': ['Khoisan languages'], 'khm': ['Central Khmer'], 'kho': ['Khotanese', 'Sakan'], 'ki': ['Kikuyu', 'Gikuyu'], 'kik': ['Kikuyu', 'Gikuyu'], 'kin': ['Kinyarwanda'], 'kir': ['Kirghiz', 'Kyrgyz'], 'kj': ['Kuanyama', 'Kwanyama'], 'kk': ['Kazakh'], 'kl': ['Kalaallisut', 'Greenlandic'], 'km': ['Central Khmer'], 'kmb': ['Kimbundu'], 'kn': ['Kannada'], 'ko': ['Korean'], 'kok': ['Konkani'], 'kom': ['Komi'], 'kon': ['Kongo'], 'kor': ['Korean'], 'kos': ['Kosraean'], 'kpe': ['Kpelle'], 'kr': ['Kanuri'], 'krc': ['Karachay-Balkar'], 'krl': ['Karelian'], 'kro': ['Kru languages'], 'kru': ['Kurukh'], 'ks': ['Kashmiri'], 'ku': ['Kurdish'], 'kua': ['Kuanyama', 'Kwanyama'], 'kum': ['Kumyk'], 'kur': ['Kurdish'], 'kut': ['Kutenai'], 'kv': ['Komi'], 'kw': ['Cornish'], 'ky': ['Kirghiz', 'Kyrgyz'], 'la': ['Latin'], 'lad': ['Ladino'], 'lah': ['Lahnda'], 'lam': ['Lamba'], 'lao': ['Lao'], 'lat': ['Latin'], 'lav': ['Latvian'], 'lb': ['Luxembourgish', 'Letzeburgesch'], 'lez': ['Lezghian'], 'lg': ['Ganda'], 'li': ['Limburgan', 'Limburger', 'Limburgish'], 'lim': ['Limburgan', 'Limburger', 'Limburgish'], 'lin': ['Lingala'], 'lit': ['Lithuanian'], 'ln': ['Lingala'], 'lo': ['Lao'], 'lol': ['Mongo'], 'loz': ['Lozi'], 'lt': ['Lithuanian'], 'ltz': ['Luxembourgish', 'Letzeburgesch'], 'lu': ['Luba-Katanga'], 'lua': ['Luba-Lulua'], 'lub': ['Luba-Katanga'], 'lug': ['Ganda'], 'lui': ['Luiseno'], 'lun': ['Lunda'], 'luo': ['Luo (Kenya and Tanzania)'], 'lus': ['Lushai'], 'lv': ['Latvian'], 'mac': ['Macedonian'], 'mad': ['Madurese'], 'mag': ['Magahi'], 'mah': ['Marshallese'], 'mai': ['Maithili'], 'mak': ['Makasar'], 'mal': ['Malayalam'], 'man': ['Mandingo'], 'mao': ['Maori'], 'map': ['Austronesian languages'], 'mar': ['Marathi'], 'mas': ['Masai'], 'may': ['Malay'], 'mdf': ['Moksha'], 'mdr': ['Mandar'], 'men': ['Mende'], 'mg': ['Malagasy'], 'mga': ['Irish, Middle (900-1200)'], 'mh': ['Marshallese'], 'mi': ['Maori'], 'mic': ["Mi'kmaq", 'Micmac'], 'min': ['Minangkabau'], 'mis': ['Uncoded languages'], 'mk': ['Macedonian'], 'mkh': ['Mon-Khmer languages'], 'ml': ['Malayalam'], 'mlg': ['Malagasy'], 'mlt': ['Maltese'], 'mn': ['Mongolian'], 'mnc': ['Manchu'], 'mni': ['Manipuri'], 'mno': ['Manobo languages'], 'moh': ['Mohawk'], 'mon': ['Mongolian'], 'mos': ['Mossi'], 'mr': ['Marathi'], 'ms': ['Malay'], 'mt': ['Maltese'], 'mul': ['Multiple languages'], 'mun': ['Munda languages'], 'mus': ['Creek'], 'mwl': ['Mirandese'], 'mwr': ['Marwari'], 'my': ['Burmese'], 'myn': ['Mayan languages'], 'myv': ['Erzya'], 'na': ['Nauru'], 'nah': ['Nahuatl languages'], 'nai': ['North American Indian languages'], 'nap': ['Neapolitan'], 'nau': ['Nauru'], 'nav': ['Navajo', 'Navaho'], 'nb': ['Bokmål, Norwegian', 'Norwegian Bokmål'], 'nbl': ['Ndebele, South', 'South Ndebele'], 'nd': ['Ndebele, North', 'North Ndebele'], 'nde': ['Ndebele, North', 'North Ndebele'], 'ndo': ['Ndonga'], 'nds': ['Low German', 'Low Saxon', 'German, Low', 'Saxon, Low'], 'ne': ['Nepali'], 'nep': ['Nepali'], 'new': ['Nepal Bhasa', 'Newari'], 'ng': ['Ndonga'], 'nia': ['Nias'], 'nic': ['Niger-Kordofanian languages'], 'niu': ['Niuean'], 'nl': ['Dutch', 'Flemish'], 'nn': ['Norwegian Nynorsk', 'Nynorsk, Norwegian'], 'nno': ['Norwegian Nynorsk', 'Nynorsk, Norwegian'], 'no': ['Norwegian'], 'nob': ['Bokmål, Norwegian', 'Norwegian Bokmål'], 'nog': ['Nogai'], 'non': ['Norse, Old'], 'nor': ['Norwegian'], 'nqo': ["N'Ko"], 'nr': ['Ndebele, South', 'South Ndebele'], 'nso': ['Pedi', 'Sepedi', 'Northern Sotho'], 'nub': ['Nubian languages'], 'nv': ['Navajo', 'Navaho'], 'nwc': ['Classical Newari', 'Old Newari', 'Classical Nepal Bhasa'], 'ny': ['Chichewa', 'Chewa', 'Nyanja'], 'nya': ['Chichewa', 'Chewa', 'Nyanja'], 'nym': ['Nyamwezi'], 'nyn': ['Nyankole'], 'nyo': ['Nyoro'], 'nzi': ['Nzima'], 'oc': ['Occitan (post 1500)', 'Provençal'], 'oci': ['Occitan (post 1500)', 'Provençal'], 'oj': ['Ojibwa'], 'oji': ['Ojibwa'], 'om': ['Oromo'], 'or': ['Oriya'], 'ori': ['Oriya'], 'orm': ['Oromo'], 'os': ['Ossetian', 'Ossetic'], 'osa': ['Osage'], 'oss': ['Ossetian', 'Ossetic'], 'ota': ['Turkish, Ottoman (1500-1928)'], 'oto': ['Otomian languages'], 'pa': ['Panjabi', 'Punjabi'], 'paa': ['Papuan languages'], 'pag': ['Pangasinan'], 'pal': ['Pahlavi'], 'pam': ['Pampanga', 'Kapampangan'], 'pan': ['Panjabi', 'Punjabi'], 'pap': ['Papiamento'], 'pau': ['Palauan'], 'peo': ['Persian, Old (ca.600-400 B.C.)'], 'per': ['Persian'], 'phi': ['Philippine languages'], 'phn': ['Phoenician'], 'pi': ['Pali'], 'pl': ['Polish'], 'pli': ['Pali'], 'pol': ['Polish'], 'pon': ['Pohnpeian'], 'por': ['Portuguese'], 'pra': ['Prakrit languages'], 'pro': ['Provençal, Old (to 1500)'], 'ps': ['Pushto', 'Pashto'], 'pt': ['Portuguese'], 'pus': ['Pushto', 'Pashto'], 'qaa-qtz': ['Reserved for local use'], 'qu': ['Quechua'], 'que': ['Quechua'], 'raj': ['Rajasthani'], 'rap': ['Rapanui'], 'rar': ['Rarotongan', 'Cook Islands Maori'], 'rm': ['Romansh'], 'rn': ['Rundi'], 'ro': ['Romanian', 'Moldavian', 'Moldovan'], 'roa': ['Romance languages'], 'roh': ['Romansh'], 'rom': ['Romany'], 'ru': ['Russian'], 'rum': ['Romanian', 'Moldavian', 'Moldovan'], 'run': ['Rundi'], 'rup': ['Aromanian', 'Arumanian', 'Macedo-Romanian'], 'rus': ['Russian'], 'rw': ['Kinyarwanda'], 'sa': ['Sanskrit'], 'sad': ['Sandawe'], 'sag': ['Sango'], 'sah': ['Yakut'], 'sai': ['South American Indian (Other)'], 'sal': ['Salishan languages'], 'sam': ['Samaritan Aramaic'], 'san': ['Sanskrit'], 'sas': ['Sasak'], 'sat': ['Santali'], 'sc': ['Sardinian'], 'scn': ['Sicilian'], 'sco': ['Scots'], 'sd': ['Sindhi'], 'se': ['Northern Sami'], 'sel': ['Selkup'], 'sem': ['Semitic languages'], 'sg': ['Sango'], 'sga': ['Irish, Old (to 900)'], 'sgn': ['Sign Languages'], 'shn': ['Shan'], 'si': ['Sinhala', 'Sinhalese'], 'sid': ['Sidamo'], 'sin': ['Sinhala', 'Sinhalese'], 'sio': ['Siouan languages'], 'sit': ['Sino-Tibetan languages'], 'sk': ['Slovak'], 'sl': ['Slovenian'], 'sla': ['Slavic languages'], 'slo': ['Slovak'], 'slv': ['Slovenian'], 'sm': ['Samoan'], 'sma': ['Southern Sami'], 'sme': ['Northern Sami'], 'smi': ['Sami languages'], 'smj': ['Lule Sami'], 'smn': ['Inari Sami'], 'smo': ['Samoan'], 'sms': ['Skolt Sami'], 'sn': ['Shona'], 'sna': ['Shona'], 'snd': ['Sindhi'], 'snk': ['Soninke'], 'so': ['Somali'], 'sog': ['Sogdian'], 'som': ['Somali'], 'son': ['Songhai languages'], 'sot': ['Sotho, Southern'], 'spa': ['Spanish', 'Castilian'], 'sq': ['Albanian'], 'sr': ['Serbian'], 'srd': ['Sardinian'], 'srn': ['Sranan Tongo'], 'srp': ['Serbian'], 'srr': ['Serer'], 'ss': ['Swati'], 'ssa': ['Nilo-Saharan languages'], 'ssw': ['Swati'], 'st': ['Sotho, Southern'], 'su': ['Sundanese'], 'suk': ['Sukuma'], 'sun': ['Sundanese'], 'sus': ['Susu'], 'sux': ['Sumerian'], 'sv': ['Swedish'], 'sw': ['Swahili'], 'swa': ['Swahili'], 'swe': ['Swedish'], 'syc': ['Classical Syriac'], 'syr': ['Syriac'], 'ta': ['Tamil'], 'tah': ['Tahitian'], 'tai': ['Tai languages'], 'tam': ['Tamil'], 'tat': ['Tatar'], 'te': ['Telugu'], 'tel': ['Telugu'], 'tem': ['Timne'], 'ter': ['Tereno'], 'tet': ['Tetum'], 'tg': ['Tajik'], 'tgk': ['Tajik'], 'tgl': ['Tagalog'], 'th': ['Thai'], 'tha': ['Thai'], 'ti': ['Tigrinya'], 'tib': ['Tibetan'], 'tig': ['Tigre'], 'tir': ['Tigrinya'], 'tiv': ['Tiv'], 'tk': ['Turkmen'], 'tkl': ['Tokelau'], 'tl': ['Tagalog'], 'tlh': ['Klingon', 'tlhIngan-Hol'], 'tli': ['Tlingit'], 'tmh': ['Tamashek'], 'tn': ['Tswana'], 'to': ['Tonga (Tonga Islands)'], 'tog': ['Tonga (Nyasa)'], 'ton': ['Tonga (Tonga Islands)'], 'tpi': ['Tok Pisin'], 'tr': ['Turkish'], 'ts': ['Tsonga'], 'tsi': ['Tsimshian'], 'tsn': ['Tswana'], 'tso': ['Tsonga'], 'tt': ['Tatar'], 'tuk': ['Turkmen'], 'tum': ['Tumbuka'], 'tup': ['Tupi languages'], 'tur': ['Turkish'], 'tut': ['Altaic languages'], 'tvl': ['Tuvalu'], 'tw': ['Twi'], 'twi': ['Twi'], 'ty': ['Tahitian'], 'tyv': ['Tuvinian'], 'udm': ['Udmurt'], 'ug': ['Uighur', 'Uyghur'], 'uga': ['Ugaritic'], 'uig': ['Uighur', 'Uyghur'], 'uk': ['Ukrainian'], 'ukr': ['Ukrainian'], 'umb': ['Umbundu'], 'und': ['Undetermined'], 'ur': ['Urdu'], 'urd': ['Urdu'], 'uz': ['Uzbek'], 'uzb': ['Uzbek'], 'vai': ['Vai'], 've': ['Venda'], 'ven': ['Venda'], 'vi': ['Vietnamese'], 'vie': ['Vietnamese'], 'vo': ['Volapük'], 'vol': ['Volapük'], 'vot': ['Votic'], 'wa': ['Walloon'], 'wak': ['Wakashan languages'], 'wal': ['Walamo'], 'war': ['Waray'], 'was': ['Washo'], 'wel': ['Welsh'], 'wen': ['Sorbian languages'], 'wln': ['Walloon'], 'wo': ['Wolof'], 'wol': ['Wolof'], 'xal': ['Kalmyk', 'Oirat'], 'xh': ['Xhosa'], 'xho': ['Xhosa'], 'yao': ['Yao'], 'yap': ['Yapese'], 'yi': ['Yiddish'], 'yid': ['Yiddish'], 'yo': ['Yoruba'], 'yor': ['Yoruba'], 'ypk': ['Yupik languages'], 'za': ['Zhuang', 'Chuang'], 'zap': ['Zapotec'], 'zbl': ['Blissymbols', 'Blissymbolics', 'Bliss'], 'zen': ['Zenaga'], 'zgh': ['Standard Moroccan Tamazight'], 'zh': ['Chinese'], 'zha': ['Zhuang', 'Chuang'], 'znd': ['Zande languages'], 'zu': ['Zulu'], 'zul': ['Zulu'], 'zun': ['Zuni'], 'zxx': ['No linguistic content', 'Not applicable'], 'zza': ['Zaza', 'Dimili', 'Dimli', 'Kirdki', 'Kirmanjki', 'Zazaki']}¶
- english_names_to_three = {'abkhazian': 'abk', 'achinese': 'ace', 'acoli': 'ach', 'adangme': 'ada', 'adygei': 'ady', 'adyghe': 'ady', 'afar': 'aar', 'afrihili': 'afh', 'afrikaans': 'afr', 'afro-asiatic languages': 'afa', 'ainu': 'ain', 'akan': 'aka', 'akkadian': 'akk', 'albanian': 'alb', 'alemannic': 'gsw', 'aleut': 'ale', 'algonquian languages': 'alg', 'alsatian': 'gsw', 'altaic languages': 'tut', 'amharic': 'amh', 'angika': 'anp', 'apache languages': 'apa', 'arabic': 'ara', 'aragonese': 'arg', 'arapaho': 'arp', 'arawak': 'arw', 'armenian': 'arm', 'aromanian': 'rup', 'artificial languages': 'art', 'arumanian': 'rup', 'assamese': 'asm', 'asturian': 'ast', 'asturleonese': 'ast', 'athapascan languages': 'ath', 'australian languages': 'aus', 'austronesian languages': 'map', 'avaric': 'ava', 'avestan': 'ave', 'awadhi': 'awa', 'aymara': 'aym', 'azerbaijani': 'aze', 'bable': 'ast', 'balinese': 'ban', 'baltic languages': 'bat', 'baluchi': 'bal', 'bambara': 'bam', 'bamileke languages': 'bai', 'banda languages': 'bad', 'bantu (other)': 'bnt', 'basa': 'bas', 'bashkir': 'bak', 'basque': 'baq', 'batak languages': 'btk', 'bedawiyet': 'bej', 'beja': 'bej', 'belarusian': 'bel', 'bemba': 'bem', 'bengali': 'ben', 'berber languages': 'ber', 'bhojpuri': 'bho', 'bihari languages': 'bih', 'bikol': 'bik', 'bilin': 'byn', 'bini': 'bin', 'bislama': 'bis', 'blin': 'byn', 'bliss': 'zbl', 'blissymbolics': 'zbl', 'blissymbols': 'zbl', 'bokmål, norwegian': 'nob', 'bosnian': 'bos', 'braj': 'bra', 'breton': 'bre', 'buginese': 'bug', 'bulgarian': 'bul', 'buriat': 'bua', 'burmese': 'bur', 'caddo': 'cad', 'castilian': 'spa', 'catalan': 'cat', 'caucasian languages': 'cau', 'cebuano': 'ceb', 'celtic languages': 'cel', 'central american indian languages': 'cai', 'central khmer': 'khm', 'chagatai': 'chg', 'chamic languages': 'cmc', 'chamorro': 'cha', 'chechen': 'che', 'cherokee': 'chr', 'chewa': 'nya', 'cheyenne': 'chy', 'chibcha': 'chb', 'chichewa': 'nya', 'chinese': 'chi', 'chinook jargon': 'chn', 'chipewyan': 'chp', 'choctaw': 'cho', 'chuang': 'zha', 'church slavic': 'chu', 'church slavonic': 'chu', 'chuukese': 'chk', 'chuvash': 'chv', 'classical nepal bhasa': 'nwc', 'classical newari': 'nwc', 'classical syriac': 'syc', 'cook islands maori': 'rar', 'coptic': 'cop', 'cornish': 'cor', 'corsican': 'cos', 'cree': 'cre', 'creek': 'mus', 'creoles and pidgins': 'crp', 'creoles and pidgins, english based': 'cpe', 'creoles and pidgins, french-based': 'cpf', 'creoles and pidgins, portuguese-based': 'cpp', 'crimean tatar': 'crh', 'crimean turkish': 'crh', 'croatian': 'hrv', 'cushitic languages': 'cus', 'czech': 'cze', 'dakota': 'dak', 'danish': 'dan', 'dargwa': 'dar', 'delaware': 'del', 'dene suline': 'chp', 'dhivehi': 'div', 'dimili': 'zza', 'dimli': 'zza', 'dinka': 'din', 'divehi': 'div', 'dogri': 'doi', 'dogrib': 'dgr', 'dravidian languages': 'dra', 'duala': 'dua', 'dutch': 'dut', 'dutch, middle (ca.1050-1350)': 'dum', 'dyula': 'dyu', 'dzongkha': 'dzo', 'eastern frisian': 'frs', 'edo': 'bin', 'efik': 'efi', 'egyptian (ancient)': 'egy', 'ekajuk': 'eka', 'elamite': 'elx', 'english': 'eng', 'english, middle (1100-1500)': 'enm', 'english, old (ca.450-1100)': 'ang', 'erzya': 'myv', 'esperanto': 'epo', 'estonian': 'est', 'ewe': 'ewe', 'ewondo': 'ewo', 'fang': 'fan', 'fanti': 'fat', 'faroese': 'fao', 'fijian': 'fij', 'filipino': 'fil', 'finnish': 'fin', 'finno-ugrian languages': 'fiu', 'flemish': 'dut', 'fon': 'fon', 'french': 'fre', 'french, middle (ca.1400-1600)': 'frm', 'french, old (842-ca.1400)': 'fro', 'friulian': 'fur', 'fulah': 'ful', 'ga': 'gaa', 'gaelic': 'gla', 'galibi carib': 'car', 'galician': 'glg', 'ganda': 'lug', 'gayo': 'gay', 'gbaya': 'gba', 'geez': 'gez', 'georgian': 'geo', 'german': 'ger', 'german, low': 'nds', 'german, middle high (ca.1050-1500)': 'gmh', 'german, old high (ca.750-1050)': 'goh', 'germanic languages': 'gem', 'gikuyu': 'kik', 'gilbertese': 'gil', 'gondi': 'gon', 'gorontalo': 'gor', 'gothic': 'got', 'grebo': 'grb', 'greek, ancient (to 1453)': 'grc', 'greek, modern (1453-)': 'gre', 'greenlandic': 'kal', 'guarani': 'grn', 'gujarati': 'guj', "gwich'in": 'gwi', 'haida': 'hai', 'haitian': 'hat', 'haitian creole': 'hat', 'hausa': 'hau', 'hawaiian': 'haw', 'hebrew': 'heb', 'herero': 'her', 'hiligaynon': 'hil', 'himachali languages': 'him', 'hindi': 'hin', 'hiri motu': 'hmo', 'hittite': 'hit', 'hmong': 'hmn', 'hungarian': 'hun', 'hupa': 'hup', 'iban': 'iba', 'icelandic': 'ice', 'ido': 'ido', 'igbo': 'ibo', 'ijo languages': 'ijo', 'iloko': 'ilo', 'imperial aramaic (700-300 bce)': 'arc', 'inari sami': 'smn', 'indic languages': 'inc', 'indo-european languages': 'ine', 'indonesian': 'ind', 'ingush': 'inh', 'interlingua (international auxiliary language association)': 'ina', 'interlingue': 'ile', 'inuktitut': 'iku', 'inupiaq': 'ipk', 'iranian languages': 'ira', 'irish': 'gle', 'irish, middle (900-1200)': 'mga', 'irish, old (to 900)': 'sga', 'iroquoian languages': 'iro', 'italian': 'ita', 'japanese': 'jpn', 'javanese': 'jav', 'jingpho': 'kac', 'judeo-arabic': 'jrb', 'judeo-persian': 'jpr', 'kabardian': 'kbd', 'kabyle': 'kab', 'kachin': 'kac', 'kalaallisut': 'kal', 'kalmyk': 'xal', 'kamba': 'kam', 'kannada': 'kan', 'kanuri': 'kau', 'kapampangan': 'pam', 'kara-kalpak': 'kaa', 'karachay-balkar': 'krc', 'karelian': 'krl', 'karen languages': 'kar', 'kashmiri': 'kas', 'kashubian': 'csb', 'kawi': 'kaw', 'kazakh': 'kaz', 'khasi': 'kha', 'khoisan languages': 'khi', 'khotanese': 'kho', 'kikuyu': 'kik', 'kimbundu': 'kmb', 'kinyarwanda': 'kin', 'kirdki': 'zza', 'kirghiz': 'kir', 'kirmanjki': 'zza', 'klingon': 'tlh', 'komi': 'kom', 'kongo': 'kon', 'konkani': 'kok', 'korean': 'kor', 'kosraean': 'kos', 'kpelle': 'kpe', 'kru languages': 'kro', 'kuanyama': 'kua', 'kumyk': 'kum', 'kurdish': 'kur', 'kurukh': 'kru', 'kutenai': 'kut', 'kwanyama': 'kua', 'kyrgyz': 'kir', 'ladino': 'lad', 'lahnda': 'lah', 'lamba': 'lam', 'land dayak languages': 'day', 'lao': 'lao', 'latin': 'lat', 'latvian': 'lav', 'leonese': 'ast', 'letzeburgesch': 'ltz', 'lezghian': 'lez', 'limburgan': 'lim', 'limburger': 'lim', 'limburgish': 'lim', 'lingala': 'lin', 'lithuanian': 'lit', 'lojban': 'jbo', 'low german': 'nds', 'low saxon': 'nds', 'lower sorbian': 'dsb', 'lozi': 'loz', 'luba-katanga': 'lub', 'luba-lulua': 'lua', 'luiseno': 'lui', 'lule sami': 'smj', 'lunda': 'lun', 'luo (kenya and tanzania)': 'luo', 'lushai': 'lus', 'luxembourgish': 'ltz', 'macedo-romanian': 'rup', 'macedonian': 'mac', 'madurese': 'mad', 'magahi': 'mag', 'maithili': 'mai', 'makasar': 'mak', 'malagasy': 'mlg', 'malay': 'may', 'malayalam': 'mal', 'maldivian': 'div', 'maltese': 'mlt', 'manchu': 'mnc', 'mandar': 'mdr', 'mandingo': 'man', 'manipuri': 'mni', 'manobo languages': 'mno', 'manx': 'glv', 'maori': 'mao', 'mapuche': 'arn', 'mapudungun': 'arn', 'marathi': 'mar', 'mari': 'chm', 'marshallese': 'mah', 'marwari': 'mwr', 'masai': 'mas', 'mayan languages': 'myn', 'mende': 'men', "mi'kmaq": 'mic', 'micmac': 'mic', 'minangkabau': 'min', 'mirandese': 'mwl', 'mohawk': 'moh', 'moksha': 'mdf', 'moldavian': 'rum', 'moldovan': 'rum', 'mon-khmer languages': 'mkh', 'mong': 'hmn', 'mongo': 'lol', 'mongolian': 'mon', 'mossi': 'mos', 'multiple languages': 'mul', 'munda languages': 'mun', "n'ko": 'nqo', 'nahuatl languages': 'nah', 'nauru': 'nau', 'navaho': 'nav', 'navajo': 'nav', 'ndebele, north': 'nde', 'ndebele, south': 'nbl', 'ndonga': 'ndo', 'neapolitan': 'nap', 'nepal bhasa': 'new', 'nepali': 'nep', 'newari': 'new', 'nias': 'nia', 'niger-kordofanian languages': 'nic', 'nilo-saharan languages': 'ssa', 'niuean': 'niu', 'no linguistic content': 'zxx', 'nogai': 'nog', 'norse, old': 'non', 'north american indian languages': 'nai', 'north ndebele': 'nde', 'northern frisian': 'frr', 'northern sami': 'sme', 'northern sotho': 'nso', 'norwegian': 'nor', 'norwegian bokmål': 'nob', 'norwegian nynorsk': 'nno', 'not applicable': 'zxx', 'nubian languages': 'nub', 'nuosu': 'iii', 'nyamwezi': 'nym', 'nyanja': 'nya', 'nyankole': 'nyn', 'nynorsk, norwegian': 'nno', 'nyoro': 'nyo', 'nzima': 'nzi', 'occidental': 'ile', 'occitan (post 1500)': 'oci', 'official aramaic (700-300 bce)': 'arc', 'oirat': 'xal', 'ojibwa': 'oji', 'old bulgarian': 'chu', 'old church slavonic': 'chu', 'old newari': 'nwc', 'old slavonic': 'chu', 'oriya': 'ori', 'oromo': 'orm', 'osage': 'osa', 'ossetian': 'oss', 'ossetic': 'oss', 'otomian languages': 'oto', 'pahlavi': 'pal', 'palauan': 'pau', 'pali': 'pli', 'pampanga': 'pam', 'pangasinan': 'pag', 'panjabi': 'pan', 'papiamento': 'pap', 'papuan languages': 'paa', 'pashto': 'pus', 'pedi': 'nso', 'persian': 'per', 'persian, old (ca.600-400 b.c.)': 'peo', 'philippine languages': 'phi', 'phoenician': 'phn', 'pilipino': 'fil', 'pohnpeian': 'pon', 'polish': 'pol', 'portuguese': 'por', 'prakrit languages': 'pra', 'provençal': 'oci', 'provençal, old (to 1500)': 'pro', 'punjabi': 'pan', 'pushto': 'pus', 'quechua': 'que', 'rajasthani': 'raj', 'rapanui': 'rap', 'rarotongan': 'rar', 'reserved for local use': 'qaa-qtz', 'romance languages': 'roa', 'romanian': 'rum', 'romansh': 'roh', 'romany': 'rom', 'rundi': 'run', 'russian': 'rus', 'sakan': 'kho', 'salishan languages': 'sal', 'samaritan aramaic': 'sam', 'sami languages': 'smi', 'samoan': 'smo', 'sandawe': 'sad', 'sango': 'sag', 'sanskrit': 'san', 'santali': 'sat', 'sardinian': 'srd', 'sasak': 'sas', 'saxon, low': 'nds', 'scots': 'sco', 'scottish gaelic': 'gla', 'selkup': 'sel', 'semitic languages': 'sem', 'sepedi': 'nso', 'serbian': 'srp', 'serer': 'srr', 'shan': 'shn', 'shona': 'sna', 'sichuan yi': 'iii', 'sicilian': 'scn', 'sidamo': 'sid', 'sign languages': 'sgn', 'siksika': 'bla', 'sindhi': 'snd', 'sinhala': 'sin', 'sinhalese': 'sin', 'sino-tibetan languages': 'sit', 'siouan languages': 'sio', 'skolt sami': 'sms', 'slave (athapascan)': 'den', 'slavic languages': 'sla', 'slovak': 'slo', 'slovenian': 'slv', 'sogdian': 'sog', 'somali': 'som', 'songhai languages': 'son', 'soninke': 'snk', 'sorbian languages': 'wen', 'sotho, southern': 'sot', 'south american indian (other)': 'sai', 'south ndebele': 'nbl', 'southern altai': 'alt', 'southern sami': 'sma', 'spanish': 'spa', 'sranan tongo': 'srn', 'standard moroccan tamazight': 'zgh', 'sukuma': 'suk', 'sumerian': 'sux', 'sundanese': 'sun', 'susu': 'sus', 'swahili': 'swa', 'swati': 'ssw', 'swedish': 'swe', 'swiss german': 'gsw', 'syriac': 'syr', 'tagalog': 'tgl', 'tahitian': 'tah', 'tai languages': 'tai', 'tajik': 'tgk', 'tamashek': 'tmh', 'tamil': 'tam', 'tatar': 'tat', 'telugu': 'tel', 'tereno': 'ter', 'tetum': 'tet', 'thai': 'tha', 'tibetan': 'tib', 'tigre': 'tig', 'tigrinya': 'tir', 'timne': 'tem', 'tiv': 'tiv', 'tlhingan-hol': 'tlh', 'tlingit': 'tli', 'tok pisin': 'tpi', 'tokelau': 'tkl', 'tonga (nyasa)': 'tog', 'tonga (tonga islands)': 'ton', 'tsimshian': 'tsi', 'tsonga': 'tso', 'tswana': 'tsn', 'tumbuka': 'tum', 'tupi languages': 'tup', 'turkish': 'tur', 'turkish, ottoman (1500-1928)': 'ota', 'turkmen': 'tuk', 'tuvalu': 'tvl', 'tuvinian': 'tyv', 'twi': 'twi', 'udmurt': 'udm', 'ugaritic': 'uga', 'uighur': 'uig', 'ukrainian': 'ukr', 'umbundu': 'umb', 'uncoded languages': 'mis', 'undetermined': 'und', 'upper sorbian': 'hsb', 'urdu': 'urd', 'uyghur': 'uig', 'uzbek': 'uzb', 'vai': 'vai', 'valencian': 'cat', 'venda': 'ven', 'vietnamese': 'vie', 'volapük': 'vol', 'votic': 'vot', 'wakashan languages': 'wak', 'walamo': 'wal', 'walloon': 'wln', 'waray': 'war', 'washo': 'was', 'welsh': 'wel', 'western frisian': 'fry', 'western pahari languages': 'him', 'wolof': 'wol', 'xhosa': 'xho', 'yakut': 'sah', 'yao': 'yao', 'yapese': 'yap', 'yiddish': 'yid', 'yoruba': 'yor', 'yupik languages': 'ypk', 'zande languages': 'znd', 'zapotec': 'zap', 'zaza': 'zza', 'zazaki': 'zza', 'zenaga': 'zen', 'zhuang': 'zha', 'zulu': 'zul', 'zuni': 'zun'}¶
- french_names = 'zaza; dimili; dimli; kirdki; kirmanjki; zazaki'¶
- i = {'code': 'sv', 'name': 'Swedish', 'nativeName': 'svenska'}¶
- classmethod iso_639_2_for_locale(locale)[source]¶
Turn a locale code into an ISO-639-2 alpha-3 language code.
- name = 'Zazaki'¶
- names = ['svenska']¶
- native_names = {'de': ['Deutsch'], 'el': ['Ελληνικά'], 'en': ['English'], 'eng': ['English'], 'es': ['español', 'castellano'], 'fr': ['français'], 'fre': ['français'], 'ger': ['Deutsch'], 'gre': ['Ελληνικά'], 'hu': ['Magyar'], 'hun': ['Magyar'], 'it': ['Italiano'], 'ita': ['Italiano'], 'no': ['Norsk'], 'nor': ['Norsk'], 'pl': ['polski'], 'pol': ['polski'], 'por': ['Português'], 'pt': ['Português'], 'ru': ['русский'], 'rus': ['русский'], 'spa': ['español', 'castellano'], 'sv': ['svenska'], 'swe': ['svenska']}¶
- classmethod string_to_alpha_3(s)[source]¶
Try really hard to convert a string to an ISO-639-2 alpha-3 language code.
- terminologic_code = ''¶
- three_to_two = {'aar': 'aa', 'abk': 'ab', 'afr': 'af', 'aka': 'ak', 'alb': 'sq', 'amh': 'am', 'ara': 'ar', 'arg': 'an', 'arm': 'hy', 'asm': 'as', 'ava': 'av', 'ave': 'ae', 'aym': 'ay', 'aze': 'az', 'bak': 'ba', 'bam': 'bm', 'baq': 'eu', 'bel': 'be', 'ben': 'bn', 'bih': 'bh', 'bis': 'bi', 'bos': 'bs', 'bre': 'br', 'bul': 'bg', 'bur': 'my', 'cat': 'ca', 'cha': 'ch', 'che': 'ce', 'chi': 'zh', 'chu': 'cu', 'chv': 'cv', 'cor': 'kw', 'cos': 'co', 'cre': 'cr', 'cze': 'cs', 'dan': 'da', 'div': 'dv', 'dut': 'nl', 'dzo': 'dz', 'eng': 'en', 'epo': 'eo', 'est': 'et', 'ewe': 'ee', 'fao': 'fo', 'fij': 'fj', 'fin': 'fi', 'fre': 'fr', 'fry': 'fy', 'ful': 'ff', 'geo': 'ka', 'ger': 'de', 'gla': 'gd', 'gle': 'ga', 'glg': 'gl', 'glv': 'gv', 'gre': 'el', 'grn': 'gn', 'guj': 'gu', 'hat': 'ht', 'hau': 'ha', 'heb': 'he', 'her': 'hz', 'hin': 'hi', 'hmo': 'ho', 'hrv': 'hr', 'hun': 'hu', 'ibo': 'ig', 'ice': 'is', 'ido': 'io', 'iii': 'ii', 'iku': 'iu', 'ile': 'ie', 'ina': 'ia', 'ind': 'id', 'ipk': 'ik', 'ita': 'it', 'jav': 'jv', 'jpn': 'ja', 'kal': 'kl', 'kan': 'kn', 'kas': 'ks', 'kau': 'kr', 'kaz': 'kk', 'khm': 'km', 'kik': 'ki', 'kin': 'rw', 'kir': 'ky', 'kom': 'kv', 'kon': 'kg', 'kor': 'ko', 'kua': 'kj', 'kur': 'ku', 'lao': 'lo', 'lat': 'la', 'lav': 'lv', 'lim': 'li', 'lin': 'ln', 'lit': 'lt', 'ltz': 'lb', 'lub': 'lu', 'lug': 'lg', 'mac': 'mk', 'mah': 'mh', 'mal': 'ml', 'mao': 'mi', 'mar': 'mr', 'may': 'ms', 'mlg': 'mg', 'mlt': 'mt', 'mon': 'mn', 'nau': 'na', 'nav': 'nv', 'nbl': 'nr', 'nde': 'nd', 'ndo': 'ng', 'nep': 'ne', 'nno': 'nn', 'nob': 'nb', 'nor': 'no', 'nya': 'ny', 'oci': 'oc', 'oji': 'oj', 'ori': 'or', 'orm': 'om', 'oss': 'os', 'pan': 'pa', 'per': 'fa', 'pli': 'pi', 'pol': 'pl', 'por': 'pt', 'pus': 'ps', 'que': 'qu', 'roh': 'rm', 'rum': 'ro', 'run': 'rn', 'rus': 'ru', 'sag': 'sg', 'san': 'sa', 'sin': 'si', 'slo': 'sk', 'slv': 'sl', 'sme': 'se', 'smo': 'sm', 'sna': 'sn', 'snd': 'sd', 'som': 'so', 'sot': 'st', 'spa': 'es', 'srd': 'sc', 'srp': 'sr', 'ssw': 'ss', 'sun': 'su', 'swa': 'sw', 'swe': 'sv', 'tah': 'ty', 'tam': 'ta', 'tat': 'tt', 'tel': 'te', 'tgk': 'tg', 'tgl': 'tl', 'tha': 'th', 'tib': 'bo', 'tir': 'ti', 'ton': 'to', 'tsn': 'tn', 'tso': 'ts', 'tuk': 'tk', 'tur': 'tr', 'twi': 'tw', 'uig': 'ug', 'ukr': 'uk', 'urd': 'ur', 'uzb': 'uz', 'ven': 've', 'vie': 'vi', 'vol': 'vo', 'wel': 'cy', 'wln': 'wa', 'wol': 'wo', 'xho': 'xh', 'yid': 'yi', 'yor': 'yo', 'zha': 'za', 'zul': 'zu'}¶
- two_to_three = {'aa': 'aar', 'ab': 'abk', 'ae': 'ave', 'af': 'afr', 'ak': 'aka', 'am': 'amh', 'an': 'arg', 'ar': 'ara', 'as': 'asm', 'av': 'ava', 'ay': 'aym', 'az': 'aze', 'ba': 'bak', 'be': 'bel', 'bg': 'bul', 'bh': 'bih', 'bi': 'bis', 'bm': 'bam', 'bn': 'ben', 'bo': 'tib', 'br': 'bre', 'bs': 'bos', 'ca': 'cat', 'ce': 'che', 'ch': 'cha', 'co': 'cos', 'cr': 'cre', 'cs': 'cze', 'cu': 'chu', 'cv': 'chv', 'cy': 'wel', 'da': 'dan', 'de': 'ger', 'dv': 'div', 'dz': 'dzo', 'ee': 'ewe', 'el': 'gre', 'en': 'eng', 'eo': 'epo', 'es': 'spa', 'et': 'est', 'eu': 'baq', 'fa': 'per', 'ff': 'ful', 'fi': 'fin', 'fj': 'fij', 'fo': 'fao', 'fr': 'fre', 'fy': 'fry', 'ga': 'gle', 'gd': 'gla', 'gl': 'glg', 'gn': 'grn', 'gu': 'guj', 'gv': 'glv', 'ha': 'hau', 'he': 'heb', 'hi': 'hin', 'ho': 'hmo', 'hr': 'hrv', 'ht': 'hat', 'hu': 'hun', 'hy': 'arm', 'hz': 'her', 'ia': 'ina', 'id': 'ind', 'ie': 'ile', 'ig': 'ibo', 'ii': 'iii', 'ik': 'ipk', 'io': 'ido', 'is': 'ice', 'it': 'ita', 'iu': 'iku', 'ja': 'jpn', 'jv': 'jav', 'ka': 'geo', 'kg': 'kon', 'ki': 'kik', 'kj': 'kua', 'kk': 'kaz', 'kl': 'kal', 'km': 'khm', 'kn': 'kan', 'ko': 'kor', 'kr': 'kau', 'ks': 'kas', 'ku': 'kur', 'kv': 'kom', 'kw': 'cor', 'ky': 'kir', 'la': 'lat', 'lb': 'ltz', 'lg': 'lug', 'li': 'lim', 'ln': 'lin', 'lo': 'lao', 'lt': 'lit', 'lu': 'lub', 'lv': 'lav', 'mg': 'mlg', 'mh': 'mah', 'mi': 'mao', 'mk': 'mac', 'ml': 'mal', 'mn': 'mon', 'mr': 'mar', 'ms': 'may', 'mt': 'mlt', 'my': 'bur', 'na': 'nau', 'nb': 'nob', 'nd': 'nde', 'ne': 'nep', 'ng': 'ndo', 'nl': 'dut', 'nn': 'nno', 'no': 'nor', 'nr': 'nbl', 'nv': 'nav', 'ny': 'nya', 'oc': 'oci', 'oj': 'oji', 'om': 'orm', 'or': 'ori', 'os': 'oss', 'pa': 'pan', 'pi': 'pli', 'pl': 'pol', 'ps': 'pus', 'pt': 'por', 'qu': 'que', 'rm': 'roh', 'rn': 'run', 'ro': 'rum', 'ru': 'rus', 'rw': 'kin', 'sa': 'san', 'sc': 'srd', 'sd': 'snd', 'se': 'sme', 'sg': 'sag', 'si': 'sin', 'sk': 'slo', 'sl': 'slv', 'sm': 'smo', 'sn': 'sna', 'so': 'som', 'sq': 'alb', 'sr': 'srp', 'ss': 'ssw', 'st': 'sot', 'su': 'sun', 'sv': 'swe', 'sw': 'swa', 'ta': 'tam', 'te': 'tel', 'tg': 'tgk', 'th': 'tha', 'ti': 'tir', 'tk': 'tuk', 'tl': 'tgl', 'tn': 'tsn', 'to': 'ton', 'tr': 'tur', 'ts': 'tso', 'tt': 'tat', 'tw': 'twi', 'ty': 'tah', 'ug': 'uig', 'uk': 'ukr', 'ur': 'urd', 'uz': 'uzb', 've': 'ven', 'vi': 'vie', 'vo': 'vol', 'wa': 'wln', 'wo': 'wol', 'xh': 'xho', 'yi': 'yid', 'yo': 'yor', 'za': 'zha', 'zh': 'chi', 'zu': 'zul'}¶
- class core.util.languages.LanguageNames[source]¶
Bases:
object
Utilities for converting between human-readable language names and codes.
LanguageNames.name_re is a regular expression that matches the English or native-language name of nearly any language known to LanguageCodes.
LanguageNames.name_to_codes is a dictionary mapping lowercase human-readable names to ISO-639-2 language codes.
- ignore = {'No linguistic content', 'Not applicable', 'Uncoded'}¶
- irrelevant_suffixes = [' languages']¶
- name_re = re.compile("(\\bafar\\b|\\babkhazian\\b|\\bachinese\\b|\\bacoli\\b|\\badangme\\b|\\badyghe\\b|\\badygei\\b|\\bafro-asiatic\\b|\\bafrihili\\b|\\bafrikaans\\b|\\bainu\\b|\\bakan\\b|\\bakkadian\\b|\\balbanian\\b|\\, re.IGNORECASE)¶
- name_to_codes = {'abkhazian': {'abk'}, 'achinese': {'ace'}, 'acoli': {'ach'}, 'adangme': {'ada'}, 'adygei': {'ady'}, 'adyghe': {'ady'}, 'afar': {'aar'}, 'afrihili': {'afh'}, 'afrikaans': {'afr'}, 'afro-asiatic': {'afa'}, 'ainu': {'ain'}, 'akan': {'aka'}, 'akkadian': {'akk'}, 'albanian': {'alb'}, 'alemannic': {'gsw'}, 'aleut': {'ale'}, 'algonquian': {'alg'}, 'alsatian': {'gsw'}, 'altaic': {'tut'}, 'amharic': {'amh'}, 'angika': {'anp'}, 'apache': {'apa'}, 'arabic': {'ara'}, 'aragonese': {'arg'}, 'arapaho': {'arp'}, 'arawak': {'arw'}, 'armenian': {'arm'}, 'aromanian': {'rup'}, 'artificial': {'art'}, 'arumanian': {'rup'}, 'assamese': {'asm'}, 'asturian': {'ast'}, 'asturleonese': {'ast'}, 'athapascan': {'ath'}, 'australian': {'aus'}, 'austronesian': {'map'}, 'avaric': {'ava'}, 'avestan': {'ave'}, 'awadhi': {'awa'}, 'aymara': {'aym'}, 'azerbaijani': {'aze'}, 'bable': {'ast'}, 'balinese': {'ban'}, 'baltic': {'bat'}, 'baluchi': {'bal'}, 'bambara': {'bam'}, 'bamileke': {'bai'}, 'banda': {'bad'}, 'bantu': {'bnt'}, 'basa': {'bas'}, 'bashkir': {'bak'}, 'basque': {'baq'}, 'batak': {'btk'}, 'bedawiyet': {'bej'}, 'beja': {'bej'}, 'belarusian': {'bel'}, 'bemba': {'bem'}, 'bengali': {'ben'}, 'berber': {'ber'}, 'bhojpuri': {'bho'}, 'bihari': {'bih'}, 'bikol': {'bik'}, 'bilin': {'byn'}, 'bini': {'bin'}, 'bislama': {'bis'}, 'blin': {'byn'}, 'bliss': {'zbl'}, 'blissymbolics': {'zbl'}, 'blissymbols': {'zbl'}, 'bokmål, norwegian': {'nob'}, 'bosnian': {'bos'}, 'braj': {'bra'}, 'breton': {'bre'}, 'buginese': {'bug'}, 'bulgarian': {'bul'}, 'buriat': {'bua'}, 'burmese': {'bur'}, 'caddo': {'cad'}, 'castellano': {'spa'}, 'castilian': {'spa'}, 'catalan': {'cat'}, 'caucasian': {'cau'}, 'cebuano': {'ceb'}, 'celtic': {'cel'}, 'central american indian': {'cai'}, 'central khmer': {'khm'}, 'chagatai': {'chg'}, 'chamic': {'cmc'}, 'chamorro': {'cha'}, 'chechen': {'che'}, 'cherokee': {'chr'}, 'chewa': {'nya'}, 'cheyenne': {'chy'}, 'chibcha': {'chb'}, 'chichewa': {'nya'}, 'chinese': {'chi'}, 'chinook jargon': {'chn'}, 'chipewyan': {'chp'}, 'choctaw': {'cho'}, 'chuang': {'zha'}, 'church slavic': {'chu'}, 'church slavonic': {'chu'}, 'chuukese': {'chk'}, 'chuvash': {'chv'}, 'classical nepal bhasa': {'nwc'}, 'classical newari': {'nwc'}, 'classical syriac': {'syc'}, 'cook islands maori': {'rar'}, 'coptic': {'cop'}, 'cornish': {'cor'}, 'corsican': {'cos'}, 'cree': {'cre'}, 'creek': {'mus'}, 'creoles and pidgins': {'crp'}, 'creoles and pidgins, english based': {'cpe'}, 'creoles and pidgins, french-based': {'cpf'}, 'creoles and pidgins, portuguese-based': {'cpp'}, 'crimean tatar': {'crh'}, 'crimean turkish': {'crh'}, 'croatian': {'hrv'}, 'cushitic': {'cus'}, 'czech': {'cze'}, 'dakota': {'dak'}, 'danish': {'dan'}, 'dargwa': {'dar'}, 'delaware': {'del'}, 'dene suline': {'chp'}, 'deutsch': {'ger'}, 'dhivehi': {'div'}, 'dimili': {'zza'}, 'dimli': {'zza'}, 'dinka': {'din'}, 'divehi': {'div'}, 'dogri': {'doi'}, 'dogrib': {'dgr'}, 'dravidian': {'dra'}, 'duala': {'dua'}, 'dutch': {'dut'}, 'dyula': {'dyu'}, 'dzongkha': {'dzo'}, 'eastern frisian': {'frs'}, 'edo': {'bin'}, 'efik': {'efi'}, 'egyptian': {'egy'}, 'ekajuk': {'eka'}, 'elamite': {'elx'}, 'english': {'eng'}, 'erzya': {'myv'}, 'espanol': {'spa'}, 'español, castellano': {'spa'}, 'esperanto': {'epo'}, 'estonian': {'est'}, 'ewe': {'ewe'}, 'ewondo': {'ewo'}, 'fang': {'fan'}, 'fanti': {'fat'}, 'faroese': {'fao'}, 'fijian': {'fij'}, 'filipino': {'fil'}, 'finnish': {'fin'}, 'finno-ugrian': {'fiu'}, 'flemish': {'dut'}, 'fon': {'fon'}, 'francais': {'fre'}, 'français': {'fre'}, 'french': {'fre'}, 'friulian': {'fur'}, 'fulah': {'ful'}, 'ga': {'gaa'}, 'gaelic': {'gla'}, 'galibi carib': {'car'}, 'galician': {'glg'}, 'ganda': {'lug'}, 'gayo': {'gay'}, 'gbaya': {'gba'}, 'geez': {'gez'}, 'georgian': {'geo'}, 'german': {'ger'}, 'german, low': {'nds'}, 'germanic': {'gem'}, 'gikuyu': {'kik'}, 'gilbertese': {'gil'}, 'gondi': {'gon'}, 'gorontalo': {'gor'}, 'gothic': {'got'}, 'grebo': {'grb'}, 'greek': {'gre'}, 'greenlandic': {'kal'}, 'guarani': {'grn'}, 'gujarati': {'guj'}, "gwich'in": {'gwi'}, 'haida': {'hai'}, 'haitian': {'hat'}, 'haitian creole': {'hat'}, 'hausa': {'hau'}, 'hawaiian': {'haw'}, 'hebrew': {'heb'}, 'herero': {'her'}, 'hiligaynon': {'hil'}, 'himachali': {'him'}, 'hindi': {'hin'}, 'hiri motu': {'hmo'}, 'hittite': {'hit'}, 'hmong': {'hmn'}, 'hungarian': {'hun'}, 'hupa': {'hup'}, 'iban': {'iba'}, 'icelandic': {'ice'}, 'ido': {'ido'}, 'igbo': {'ibo'}, 'ijo': {'ijo'}, 'iloko': {'ilo'}, 'inari sami': {'smn'}, 'indic': {'inc'}, 'indo-european': {'ine'}, 'indonesian': {'ind'}, 'ingush': {'inh'}, 'interlingua': {'ina'}, 'interlingue': {'ile'}, 'inuktitut': {'iku'}, 'inupiaq': {'ipk'}, 'iranian': {'ira'}, 'irish': {'gle'}, 'iroquoian': {'iro'}, 'italian': {'ita'}, 'italiano': {'ita'}, 'japanese': {'jpn'}, 'javanese': {'jav'}, 'jingpho': {'kac'}, 'judeo-arabic': {'jrb'}, 'judeo-persian': {'jpr'}, 'kabardian': {'kbd'}, 'kabyle': {'kab'}, 'kachin': {'kac'}, 'kalaallisut': {'kal'}, 'kalmyk': {'xal'}, 'kamba': {'kam'}, 'kannada': {'kan'}, 'kanuri': {'kau'}, 'kapampangan': {'pam'}, 'kara-kalpak': {'kaa'}, 'karachay-balkar': {'krc'}, 'karelian': {'krl'}, 'karen': {'kar'}, 'kashmiri': {'kas'}, 'kashubian': {'csb'}, 'kawi': {'kaw'}, 'kazakh': {'kaz'}, 'khasi': {'kha'}, 'khoisan': {'khi'}, 'khotanese': {'kho'}, 'kikuyu': {'kik'}, 'kimbundu': {'kmb'}, 'kinyarwanda': {'kin'}, 'kirdki': {'zza'}, 'kirghiz': {'kir'}, 'kirmanjki': {'zza'}, 'klingon': {'tlh'}, 'komi': {'kom'}, 'kongo': {'kon'}, 'konkani': {'kok'}, 'korean': {'kor'}, 'kosraean': {'kos'}, 'kpelle': {'kpe'}, 'kru': {'kro'}, 'kuanyama': {'kua'}, 'kumyk': {'kum'}, 'kurdish': {'kur'}, 'kurukh': {'kru'}, 'kutenai': {'kut'}, 'kwanyama': {'kua'}, 'kyrgyz': {'kir'}, 'ladino': {'lad'}, 'lahnda': {'lah'}, 'lamba': {'lam'}, 'land dayak': {'day'}, 'lao': {'lao'}, 'latin': {'lat'}, 'latvian': {'lav'}, 'leonese': {'ast'}, 'letzeburgesch': {'ltz'}, 'lezghian': {'lez'}, 'limburgan': {'lim'}, 'limburger': {'lim'}, 'limburgish': {'lim'}, 'lingala': {'lin'}, 'lithuanian': {'lit'}, 'lojban': {'jbo'}, 'low german': {'nds'}, 'low saxon': {'nds'}, 'lower sorbian': {'dsb'}, 'lozi': {'loz'}, 'luba-katanga': {'lub'}, 'luba-lulua': {'lua'}, 'luiseno': {'lui'}, 'lule sami': {'smj'}, 'lunda': {'lun'}, 'luo': {'luo'}, 'lushai': {'lus'}, 'luxembourgish': {'ltz'}, 'macedo-romanian': {'rup'}, 'macedonian': {'mac'}, 'madurese': {'mad'}, 'magahi': {'mag'}, 'magyar': {'hun'}, 'maithili': {'mai'}, 'makasar': {'mak'}, 'malagasy': {'mlg'}, 'malay': {'may'}, 'malayalam': {'mal'}, 'maldivian': {'div'}, 'maltese': {'mlt'}, 'manchu': {'mnc'}, 'mandar': {'mdr'}, 'mandingo': {'man'}, 'manipuri': {'mni'}, 'manobo': {'mno'}, 'manx': {'glv'}, 'maori': {'mao'}, 'mapuche': {'arn'}, 'mapudungun': {'arn'}, 'marathi': {'mar'}, 'mari': {'chm'}, 'marshallese': {'mah'}, 'marwari': {'mwr'}, 'masai': {'mas'}, 'mayan': {'myn'}, 'mende': {'men'}, "mi'kmaq": {'mic'}, 'micmac': {'mic'}, 'minangkabau': {'min'}, 'mirandese': {'mwl'}, 'mohawk': {'moh'}, 'moksha': {'mdf'}, 'moldavian': {'rum'}, 'moldovan': {'rum'}, 'mon-khmer': {'mkh'}, 'mong': {'hmn'}, 'mongo': {'lol'}, 'mongolian': {'mon'}, 'mossi': {'mos'}, 'multiple': {'mul'}, 'munda': {'mun'}, "n'ko": {'nqo'}, 'nahuatl': {'nah'}, 'nauru': {'nau'}, 'navaho': {'nav'}, 'navajo': {'nav'}, 'ndebele, north': {'nde'}, 'ndebele, south': {'nbl'}, 'ndonga': {'ndo'}, 'neapolitan': {'nap'}, 'nepal bhasa': {'new'}, 'nepali': {'nep'}, 'newari': {'new'}, 'nias': {'nia'}, 'niger-kordofanian': {'nic'}, 'nilo-saharan': {'ssa'}, 'niuean': {'niu'}, 'nogai': {'nog'}, 'norse, old': {'non'}, 'norsk': {'nor'}, 'north american indian': {'nai'}, 'north ndebele': {'nde'}, 'northern frisian': {'frr'}, 'northern sami': {'sme'}, 'northern sotho': {'nso'}, 'norwegian': {'nor'}, 'norwegian bokmål': {'nob'}, 'norwegian nynorsk': {'nno'}, 'nubian': {'nub'}, 'nuosu': {'iii'}, 'nyamwezi': {'nym'}, 'nyanja': {'nya'}, 'nyankole': {'nyn'}, 'nynorsk, norwegian': {'nno'}, 'nyoro': {'nyo'}, 'nzima': {'nzi'}, 'occidental': {'ile'}, 'occitan': {'oci'}, 'oirat': {'xal'}, 'ojibwa': {'oji'}, 'old bulgarian': {'chu'}, 'old church slavonic': {'chu'}, 'old newari': {'nwc'}, 'old slavonic': {'chu'}, 'oriya': {'ori'}, 'oromo': {'orm'}, 'osage': {'osa'}, 'ossetian': {'oss'}, 'ossetic': {'oss'}, 'otomian': {'oto'}, 'pahlavi': {'pal'}, 'palauan': {'pau'}, 'pali': {'pli'}, 'pampanga': {'pam'}, 'pangasinan': {'pag'}, 'panjabi': {'pan'}, 'papiamento': {'pap'}, 'papuan': {'paa'}, 'pashto': {'pus'}, 'pedi': {'nso'}, 'persian': {'per'}, 'philippine': {'phi'}, 'phoenician': {'phn'}, 'pilipino': {'fil'}, 'pohnpeian': {'pon'}, 'polish': {'pol'}, 'polski': {'pol'}, 'portugues': {'por'}, 'portuguese': {'por'}, 'português': {'por'}, 'prakrit': {'pra'}, 'provençal': {'oci'}, 'punjabi': {'pan'}, 'pushto': {'pus'}, 'quechua': {'que'}, 'rajasthani': {'raj'}, 'rapanui': {'rap'}, 'rarotongan': {'rar'}, 'reserved for local use': {'qaa-qtz'}, 'romance': {'roa'}, 'romanian': {'rum'}, 'romansh': {'roh'}, 'romany': {'rom'}, 'rundi': {'run'}, 'russian': {'rus'}, 'sakan': {'kho'}, 'salishan': {'sal'}, 'samaritan aramaic': {'sam'}, 'sami': {'smi'}, 'samoan': {'smo'}, 'sandawe': {'sad'}, 'sango': {'sag'}, 'sanskrit': {'san'}, 'santali': {'sat'}, 'sardinian': {'srd'}, 'sasak': {'sas'}, 'saxon, low': {'nds'}, 'scots': {'sco'}, 'scottish gaelic': {'gla'}, 'selkup': {'sel'}, 'semitic': {'sem'}, 'sepedi': {'nso'}, 'serbian': {'srp'}, 'serer': {'srr'}, 'shan': {'shn'}, 'shona': {'sna'}, 'sichuan yi': {'iii'}, 'sicilian': {'scn'}, 'sidamo': {'sid'}, 'sign languages': {'sgn'}, 'siksika': {'bla'}, 'sindhi': {'snd'}, 'sinhala': {'sin'}, 'sinhalese': {'sin'}, 'sino-tibetan': {'sit'}, 'siouan': {'sio'}, 'skolt sami': {'sms'}, 'slave': {'den'}, 'slavic': {'sla'}, 'slovak': {'slo'}, 'slovenian': {'slv'}, 'sogdian': {'sog'}, 'somali': {'som'}, 'songhai': {'son'}, 'soninke': {'snk'}, 'sorbian': {'wen'}, 'sotho, southern': {'sot'}, 'south american indian': {'sai'}, 'south ndebele': {'nbl'}, 'southern altai': {'alt'}, 'southern sami': {'sma'}, 'spanish': {'spa'}, 'sranan tongo': {'srn'}, 'standard moroccan tamazight': {'zgh'}, 'sukuma': {'suk'}, 'sumerian': {'sux'}, 'sundanese': {'sun'}, 'susu': {'sus'}, 'svenska': {'swe'}, 'swahili': {'swa'}, 'swati': {'ssw'}, 'swedish': {'swe'}, 'swiss german': {'gsw'}, 'syriac': {'syr'}, 'tagalog': {'tgl'}, 'tahitian': {'tah'}, 'tai': {'tai'}, 'tajik': {'tgk'}, 'tamashek': {'tmh'}, 'tamil': {'tam'}, 'tatar': {'tat'}, 'telugu': {'tel'}, 'tereno': {'ter'}, 'tetum': {'tet'}, 'thai': {'tha'}, 'tibetan': {'tib'}, 'tigre': {'tig'}, 'tigrinya': {'tir'}, 'timne': {'tem'}, 'tiv': {'tiv'}, 'tlhingan-hol': {'tlh'}, 'tlingit': {'tli'}, 'tok pisin': {'tpi'}, 'tokelau': {'tkl'}, 'tonga': {'tog', 'ton'}, 'tsimshian': {'tsi'}, 'tsonga': {'tso'}, 'tswana': {'tsn'}, 'tumbuka': {'tum'}, 'tupi': {'tup'}, 'turkish': {'tur'}, 'turkmen': {'tuk'}, 'tuvalu': {'tvl'}, 'tuvinian': {'tyv'}, 'twi': {'twi'}, 'udmurt': {'udm'}, 'ugaritic': {'uga'}, 'uighur': {'uig'}, 'ukrainian': {'ukr'}, 'umbundu': {'umb'}, 'uncoded': {'mis'}, 'undetermined': {'und'}, 'upper sorbian': {'hsb'}, 'urdu': {'urd'}, 'uyghur': {'uig'}, 'uzbek': {'uzb'}, 'vai': {'vai'}, 'valencian': {'cat'}, 'venda': {'ven'}, 'vietnamese': {'vie'}, 'volapük': {'vol'}, 'votic': {'vot'}, 'wakashan': {'wak'}, 'walamo': {'wal'}, 'walloon': {'wln'}, 'waray': {'war'}, 'washo': {'was'}, 'welsh': {'wel'}, 'western frisian': {'fry'}, 'western pahari': {'him'}, 'wolof': {'wol'}, 'xhosa': {'xho'}, 'yakut': {'sah'}, 'yao': {'yao'}, 'yapese': {'yap'}, 'yiddish': {'yid'}, 'yoruba': {'yor'}, 'yupik': {'ypk'}, 'zande': {'znd'}, 'zapotec': {'zap'}, 'zaza': {'zza'}, 'zazaki': {'zza'}, 'zenaga': {'zen'}, 'zhuang': {'zha'}, 'zulu': {'zul'}, 'zuni': {'zun'}, 'ελληνικά': {'gre'}, 'русский': {'rus'}}¶
- number = re.compile('[0-9]')¶
- parentheses = re.compile('\\([^)]+\\)')¶
core.util.median module¶
core.util.opds_writer module¶
- class core.util.opds_writer.AtomFeed(title, url, **kwargs)[source]¶
Bases:
object
- APP_NS = 'http://www.w3.org/2007/app'¶
- ATOM_LIKE_TYPES = ['application/atom+xml', 'application/xml']¶
- ATOM_NS = 'http://www.w3.org/2005/Atom'¶
- ATOM_TYPE = 'application/atom+xml'¶
- BIBFRAME_NS = 'http://bibframe.org/vocab/'¶
- BIB_SCHEMA_NS = 'http://bib.schema.org/'¶
- DCTERMS_NS = 'http://purl.org/dc/terms/'¶
- DRM_NS = 'http://librarysimplified.org/terms/drm'¶
- E¶
A helper object for creating etree elements.
- LCP_NS = 'http://readium.org/lcp-specs/ns'¶
- OPDS_NS = 'http://opds-spec.org/2010/catalog'¶
- OPENSEARCH_NS = 'http://a9.com/-/spec/opensearch/1.1/'¶
- OPF_NS = 'http://www.idpf.org/2007/opf'¶
- SCHEMA¶
A helper object for creating etree elements.
- SCHEMA_NS = 'http://schema.org/'¶
- SIMPLIFIED¶
A helper object for creating etree elements.
- SIMPLIFIED_NS = 'http://librarysimplified.org/terms/'¶
- TIME_FORMAT_NAIVE = '%Y-%m-%dT%H:%M:%SZ'¶
- TIME_FORMAT_UTC = '%Y-%m-%dT%H:%M:%S+00:00'¶
- default_typemap = {<module 'datetime' from '/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/datetime.py'>: <function AtomFeed.<lambda>>}¶
- nsmap = {None: 'http://www.w3.org/2005/Atom', 'app': 'http://www.w3.org/2007/app', 'dcterms': 'http://purl.org/dc/terms/', 'opds': 'http://opds-spec.org/2010/catalog', 'opf': 'http://www.idpf.org/2007/opf', 'drm': 'http://librarysimplified.org/terms/drm', 'schema': 'http://schema.org/', 'simplified': 'http://librarysimplified.org/terms/', 'bibframe': 'http://bibframe.org/vocab/', 'bib': 'http://bib.schema.org/', 'opensearch': 'http://a9.com/-/spec/opensearch/1.1/', 'lcp': 'http://readium.org/lcp-specs/ns'}¶
- class core.util.opds_writer.ElementMaker[source]¶
Bases:
ElementMaker
A helper object for creating etree elements.
- class core.util.opds_writer.OPDSFeed(title, url)[source]¶
Bases:
AtomFeed
- ACQUISITION_FEED_TYPE = 'application/atom+xml;profile=opds-catalog;kind=acquisition'¶
- ACQUISITION_REL = 'http://opds-spec.org/acquisition'¶
- BORROW_REL = 'http://opds-spec.org/acquisition/borrow'¶
- DEFAULT_MAX_AGE = 600¶
- ENTRY_TYPE = 'application/atom+xml;type=entry;profile=opds-catalog'¶
- EPUB_MEDIA_TYPE = 'application/epub+zip'¶
- FEATURED_REL = 'http://opds-spec.org/featured'¶
- FULL_IMAGE_REL = 'http://opds-spec.org/image'¶
- GROUP_REL = 'collection'¶
- NAVIGATION_FEED_TYPE = 'application/atom+xml;profile=opds-catalog;kind=navigation'¶
- NO_TITLE = 'http://librarysimplified.org/terms/problem/no-title'¶
- OPEN_ACCESS_REL = 'http://opds-spec.org/acquisition/open-access'¶
- POPULAR_REL = 'http://opds-spec.org/sort/popular'¶
- RECOMMENDED_REL = 'http://opds-spec.org/recommended'¶
- REVOKE_LOAN_REL = 'http://librarysimplified.org/terms/rel/revoke'¶
core.util.permanent_work_id module¶
- class core.util.permanent_work_id.WorkIDCalculator[source]¶
Bases:
object
- apostropheStrip = re.compile("'s")¶
- authorExtract1 = re.compile('^(.+?)\\spresents.*$')¶
- authorExtract2 = re.compile('^(?:(?:a|an)\\s)?(.+?)\\spresentation.*$')¶
- bracketedCharacterStrip = re.compile('\\[(.*?)\\]')¶
- commonAuthorPrefixPattern = re.compile('^(?:edited by|by the editors of|by|chosen by|translated by|prepared by|translated and edited by|completely rev by|pictures by|selected and adapted by|with a foreword by|with a new foreword by|introd )¶
- commonAuthorSuffixPattern = re.compile('^(.+?)\\s(?:general editor|editor|editor in chief|etc|inc|inc\\setc|co|corporation|llc|partners|company|home entertainment)$')¶
- commonSubtitlesPattern = re.compile('^(.*?)((a|una)\\s(.*)novel(a|la)?|a(.*)memoir|a(.*)mystery|a(.*)thriller|by\\s(.+)|a novel of .*|stories|an autobiography|a biography|a memoir in books|\\d+\\S*\\s*ed(ition)?|\\d+\\S*\\s*update|1st\\)¶
- consecutiveCharacterStrip = re.compile('\\s{2,}')¶
- distributedByRemoval = re.compile('^distributed (?:in.*\\s)?by\\s(.+)$')¶
- find = '10th'¶
- format_to_grouping_category = {'Adobe_EPUB_eBook': 'ebook', 'Adobe_PDF_eBook': 'ebook', 'Atlas': 'other', 'Blu-ray': 'movie', 'Book': 'book', 'Braille': 'book', 'CDROM': 'other', 'Chart': 'other', 'ChipCartridge': 'other', 'Collage': 'other', 'CompactDisc': 'audio', 'DVD': 'movie', 'DiscCartridge': 'other', 'Disney_Online_Book': 'ebook', 'Drawing': 'other', 'Electronic': 'other', 'Filmstrip': 'movie', 'FlashCard': 'other', 'FloppyDisk': 'other', 'Globe': 'other', 'Journal': 'book', 'Kindle_Book': 'ebook', 'Kit': 'other', 'LargePrint': 'book', 'Manuscript': 'book', 'Map': 'other', 'Microfilm': 'other', 'Microsoft_eBook': 'ebook', 'Mobipocket_eBook': 'ebook', 'MotionPicture': 'movie', 'MusicRecording': 'music', 'MusicalScore': 'book', 'Newspaper': 'book', 'Open_EPUB_eBook': 'ebook', 'Open_PDF_eBook': 'ebook', 'OverDrive_MP3_Audiobook': 'audio', 'OverDrive_Music': 'music', 'OverDrive_Read': 'ebook', 'OverDrive_Video': 'movie', 'OverDrive_WMA_Audiobook': 'audio', 'Painting': 'other', 'Palm': 'ebook', 'Phonograph': 'audio', 'Photo': 'other', 'Photonegative': 'other', 'PhysicalObject': 'other', 'Playaway': 'audio', 'Print': 'other', 'SeedPacket': 'other', 'SensorImage': 'other', 'Serial': 'book', 'Slide': 'other', 'Software': 'other', 'SoundCassette': 'audio', 'SoundDisc': 'audio', 'SoundRecording': 'audio', 'TapeCartridge': 'other', 'TapeCassette': 'other', 'TapeRecording': 'audio', 'TapeReel': 'other', 'Transparency': 'other', 'Unknown': 'other', 'VerticalFile': 'other', 'Video': 'movie', 'VideoCartridge': 'movie', 'VideoCassette': 'movie', 'VideoDisc': 'movie', 'VideoReel': 'movie', 'eBook': 'ebook', 'eContent': 'ebook', 'epub': 'ebook', 'externalLink': 'ebook', 'externalMP3': 'audio', 'external_eaudio': 'audio', 'external_ebook': 'ebook', 'external_emusic': 'music', 'external_evideo': 'movie', 'external_web': 'ebook', 'gif': 'other', 'gifs': 'other', 'interactiveBook': 'ebook', 'itunes': 'audio', 'jpg': 'other', 'kindle': 'ebook', 'mp3': 'audio', 'overdrive': 'ebook', 'pdf': 'ebook', 'plucker': 'ebook', 'text': 'ebook'}¶
- initialsFix = re.compile('(?<=[A-Z])\\.(?=(\\s|[A-Z]|$))')¶
- classmethod normalize_author(author)[source]¶
Converts to NFKD unicode. Strips bracket, special characters, dots out. Converts to single-space and strips trailing spaces. Strips movie studio language surrouding the possible author’s name. Lowercases.
Returns de-linted author’s name.
- classmethod normalize_title(full_title, num_non_filing_characters=0)[source]¶
Converts to NFKD unicode. Strips bracket, special characters. Splits into title and subtitle portions (normalizes subtitle). Lowercases.
- numerics = [(re.compile('1st'), 'first'), (re.compile('2nd'), 'second'), (re.compile('3rd'), 'third'), (re.compile('4th'), 'fourth'), (re.compile('5th'), 'fifth'), (re.compile('6th'), 'sixth'), (re.compile('7th'), 'seventh'), (re.compile('8th'), 'eighth'), (re.compile('9th'), 'ninth'), (re.compile('10th'), 'tenth')]¶
- replace = 'tenth'¶
- sortTrimmingPattern = re.compile('(?i)^(?:(?:a|an|the|el|la|"|\')\\s)(.*)$', re.IGNORECASE)¶
- specialCharacterStrip = re.compile('[^\\w\\d\\s]')¶
- subtitleIndicator = re.compile('[:;/=]')¶
core.util.personal_names module¶
- core.util.personal_names.contributor_name_match_ratio(name1, name2, normalize_names=True)[source]¶
Returns a number between 0 and 100, representing the percent match (Levenshtein Distance) between name1 and name2, after each has been normalized.
- core.util.personal_names.display_name_to_sort_name(display_name)[source]¶
Take the “First Name Last Name”-formatted display_name, and convert it to a “Last Name, First Name” format appropriate for searching and sorting by.
Checks first if the display_name fits what we know of corporate entity business names. If yes, uses the whole name without re-converting it.
Uses the HumanName library to try to parse the name into parts, and rearrange the parts into desired order and format.
- core.util.personal_names.is_corporate_name(display_name)[source]¶
Does this display name look like a corporate name?
- core.util.personal_names.name_tidy(name)[source]¶
Converts to NFKD unicode.
Strips excessive whitespace and trailing punctuation.
Normalizes PhD/MD suffixes.
- Does not perform any potentially name-altering business logic, such as
running HumanName parser or any other name part reorganization.
- Does not perform any cleaning that would later need to be reversed,
such as lowercasing.
- core.util.personal_names.normalize_contributor_name_for_matching(name)[source]¶
Used to standardize author names before matching them to each other to identify best results in VIAF author search feeds.
Split the name into title, first, middle, last name, suffix, nickname, and set the parts in that order. Remove spacing around abbreviated initials, so ‘George RR Martin’ matches ‘George R R Martin’ (treat two-letter words as initials).
Run WorkIDCalculator.normalize_author on the name, which will convert to NFKD unicode, de-lint special characters and spaces, and lowercase.
TODO: Consider: Further remove periods, commas, dashes, and all non-word characters. TODO: consider what to do for multiple authors, like an et al or brothers grimm
- core.util.personal_names.sort_name_to_display_name(sort_name)[source]¶
Take the “Last Name, First Name”-formatted sort_name, and convert it to a “First Name Last Name” format appropriate for displaying to patrons in a catalog listing.
While the code attempts to do the best it can, name recognition gets complicated really fast when there’s more than one plain-format first name and one plain-format last name. This code is meant to serve as first line of approximation. If we later on can find better human librarian-checked sort and display names in the Metadata Wrangler, we use those.
:param sort_name Doe, Jane :return display_name Jane Doe
core.util.problem_detail module¶
Simple helper library for generating problem detail documents.
As per http://datatracker.ietf.org/doc/draft-ietf-appsawg-http-problem/
- class core.util.problem_detail.ProblemDetail(uri, status_code=None, title=None, detail=None, instance=None, debug_message=None)[source]¶
Bases:
object
A common type of problem.
- JSON_MEDIA_TYPE = 'application/api-problem+json'¶
- detailed(detail, status_code=None, title=None, instance=None, debug_message=None)[source]¶
Create a ProblemDetail for a more specific occurance of an existing ProblemDetail.
The detailed error message will be shown to patrons.
- property response¶
Create a Flask-style response.
- with_debug(debug_message, detail=None, status_code=None, title=None, instance=None)[source]¶
Insert debugging information into a ProblemDetail.
The original ProblemDetail’s error message will be shown to patrons, but a more specific error message will be visible to those who inspect the problem document.
- exception core.util.problem_detail.ProblemError(problem_detail)[source]¶
Bases:
BaseError
Exception class allowing to raise and catch ProblemDetail objects.
- property problem_detail¶
Return the ProblemDetail object associated with this exception.
- Returns:
ProblemDetail object associated with this exception
- Return type:
core.util.stopwords module¶
Sets of stopwords.
core.util.string_helpers module¶
- class core.util.string_helpers.UnicodeAwareBase64(encoding)[source]¶
Bases:
object
Simulate the interface of the base64 module, but make it look as though base64-encoding and -decoding works on Unicode strings.
Behind the scenes, Unicode strings are encoded to a particular encoding, then base64-encoded or -decoded, then decoded from that encoding.
Since we get Unicode strings out of the database, this lets us base64-encode and -decode strings based on those strings, without worrying about encoding to bytes and then decoding.
- b64decode(s, *args, **kwargs)¶
- b64encode(s, *args, **kwargs)¶
- decodebytes(s, *args, **kwargs)¶
- encodebytes(s, *args, **kwargs)¶
- standard_b64decode(s, *args, **kwargs)¶
- standard_b64encode(s, *args, **kwargs)¶
- urlsafe_b64decode(s, *args, **kwargs)¶
- urlsafe_b64encode(s, *args, **kwargs)¶
core.util.summary module¶
- class core.util.summary.SummaryEvaluator(optimal_number_of_sentences=4, noun_phrases_to_consider=10, bad_phrases=None)[source]¶
Bases:
object
Evaluate summaries of a book to find a usable summary.
A usable summary will have good coverage of the popular noun phrases found across all summaries of the book, will have an approximate length of four sentences (this is customizable), and will not mention words that indicate it’s a summary of a specific edition of the book.
All else being equal, a shorter summary is better.
A summary is penalized for apparently not being in English.
- bad_res = {re.compile('This is'), re.compile('the [^ ]+ Collection'), re.compile('Includes')}¶
- default_bad_phrases = {'--container', '--original container', 'abridged', 'adaptation of', 'all rights reserved', 'complete novels', 'complete texts', 'condensed', 'contains', 'edition', 'excerpts', 'in one volume', 'look for', 'new edition', 'playaway', 'retelling', 'retelling of', 'selections', 'version', 'version of'}¶
- log = <Logger Summary Evaluator (WARNING)>¶
core.util.titles module¶
- core.util.titles.normalize_title_for_matching(title)[source]¶
Used to standardize book titles before matching them to each other to identify best results in VIAF author search feeds.
Run WorkIDCalculator.normalize_title on the name, which will convert to NFKD unicode, de-lint special characters, and lowercase.
- core.util.titles.title_match_ratio(title1, title2)[source]¶
Returns a number between 0 and 100, representing the percent match (Levenshtein Distance) between book title1 and book title2, after each has been normalized.
- core.util.titles.unfluff_title(title)[source]¶
Removes parts of the title that are deemed to be add-ons, like imprint information, inserted subtitles and corporate names. For example, in: Hello World, edited by Bob Bobbinson Hello World: The True and Amazing Adventures of Bob Hello World (Unabridged) (TODO: later add logic for something like Hello World, Harvard University, publisher) we want to return “Hello World”.
core.util.web_publication_manifest module¶
Helper classes for the Readium Web Publication Manifest format (https://github.com/readium/webpub-manifest) and its audiobook profile (https://github.com/HadrienGardeur/audiobook-manifest).
- class core.util.web_publication_manifest.AudiobookManifest(context=None, type=None)[source]¶
Bases:
Manifest
A Python object corresponding to a Readium Web Publication Manifest.
- DEFAULT_TYPE = 'http://bib.schema.org/Audiobook'¶
- class core.util.web_publication_manifest.JSONable[source]¶
Bases:
object
An object whose Unicode representation is a JSON dump of a dictionary.
- property as_dict¶
- class core.util.web_publication_manifest.Manifest(context=None, type=None)[source]¶
Bases:
JSONable
A Python object corresponding to a Readium Web Publication Manifest.
- AUDIOBOOK_TYPE = 'http://bib.schema.org/Audiobook'¶
- BOOK_TYPE = 'http://schema.org/Book'¶
- DEFAULT_CONTEXT = 'http://readium.org/webpub/default.jsonld'¶
- DEFAULT_TYPE = 'http://schema.org/Book'¶
- property as_dict¶
- property component_lists¶
- update_bibliographic_metadata(license_pool)[source]¶
Update this Manifest with basic bibliographic metadata taken from a LicensePool object.
Currently this assumes that there is no other source of bibliographic metadata, so it will overwrite any metadata that is already present and add a cover link even if the manifest already has one.
core.util.worker_pools module¶
- class core.util.worker_pools.DatabasePool(size, session_factory, worker_factory=None)[source]¶
Bases:
Pool
A pool of DatabaseWorker threads and a job queue to keep them busy.
- class core.util.worker_pools.DatabaseWorker(jobs, _db)[source]¶
Bases:
Worker
A worker Thread that performs jobs with a database session
- class core.util.worker_pools.Job[source]¶
Bases:
object
Abstract parent class for a bit o’ work that can be run in a Thread. For use with Worker.
- class core.util.worker_pools.Pool(size, worker_factory=None)[source]¶
Bases:
object
A pool of Worker threads and a job queue to keep them busy.
- log = <Logger core.util.worker_pools (WARNING)>¶
- property success_rate¶
- class core.util.worker_pools.Worker(jobs)[source]¶
Bases:
Thread
A Thread that performs jobs
- property log¶
- run()[source]¶
Method representing the thread’s activity.
You may override this method in a subclass. The standard run() method invokes the callable object passed to the object’s constructor as the target argument, if any, with sequential and keyword arguments taken from the args and kwargs arguments, respectively.
core.util.xmlparser module¶
Module contents¶
Miscellaneous utilities
- class core.util.MetadataSimilarity[source]¶
Bases:
object
Estimate how similar two bits of metadata are.
- SEPARATOR = re.compile('\\W')¶
- classmethod author_name_similarity(authors1, authors2)[source]¶
What percentage of the total number of authors in the two sets are present in both sets?
- classmethod author_similarity(authors1, authors2)[source]¶
What percentage of the total number of authors in the two sets are present in both sets?
- classmethod histogram(strings, stopwords=None)[source]¶
Create a histogram of word frequencies across the given list of strings.
- classmethod histogram_distance(strings_1, strings_2, stopwords=None)[source]¶
Calculate the histogram distance between two sets of strings.
The histogram distance is the sum of the word distance for every word that occurs in either histogram.
If a word appears in one histogram but not the other, its word distance is its frequency of appearance. If a word appears in both histograms, its word distance is the absolute value of the difference between that word’s frequency of appearance in histogram A, and its frequency of appearance in histogram B.
If the strings use the same words at exactly the same frequency, the difference will be 0. If the strings use completely different words, the difference will be 1.
- class core.util.TitleProcessor[source]¶
Bases:
object
- classmethod extract_subtitle(main_title, subtitled_title)[source]¶
Extracts a subtitle given a shorter and longer title version
- Returns:
subtitle or None
- title_stopwords = ['The ', 'A ', 'An ']¶
- core.util.fast_query_count(query)[source]¶
Counts the results of a query without using super-slow subquery
- core.util.first_or_default(collection, default=None)[source]¶
Return first element of the specified collection or the default value if the collection is empty.
- Parameters:
collection (Iterable) – Collection
default (Any) – Default value