core.util package

Subpackages

Submodules

core.util.accept_language module

A package to parse Accept-Language headers.

This is based on accept_language.py. Here is the original licensing information for accept_language.py:

Copyright [2017] [Chatbot Developers]

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at:

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

class core.util.accept_language.Lang(language, locale, quality)

Bases: tuple

language

Alias for field number 0

locale

Alias for field number 1

quality

Alias for field number 2

core.util.accept_language.parse_accept_language(accept_language_str, default_quality=None)[source]

Parse a RFC 2616 Accept-Language string. https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14

Parameters:

accept_language_str (str) – A string in RFC 2616 format.

Returns:

List of Lang namedtuples.

Return type:

list

Example:
>>> parse_accept_language('en-US,el;q=0.8')
[
    Lang(locale='en_US', language='en', quality=1.0),
    Lang(locale=None, language='el', quality=0.8),
]

core.util.authentication_for_opds module

class core.util.authentication_for_opds.AuthenticationForOPDSDocument(id=None, title=None, authentication_flows=[], links=[])[source]

Bases: object

A data structure that can become an Authentication For OPDS document.

MEDIA_TYPE = 'application/vnd.opds.authentication.v1.0+json'
to_dict(_db)[source]

Convert this data structure to a dictionary that becomes an Authentication For OPDS document when serialized to JSON.

Parameters:

_db – Database connection or other argument to pass into OPDSAuthenticationFlow.to_dict().

class core.util.authentication_for_opds.OPDSAuthenticationFlow[source]

Bases: object

An object that can be represented as an Authentication Flow in an Authentication For OPDS document.

FLOW_TYPE = None
authentication_flow_document(_db)[source]

Convert this object into a dictionary or a list of dictionaries that can be used in the authentication list of an AuthenticationFor OPDS document.

core.util.datetime_helpers module

core.util.datetime_helpers.datetime_utc(*args, **kwargs)[source]

Return a datetime object but with UTC information from pytz. :return: datetime object

core.util.datetime_helpers.from_timestamp(ts)[source]

Return a UTC datetime object from a timestamp.

Returns:

datetime object

core.util.datetime_helpers.strptime_utc(date_string, format)[source]

Parse a string that describes a time but includes no timezone, into a timezone-aware datetime object set to UTC.

Raises:

ValueError – If format expects timezone information to be present in date_string.

core.util.datetime_helpers.to_utc(dt)[source]

This converts a naive datetime object that represents UTC into an aware datetime object.

Returns:

datetime object, or None if dt was None.

core.util.datetime_helpers.utc_now()[source]

Get the current time in UTC.

Returns:

datetime object

core.util.epub module

class core.util.epub.EpubAccessor[source]

Bases: object

CONTAINER_FILE = 'META-INF/container.xml'
IDPF_NAMESPACE = 'http://www.idpf.org/2007/opf'
classmethod get_element_from_package(zip_file, package_document_path, element_tag)[source]

Pulls one or more elements from the package_document

classmethod get_elements_from_package(zip_file, package_document_path, element_tags)[source]

Pulls one or more elements from the package_document

classmethod open_epub(url, content=None)[source]

Cracks open an EPUB to expose its contents

Parameters:
  • url – A url representing the EPUB, only used for errors and in the absence of the content parameter

  • content – A string representing the compressed EPUB

Returns:

A tuple containing a ZipFile of the EPUB and the path to its package

core.util.flask_util module

Utilities for Flask applications.

class core.util.flask_util.OPDSEntryResponse(response=None, **kwargs)[source]

Bases: Response

A convenience specialization of Response for typical OPDS entries.

class core.util.flask_util.OPDSFeedResponse(response=None, status=None, headers=None, mimetype=None, content_type=None, direct_passthrough=False, max_age=None, private=None)[source]

Bases: Response

A convenience specialization of Response for typical OPDS feeds.

class core.util.flask_util.Response(response=None, status=None, headers=None, mimetype=None, content_type=None, direct_passthrough=False, max_age=0, private=None)[source]

Bases: Response

A Flask Response object with some conveniences added.

The conveniences:

  • It’s easy to calculate header values such as Cache-Control.

  • A response can be easily converted into a string for use in tests.

core.util.flask_util.problem(type, status, title, detail=None, instance=None, headers={})[source]

Create a Response that includes a Problem Detail Document.

core.util.flask_util.problem_raw(type, status, title, detail=None, instance=None, headers={})[source]

core.util.http module

exception core.util.http.BadResponseException(url_or_service, message, debug_message=None, status_code=None)[source]

Bases: RemoteIntegrationException

The request seemingly went okay, but we got a bad response.

BAD_STATUS_CODE_MESSAGE = 'Got status code %s from external server, cannot continue.'
classmethod bad_status_code(url, response)[source]

The response is bad because the status code is wrong.

detail = l'The server made a request to %(service)s, and got an unexpected or invalid response.'
document_debug_message(debug=True)[source]
classmethod from_response(url, message, response)[source]

Helper method to turn a requests Response object into a BadResponseException.

internal_message = 'Bad response from %s: %s'
title = l'Bad response'
class core.util.http.HTTP[source]

Bases: object

A helper for the requests module.

classmethod debuggable_get(url, **kwargs)[source]

Make a GET request that returns a detailed problem detail document on error.

classmethod debuggable_post(url, payload, **kwargs)[source]

Make a POST request that returns a detailed problem detail document on error.

classmethod debuggable_request(http_method, url, make_request_with=None, **kwargs)[source]

Make a request that returns a detailed problem detail document on error, rather than a generic “an integration error occured” message.

Parameters:
  • http_method – HTTP method to use when making the request.

  • url – Make the request to this URL.

  • make_request_with – A function that actually makes the HTTP request.

  • kwargs – Keyword arguments for the make_request_with function.

classmethod get_with_timeout(url, *args, **kwargs)[source]

Make a GET request with timeout handling.

classmethod post_with_timeout(url, payload, *args, **kwargs)[source]

Make a POST request with timeout handling.

classmethod process_debuggable_response(url, response, disallowed_response_codes=None, allowed_response_codes=None, expected_encoding='utf-8')[source]

If there was a problem with an integration request, return an appropriate ProblemDetail. Otherwise, return the response to the original request.

Parameters:
  • response – A Response object from the requests library.

  • expected_encoding – Typically we expect HTTP responses to be UTF-8 encoded, but for certain requests we can change the encoding type.

classmethod put_with_timeout(url, payload, *args, **kwargs)[source]

Make a PUT request with timeout handling.

classmethod request_with_timeout(http_method, url, *args, **kwargs)[source]

Call requests.request and turn a timeout into a RequestTimedOut exception.

classmethod series(status_code)[source]

Return the HTTP series for the given status code.

exception core.util.http.IntegrationException(message, debug_message=None)[source]

Bases: Exception

An exception that happens when the site’s connection to a third-party service is broken.

This may be because communication failed (RemoteIntegrationException), or because local configuration is missing or obviously wrong (CannotLoadConfiguration).

exception core.util.http.RemoteIntegrationException(url_or_service, message, debug_message=None)[source]

Bases: IntegrationException

An exception that happens when we try and fail to communicate with a third-party service over HTTP.

as_problem_detail_document(debug)[source]
detail = l'The server tried to access %(service)s but the third-party service experienced an error.'
document_debug_message(debug=True)[source]
document_detail(debug=True)[source]
internal_message = 'Error accessing %s: %s'
title = l'Failure contacting external service'
exception core.util.http.RequestNetworkException(url_or_service, message, debug_message=None)[source]

Bases: RemoteIntegrationException, RequestException

An exception from the requests module that can be represented as a problem detail document.

detail = l'The server experienced a network error while contacting %(service)s.'
internal_message = 'Network error contacting %s: %s'
title = l'Network failure contacting third-party service'
exception core.util.http.RequestTimedOut(url_or_service, message, debug_message=None)[source]

Bases: RequestNetworkException, Timeout

A timeout exception that can be represented as a problem detail document.

detail = l'The server made a request to %(service)s, and that request timed out.'
internal_message = 'Timeout accessing %s: %s'
title = l'Timeout'

core.util.languages module

Data and functions for dealing with language names and codes.

class core.util.languages.LanguageCodes[source]

Bases: object

Convert between ISO-639-2 and ISO-693-1 language codes.

The data file comes from http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt

NATIVE_NAMES_RAW_DATA = [{'code': 'en', 'name': 'English', 'nativeName': 'English'}, {'code': 'fr', 'name': 'French', 'nativeName': 'français'}, {'code': 'de', 'name': 'German', 'nativeName': 'Deutsch'}, {'code': 'el', 'name': 'Greek, Modern', 'nativeName': 'Ελληνικά'}, {'code': 'hu', 'name': 'Hungarian', 'nativeName': 'Magyar'}, {'code': 'it', 'name': 'Italian', 'nativeName': 'Italiano'}, {'code': 'no', 'name': 'Norwegian', 'nativeName': 'Norsk'}, {'code': 'pl', 'name': 'Polish', 'nativeName': 'polski'}, {'code': 'pt', 'name': 'Portuguese', 'nativeName': 'Português'}, {'code': 'ru', 'name': 'Russian', 'nativeName': 'русский'}, {'code': 'es', 'name': 'Spanish, Castilian', 'nativeName': 'español, castellano'}, {'code': 'sv', 'name': 'Swedish', 'nativeName': 'svenska'}]
RAW_DATA = "aar||aa|Afar|afar\nabk||ab|Abkhazian|abkhaze\nace|||Achinese|aceh\nach|||Acoli|acoli\nada|||Adangme|adangme\nady|||Adyghe; Adygei|adyghé\nafa|||Afro-Asiatic languages|afro-asiatiques, langues\nafh|||Afrihili|afrihili\nafr||af|Afrikaans|afrikaans\nain|||Ainu|aïnou\naka||ak|Akan|akan\nakk|||Akkadian|akkadien\nalb|sqi|sq|Albanian|albanais\nale|||Aleut|aléoute\nalg|||Algonquian languages|algonquines, langues\nalt|||Southern Altai|altai du Sud\namh||am|Amharic|amharique\nang|||English, Old (ca.450-1100)|anglo-saxon (ca.450-1100)\nanp|||Angika|angika\napa|||Apache languages|apaches, langues\nara||ar|Arabic|arabe\narc|||Official Aramaic (700-300 BCE); Imperial Aramaic (700-300 BCE)|araméen d'empire (700-300 BCE)\narg||an|Aragonese|aragonais\narm|hye|hy|Armenian|arménien\narn|||Mapudungun; Mapuche|mapudungun; mapuche; mapuce\narp|||Arapaho|arapaho\nart|||Artificial languages|artificielles, langues\narw|||Arawak|arawak\nasm||as|Assamese|assamais\nast|||Asturian; Bable; Leonese; Asturleonese|asturien; bable; léonais; asturoléonais\nath|||Athapascan languages|athapascanes, langues\naus|||Australian languages|australiennes, langues\nava||av|Avaric|avar\nave||ae|Avestan|avestique\nawa|||Awadhi|awadhi\naym||ay|Aymara|aymara\naze||az|Azerbaijani|azéri\nbad|||Banda languages|banda, langues\nbai|||Bamileke languages|bamiléké, langues\nbak||ba|Bashkir|bachkir\nbal|||Baluchi|baloutchi\nbam||bm|Bambara|bambara\nban|||Balinese|balinais\nbaq|eus|eu|Basque|basque\nbas|||Basa|basa\nbat|||Baltic languages|baltes, langues\nbej|||Beja; Bedawiyet|bedja\nbel||be|Belarusian|biélorusse\nbem|||Bemba|bemba\nben||bn|Bengali|bengali\nber|||Berber languages|berbères, langues\nbho|||Bhojpuri|bhojpuri\nbih||bh|Bihari languages|langues biharis\nbik|||Bikol|bikol\nbin|||Bini; Edo|bini; edo\nbis||bi|Bislama|bichlamar\nbla|||Siksika|blackfoot\nbnt|||Bantu (Other)|bantoues, autres langues\nbos||bs|Bosnian|bosniaque\nbra|||Braj|braj\nbre||br|Breton|breton\nbtk|||Batak languages|batak, langues\nbua|||Buriat|bouriate\nbug|||Buginese|bugi\nbul||bg|Bulgarian|bulgare\nbur|mya|my|Burmese|birman\nbyn|||Blin; Bilin|blin; bilen\ncad|||Caddo|caddo\ncai|||Central American Indian languages|amérindiennes de L'Amérique centrale, langues\ncar|||Galibi Carib|karib; galibi; carib\ncat||ca|Catalan; Valencian|catalan; valencien\ncau|||Caucasian languages|caucasiennes, langues\nceb|||Cebuano|cebuano\ncel|||Celtic languages|celtiques, langues; celtes, langues\ncha||ch|Chamorro|chamorro\nchb|||Chibcha|chibcha\nche||ce|Chechen|tchétchène\nchg|||Chagatai|djaghataï\nchi|zho|zh|Chinese|chinois\nchk|||Chuukese|chuuk\nchm|||Mari|mari\nchn|||Chinook jargon|chinook, jargon\ncho|||Choctaw|choctaw\nchp|||Chipewyan; Dene Suline|chipewyan\nchr|||Cherokee|cherokee\nchu||cu|Church Slavic; Old Slavonic; Church Slavonic; Old Bulgarian; Old Church Slavonic|slavon d'église; vieux slave; slavon liturgique; vieux bulgare\nchv||cv|Chuvash|tchouvache\nchy|||Cheyenne|cheyenne\ncmc|||Chamic languages|chames, langues\ncop|||Coptic|copte\ncor||kw|Cornish|cornique\ncos||co|Corsican|corse\ncpe|||Creoles and pidgins, English based|créoles et pidgins basés sur l'anglais\ncpf|||Creoles and pidgins, French-based |créoles et pidgins basés sur le français\ncpp|||Creoles and pidgins, Portuguese-based |créoles et pidgins basés sur le portugais\ncre||cr|Cree|cree\ncrh|||Crimean Tatar; Crimean Turkish|tatar de Crimé\ncrp|||Creoles and pidgins |créoles et pidgins\ncsb|||Kashubian|kachoube\ncus|||Cushitic languages|couchitiques, langues\ncze|ces|cs|Czech|tchèque\ndak|||Dakota|dakota\ndan||da|Danish|danois\ndar|||Dargwa|dargwa\nday|||Land Dayak languages|dayak, langues\ndel|||Delaware|delaware\nden|||Slave (Athapascan)|esclave (athapascan)\ndgr|||Dogrib|dogrib\ndin|||Dinka|dinka\ndiv||dv|Divehi; Dhivehi; Maldivian|maldivien\ndoi|||Dogri|dogri\ndra|||Dravidian languages|dravidiennes, langues\ndsb|||Lower Sorbian|bas-sorabe\ndua|||Duala|douala\ndum|||Dutch, Middle (ca.1050-1350)|néerlandais moyen (ca. 1050-1350)\ndut|nld|nl|Dutch; Flemish|néerlandais; flamand\ndyu|||Dyula|dioula\ndzo||dz|Dzongkha|dzongkha\nefi|||Efik|efik\negy|||Egyptian (Ancient)|égyptien\neka|||Ekajuk|ekajuk\nelx|||Elamite|élamite\neng||en|English|anglais\nenm|||English, Middle (1100-1500)|anglais moyen (1100-1500)\nepo||eo|Esperanto|espéranto\nest||et|Estonian|estonien\newe||ee|Ewe|éwé\newo|||Ewondo|éwondo\nfan|||Fang|fang\nfao||fo|Faroese|féroïen\nfat|||Fanti|fanti\nfij||fj|Fijian|fidjien\nfil|||Filipino; Pilipino|filipino; pilipino\nfin||fi|Finnish|finnois\nfiu|||Finno-Ugrian languages|finno-ougriennes, langues\nfon|||Fon|fon\nfre|fra|fr|French|français\nfrm|||French, Middle (ca.1400-1600)|français moyen (1400-1600)\nfro|||French, Old (842-ca.1400)|français ancien (842-ca.1400)\nfrr|||Northern Frisian|frison septentrional\nfrs|||Eastern Frisian|frison oriental\nfry||fy|Western Frisian|frison occidental\nful||ff|Fulah|peul\nfur|||Friulian|frioulan\ngaa|||Ga|ga\ngay|||Gayo|gayo\ngba|||Gbaya|gbaya\ngem|||Germanic languages|germaniques, langues\ngeo|kat|ka|Georgian|géorgien\nger|deu|de|German|allemand\ngez|||Geez|guèze\ngil|||Gilbertese|kiribati\ngla||gd|Gaelic; Scottish Gaelic|gaélique; gaélique écossais\ngle||ga|Irish|irlandais\nglg||gl|Galician|galicien\nglv||gv|Manx|manx; mannois\ngmh|||German, Middle High (ca.1050-1500)|allemand, moyen haut (ca. 1050-1500)\ngoh|||German, Old High (ca.750-1050)|allemand, vieux haut (ca. 750-1050)\ngon|||Gondi|gond\ngor|||Gorontalo|gorontalo\ngot|||Gothic|gothique\ngrb|||Grebo|grebo\ngrc|||Greek, Ancient (to 1453)|grec ancien (jusqu'à 1453)\ngre|ell|el|Greek, Modern (1453-)|grec moderne (après 1453)\ngrn||gn|Guarani|guarani\ngsw|||Swiss German; Alemannic; Alsatian|suisse alémanique; alémanique; alsacien\nguj||gu|Gujarati|goudjrati\ngwi|||Gwich'in|gwich'in\nhai|||Haida|haida\nhat||ht|Haitian; Haitian Creole|haïtien; créole haïtien\nhau||ha|Hausa|haoussa\nhaw|||Hawaiian|hawaïen\nheb||he|Hebrew|hébreu\nher||hz|Herero|herero\nhil|||Hiligaynon|hiligaynon\nhim|||Himachali languages; Western Pahari languages|langues himachalis; langues paharis occidentales\nhin||hi|Hindi|hindi\nhit|||Hittite|hittite\nhmn|||Hmong; Mong|hmong\nhmo||ho|Hiri Motu|hiri motu\nhrv||hr|Croatian|croate\nhsb|||Upper Sorbian|haut-sorabe\nhun||hu|Hungarian|hongrois\nhup|||Hupa|hupa\niba|||Iban|iban\nibo||ig|Igbo|igbo\nice|isl|is|Icelandic|islandais\nido||io|Ido|ido\niii||ii|Sichuan Yi; Nuosu|yi de Sichuan\nijo|||Ijo languages|ijo, langues\niku||iu|Inuktitut|inuktitut\nile||ie|Interlingue; Occidental|interlingue\nilo|||Iloko|ilocano\nina||ia|Interlingua (International Auxiliary Language Association)|interlingua (langue auxiliaire internationale)\ninc|||Indic languages|indo-aryennes, langues\nind||id|Indonesian|indonésien\nine|||Indo-European languages|indo-européennes, langues\ninh|||Ingush|ingouche\nipk||ik|Inupiaq|inupiaq\nira|||Iranian languages|iraniennes, langues\niro|||Iroquoian languages|iroquoises, langues\nita||it|Italian|italien\njav||jv|Javanese|javanais\njbo|||Lojban|lojban\njpn||ja|Japanese|japonais\njpr|||Judeo-Persian|judéo-persan\njrb|||Judeo-Arabic|judéo-arabe\nkaa|||Kara-Kalpak|karakalpak\nkab|||Kabyle|kabyle\nkac|||Kachin; Jingpho|kachin; jingpho\nkal||kl|Kalaallisut; Greenlandic|groenlandais\nkam|||Kamba|kamba\nkan||kn|Kannada|kannada\nkar|||Karen languages|karen, langues\nkas||ks|Kashmiri|kashmiri\nkau||kr|Kanuri|kanouri\nkaw|||Kawi|kawi\nkaz||kk|Kazakh|kazakh\nkbd|||Kabardian|kabardien\nkha|||Khasi|khasi\nkhi|||Khoisan languages|khoïsan, langues\nkhm||km|Central Khmer|khmer central\nkho|||Khotanese; Sakan|khotanais; sakan\nkik||ki|Kikuyu; Gikuyu|kikuyu\nkin||rw|Kinyarwanda|rwanda\nkir||ky|Kirghiz; Kyrgyz|kirghiz\nkmb|||Kimbundu|kimbundu\nkok|||Konkani|konkani\nkom||kv|Komi|kom\nkon||kg|Kongo|kongo\nkor||ko|Korean|coréen\nkos|||Kosraean|kosrae\nkpe|||Kpelle|kpellé\nkrc|||Karachay-Balkar|karatchai balkar\nkrl|||Karelian|carélien\nkro|||Kru languages|krou, langues\nkru|||Kurukh|kurukh\nkua||kj|Kuanyama; Kwanyama|kuanyama; kwanyama\nkum|||Kumyk|koumyk\nkur||ku|Kurdish|kurde\nkut|||Kutenai|kutenai\nlad|||Ladino|judéo-espagnol\nlah|||Lahnda|lahnda\nlam|||Lamba|lamba\nlao||lo|Lao|lao\nlat||la|Latin|latin\nlav||lv|Latvian|letton\nlez|||Lezghian|lezghien\nlim||li|Limburgan; Limburger; Limburgish|limbourgeois\nlin||ln|Lingala|lingala\nlit||lt|Lithuanian|lituanien\nlol|||Mongo|mongo\nloz|||Lozi|lozi\nltz||lb|Luxembourgish; Letzeburgesch|luxembourgeois\nlua|||Luba-Lulua|luba-lulua\nlub||lu|Luba-Katanga|luba-katanga\nlug||lg|Ganda|ganda\nlui|||Luiseno|luiseno\nlun|||Lunda|lunda\nluo|||Luo (Kenya and Tanzania)|luo (Kenya et Tanzanie)\nlus|||Lushai|lushai\nmac|mkd|mk|Macedonian|macédonien\nmad|||Madurese|madourais\nmag|||Magahi|magahi\nmah||mh|Marshallese|marshall\nmai|||Maithili|maithili\nmak|||Makasar|makassar\nmal||ml|Malayalam|malayalam\nman|||Mandingo|mandingue\nmao|mri|mi|Maori|maori\nmap|||Austronesian languages|austronésiennes, langues\nmar||mr|Marathi|marathe\nmas|||Masai|massaï\nmay|msa|ms|Malay|malais\nmdf|||Moksha|moksa\nmdr|||Mandar|mandar\nmen|||Mende|mendé\nmga|||Irish, Middle (900-1200)|irlandais moyen (900-1200)\nmic|||Mi'kmaq; Micmac|mi'kmaq; micmac\nmin|||Minangkabau|minangkabau\nmis|||Uncoded languages|langues non codées\nmkh|||Mon-Khmer languages|môn-khmer, langues\nmlg||mg|Malagasy|malgache\nmlt||mt|Maltese|maltais\nmnc|||Manchu|mandchou\nmni|||Manipuri|manipuri\nmno|||Manobo languages|manobo, langues\nmoh|||Mohawk|mohawk\nmon||mn|Mongolian|mongol\nmos|||Mossi|moré\nmul|||Multiple languages|multilingue\nmun|||Munda languages|mounda, langues\nmus|||Creek|muskogee\nmwl|||Mirandese|mirandais\nmwr|||Marwari|marvari\nmyn|||Mayan languages|maya, langues\nmyv|||Erzya|erza\nnah|||Nahuatl languages|nahuatl, langues\nnai|||North American Indian languages|nord-amérindiennes, langues\nnap|||Neapolitan|napolitain\nnau||na|Nauru|nauruan\nnav||nv|Navajo; Navaho|navaho\nnbl||nr|Ndebele, South; South Ndebele|ndébélé du Sud\nnde||nd|Ndebele, North; North Ndebele|ndébélé du Nord\nndo||ng|Ndonga|ndonga\nnds|||Low German; Low Saxon; German, Low; Saxon, Low|bas allemand; bas saxon; allemand, bas; saxon, bas\nnep||ne|Nepali|népalais\nnew|||Nepal Bhasa; Newari|nepal bhasa; newari\nnia|||Nias|nias\nnic|||Niger-Kordofanian languages|nigéro-kordofaniennes, langues\nniu|||Niuean|niué\nnno||nn|Norwegian Nynorsk; Nynorsk, Norwegian|norvégien nynorsk; nynorsk, norvégien\nnob||nb|Bokmål, Norwegian; Norwegian Bokmål|norvégien bokmål\nnog|||Nogai|nogaï; nogay\nnon|||Norse, Old|norrois, vieux\nnor||no|Norwegian|norvégien\nnqo|||N'Ko|n'ko\nnso|||Pedi; Sepedi; Northern Sotho|pedi; sepedi; sotho du Nord\nnub|||Nubian languages|nubiennes, langues\nnwc|||Classical Newari; Old Newari; Classical Nepal Bhasa|newari classique\nnya||ny|Chichewa; Chewa; Nyanja|chichewa; chewa; nyanja\nnym|||Nyamwezi|nyamwezi\nnyn|||Nyankole|nyankolé\nnyo|||Nyoro|nyoro\nnzi|||Nzima|nzema\noci||oc|Occitan (post 1500); Provençal|occitan (après 1500); provençal\noji||oj|Ojibwa|ojibwa\nori||or|Oriya|oriya\norm||om|Oromo|galla\nosa|||Osage|osage\noss||os|Ossetian; Ossetic|ossète\nota|||Turkish, Ottoman (1500-1928)|turc ottoman (1500-1928)\noto|||Otomian languages|otomi, langues\npaa|||Papuan languages|papoues, langues\npag|||Pangasinan|pangasinan\npal|||Pahlavi|pahlavi\npam|||Pampanga; Kapampangan|pampangan\npan||pa|Panjabi; Punjabi|pendjabi\npap|||Papiamento|papiamento\npau|||Palauan|palau\npeo|||Persian, Old (ca.600-400 B.C.)|perse, vieux (ca. 600-400 av. J.-C.)\nper|fas|fa|Persian|persan\nphi|||Philippine languages|philippines, langues\nphn|||Phoenician|phénicien\npli||pi|Pali|pali\npol||pl|Polish|polonais\npon|||Pohnpeian|pohnpei\npor||pt|Portuguese|portugais\npra|||Prakrit languages|prâkrit, langues\npro|||Provençal, Old (to 1500)|provençal ancien (jusqu'à 1500)\npus||ps|Pushto; Pashto|pachto\nqaa-qtz|||Reserved for local use|réservée à l'usage local\nque||qu|Quechua|quechua\nraj|||Rajasthani|rajasthani\nrap|||Rapanui|rapanui\nrar|||Rarotongan; Cook Islands Maori|rarotonga; maori des îles Cook\nroa|||Romance languages|romanes, langues\nroh||rm|Romansh|romanche\nrom|||Romany|tsigane\nrum|ron|ro|Romanian; Moldavian; Moldovan|roumain; moldave\nrun||rn|Rundi|rundi\nrup|||Aromanian; Arumanian; Macedo-Romanian|aroumain; macédo-roumain\nrus||ru|Russian|russe\nsad|||Sandawe|sandawe\nsag||sg|Sango|sango\nsah|||Yakut|iakoute\nsai|||South American Indian (Other)|indiennes d'Amérique du Sud, autres langues\nsal|||Salishan languages|salishennes, langues\nsam|||Samaritan Aramaic|samaritain\nsan||sa|Sanskrit|sanskrit\nsas|||Sasak|sasak\nsat|||Santali|santal\nscn|||Sicilian|sicilien\nsco|||Scots|écossais\nsel|||Selkup|selkoupe\nsem|||Semitic languages|sémitiques, langues\nsga|||Irish, Old (to 900)|irlandais ancien (jusqu'à 900)\nsgn|||Sign Languages|langues des signes\nshn|||Shan|chan\nsid|||Sidamo|sidamo\nsin||si|Sinhala; Sinhalese|singhalais\nsio|||Siouan languages|sioux, langues\nsit|||Sino-Tibetan languages|sino-tibétaines, langues\nsla|||Slavic languages|slaves, langues\nslo|slk|sk|Slovak|slovaque\nslv||sl|Slovenian|slovène\nsma|||Southern Sami|sami du Sud\nsme||se|Northern Sami|sami du Nord\nsmi|||Sami languages|sames, langues\nsmj|||Lule Sami|sami de Lule\nsmn|||Inari Sami|sami d'Inari\nsmo||sm|Samoan|samoan\nsms|||Skolt Sami|sami skolt\nsna||sn|Shona|shona\nsnd||sd|Sindhi|sindhi\nsnk|||Soninke|soninké\nsog|||Sogdian|sogdien\nsom||so|Somali|somali\nson|||Songhai languages|songhai, langues\nsot||st|Sotho, Southern|sotho du Sud\nspa||es|Spanish; Castilian|espagnol; castillan\nsrd||sc|Sardinian|sarde\nsrn|||Sranan Tongo|sranan tongo\nsrp||sr|Serbian|serbe\nsrr|||Serer|sérère\nssa|||Nilo-Saharan languages|nilo-sahariennes, langues\nssw||ss|Swati|swati\nsuk|||Sukuma|sukuma\nsun||su|Sundanese|soundanais\nsus|||Susu|soussou\nsux|||Sumerian|sumérien\nswa||sw|Swahili|swahili\nswe||sv|Swedish|suédois\nsyc|||Classical Syriac|syriaque classique\nsyr|||Syriac|syriaque\ntah||ty|Tahitian|tahitien\ntai|||Tai languages|tai, langues\ntam||ta|Tamil|tamoul\ntat||tt|Tatar|tatar\ntel||te|Telugu|télougou\ntem|||Timne|temne\nter|||Tereno|tereno\ntet|||Tetum|tetum\ntgk||tg|Tajik|tadjik\ntgl||tl|Tagalog|tagalog\ntha||th|Thai|thaï\ntib|bod|bo|Tibetan|tibétain\ntig|||Tigre|tigré\ntir||ti|Tigrinya|tigrigna\ntiv|||Tiv|tiv\ntkl|||Tokelau|tokelau\ntlh|||Klingon; tlhIngan-Hol|klingon\ntli|||Tlingit|tlingit\ntmh|||Tamashek|tamacheq\ntog|||Tonga (Nyasa)|tonga (Nyasa)\nton||to|Tonga (Tonga Islands)|tongan (Îles Tonga)\ntpi|||Tok Pisin|tok pisin\ntsi|||Tsimshian|tsimshian\ntsn||tn|Tswana|tswana\ntso||ts|Tsonga|tsonga\ntuk||tk|Turkmen|turkmène\ntum|||Tumbuka|tumbuka\ntup|||Tupi languages|tupi, langues\ntur||tr|Turkish|turc\ntut|||Altaic languages|altaïques, langues\ntvl|||Tuvalu|tuvalu\ntwi||tw|Twi|twi\ntyv|||Tuvinian|touva\nudm|||Udmurt|oudmourte\nuga|||Ugaritic|ougaritique\nuig||ug|Uighur; Uyghur|ouïgour\nukr||uk|Ukrainian|ukrainien\numb|||Umbundu|umbundu\nund|||Undetermined|indéterminée\nurd||ur|Urdu|ourdou\nuzb||uz|Uzbek|ouszbek\nvai|||Vai|vaï\nven||ve|Venda|venda\nvie||vi|Vietnamese|vietnamien\nvol||vo|Volapük|volapük\nvot|||Votic|vote\nwak|||Wakashan languages|wakashanes, langues\nwal|||Walamo|walamo\nwar|||Waray|waray\nwas|||Washo|washo\nwel|cym|cy|Welsh|gallois\nwen|||Sorbian languages|sorabes, langues\nwln||wa|Walloon|wallon\nwol||wo|Wolof|wolof\nxal|||Kalmyk; Oirat|kalmouk; oïrat\nxho||xh|Xhosa|xhosa\nyao|||Yao|yao\nyap|||Yapese|yapois\nyid||yi|Yiddish|yiddish\nyor||yo|Yoruba|yoruba\nypk|||Yupik languages|yupik, langues\nzap|||Zapotec|zapotèque\nzbl|||Blissymbols; Blissymbolics; Bliss|symboles Bliss; Bliss\nzen|||Zenaga|zenaga\nzgh|||Standard Moroccan Tamazight|amazighe standard marocain\nzha||za|Zhuang; Chuang|zhuang; chuang\nznd|||Zande languages|zandé, langues\nzul||zu|Zulu|zoulou\nzun|||Zuni|zuni\nzxx|||No linguistic content; Not applicable|pas de contenu linguistique; non applicable\nzza|||Zaza; Dimili; Dimli; Kirdki; Kirmanjki; Zazaki|zaza; dimili; dimli; kirdki; kirmanjki; zazaki"
alpha_2 = 'sv'
alpha_3 = 'swe'
english_names = {'aa': ['Afar'], 'aar': ['Afar'], 'ab': ['Abkhazian'], 'abk': ['Abkhazian'], 'ace': ['Achinese'], 'ach': ['Acoli'], 'ada': ['Adangme'], 'ady': ['Adyghe', 'Adygei'], 'ae': ['Avestan'], 'af': ['Afrikaans'], 'afa': ['Afro-Asiatic languages'], 'afh': ['Afrihili'], 'afr': ['Afrikaans'], 'ain': ['Ainu'], 'ak': ['Akan'], 'aka': ['Akan'], 'akk': ['Akkadian'], 'alb': ['Albanian'], 'ale': ['Aleut'], 'alg': ['Algonquian languages'], 'alt': ['Southern Altai'], 'am': ['Amharic'], 'amh': ['Amharic'], 'an': ['Aragonese'], 'ang': ['English, Old (ca.450-1100)'], 'anp': ['Angika'], 'apa': ['Apache languages'], 'ar': ['Arabic'], 'ara': ['Arabic'], 'arc': ['Official Aramaic (700-300 BCE)', 'Imperial Aramaic (700-300 BCE)'], 'arg': ['Aragonese'], 'arm': ['Armenian'], 'arn': ['Mapudungun', 'Mapuche'], 'arp': ['Arapaho'], 'art': ['Artificial languages'], 'arw': ['Arawak'], 'as': ['Assamese'], 'asm': ['Assamese'], 'ast': ['Asturian', 'Bable', 'Leonese', 'Asturleonese'], 'ath': ['Athapascan languages'], 'aus': ['Australian languages'], 'av': ['Avaric'], 'ava': ['Avaric'], 'ave': ['Avestan'], 'awa': ['Awadhi'], 'ay': ['Aymara'], 'aym': ['Aymara'], 'az': ['Azerbaijani'], 'aze': ['Azerbaijani'], 'ba': ['Bashkir'], 'bad': ['Banda languages'], 'bai': ['Bamileke languages'], 'bak': ['Bashkir'], 'bal': ['Baluchi'], 'bam': ['Bambara'], 'ban': ['Balinese'], 'baq': ['Basque'], 'bas': ['Basa'], 'bat': ['Baltic languages'], 'be': ['Belarusian'], 'bej': ['Beja', 'Bedawiyet'], 'bel': ['Belarusian'], 'bem': ['Bemba'], 'ben': ['Bengali'], 'ber': ['Berber languages'], 'bg': ['Bulgarian'], 'bh': ['Bihari languages'], 'bho': ['Bhojpuri'], 'bi': ['Bislama'], 'bih': ['Bihari languages'], 'bik': ['Bikol'], 'bin': ['Bini', 'Edo'], 'bis': ['Bislama'], 'bla': ['Siksika'], 'bm': ['Bambara'], 'bn': ['Bengali'], 'bnt': ['Bantu (Other)'], 'bo': ['Tibetan'], 'bos': ['Bosnian'], 'br': ['Breton'], 'bra': ['Braj'], 'bre': ['Breton'], 'bs': ['Bosnian'], 'btk': ['Batak languages'], 'bua': ['Buriat'], 'bug': ['Buginese'], 'bul': ['Bulgarian'], 'bur': ['Burmese'], 'byn': ['Blin', 'Bilin'], 'ca': ['Catalan', 'Valencian'], 'cad': ['Caddo'], 'cai': ['Central American Indian languages'], 'car': ['Galibi Carib'], 'cat': ['Catalan', 'Valencian'], 'cau': ['Caucasian languages'], 'ce': ['Chechen'], 'ceb': ['Cebuano'], 'cel': ['Celtic languages'], 'ch': ['Chamorro'], 'cha': ['Chamorro'], 'chb': ['Chibcha'], 'che': ['Chechen'], 'chg': ['Chagatai'], 'chi': ['Chinese'], 'chk': ['Chuukese'], 'chm': ['Mari'], 'chn': ['Chinook jargon'], 'cho': ['Choctaw'], 'chp': ['Chipewyan', 'Dene Suline'], 'chr': ['Cherokee'], 'chu': ['Church Slavic', 'Old Slavonic', 'Church Slavonic', 'Old Bulgarian', 'Old Church Slavonic'], 'chv': ['Chuvash'], 'chy': ['Cheyenne'], 'cmc': ['Chamic languages'], 'co': ['Corsican'], 'cop': ['Coptic'], 'cor': ['Cornish'], 'cos': ['Corsican'], 'cpe': ['Creoles and pidgins, English based'], 'cpf': ['Creoles and pidgins, French-based'], 'cpp': ['Creoles and pidgins, Portuguese-based'], 'cr': ['Cree'], 'cre': ['Cree'], 'crh': ['Crimean Tatar', 'Crimean Turkish'], 'crp': ['Creoles and pidgins'], 'cs': ['Czech'], 'csb': ['Kashubian'], 'cu': ['Church Slavic', 'Old Slavonic', 'Church Slavonic', 'Old Bulgarian', 'Old Church Slavonic'], 'cus': ['Cushitic languages'], 'cv': ['Chuvash'], 'cy': ['Welsh'], 'cze': ['Czech'], 'da': ['Danish'], 'dak': ['Dakota'], 'dan': ['Danish'], 'dar': ['Dargwa'], 'day': ['Land Dayak languages'], 'de': ['German'], 'del': ['Delaware'], 'den': ['Slave (Athapascan)'], 'dgr': ['Dogrib'], 'din': ['Dinka'], 'div': ['Divehi', 'Dhivehi', 'Maldivian'], 'doi': ['Dogri'], 'dra': ['Dravidian languages'], 'dsb': ['Lower Sorbian'], 'dua': ['Duala'], 'dum': ['Dutch, Middle (ca.1050-1350)'], 'dut': ['Dutch', 'Flemish'], 'dv': ['Divehi', 'Dhivehi', 'Maldivian'], 'dyu': ['Dyula'], 'dz': ['Dzongkha'], 'dzo': ['Dzongkha'], 'ee': ['Ewe'], 'efi': ['Efik'], 'egy': ['Egyptian (Ancient)'], 'eka': ['Ekajuk'], 'el': ['Greek, Modern (1453-)'], 'elx': ['Elamite'], 'en': ['English'], 'eng': ['English'], 'enm': ['English, Middle (1100-1500)'], 'eo': ['Esperanto'], 'epo': ['Esperanto'], 'es': ['Spanish', 'Castilian'], 'est': ['Estonian'], 'et': ['Estonian'], 'eu': ['Basque'], 'ewe': ['Ewe'], 'ewo': ['Ewondo'], 'fa': ['Persian'], 'fan': ['Fang'], 'fao': ['Faroese'], 'fat': ['Fanti'], 'ff': ['Fulah'], 'fi': ['Finnish'], 'fij': ['Fijian'], 'fil': ['Filipino', 'Pilipino'], 'fin': ['Finnish'], 'fiu': ['Finno-Ugrian languages'], 'fj': ['Fijian'], 'fo': ['Faroese'], 'fon': ['Fon'], 'fr': ['French'], 'fre': ['French'], 'frm': ['French, Middle (ca.1400-1600)'], 'fro': ['French, Old (842-ca.1400)'], 'frr': ['Northern Frisian'], 'frs': ['Eastern Frisian'], 'fry': ['Western Frisian'], 'ful': ['Fulah'], 'fur': ['Friulian'], 'fy': ['Western Frisian'], 'ga': ['Irish'], 'gaa': ['Ga'], 'gay': ['Gayo'], 'gba': ['Gbaya'], 'gd': ['Gaelic', 'Scottish Gaelic'], 'gem': ['Germanic languages'], 'geo': ['Georgian'], 'ger': ['German'], 'gez': ['Geez'], 'gil': ['Gilbertese'], 'gl': ['Galician'], 'gla': ['Gaelic', 'Scottish Gaelic'], 'gle': ['Irish'], 'glg': ['Galician'], 'glv': ['Manx'], 'gmh': ['German, Middle High (ca.1050-1500)'], 'gn': ['Guarani'], 'goh': ['German, Old High (ca.750-1050)'], 'gon': ['Gondi'], 'gor': ['Gorontalo'], 'got': ['Gothic'], 'grb': ['Grebo'], 'grc': ['Greek, Ancient (to 1453)'], 'gre': ['Greek, Modern (1453-)'], 'grn': ['Guarani'], 'gsw': ['Swiss German', 'Alemannic', 'Alsatian'], 'gu': ['Gujarati'], 'guj': ['Gujarati'], 'gv': ['Manx'], 'gwi': ["Gwich'in"], 'ha': ['Hausa'], 'hai': ['Haida'], 'hat': ['Haitian', 'Haitian Creole'], 'hau': ['Hausa'], 'haw': ['Hawaiian'], 'he': ['Hebrew'], 'heb': ['Hebrew'], 'her': ['Herero'], 'hi': ['Hindi'], 'hil': ['Hiligaynon'], 'him': ['Himachali languages', 'Western Pahari languages'], 'hin': ['Hindi'], 'hit': ['Hittite'], 'hmn': ['Hmong', 'Mong'], 'hmo': ['Hiri Motu'], 'ho': ['Hiri Motu'], 'hr': ['Croatian'], 'hrv': ['Croatian'], 'hsb': ['Upper Sorbian'], 'ht': ['Haitian', 'Haitian Creole'], 'hu': ['Hungarian'], 'hun': ['Hungarian'], 'hup': ['Hupa'], 'hy': ['Armenian'], 'hz': ['Herero'], 'ia': ['Interlingua (International Auxiliary Language Association)'], 'iba': ['Iban'], 'ibo': ['Igbo'], 'ice': ['Icelandic'], 'id': ['Indonesian'], 'ido': ['Ido'], 'ie': ['Interlingue', 'Occidental'], 'ig': ['Igbo'], 'ii': ['Sichuan Yi', 'Nuosu'], 'iii': ['Sichuan Yi', 'Nuosu'], 'ijo': ['Ijo languages'], 'ik': ['Inupiaq'], 'iku': ['Inuktitut'], 'ile': ['Interlingue', 'Occidental'], 'ilo': ['Iloko'], 'ina': ['Interlingua (International Auxiliary Language Association)'], 'inc': ['Indic languages'], 'ind': ['Indonesian'], 'ine': ['Indo-European languages'], 'inh': ['Ingush'], 'io': ['Ido'], 'ipk': ['Inupiaq'], 'ira': ['Iranian languages'], 'iro': ['Iroquoian languages'], 'is': ['Icelandic'], 'it': ['Italian'], 'ita': ['Italian'], 'iu': ['Inuktitut'], 'ja': ['Japanese'], 'jav': ['Javanese'], 'jbo': ['Lojban'], 'jpn': ['Japanese'], 'jpr': ['Judeo-Persian'], 'jrb': ['Judeo-Arabic'], 'jv': ['Javanese'], 'ka': ['Georgian'], 'kaa': ['Kara-Kalpak'], 'kab': ['Kabyle'], 'kac': ['Kachin', 'Jingpho'], 'kal': ['Kalaallisut', 'Greenlandic'], 'kam': ['Kamba'], 'kan': ['Kannada'], 'kar': ['Karen languages'], 'kas': ['Kashmiri'], 'kau': ['Kanuri'], 'kaw': ['Kawi'], 'kaz': ['Kazakh'], 'kbd': ['Kabardian'], 'kg': ['Kongo'], 'kha': ['Khasi'], 'khi': ['Khoisan languages'], 'khm': ['Central Khmer'], 'kho': ['Khotanese', 'Sakan'], 'ki': ['Kikuyu', 'Gikuyu'], 'kik': ['Kikuyu', 'Gikuyu'], 'kin': ['Kinyarwanda'], 'kir': ['Kirghiz', 'Kyrgyz'], 'kj': ['Kuanyama', 'Kwanyama'], 'kk': ['Kazakh'], 'kl': ['Kalaallisut', 'Greenlandic'], 'km': ['Central Khmer'], 'kmb': ['Kimbundu'], 'kn': ['Kannada'], 'ko': ['Korean'], 'kok': ['Konkani'], 'kom': ['Komi'], 'kon': ['Kongo'], 'kor': ['Korean'], 'kos': ['Kosraean'], 'kpe': ['Kpelle'], 'kr': ['Kanuri'], 'krc': ['Karachay-Balkar'], 'krl': ['Karelian'], 'kro': ['Kru languages'], 'kru': ['Kurukh'], 'ks': ['Kashmiri'], 'ku': ['Kurdish'], 'kua': ['Kuanyama', 'Kwanyama'], 'kum': ['Kumyk'], 'kur': ['Kurdish'], 'kut': ['Kutenai'], 'kv': ['Komi'], 'kw': ['Cornish'], 'ky': ['Kirghiz', 'Kyrgyz'], 'la': ['Latin'], 'lad': ['Ladino'], 'lah': ['Lahnda'], 'lam': ['Lamba'], 'lao': ['Lao'], 'lat': ['Latin'], 'lav': ['Latvian'], 'lb': ['Luxembourgish', 'Letzeburgesch'], 'lez': ['Lezghian'], 'lg': ['Ganda'], 'li': ['Limburgan', 'Limburger', 'Limburgish'], 'lim': ['Limburgan', 'Limburger', 'Limburgish'], 'lin': ['Lingala'], 'lit': ['Lithuanian'], 'ln': ['Lingala'], 'lo': ['Lao'], 'lol': ['Mongo'], 'loz': ['Lozi'], 'lt': ['Lithuanian'], 'ltz': ['Luxembourgish', 'Letzeburgesch'], 'lu': ['Luba-Katanga'], 'lua': ['Luba-Lulua'], 'lub': ['Luba-Katanga'], 'lug': ['Ganda'], 'lui': ['Luiseno'], 'lun': ['Lunda'], 'luo': ['Luo (Kenya and Tanzania)'], 'lus': ['Lushai'], 'lv': ['Latvian'], 'mac': ['Macedonian'], 'mad': ['Madurese'], 'mag': ['Magahi'], 'mah': ['Marshallese'], 'mai': ['Maithili'], 'mak': ['Makasar'], 'mal': ['Malayalam'], 'man': ['Mandingo'], 'mao': ['Maori'], 'map': ['Austronesian languages'], 'mar': ['Marathi'], 'mas': ['Masai'], 'may': ['Malay'], 'mdf': ['Moksha'], 'mdr': ['Mandar'], 'men': ['Mende'], 'mg': ['Malagasy'], 'mga': ['Irish, Middle (900-1200)'], 'mh': ['Marshallese'], 'mi': ['Maori'], 'mic': ["Mi'kmaq", 'Micmac'], 'min': ['Minangkabau'], 'mis': ['Uncoded languages'], 'mk': ['Macedonian'], 'mkh': ['Mon-Khmer languages'], 'ml': ['Malayalam'], 'mlg': ['Malagasy'], 'mlt': ['Maltese'], 'mn': ['Mongolian'], 'mnc': ['Manchu'], 'mni': ['Manipuri'], 'mno': ['Manobo languages'], 'moh': ['Mohawk'], 'mon': ['Mongolian'], 'mos': ['Mossi'], 'mr': ['Marathi'], 'ms': ['Malay'], 'mt': ['Maltese'], 'mul': ['Multiple languages'], 'mun': ['Munda languages'], 'mus': ['Creek'], 'mwl': ['Mirandese'], 'mwr': ['Marwari'], 'my': ['Burmese'], 'myn': ['Mayan languages'], 'myv': ['Erzya'], 'na': ['Nauru'], 'nah': ['Nahuatl languages'], 'nai': ['North American Indian languages'], 'nap': ['Neapolitan'], 'nau': ['Nauru'], 'nav': ['Navajo', 'Navaho'], 'nb': ['Bokmål, Norwegian', 'Norwegian Bokmål'], 'nbl': ['Ndebele, South', 'South Ndebele'], 'nd': ['Ndebele, North', 'North Ndebele'], 'nde': ['Ndebele, North', 'North Ndebele'], 'ndo': ['Ndonga'], 'nds': ['Low German', 'Low Saxon', 'German, Low', 'Saxon, Low'], 'ne': ['Nepali'], 'nep': ['Nepali'], 'new': ['Nepal Bhasa', 'Newari'], 'ng': ['Ndonga'], 'nia': ['Nias'], 'nic': ['Niger-Kordofanian languages'], 'niu': ['Niuean'], 'nl': ['Dutch', 'Flemish'], 'nn': ['Norwegian Nynorsk', 'Nynorsk, Norwegian'], 'nno': ['Norwegian Nynorsk', 'Nynorsk, Norwegian'], 'no': ['Norwegian'], 'nob': ['Bokmål, Norwegian', 'Norwegian Bokmål'], 'nog': ['Nogai'], 'non': ['Norse, Old'], 'nor': ['Norwegian'], 'nqo': ["N'Ko"], 'nr': ['Ndebele, South', 'South Ndebele'], 'nso': ['Pedi', 'Sepedi', 'Northern Sotho'], 'nub': ['Nubian languages'], 'nv': ['Navajo', 'Navaho'], 'nwc': ['Classical Newari', 'Old Newari', 'Classical Nepal Bhasa'], 'ny': ['Chichewa', 'Chewa', 'Nyanja'], 'nya': ['Chichewa', 'Chewa', 'Nyanja'], 'nym': ['Nyamwezi'], 'nyn': ['Nyankole'], 'nyo': ['Nyoro'], 'nzi': ['Nzima'], 'oc': ['Occitan (post 1500)', 'Provençal'], 'oci': ['Occitan (post 1500)', 'Provençal'], 'oj': ['Ojibwa'], 'oji': ['Ojibwa'], 'om': ['Oromo'], 'or': ['Oriya'], 'ori': ['Oriya'], 'orm': ['Oromo'], 'os': ['Ossetian', 'Ossetic'], 'osa': ['Osage'], 'oss': ['Ossetian', 'Ossetic'], 'ota': ['Turkish, Ottoman (1500-1928)'], 'oto': ['Otomian languages'], 'pa': ['Panjabi', 'Punjabi'], 'paa': ['Papuan languages'], 'pag': ['Pangasinan'], 'pal': ['Pahlavi'], 'pam': ['Pampanga', 'Kapampangan'], 'pan': ['Panjabi', 'Punjabi'], 'pap': ['Papiamento'], 'pau': ['Palauan'], 'peo': ['Persian, Old (ca.600-400 B.C.)'], 'per': ['Persian'], 'phi': ['Philippine languages'], 'phn': ['Phoenician'], 'pi': ['Pali'], 'pl': ['Polish'], 'pli': ['Pali'], 'pol': ['Polish'], 'pon': ['Pohnpeian'], 'por': ['Portuguese'], 'pra': ['Prakrit languages'], 'pro': ['Provençal, Old (to 1500)'], 'ps': ['Pushto', 'Pashto'], 'pt': ['Portuguese'], 'pus': ['Pushto', 'Pashto'], 'qaa-qtz': ['Reserved for local use'], 'qu': ['Quechua'], 'que': ['Quechua'], 'raj': ['Rajasthani'], 'rap': ['Rapanui'], 'rar': ['Rarotongan', 'Cook Islands Maori'], 'rm': ['Romansh'], 'rn': ['Rundi'], 'ro': ['Romanian', 'Moldavian', 'Moldovan'], 'roa': ['Romance languages'], 'roh': ['Romansh'], 'rom': ['Romany'], 'ru': ['Russian'], 'rum': ['Romanian', 'Moldavian', 'Moldovan'], 'run': ['Rundi'], 'rup': ['Aromanian', 'Arumanian', 'Macedo-Romanian'], 'rus': ['Russian'], 'rw': ['Kinyarwanda'], 'sa': ['Sanskrit'], 'sad': ['Sandawe'], 'sag': ['Sango'], 'sah': ['Yakut'], 'sai': ['South American Indian (Other)'], 'sal': ['Salishan languages'], 'sam': ['Samaritan Aramaic'], 'san': ['Sanskrit'], 'sas': ['Sasak'], 'sat': ['Santali'], 'sc': ['Sardinian'], 'scn': ['Sicilian'], 'sco': ['Scots'], 'sd': ['Sindhi'], 'se': ['Northern Sami'], 'sel': ['Selkup'], 'sem': ['Semitic languages'], 'sg': ['Sango'], 'sga': ['Irish, Old (to 900)'], 'sgn': ['Sign Languages'], 'shn': ['Shan'], 'si': ['Sinhala', 'Sinhalese'], 'sid': ['Sidamo'], 'sin': ['Sinhala', 'Sinhalese'], 'sio': ['Siouan languages'], 'sit': ['Sino-Tibetan languages'], 'sk': ['Slovak'], 'sl': ['Slovenian'], 'sla': ['Slavic languages'], 'slo': ['Slovak'], 'slv': ['Slovenian'], 'sm': ['Samoan'], 'sma': ['Southern Sami'], 'sme': ['Northern Sami'], 'smi': ['Sami languages'], 'smj': ['Lule Sami'], 'smn': ['Inari Sami'], 'smo': ['Samoan'], 'sms': ['Skolt Sami'], 'sn': ['Shona'], 'sna': ['Shona'], 'snd': ['Sindhi'], 'snk': ['Soninke'], 'so': ['Somali'], 'sog': ['Sogdian'], 'som': ['Somali'], 'son': ['Songhai languages'], 'sot': ['Sotho, Southern'], 'spa': ['Spanish', 'Castilian'], 'sq': ['Albanian'], 'sr': ['Serbian'], 'srd': ['Sardinian'], 'srn': ['Sranan Tongo'], 'srp': ['Serbian'], 'srr': ['Serer'], 'ss': ['Swati'], 'ssa': ['Nilo-Saharan languages'], 'ssw': ['Swati'], 'st': ['Sotho, Southern'], 'su': ['Sundanese'], 'suk': ['Sukuma'], 'sun': ['Sundanese'], 'sus': ['Susu'], 'sux': ['Sumerian'], 'sv': ['Swedish'], 'sw': ['Swahili'], 'swa': ['Swahili'], 'swe': ['Swedish'], 'syc': ['Classical Syriac'], 'syr': ['Syriac'], 'ta': ['Tamil'], 'tah': ['Tahitian'], 'tai': ['Tai languages'], 'tam': ['Tamil'], 'tat': ['Tatar'], 'te': ['Telugu'], 'tel': ['Telugu'], 'tem': ['Timne'], 'ter': ['Tereno'], 'tet': ['Tetum'], 'tg': ['Tajik'], 'tgk': ['Tajik'], 'tgl': ['Tagalog'], 'th': ['Thai'], 'tha': ['Thai'], 'ti': ['Tigrinya'], 'tib': ['Tibetan'], 'tig': ['Tigre'], 'tir': ['Tigrinya'], 'tiv': ['Tiv'], 'tk': ['Turkmen'], 'tkl': ['Tokelau'], 'tl': ['Tagalog'], 'tlh': ['Klingon', 'tlhIngan-Hol'], 'tli': ['Tlingit'], 'tmh': ['Tamashek'], 'tn': ['Tswana'], 'to': ['Tonga (Tonga Islands)'], 'tog': ['Tonga (Nyasa)'], 'ton': ['Tonga (Tonga Islands)'], 'tpi': ['Tok Pisin'], 'tr': ['Turkish'], 'ts': ['Tsonga'], 'tsi': ['Tsimshian'], 'tsn': ['Tswana'], 'tso': ['Tsonga'], 'tt': ['Tatar'], 'tuk': ['Turkmen'], 'tum': ['Tumbuka'], 'tup': ['Tupi languages'], 'tur': ['Turkish'], 'tut': ['Altaic languages'], 'tvl': ['Tuvalu'], 'tw': ['Twi'], 'twi': ['Twi'], 'ty': ['Tahitian'], 'tyv': ['Tuvinian'], 'udm': ['Udmurt'], 'ug': ['Uighur', 'Uyghur'], 'uga': ['Ugaritic'], 'uig': ['Uighur', 'Uyghur'], 'uk': ['Ukrainian'], 'ukr': ['Ukrainian'], 'umb': ['Umbundu'], 'und': ['Undetermined'], 'ur': ['Urdu'], 'urd': ['Urdu'], 'uz': ['Uzbek'], 'uzb': ['Uzbek'], 'vai': ['Vai'], 've': ['Venda'], 'ven': ['Venda'], 'vi': ['Vietnamese'], 'vie': ['Vietnamese'], 'vo': ['Volapük'], 'vol': ['Volapük'], 'vot': ['Votic'], 'wa': ['Walloon'], 'wak': ['Wakashan languages'], 'wal': ['Walamo'], 'war': ['Waray'], 'was': ['Washo'], 'wel': ['Welsh'], 'wen': ['Sorbian languages'], 'wln': ['Walloon'], 'wo': ['Wolof'], 'wol': ['Wolof'], 'xal': ['Kalmyk', 'Oirat'], 'xh': ['Xhosa'], 'xho': ['Xhosa'], 'yao': ['Yao'], 'yap': ['Yapese'], 'yi': ['Yiddish'], 'yid': ['Yiddish'], 'yo': ['Yoruba'], 'yor': ['Yoruba'], 'ypk': ['Yupik languages'], 'za': ['Zhuang', 'Chuang'], 'zap': ['Zapotec'], 'zbl': ['Blissymbols', 'Blissymbolics', 'Bliss'], 'zen': ['Zenaga'], 'zgh': ['Standard Moroccan Tamazight'], 'zh': ['Chinese'], 'zha': ['Zhuang', 'Chuang'], 'znd': ['Zande languages'], 'zu': ['Zulu'], 'zul': ['Zulu'], 'zun': ['Zuni'], 'zxx': ['No linguistic content', 'Not applicable'], 'zza': ['Zaza', 'Dimili', 'Dimli', 'Kirdki', 'Kirmanjki', 'Zazaki']}
english_names_to_three = {'abkhazian': 'abk', 'achinese': 'ace', 'acoli': 'ach', 'adangme': 'ada', 'adygei': 'ady', 'adyghe': 'ady', 'afar': 'aar', 'afrihili': 'afh', 'afrikaans': 'afr', 'afro-asiatic languages': 'afa', 'ainu': 'ain', 'akan': 'aka', 'akkadian': 'akk', 'albanian': 'alb', 'alemannic': 'gsw', 'aleut': 'ale', 'algonquian languages': 'alg', 'alsatian': 'gsw', 'altaic languages': 'tut', 'amharic': 'amh', 'angika': 'anp', 'apache languages': 'apa', 'arabic': 'ara', 'aragonese': 'arg', 'arapaho': 'arp', 'arawak': 'arw', 'armenian': 'arm', 'aromanian': 'rup', 'artificial languages': 'art', 'arumanian': 'rup', 'assamese': 'asm', 'asturian': 'ast', 'asturleonese': 'ast', 'athapascan languages': 'ath', 'australian languages': 'aus', 'austronesian languages': 'map', 'avaric': 'ava', 'avestan': 'ave', 'awadhi': 'awa', 'aymara': 'aym', 'azerbaijani': 'aze', 'bable': 'ast', 'balinese': 'ban', 'baltic languages': 'bat', 'baluchi': 'bal', 'bambara': 'bam', 'bamileke languages': 'bai', 'banda languages': 'bad', 'bantu (other)': 'bnt', 'basa': 'bas', 'bashkir': 'bak', 'basque': 'baq', 'batak languages': 'btk', 'bedawiyet': 'bej', 'beja': 'bej', 'belarusian': 'bel', 'bemba': 'bem', 'bengali': 'ben', 'berber languages': 'ber', 'bhojpuri': 'bho', 'bihari languages': 'bih', 'bikol': 'bik', 'bilin': 'byn', 'bini': 'bin', 'bislama': 'bis', 'blin': 'byn', 'bliss': 'zbl', 'blissymbolics': 'zbl', 'blissymbols': 'zbl', 'bokmål, norwegian': 'nob', 'bosnian': 'bos', 'braj': 'bra', 'breton': 'bre', 'buginese': 'bug', 'bulgarian': 'bul', 'buriat': 'bua', 'burmese': 'bur', 'caddo': 'cad', 'castilian': 'spa', 'catalan': 'cat', 'caucasian languages': 'cau', 'cebuano': 'ceb', 'celtic languages': 'cel', 'central american indian languages': 'cai', 'central khmer': 'khm', 'chagatai': 'chg', 'chamic languages': 'cmc', 'chamorro': 'cha', 'chechen': 'che', 'cherokee': 'chr', 'chewa': 'nya', 'cheyenne': 'chy', 'chibcha': 'chb', 'chichewa': 'nya', 'chinese': 'chi', 'chinook jargon': 'chn', 'chipewyan': 'chp', 'choctaw': 'cho', 'chuang': 'zha', 'church slavic': 'chu', 'church slavonic': 'chu', 'chuukese': 'chk', 'chuvash': 'chv', 'classical nepal bhasa': 'nwc', 'classical newari': 'nwc', 'classical syriac': 'syc', 'cook islands maori': 'rar', 'coptic': 'cop', 'cornish': 'cor', 'corsican': 'cos', 'cree': 'cre', 'creek': 'mus', 'creoles and pidgins': 'crp', 'creoles and pidgins, english based': 'cpe', 'creoles and pidgins, french-based': 'cpf', 'creoles and pidgins, portuguese-based': 'cpp', 'crimean tatar': 'crh', 'crimean turkish': 'crh', 'croatian': 'hrv', 'cushitic languages': 'cus', 'czech': 'cze', 'dakota': 'dak', 'danish': 'dan', 'dargwa': 'dar', 'delaware': 'del', 'dene suline': 'chp', 'dhivehi': 'div', 'dimili': 'zza', 'dimli': 'zza', 'dinka': 'din', 'divehi': 'div', 'dogri': 'doi', 'dogrib': 'dgr', 'dravidian languages': 'dra', 'duala': 'dua', 'dutch': 'dut', 'dutch, middle (ca.1050-1350)': 'dum', 'dyula': 'dyu', 'dzongkha': 'dzo', 'eastern frisian': 'frs', 'edo': 'bin', 'efik': 'efi', 'egyptian (ancient)': 'egy', 'ekajuk': 'eka', 'elamite': 'elx', 'english': 'eng', 'english, middle (1100-1500)': 'enm', 'english, old (ca.450-1100)': 'ang', 'erzya': 'myv', 'esperanto': 'epo', 'estonian': 'est', 'ewe': 'ewe', 'ewondo': 'ewo', 'fang': 'fan', 'fanti': 'fat', 'faroese': 'fao', 'fijian': 'fij', 'filipino': 'fil', 'finnish': 'fin', 'finno-ugrian languages': 'fiu', 'flemish': 'dut', 'fon': 'fon', 'french': 'fre', 'french, middle (ca.1400-1600)': 'frm', 'french, old (842-ca.1400)': 'fro', 'friulian': 'fur', 'fulah': 'ful', 'ga': 'gaa', 'gaelic': 'gla', 'galibi carib': 'car', 'galician': 'glg', 'ganda': 'lug', 'gayo': 'gay', 'gbaya': 'gba', 'geez': 'gez', 'georgian': 'geo', 'german': 'ger', 'german, low': 'nds', 'german, middle high (ca.1050-1500)': 'gmh', 'german, old high (ca.750-1050)': 'goh', 'germanic languages': 'gem', 'gikuyu': 'kik', 'gilbertese': 'gil', 'gondi': 'gon', 'gorontalo': 'gor', 'gothic': 'got', 'grebo': 'grb', 'greek, ancient (to 1453)': 'grc', 'greek, modern (1453-)': 'gre', 'greenlandic': 'kal', 'guarani': 'grn', 'gujarati': 'guj', "gwich'in": 'gwi', 'haida': 'hai', 'haitian': 'hat', 'haitian creole': 'hat', 'hausa': 'hau', 'hawaiian': 'haw', 'hebrew': 'heb', 'herero': 'her', 'hiligaynon': 'hil', 'himachali languages': 'him', 'hindi': 'hin', 'hiri motu': 'hmo', 'hittite': 'hit', 'hmong': 'hmn', 'hungarian': 'hun', 'hupa': 'hup', 'iban': 'iba', 'icelandic': 'ice', 'ido': 'ido', 'igbo': 'ibo', 'ijo languages': 'ijo', 'iloko': 'ilo', 'imperial aramaic (700-300 bce)': 'arc', 'inari sami': 'smn', 'indic languages': 'inc', 'indo-european languages': 'ine', 'indonesian': 'ind', 'ingush': 'inh', 'interlingua (international auxiliary language association)': 'ina', 'interlingue': 'ile', 'inuktitut': 'iku', 'inupiaq': 'ipk', 'iranian languages': 'ira', 'irish': 'gle', 'irish, middle (900-1200)': 'mga', 'irish, old (to 900)': 'sga', 'iroquoian languages': 'iro', 'italian': 'ita', 'japanese': 'jpn', 'javanese': 'jav', 'jingpho': 'kac', 'judeo-arabic': 'jrb', 'judeo-persian': 'jpr', 'kabardian': 'kbd', 'kabyle': 'kab', 'kachin': 'kac', 'kalaallisut': 'kal', 'kalmyk': 'xal', 'kamba': 'kam', 'kannada': 'kan', 'kanuri': 'kau', 'kapampangan': 'pam', 'kara-kalpak': 'kaa', 'karachay-balkar': 'krc', 'karelian': 'krl', 'karen languages': 'kar', 'kashmiri': 'kas', 'kashubian': 'csb', 'kawi': 'kaw', 'kazakh': 'kaz', 'khasi': 'kha', 'khoisan languages': 'khi', 'khotanese': 'kho', 'kikuyu': 'kik', 'kimbundu': 'kmb', 'kinyarwanda': 'kin', 'kirdki': 'zza', 'kirghiz': 'kir', 'kirmanjki': 'zza', 'klingon': 'tlh', 'komi': 'kom', 'kongo': 'kon', 'konkani': 'kok', 'korean': 'kor', 'kosraean': 'kos', 'kpelle': 'kpe', 'kru languages': 'kro', 'kuanyama': 'kua', 'kumyk': 'kum', 'kurdish': 'kur', 'kurukh': 'kru', 'kutenai': 'kut', 'kwanyama': 'kua', 'kyrgyz': 'kir', 'ladino': 'lad', 'lahnda': 'lah', 'lamba': 'lam', 'land dayak languages': 'day', 'lao': 'lao', 'latin': 'lat', 'latvian': 'lav', 'leonese': 'ast', 'letzeburgesch': 'ltz', 'lezghian': 'lez', 'limburgan': 'lim', 'limburger': 'lim', 'limburgish': 'lim', 'lingala': 'lin', 'lithuanian': 'lit', 'lojban': 'jbo', 'low german': 'nds', 'low saxon': 'nds', 'lower sorbian': 'dsb', 'lozi': 'loz', 'luba-katanga': 'lub', 'luba-lulua': 'lua', 'luiseno': 'lui', 'lule sami': 'smj', 'lunda': 'lun', 'luo (kenya and tanzania)': 'luo', 'lushai': 'lus', 'luxembourgish': 'ltz', 'macedo-romanian': 'rup', 'macedonian': 'mac', 'madurese': 'mad', 'magahi': 'mag', 'maithili': 'mai', 'makasar': 'mak', 'malagasy': 'mlg', 'malay': 'may', 'malayalam': 'mal', 'maldivian': 'div', 'maltese': 'mlt', 'manchu': 'mnc', 'mandar': 'mdr', 'mandingo': 'man', 'manipuri': 'mni', 'manobo languages': 'mno', 'manx': 'glv', 'maori': 'mao', 'mapuche': 'arn', 'mapudungun': 'arn', 'marathi': 'mar', 'mari': 'chm', 'marshallese': 'mah', 'marwari': 'mwr', 'masai': 'mas', 'mayan languages': 'myn', 'mende': 'men', "mi'kmaq": 'mic', 'micmac': 'mic', 'minangkabau': 'min', 'mirandese': 'mwl', 'mohawk': 'moh', 'moksha': 'mdf', 'moldavian': 'rum', 'moldovan': 'rum', 'mon-khmer languages': 'mkh', 'mong': 'hmn', 'mongo': 'lol', 'mongolian': 'mon', 'mossi': 'mos', 'multiple languages': 'mul', 'munda languages': 'mun', "n'ko": 'nqo', 'nahuatl languages': 'nah', 'nauru': 'nau', 'navaho': 'nav', 'navajo': 'nav', 'ndebele, north': 'nde', 'ndebele, south': 'nbl', 'ndonga': 'ndo', 'neapolitan': 'nap', 'nepal bhasa': 'new', 'nepali': 'nep', 'newari': 'new', 'nias': 'nia', 'niger-kordofanian languages': 'nic', 'nilo-saharan languages': 'ssa', 'niuean': 'niu', 'no linguistic content': 'zxx', 'nogai': 'nog', 'norse, old': 'non', 'north american indian languages': 'nai', 'north ndebele': 'nde', 'northern frisian': 'frr', 'northern sami': 'sme', 'northern sotho': 'nso', 'norwegian': 'nor', 'norwegian bokmål': 'nob', 'norwegian nynorsk': 'nno', 'not applicable': 'zxx', 'nubian languages': 'nub', 'nuosu': 'iii', 'nyamwezi': 'nym', 'nyanja': 'nya', 'nyankole': 'nyn', 'nynorsk, norwegian': 'nno', 'nyoro': 'nyo', 'nzima': 'nzi', 'occidental': 'ile', 'occitan (post 1500)': 'oci', 'official aramaic (700-300 bce)': 'arc', 'oirat': 'xal', 'ojibwa': 'oji', 'old bulgarian': 'chu', 'old church slavonic': 'chu', 'old newari': 'nwc', 'old slavonic': 'chu', 'oriya': 'ori', 'oromo': 'orm', 'osage': 'osa', 'ossetian': 'oss', 'ossetic': 'oss', 'otomian languages': 'oto', 'pahlavi': 'pal', 'palauan': 'pau', 'pali': 'pli', 'pampanga': 'pam', 'pangasinan': 'pag', 'panjabi': 'pan', 'papiamento': 'pap', 'papuan languages': 'paa', 'pashto': 'pus', 'pedi': 'nso', 'persian': 'per', 'persian, old (ca.600-400 b.c.)': 'peo', 'philippine languages': 'phi', 'phoenician': 'phn', 'pilipino': 'fil', 'pohnpeian': 'pon', 'polish': 'pol', 'portuguese': 'por', 'prakrit languages': 'pra', 'provençal': 'oci', 'provençal, old (to 1500)': 'pro', 'punjabi': 'pan', 'pushto': 'pus', 'quechua': 'que', 'rajasthani': 'raj', 'rapanui': 'rap', 'rarotongan': 'rar', 'reserved for local use': 'qaa-qtz', 'romance languages': 'roa', 'romanian': 'rum', 'romansh': 'roh', 'romany': 'rom', 'rundi': 'run', 'russian': 'rus', 'sakan': 'kho', 'salishan languages': 'sal', 'samaritan aramaic': 'sam', 'sami languages': 'smi', 'samoan': 'smo', 'sandawe': 'sad', 'sango': 'sag', 'sanskrit': 'san', 'santali': 'sat', 'sardinian': 'srd', 'sasak': 'sas', 'saxon, low': 'nds', 'scots': 'sco', 'scottish gaelic': 'gla', 'selkup': 'sel', 'semitic languages': 'sem', 'sepedi': 'nso', 'serbian': 'srp', 'serer': 'srr', 'shan': 'shn', 'shona': 'sna', 'sichuan yi': 'iii', 'sicilian': 'scn', 'sidamo': 'sid', 'sign languages': 'sgn', 'siksika': 'bla', 'sindhi': 'snd', 'sinhala': 'sin', 'sinhalese': 'sin', 'sino-tibetan languages': 'sit', 'siouan languages': 'sio', 'skolt sami': 'sms', 'slave (athapascan)': 'den', 'slavic languages': 'sla', 'slovak': 'slo', 'slovenian': 'slv', 'sogdian': 'sog', 'somali': 'som', 'songhai languages': 'son', 'soninke': 'snk', 'sorbian languages': 'wen', 'sotho, southern': 'sot', 'south american indian (other)': 'sai', 'south ndebele': 'nbl', 'southern altai': 'alt', 'southern sami': 'sma', 'spanish': 'spa', 'sranan tongo': 'srn', 'standard moroccan tamazight': 'zgh', 'sukuma': 'suk', 'sumerian': 'sux', 'sundanese': 'sun', 'susu': 'sus', 'swahili': 'swa', 'swati': 'ssw', 'swedish': 'swe', 'swiss german': 'gsw', 'syriac': 'syr', 'tagalog': 'tgl', 'tahitian': 'tah', 'tai languages': 'tai', 'tajik': 'tgk', 'tamashek': 'tmh', 'tamil': 'tam', 'tatar': 'tat', 'telugu': 'tel', 'tereno': 'ter', 'tetum': 'tet', 'thai': 'tha', 'tibetan': 'tib', 'tigre': 'tig', 'tigrinya': 'tir', 'timne': 'tem', 'tiv': 'tiv', 'tlhingan-hol': 'tlh', 'tlingit': 'tli', 'tok pisin': 'tpi', 'tokelau': 'tkl', 'tonga (nyasa)': 'tog', 'tonga (tonga islands)': 'ton', 'tsimshian': 'tsi', 'tsonga': 'tso', 'tswana': 'tsn', 'tumbuka': 'tum', 'tupi languages': 'tup', 'turkish': 'tur', 'turkish, ottoman (1500-1928)': 'ota', 'turkmen': 'tuk', 'tuvalu': 'tvl', 'tuvinian': 'tyv', 'twi': 'twi', 'udmurt': 'udm', 'ugaritic': 'uga', 'uighur': 'uig', 'ukrainian': 'ukr', 'umbundu': 'umb', 'uncoded languages': 'mis', 'undetermined': 'und', 'upper sorbian': 'hsb', 'urdu': 'urd', 'uyghur': 'uig', 'uzbek': 'uzb', 'vai': 'vai', 'valencian': 'cat', 'venda': 'ven', 'vietnamese': 'vie', 'volapük': 'vol', 'votic': 'vot', 'wakashan languages': 'wak', 'walamo': 'wal', 'walloon': 'wln', 'waray': 'war', 'washo': 'was', 'welsh': 'wel', 'western frisian': 'fry', 'western pahari languages': 'him', 'wolof': 'wol', 'xhosa': 'xho', 'yakut': 'sah', 'yao': 'yao', 'yapese': 'yap', 'yiddish': 'yid', 'yoruba': 'yor', 'yupik languages': 'ypk', 'zande languages': 'znd', 'zapotec': 'zap', 'zaza': 'zza', 'zazaki': 'zza', 'zenaga': 'zen', 'zhuang': 'zha', 'zulu': 'zul', 'zuni': 'zun'}
french_names = 'zaza; dimili; dimli; kirdki; kirmanjki; zazaki'
i = {'code': 'sv', 'name': 'Swedish', 'nativeName': 'svenska'}
classmethod iso_639_2_for_locale(locale)[source]

Turn a locale code into an ISO-639-2 alpha-3 language code.

name = 'Zazaki'
classmethod name_for_languageset(languages)[source]
names = ['svenska']
native_names = {'de': ['Deutsch'], 'el': ['Ελληνικά'], 'en': ['English'], 'eng': ['English'], 'es': ['español', 'castellano'], 'fr': ['français'], 'fre': ['français'], 'ger': ['Deutsch'], 'gre': ['Ελληνικά'], 'hu': ['Magyar'], 'hun': ['Magyar'], 'it': ['Italiano'], 'ita': ['Italiano'], 'no': ['Norsk'], 'nor': ['Norsk'], 'pl': ['polski'], 'pol': ['polski'], 'por': ['Português'], 'pt': ['Português'], 'ru': ['русский'], 'rus': ['русский'], 'spa': ['español', 'castellano'], 'sv': ['svenska'], 'swe': ['svenska']}
classmethod string_to_alpha_3(s)[source]

Try really hard to convert a string to an ISO-639-2 alpha-3 language code.

terminologic_code = ''
three_to_two = {'aar': 'aa', 'abk': 'ab', 'afr': 'af', 'aka': 'ak', 'alb': 'sq', 'amh': 'am', 'ara': 'ar', 'arg': 'an', 'arm': 'hy', 'asm': 'as', 'ava': 'av', 'ave': 'ae', 'aym': 'ay', 'aze': 'az', 'bak': 'ba', 'bam': 'bm', 'baq': 'eu', 'bel': 'be', 'ben': 'bn', 'bih': 'bh', 'bis': 'bi', 'bos': 'bs', 'bre': 'br', 'bul': 'bg', 'bur': 'my', 'cat': 'ca', 'cha': 'ch', 'che': 'ce', 'chi': 'zh', 'chu': 'cu', 'chv': 'cv', 'cor': 'kw', 'cos': 'co', 'cre': 'cr', 'cze': 'cs', 'dan': 'da', 'div': 'dv', 'dut': 'nl', 'dzo': 'dz', 'eng': 'en', 'epo': 'eo', 'est': 'et', 'ewe': 'ee', 'fao': 'fo', 'fij': 'fj', 'fin': 'fi', 'fre': 'fr', 'fry': 'fy', 'ful': 'ff', 'geo': 'ka', 'ger': 'de', 'gla': 'gd', 'gle': 'ga', 'glg': 'gl', 'glv': 'gv', 'gre': 'el', 'grn': 'gn', 'guj': 'gu', 'hat': 'ht', 'hau': 'ha', 'heb': 'he', 'her': 'hz', 'hin': 'hi', 'hmo': 'ho', 'hrv': 'hr', 'hun': 'hu', 'ibo': 'ig', 'ice': 'is', 'ido': 'io', 'iii': 'ii', 'iku': 'iu', 'ile': 'ie', 'ina': 'ia', 'ind': 'id', 'ipk': 'ik', 'ita': 'it', 'jav': 'jv', 'jpn': 'ja', 'kal': 'kl', 'kan': 'kn', 'kas': 'ks', 'kau': 'kr', 'kaz': 'kk', 'khm': 'km', 'kik': 'ki', 'kin': 'rw', 'kir': 'ky', 'kom': 'kv', 'kon': 'kg', 'kor': 'ko', 'kua': 'kj', 'kur': 'ku', 'lao': 'lo', 'lat': 'la', 'lav': 'lv', 'lim': 'li', 'lin': 'ln', 'lit': 'lt', 'ltz': 'lb', 'lub': 'lu', 'lug': 'lg', 'mac': 'mk', 'mah': 'mh', 'mal': 'ml', 'mao': 'mi', 'mar': 'mr', 'may': 'ms', 'mlg': 'mg', 'mlt': 'mt', 'mon': 'mn', 'nau': 'na', 'nav': 'nv', 'nbl': 'nr', 'nde': 'nd', 'ndo': 'ng', 'nep': 'ne', 'nno': 'nn', 'nob': 'nb', 'nor': 'no', 'nya': 'ny', 'oci': 'oc', 'oji': 'oj', 'ori': 'or', 'orm': 'om', 'oss': 'os', 'pan': 'pa', 'per': 'fa', 'pli': 'pi', 'pol': 'pl', 'por': 'pt', 'pus': 'ps', 'que': 'qu', 'roh': 'rm', 'rum': 'ro', 'run': 'rn', 'rus': 'ru', 'sag': 'sg', 'san': 'sa', 'sin': 'si', 'slo': 'sk', 'slv': 'sl', 'sme': 'se', 'smo': 'sm', 'sna': 'sn', 'snd': 'sd', 'som': 'so', 'sot': 'st', 'spa': 'es', 'srd': 'sc', 'srp': 'sr', 'ssw': 'ss', 'sun': 'su', 'swa': 'sw', 'swe': 'sv', 'tah': 'ty', 'tam': 'ta', 'tat': 'tt', 'tel': 'te', 'tgk': 'tg', 'tgl': 'tl', 'tha': 'th', 'tib': 'bo', 'tir': 'ti', 'ton': 'to', 'tsn': 'tn', 'tso': 'ts', 'tuk': 'tk', 'tur': 'tr', 'twi': 'tw', 'uig': 'ug', 'ukr': 'uk', 'urd': 'ur', 'uzb': 'uz', 'ven': 've', 'vie': 'vi', 'vol': 'vo', 'wel': 'cy', 'wln': 'wa', 'wol': 'wo', 'xho': 'xh', 'yid': 'yi', 'yor': 'yo', 'zha': 'za', 'zul': 'zu'}
two_to_three = {'aa': 'aar', 'ab': 'abk', 'ae': 'ave', 'af': 'afr', 'ak': 'aka', 'am': 'amh', 'an': 'arg', 'ar': 'ara', 'as': 'asm', 'av': 'ava', 'ay': 'aym', 'az': 'aze', 'ba': 'bak', 'be': 'bel', 'bg': 'bul', 'bh': 'bih', 'bi': 'bis', 'bm': 'bam', 'bn': 'ben', 'bo': 'tib', 'br': 'bre', 'bs': 'bos', 'ca': 'cat', 'ce': 'che', 'ch': 'cha', 'co': 'cos', 'cr': 'cre', 'cs': 'cze', 'cu': 'chu', 'cv': 'chv', 'cy': 'wel', 'da': 'dan', 'de': 'ger', 'dv': 'div', 'dz': 'dzo', 'ee': 'ewe', 'el': 'gre', 'en': 'eng', 'eo': 'epo', 'es': 'spa', 'et': 'est', 'eu': 'baq', 'fa': 'per', 'ff': 'ful', 'fi': 'fin', 'fj': 'fij', 'fo': 'fao', 'fr': 'fre', 'fy': 'fry', 'ga': 'gle', 'gd': 'gla', 'gl': 'glg', 'gn': 'grn', 'gu': 'guj', 'gv': 'glv', 'ha': 'hau', 'he': 'heb', 'hi': 'hin', 'ho': 'hmo', 'hr': 'hrv', 'ht': 'hat', 'hu': 'hun', 'hy': 'arm', 'hz': 'her', 'ia': 'ina', 'id': 'ind', 'ie': 'ile', 'ig': 'ibo', 'ii': 'iii', 'ik': 'ipk', 'io': 'ido', 'is': 'ice', 'it': 'ita', 'iu': 'iku', 'ja': 'jpn', 'jv': 'jav', 'ka': 'geo', 'kg': 'kon', 'ki': 'kik', 'kj': 'kua', 'kk': 'kaz', 'kl': 'kal', 'km': 'khm', 'kn': 'kan', 'ko': 'kor', 'kr': 'kau', 'ks': 'kas', 'ku': 'kur', 'kv': 'kom', 'kw': 'cor', 'ky': 'kir', 'la': 'lat', 'lb': 'ltz', 'lg': 'lug', 'li': 'lim', 'ln': 'lin', 'lo': 'lao', 'lt': 'lit', 'lu': 'lub', 'lv': 'lav', 'mg': 'mlg', 'mh': 'mah', 'mi': 'mao', 'mk': 'mac', 'ml': 'mal', 'mn': 'mon', 'mr': 'mar', 'ms': 'may', 'mt': 'mlt', 'my': 'bur', 'na': 'nau', 'nb': 'nob', 'nd': 'nde', 'ne': 'nep', 'ng': 'ndo', 'nl': 'dut', 'nn': 'nno', 'no': 'nor', 'nr': 'nbl', 'nv': 'nav', 'ny': 'nya', 'oc': 'oci', 'oj': 'oji', 'om': 'orm', 'or': 'ori', 'os': 'oss', 'pa': 'pan', 'pi': 'pli', 'pl': 'pol', 'ps': 'pus', 'pt': 'por', 'qu': 'que', 'rm': 'roh', 'rn': 'run', 'ro': 'rum', 'ru': 'rus', 'rw': 'kin', 'sa': 'san', 'sc': 'srd', 'sd': 'snd', 'se': 'sme', 'sg': 'sag', 'si': 'sin', 'sk': 'slo', 'sl': 'slv', 'sm': 'smo', 'sn': 'sna', 'so': 'som', 'sq': 'alb', 'sr': 'srp', 'ss': 'ssw', 'st': 'sot', 'su': 'sun', 'sv': 'swe', 'sw': 'swa', 'ta': 'tam', 'te': 'tel', 'tg': 'tgk', 'th': 'tha', 'ti': 'tir', 'tk': 'tuk', 'tl': 'tgl', 'tn': 'tsn', 'to': 'ton', 'tr': 'tur', 'ts': 'tso', 'tt': 'tat', 'tw': 'twi', 'ty': 'tah', 'ug': 'uig', 'uk': 'ukr', 'ur': 'urd', 'uz': 'uzb', 've': 'ven', 'vi': 'vie', 'vo': 'vol', 'wa': 'wln', 'wo': 'wol', 'xh': 'xho', 'yi': 'yid', 'yo': 'yor', 'za': 'zha', 'zh': 'chi', 'zu': 'zul'}
class core.util.languages.LanguageNames[source]

Bases: object

Utilities for converting between human-readable language names and codes.

LanguageNames.name_re is a regular expression that matches the English or native-language name of nearly any language known to LanguageCodes.

LanguageNames.name_to_codes is a dictionary mapping lowercase human-readable names to ISO-639-2 language codes.

ignore = {'No linguistic content', 'Not applicable', 'Uncoded'}
irrelevant_suffixes = [' languages']
name_re = re.compile("(\\bafar\\b|\\babkhazian\\b|\\bachinese\\b|\\bacoli\\b|\\badangme\\b|\\badyghe\\b|\\badygei\\b|\\bafro-asiatic\\b|\\bafrihili\\b|\\bafrikaans\\b|\\bainu\\b|\\bakan\\b|\\bakkadian\\b|\\balbanian\\b|\\, re.IGNORECASE)
name_to_codes = {'abkhazian': {'abk'}, 'achinese': {'ace'}, 'acoli': {'ach'}, 'adangme': {'ada'}, 'adygei': {'ady'}, 'adyghe': {'ady'}, 'afar': {'aar'}, 'afrihili': {'afh'}, 'afrikaans': {'afr'}, 'afro-asiatic': {'afa'}, 'ainu': {'ain'}, 'akan': {'aka'}, 'akkadian': {'akk'}, 'albanian': {'alb'}, 'alemannic': {'gsw'}, 'aleut': {'ale'}, 'algonquian': {'alg'}, 'alsatian': {'gsw'}, 'altaic': {'tut'}, 'amharic': {'amh'}, 'angika': {'anp'}, 'apache': {'apa'}, 'arabic': {'ara'}, 'aragonese': {'arg'}, 'arapaho': {'arp'}, 'arawak': {'arw'}, 'armenian': {'arm'}, 'aromanian': {'rup'}, 'artificial': {'art'}, 'arumanian': {'rup'}, 'assamese': {'asm'}, 'asturian': {'ast'}, 'asturleonese': {'ast'}, 'athapascan': {'ath'}, 'australian': {'aus'}, 'austronesian': {'map'}, 'avaric': {'ava'}, 'avestan': {'ave'}, 'awadhi': {'awa'}, 'aymara': {'aym'}, 'azerbaijani': {'aze'}, 'bable': {'ast'}, 'balinese': {'ban'}, 'baltic': {'bat'}, 'baluchi': {'bal'}, 'bambara': {'bam'}, 'bamileke': {'bai'}, 'banda': {'bad'}, 'bantu': {'bnt'}, 'basa': {'bas'}, 'bashkir': {'bak'}, 'basque': {'baq'}, 'batak': {'btk'}, 'bedawiyet': {'bej'}, 'beja': {'bej'}, 'belarusian': {'bel'}, 'bemba': {'bem'}, 'bengali': {'ben'}, 'berber': {'ber'}, 'bhojpuri': {'bho'}, 'bihari': {'bih'}, 'bikol': {'bik'}, 'bilin': {'byn'}, 'bini': {'bin'}, 'bislama': {'bis'}, 'blin': {'byn'}, 'bliss': {'zbl'}, 'blissymbolics': {'zbl'}, 'blissymbols': {'zbl'}, 'bokmål, norwegian': {'nob'}, 'bosnian': {'bos'}, 'braj': {'bra'}, 'breton': {'bre'}, 'buginese': {'bug'}, 'bulgarian': {'bul'}, 'buriat': {'bua'}, 'burmese': {'bur'}, 'caddo': {'cad'}, 'castellano': {'spa'}, 'castilian': {'spa'}, 'catalan': {'cat'}, 'caucasian': {'cau'}, 'cebuano': {'ceb'}, 'celtic': {'cel'}, 'central american indian': {'cai'}, 'central khmer': {'khm'}, 'chagatai': {'chg'}, 'chamic': {'cmc'}, 'chamorro': {'cha'}, 'chechen': {'che'}, 'cherokee': {'chr'}, 'chewa': {'nya'}, 'cheyenne': {'chy'}, 'chibcha': {'chb'}, 'chichewa': {'nya'}, 'chinese': {'chi'}, 'chinook jargon': {'chn'}, 'chipewyan': {'chp'}, 'choctaw': {'cho'}, 'chuang': {'zha'}, 'church slavic': {'chu'}, 'church slavonic': {'chu'}, 'chuukese': {'chk'}, 'chuvash': {'chv'}, 'classical nepal bhasa': {'nwc'}, 'classical newari': {'nwc'}, 'classical syriac': {'syc'}, 'cook islands maori': {'rar'}, 'coptic': {'cop'}, 'cornish': {'cor'}, 'corsican': {'cos'}, 'cree': {'cre'}, 'creek': {'mus'}, 'creoles and pidgins': {'crp'}, 'creoles and pidgins, english based': {'cpe'}, 'creoles and pidgins, french-based': {'cpf'}, 'creoles and pidgins, portuguese-based': {'cpp'}, 'crimean tatar': {'crh'}, 'crimean turkish': {'crh'}, 'croatian': {'hrv'}, 'cushitic': {'cus'}, 'czech': {'cze'}, 'dakota': {'dak'}, 'danish': {'dan'}, 'dargwa': {'dar'}, 'delaware': {'del'}, 'dene suline': {'chp'}, 'deutsch': {'ger'}, 'dhivehi': {'div'}, 'dimili': {'zza'}, 'dimli': {'zza'}, 'dinka': {'din'}, 'divehi': {'div'}, 'dogri': {'doi'}, 'dogrib': {'dgr'}, 'dravidian': {'dra'}, 'duala': {'dua'}, 'dutch': {'dut'}, 'dyula': {'dyu'}, 'dzongkha': {'dzo'}, 'eastern frisian': {'frs'}, 'edo': {'bin'}, 'efik': {'efi'}, 'egyptian': {'egy'}, 'ekajuk': {'eka'}, 'elamite': {'elx'}, 'english': {'eng'}, 'erzya': {'myv'}, 'espanol': {'spa'}, 'español, castellano': {'spa'}, 'esperanto': {'epo'}, 'estonian': {'est'}, 'ewe': {'ewe'}, 'ewondo': {'ewo'}, 'fang': {'fan'}, 'fanti': {'fat'}, 'faroese': {'fao'}, 'fijian': {'fij'}, 'filipino': {'fil'}, 'finnish': {'fin'}, 'finno-ugrian': {'fiu'}, 'flemish': {'dut'}, 'fon': {'fon'}, 'francais': {'fre'}, 'français': {'fre'}, 'french': {'fre'}, 'friulian': {'fur'}, 'fulah': {'ful'}, 'ga': {'gaa'}, 'gaelic': {'gla'}, 'galibi carib': {'car'}, 'galician': {'glg'}, 'ganda': {'lug'}, 'gayo': {'gay'}, 'gbaya': {'gba'}, 'geez': {'gez'}, 'georgian': {'geo'}, 'german': {'ger'}, 'german, low': {'nds'}, 'germanic': {'gem'}, 'gikuyu': {'kik'}, 'gilbertese': {'gil'}, 'gondi': {'gon'}, 'gorontalo': {'gor'}, 'gothic': {'got'}, 'grebo': {'grb'}, 'greek': {'gre'}, 'greenlandic': {'kal'}, 'guarani': {'grn'}, 'gujarati': {'guj'}, "gwich'in": {'gwi'}, 'haida': {'hai'}, 'haitian': {'hat'}, 'haitian creole': {'hat'}, 'hausa': {'hau'}, 'hawaiian': {'haw'}, 'hebrew': {'heb'}, 'herero': {'her'}, 'hiligaynon': {'hil'}, 'himachali': {'him'}, 'hindi': {'hin'}, 'hiri motu': {'hmo'}, 'hittite': {'hit'}, 'hmong': {'hmn'}, 'hungarian': {'hun'}, 'hupa': {'hup'}, 'iban': {'iba'}, 'icelandic': {'ice'}, 'ido': {'ido'}, 'igbo': {'ibo'}, 'ijo': {'ijo'}, 'iloko': {'ilo'}, 'inari sami': {'smn'}, 'indic': {'inc'}, 'indo-european': {'ine'}, 'indonesian': {'ind'}, 'ingush': {'inh'}, 'interlingua': {'ina'}, 'interlingue': {'ile'}, 'inuktitut': {'iku'}, 'inupiaq': {'ipk'}, 'iranian': {'ira'}, 'irish': {'gle'}, 'iroquoian': {'iro'}, 'italian': {'ita'}, 'italiano': {'ita'}, 'japanese': {'jpn'}, 'javanese': {'jav'}, 'jingpho': {'kac'}, 'judeo-arabic': {'jrb'}, 'judeo-persian': {'jpr'}, 'kabardian': {'kbd'}, 'kabyle': {'kab'}, 'kachin': {'kac'}, 'kalaallisut': {'kal'}, 'kalmyk': {'xal'}, 'kamba': {'kam'}, 'kannada': {'kan'}, 'kanuri': {'kau'}, 'kapampangan': {'pam'}, 'kara-kalpak': {'kaa'}, 'karachay-balkar': {'krc'}, 'karelian': {'krl'}, 'karen': {'kar'}, 'kashmiri': {'kas'}, 'kashubian': {'csb'}, 'kawi': {'kaw'}, 'kazakh': {'kaz'}, 'khasi': {'kha'}, 'khoisan': {'khi'}, 'khotanese': {'kho'}, 'kikuyu': {'kik'}, 'kimbundu': {'kmb'}, 'kinyarwanda': {'kin'}, 'kirdki': {'zza'}, 'kirghiz': {'kir'}, 'kirmanjki': {'zza'}, 'klingon': {'tlh'}, 'komi': {'kom'}, 'kongo': {'kon'}, 'konkani': {'kok'}, 'korean': {'kor'}, 'kosraean': {'kos'}, 'kpelle': {'kpe'}, 'kru': {'kro'}, 'kuanyama': {'kua'}, 'kumyk': {'kum'}, 'kurdish': {'kur'}, 'kurukh': {'kru'}, 'kutenai': {'kut'}, 'kwanyama': {'kua'}, 'kyrgyz': {'kir'}, 'ladino': {'lad'}, 'lahnda': {'lah'}, 'lamba': {'lam'}, 'land dayak': {'day'}, 'lao': {'lao'}, 'latin': {'lat'}, 'latvian': {'lav'}, 'leonese': {'ast'}, 'letzeburgesch': {'ltz'}, 'lezghian': {'lez'}, 'limburgan': {'lim'}, 'limburger': {'lim'}, 'limburgish': {'lim'}, 'lingala': {'lin'}, 'lithuanian': {'lit'}, 'lojban': {'jbo'}, 'low german': {'nds'}, 'low saxon': {'nds'}, 'lower sorbian': {'dsb'}, 'lozi': {'loz'}, 'luba-katanga': {'lub'}, 'luba-lulua': {'lua'}, 'luiseno': {'lui'}, 'lule sami': {'smj'}, 'lunda': {'lun'}, 'luo': {'luo'}, 'lushai': {'lus'}, 'luxembourgish': {'ltz'}, 'macedo-romanian': {'rup'}, 'macedonian': {'mac'}, 'madurese': {'mad'}, 'magahi': {'mag'}, 'magyar': {'hun'}, 'maithili': {'mai'}, 'makasar': {'mak'}, 'malagasy': {'mlg'}, 'malay': {'may'}, 'malayalam': {'mal'}, 'maldivian': {'div'}, 'maltese': {'mlt'}, 'manchu': {'mnc'}, 'mandar': {'mdr'}, 'mandingo': {'man'}, 'manipuri': {'mni'}, 'manobo': {'mno'}, 'manx': {'glv'}, 'maori': {'mao'}, 'mapuche': {'arn'}, 'mapudungun': {'arn'}, 'marathi': {'mar'}, 'mari': {'chm'}, 'marshallese': {'mah'}, 'marwari': {'mwr'}, 'masai': {'mas'}, 'mayan': {'myn'}, 'mende': {'men'}, "mi'kmaq": {'mic'}, 'micmac': {'mic'}, 'minangkabau': {'min'}, 'mirandese': {'mwl'}, 'mohawk': {'moh'}, 'moksha': {'mdf'}, 'moldavian': {'rum'}, 'moldovan': {'rum'}, 'mon-khmer': {'mkh'}, 'mong': {'hmn'}, 'mongo': {'lol'}, 'mongolian': {'mon'}, 'mossi': {'mos'}, 'multiple': {'mul'}, 'munda': {'mun'}, "n'ko": {'nqo'}, 'nahuatl': {'nah'}, 'nauru': {'nau'}, 'navaho': {'nav'}, 'navajo': {'nav'}, 'ndebele, north': {'nde'}, 'ndebele, south': {'nbl'}, 'ndonga': {'ndo'}, 'neapolitan': {'nap'}, 'nepal bhasa': {'new'}, 'nepali': {'nep'}, 'newari': {'new'}, 'nias': {'nia'}, 'niger-kordofanian': {'nic'}, 'nilo-saharan': {'ssa'}, 'niuean': {'niu'}, 'nogai': {'nog'}, 'norse, old': {'non'}, 'norsk': {'nor'}, 'north american indian': {'nai'}, 'north ndebele': {'nde'}, 'northern frisian': {'frr'}, 'northern sami': {'sme'}, 'northern sotho': {'nso'}, 'norwegian': {'nor'}, 'norwegian bokmål': {'nob'}, 'norwegian nynorsk': {'nno'}, 'nubian': {'nub'}, 'nuosu': {'iii'}, 'nyamwezi': {'nym'}, 'nyanja': {'nya'}, 'nyankole': {'nyn'}, 'nynorsk, norwegian': {'nno'}, 'nyoro': {'nyo'}, 'nzima': {'nzi'}, 'occidental': {'ile'}, 'occitan': {'oci'}, 'oirat': {'xal'}, 'ojibwa': {'oji'}, 'old bulgarian': {'chu'}, 'old church slavonic': {'chu'}, 'old newari': {'nwc'}, 'old slavonic': {'chu'}, 'oriya': {'ori'}, 'oromo': {'orm'}, 'osage': {'osa'}, 'ossetian': {'oss'}, 'ossetic': {'oss'}, 'otomian': {'oto'}, 'pahlavi': {'pal'}, 'palauan': {'pau'}, 'pali': {'pli'}, 'pampanga': {'pam'}, 'pangasinan': {'pag'}, 'panjabi': {'pan'}, 'papiamento': {'pap'}, 'papuan': {'paa'}, 'pashto': {'pus'}, 'pedi': {'nso'}, 'persian': {'per'}, 'philippine': {'phi'}, 'phoenician': {'phn'}, 'pilipino': {'fil'}, 'pohnpeian': {'pon'}, 'polish': {'pol'}, 'polski': {'pol'}, 'portugues': {'por'}, 'portuguese': {'por'}, 'português': {'por'}, 'prakrit': {'pra'}, 'provençal': {'oci'}, 'punjabi': {'pan'}, 'pushto': {'pus'}, 'quechua': {'que'}, 'rajasthani': {'raj'}, 'rapanui': {'rap'}, 'rarotongan': {'rar'}, 'reserved for local use': {'qaa-qtz'}, 'romance': {'roa'}, 'romanian': {'rum'}, 'romansh': {'roh'}, 'romany': {'rom'}, 'rundi': {'run'}, 'russian': {'rus'}, 'sakan': {'kho'}, 'salishan': {'sal'}, 'samaritan aramaic': {'sam'}, 'sami': {'smi'}, 'samoan': {'smo'}, 'sandawe': {'sad'}, 'sango': {'sag'}, 'sanskrit': {'san'}, 'santali': {'sat'}, 'sardinian': {'srd'}, 'sasak': {'sas'}, 'saxon, low': {'nds'}, 'scots': {'sco'}, 'scottish gaelic': {'gla'}, 'selkup': {'sel'}, 'semitic': {'sem'}, 'sepedi': {'nso'}, 'serbian': {'srp'}, 'serer': {'srr'}, 'shan': {'shn'}, 'shona': {'sna'}, 'sichuan yi': {'iii'}, 'sicilian': {'scn'}, 'sidamo': {'sid'}, 'sign languages': {'sgn'}, 'siksika': {'bla'}, 'sindhi': {'snd'}, 'sinhala': {'sin'}, 'sinhalese': {'sin'}, 'sino-tibetan': {'sit'}, 'siouan': {'sio'}, 'skolt sami': {'sms'}, 'slave': {'den'}, 'slavic': {'sla'}, 'slovak': {'slo'}, 'slovenian': {'slv'}, 'sogdian': {'sog'}, 'somali': {'som'}, 'songhai': {'son'}, 'soninke': {'snk'}, 'sorbian': {'wen'}, 'sotho, southern': {'sot'}, 'south american indian': {'sai'}, 'south ndebele': {'nbl'}, 'southern altai': {'alt'}, 'southern sami': {'sma'}, 'spanish': {'spa'}, 'sranan tongo': {'srn'}, 'standard moroccan tamazight': {'zgh'}, 'sukuma': {'suk'}, 'sumerian': {'sux'}, 'sundanese': {'sun'}, 'susu': {'sus'}, 'svenska': {'swe'}, 'swahili': {'swa'}, 'swati': {'ssw'}, 'swedish': {'swe'}, 'swiss german': {'gsw'}, 'syriac': {'syr'}, 'tagalog': {'tgl'}, 'tahitian': {'tah'}, 'tai': {'tai'}, 'tajik': {'tgk'}, 'tamashek': {'tmh'}, 'tamil': {'tam'}, 'tatar': {'tat'}, 'telugu': {'tel'}, 'tereno': {'ter'}, 'tetum': {'tet'}, 'thai': {'tha'}, 'tibetan': {'tib'}, 'tigre': {'tig'}, 'tigrinya': {'tir'}, 'timne': {'tem'}, 'tiv': {'tiv'}, 'tlhingan-hol': {'tlh'}, 'tlingit': {'tli'}, 'tok pisin': {'tpi'}, 'tokelau': {'tkl'}, 'tonga': {'tog', 'ton'}, 'tsimshian': {'tsi'}, 'tsonga': {'tso'}, 'tswana': {'tsn'}, 'tumbuka': {'tum'}, 'tupi': {'tup'}, 'turkish': {'tur'}, 'turkmen': {'tuk'}, 'tuvalu': {'tvl'}, 'tuvinian': {'tyv'}, 'twi': {'twi'}, 'udmurt': {'udm'}, 'ugaritic': {'uga'}, 'uighur': {'uig'}, 'ukrainian': {'ukr'}, 'umbundu': {'umb'}, 'uncoded': {'mis'}, 'undetermined': {'und'}, 'upper sorbian': {'hsb'}, 'urdu': {'urd'}, 'uyghur': {'uig'}, 'uzbek': {'uzb'}, 'vai': {'vai'}, 'valencian': {'cat'}, 'venda': {'ven'}, 'vietnamese': {'vie'}, 'volapük': {'vol'}, 'votic': {'vot'}, 'wakashan': {'wak'}, 'walamo': {'wal'}, 'walloon': {'wln'}, 'waray': {'war'}, 'washo': {'was'}, 'welsh': {'wel'}, 'western frisian': {'fry'}, 'western pahari': {'him'}, 'wolof': {'wol'}, 'xhosa': {'xho'}, 'yakut': {'sah'}, 'yao': {'yao'}, 'yapese': {'yap'}, 'yiddish': {'yid'}, 'yoruba': {'yor'}, 'yupik': {'ypk'}, 'zande': {'znd'}, 'zapotec': {'zap'}, 'zaza': {'zza'}, 'zazaki': {'zza'}, 'zenaga': {'zen'}, 'zhuang': {'zha'}, 'zulu': {'zul'}, 'zuni': {'zun'}, 'ελληνικά': {'gre'}, 'русский': {'rus'}}
number = re.compile('[0-9]')
parentheses = re.compile('\\([^)]+\\)')
class core.util.languages.LookupTable[source]

Bases: dict

Return None on x[key] when ‘key’ isn’t in the dictionary, rather than raising a ValueError.

core.util.median module

core.util.median.median(numbers)[source]

core.util.opds_writer module

class core.util.opds_writer.AtomFeed(title, url, **kwargs)[source]

Bases: object

APP_NS = 'http://www.w3.org/2007/app'
ATOM_LIKE_TYPES = ['application/atom+xml', 'application/xml']
ATOM_NS = 'http://www.w3.org/2005/Atom'
ATOM_TYPE = 'application/atom+xml'
BIBFRAME_NS = 'http://bibframe.org/vocab/'
BIB_SCHEMA_NS = 'http://bib.schema.org/'
DCTERMS_NS = 'http://purl.org/dc/terms/'
DRM_NS = 'http://librarysimplified.org/terms/drm'
E

A helper object for creating etree elements.

LCP_NS = 'http://readium.org/lcp-specs/ns'
OPDS_NS = 'http://opds-spec.org/2010/catalog'
OPENSEARCH_NS = 'http://a9.com/-/spec/opensearch/1.1/'
OPF_NS = 'http://www.idpf.org/2007/opf'
SCHEMA

A helper object for creating etree elements.

SCHEMA_NS = 'http://schema.org/'
SIMPLIFIED

A helper object for creating etree elements.

SIMPLIFIED_NS = 'http://librarysimplified.org/terms/'
TIME_FORMAT_NAIVE = '%Y-%m-%dT%H:%M:%SZ'
TIME_FORMAT_UTC = '%Y-%m-%dT%H:%M:%S+00:00'
classmethod add_link_to_entry(entry, children=None, **kwargs)[source]
classmethod author(*args, **kwargs)[source]
classmethod category(*args, **kwargs)[source]
classmethod contributor(*args, **kwargs)[source]
default_typemap = {<module 'datetime' from '/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/datetime.py'>: <function AtomFeed.<lambda>>}
classmethod entry(*args, **kwargs)[source]
classmethod id(*args, **kwargs)[source]
classmethod makeelement(*args, **kwargs)[source]
classmethod name(*args, **kwargs)[source]
nsmap = {None: 'http://www.w3.org/2005/Atom', 'app': 'http://www.w3.org/2007/app', 'dcterms': 'http://purl.org/dc/terms/', 'opds': 'http://opds-spec.org/2010/catalog', 'opf': 'http://www.idpf.org/2007/opf', 'drm': 'http://librarysimplified.org/terms/drm', 'schema': 'http://schema.org/', 'simplified': 'http://librarysimplified.org/terms/', 'bibframe': 'http://bibframe.org/vocab/', 'bib': 'http://bib.schema.org/', 'opensearch': 'http://a9.com/-/spec/opensearch/1.1/', 'lcp': 'http://readium.org/lcp-specs/ns'}
classmethod schema_(field_name)[source]
classmethod summary(*args, **kwargs)[source]
classmethod title(*args, **kwargs)[source]
classmethod update(*args, **kwargs)[source]
classmethod updated(*args, **kwargs)[source]
class core.util.opds_writer.ElementMaker[source]

Bases: ElementMaker

A helper object for creating etree elements.

class core.util.opds_writer.OPDSFeed(title, url)[source]

Bases: AtomFeed

ACQUISITION_FEED_TYPE = 'application/atom+xml;profile=opds-catalog;kind=acquisition'
ACQUISITION_REL = 'http://opds-spec.org/acquisition'
BORROW_REL = 'http://opds-spec.org/acquisition/borrow'
DEFAULT_MAX_AGE = 600
ENTRY_TYPE = 'application/atom+xml;type=entry;profile=opds-catalog'
EPUB_MEDIA_TYPE = 'application/epub+zip'
FEATURED_REL = 'http://opds-spec.org/featured'
FULL_IMAGE_REL = 'http://opds-spec.org/image'
GROUP_REL = 'collection'
NAVIGATION_FEED_TYPE = 'application/atom+xml;profile=opds-catalog;kind=navigation'
NO_TITLE = 'http://librarysimplified.org/terms/problem/no-title'
OPEN_ACCESS_REL = 'http://opds-spec.org/acquisition/open-access'
POPULAR_REL = 'http://opds-spec.org/sort/popular'
RECOMMENDED_REL = 'http://opds-spec.org/recommended'
REVOKE_LOAN_REL = 'http://librarysimplified.org/terms/rel/revoke'
class core.util.opds_writer.OPDSMessage(urn, status_code, message)[source]

Bases: object

An indication that an <entry> could not be created for an identifier.

Inserted into an OPDS feed as an extension tag.

property tag

core.util.permanent_work_id module

class core.util.permanent_work_id.WorkIDCalculator[source]

Bases: object

apostropheStrip = re.compile("'s")
authorExtract1 = re.compile('^(.+?)\\spresents.*$')
authorExtract2 = re.compile('^(?:(?:a|an)\\s)?(.+?)\\spresentation.*$')
bracketedCharacterStrip = re.compile('\\[(.*?)\\]')
commonAuthorPrefixPattern = re.compile('^(?:edited by|by the editors of|by|chosen by|translated by|prepared by|translated and edited by|completely rev by|pictures by|selected and adapted by|with a foreword by|with a new foreword by|introd )
commonAuthorSuffixPattern = re.compile('^(.+?)\\s(?:general editor|editor|editor in chief|etc|inc|inc\\setc|co|corporation|llc|partners|company|home entertainment)$')
commonSubtitlesPattern = re.compile('^(.*?)((a|una)\\s(.*)novel(a|la)?|a(.*)memoir|a(.*)mystery|a(.*)thriller|by\\s(.+)|a novel of .*|stories|an autobiography|a biography|a memoir in books|\\d+\\S*\\s*ed(ition)?|\\d+\\S*\\s*update|1st\\)
consecutiveCharacterStrip = re.compile('\\s{2,}')
distributedByRemoval = re.compile('^distributed (?:in.*\\s)?by\\s(.+)$')
find = '10th'
format_to_grouping_category = {'Adobe_EPUB_eBook': 'ebook', 'Adobe_PDF_eBook': 'ebook', 'Atlas': 'other', 'Blu-ray': 'movie', 'Book': 'book', 'Braille': 'book', 'CDROM': 'other', 'Chart': 'other', 'ChipCartridge': 'other', 'Collage': 'other', 'CompactDisc': 'audio', 'DVD': 'movie', 'DiscCartridge': 'other', 'Disney_Online_Book': 'ebook', 'Drawing': 'other', 'Electronic': 'other', 'Filmstrip': 'movie', 'FlashCard': 'other', 'FloppyDisk': 'other', 'Globe': 'other', 'Journal': 'book', 'Kindle_Book': 'ebook', 'Kit': 'other', 'LargePrint': 'book', 'Manuscript': 'book', 'Map': 'other', 'Microfilm': 'other', 'Microsoft_eBook': 'ebook', 'Mobipocket_eBook': 'ebook', 'MotionPicture': 'movie', 'MusicRecording': 'music', 'MusicalScore': 'book', 'Newspaper': 'book', 'Open_EPUB_eBook': 'ebook', 'Open_PDF_eBook': 'ebook', 'OverDrive_MP3_Audiobook': 'audio', 'OverDrive_Music': 'music', 'OverDrive_Read': 'ebook', 'OverDrive_Video': 'movie', 'OverDrive_WMA_Audiobook': 'audio', 'Painting': 'other', 'Palm': 'ebook', 'Phonograph': 'audio', 'Photo': 'other', 'Photonegative': 'other', 'PhysicalObject': 'other', 'Playaway': 'audio', 'Print': 'other', 'SeedPacket': 'other', 'SensorImage': 'other', 'Serial': 'book', 'Slide': 'other', 'Software': 'other', 'SoundCassette': 'audio', 'SoundDisc': 'audio', 'SoundRecording': 'audio', 'TapeCartridge': 'other', 'TapeCassette': 'other', 'TapeRecording': 'audio', 'TapeReel': 'other', 'Transparency': 'other', 'Unknown': 'other', 'VerticalFile': 'other', 'Video': 'movie', 'VideoCartridge': 'movie', 'VideoCassette': 'movie', 'VideoDisc': 'movie', 'VideoReel': 'movie', 'eBook': 'ebook', 'eContent': 'ebook', 'epub': 'ebook', 'externalLink': 'ebook', 'externalMP3': 'audio', 'external_eaudio': 'audio', 'external_ebook': 'ebook', 'external_emusic': 'music', 'external_evideo': 'movie', 'external_web': 'ebook', 'gif': 'other', 'gifs': 'other', 'interactiveBook': 'ebook', 'itunes': 'audio', 'jpg': 'other', 'kindle': 'ebook', 'mp3': 'audio', 'overdrive': 'ebook', 'pdf': 'ebook', 'plucker': 'ebook', 'text': 'ebook'}
initialsFix = re.compile('(?<=[A-Z])\\.(?=(\\s|[A-Z]|$))')
classmethod make_value_sortable(curtitle)[source]
classmethod normalize_author(author)[source]

Converts to NFKD unicode. Strips bracket, special characters, dots out. Converts to single-space and strips trailing spaces. Strips movie studio language surrouding the possible author’s name. Lowercases.

Returns de-linted author’s name.

classmethod normalize_subtitle(original_title)[source]
classmethod normalize_title(full_title, num_non_filing_characters=0)[source]

Converts to NFKD unicode. Strips bracket, special characters. Splits into title and subtitle portions (normalizes subtitle). Lowercases.

numerics = [(re.compile('1st'), 'first'), (re.compile('2nd'), 'second'), (re.compile('3rd'), 'third'), (re.compile('4th'), 'fourth'), (re.compile('5th'), 'fifth'), (re.compile('6th'), 'sixth'), (re.compile('7th'), 'seventh'), (re.compile('8th'), 'eighth'), (re.compile('9th'), 'ninth'), (re.compile('10th'), 'tenth')]
classmethod permanent_id(normalized_title, normalized_author, grouping_category)[source]
replace = 'tenth'
sortTrimmingPattern = re.compile('(?i)^(?:(?:a|an|the|el|la|"|\')\\s)(.*)$', re.IGNORECASE)
specialCharacterStrip = re.compile('[^\\w\\d\\s]')
subtitleIndicator = re.compile('[:;/=]')

core.util.personal_names module

core.util.personal_names.contributor_name_match_ratio(name1, name2, normalize_names=True)[source]

Returns a number between 0 and 100, representing the percent match (Levenshtein Distance) between name1 and name2, after each has been normalized.

core.util.personal_names.display_name_to_sort_name(display_name)[source]

Take the “First Name Last Name”-formatted display_name, and convert it to a “Last Name, First Name” format appropriate for searching and sorting by.

Checks first if the display_name fits what we know of corporate entity business names. If yes, uses the whole name without re-converting it.

Uses the HumanName library to try to parse the name into parts, and rearrange the parts into desired order and format.

core.util.personal_names.is_corporate_name(display_name)[source]

Does this display name look like a corporate name?

core.util.personal_names.is_one_name(human_name)[source]

Examples: ‘Pope Francis’, ‘Prince’.

core.util.personal_names.name_tidy(name)[source]
  • Converts to NFKD unicode.

  • Strips excessive whitespace and trailing punctuation.

  • Normalizes PhD/MD suffixes.

  • Does not perform any potentially name-altering business logic, such as

    running HumanName parser or any other name part reorganization.

  • Does not perform any cleaning that would later need to be reversed,

    such as lowercasing.

core.util.personal_names.normalize_contributor_name_for_matching(name)[source]

Used to standardize author names before matching them to each other to identify best results in VIAF author search feeds.

Split the name into title, first, middle, last name, suffix, nickname, and set the parts in that order. Remove spacing around abbreviated initials, so ‘George RR Martin’ matches ‘George R R Martin’ (treat two-letter words as initials).

Run WorkIDCalculator.normalize_author on the name, which will convert to NFKD unicode, de-lint special characters and spaces, and lowercase.

TODO: Consider: Further remove periods, commas, dashes, and all non-word characters. TODO: consider what to do for multiple authors, like an et al or brothers grimm

core.util.personal_names.sort_name_to_display_name(sort_name)[source]

Take the “Last Name, First Name”-formatted sort_name, and convert it to a “First Name Last Name” format appropriate for displaying to patrons in a catalog listing.

While the code attempts to do the best it can, name recognition gets complicated really fast when there’s more than one plain-format first name and one plain-format last name. This code is meant to serve as first line of approximation. If we later on can find better human librarian-checked sort and display names in the Metadata Wrangler, we use those.

:param sort_name Doe, Jane :return display_name Jane Doe

core.util.problem_detail module

Simple helper library for generating problem detail documents.

As per http://datatracker.ietf.org/doc/draft-ietf-appsawg-http-problem/

class core.util.problem_detail.ProblemDetail(uri, status_code=None, title=None, detail=None, instance=None, debug_message=None)[source]

Bases: object

A common type of problem.

JSON_MEDIA_TYPE = 'application/api-problem+json'
detailed(detail, status_code=None, title=None, instance=None, debug_message=None)[source]

Create a ProblemDetail for a more specific occurance of an existing ProblemDetail.

The detailed error message will be shown to patrons.

property response

Create a Flask-style response.

with_debug(debug_message, detail=None, status_code=None, title=None, instance=None)[source]

Insert debugging information into a ProblemDetail.

The original ProblemDetail’s error message will be shown to patrons, but a more specific error message will be visible to those who inspect the problem document.

exception core.util.problem_detail.ProblemError(problem_detail)[source]

Bases: BaseError

Exception class allowing to raise and catch ProblemDetail objects.

property problem_detail

Return the ProblemDetail object associated with this exception.

Returns:

ProblemDetail object associated with this exception

Return type:

ProblemDetail

core.util.problem_detail.json(type, status, title, detail=None, instance=None, debug_message=None)[source]

core.util.stopwords module

Sets of stopwords.

core.util.string_helpers module

class core.util.string_helpers.UnicodeAwareBase64(encoding)[source]

Bases: object

Simulate the interface of the base64 module, but make it look as though base64-encoding and -decoding works on Unicode strings.

Behind the scenes, Unicode strings are encoded to a particular encoding, then base64-encoded or -decoded, then decoded from that encoding.

Since we get Unicode strings out of the database, this lets us base64-encode and -decode strings based on those strings, without worrying about encoding to bytes and then decoding.

b64decode(s, *args, **kwargs)
b64encode(s, *args, **kwargs)
decodebytes(s, *args, **kwargs)
encodebytes(s, *args, **kwargs)
standard_b64decode(s, *args, **kwargs)
standard_b64encode(s, *args, **kwargs)
urlsafe_b64decode(s, *args, **kwargs)
urlsafe_b64encode(s, *args, **kwargs)
wrap()[source]
core.util.string_helpers.random_string(size)[source]

Generate a random string of binary, encoded as hex digits.

Param:

Size of binary string in bytes.

Returns:

A Unicode string.

core.util.summary module

class core.util.summary.SummaryEvaluator(optimal_number_of_sentences=4, noun_phrases_to_consider=10, bad_phrases=None)[source]

Bases: object

Evaluate summaries of a book to find a usable summary.

A usable summary will have good coverage of the popular noun phrases found across all summaries of the book, will have an approximate length of four sentences (this is customizable), and will not mention words that indicate it’s a summary of a specific edition of the book.

All else being equal, a shorter summary is better.

A summary is penalized for apparently not being in English.

add(summary, parser=None)[source]
bad_res = {re.compile('This is'), re.compile('the [^ ]+ Collection'), re.compile('Includes')}
best_choice()[source]
best_choices(n=3)[source]

Choose the best n choices among the current summaries.

default_bad_phrases = {'--container', '--original container', 'abridged', 'adaptation of', 'all rights reserved', 'complete novels', 'complete texts', 'condensed', 'contains', 'edition', 'excerpts', 'in one volume', 'look for', 'new edition', 'playaway', 'retelling', 'retelling of', 'selections', 'version', 'version of'}
log = <Logger Summary Evaluator (WARNING)>
ready()[source]

We are done adding to the corpus and ready to start evaluating.

score(summary, apply_language_penalty=True)[source]

Score a summary relative to our current view of the dataset.

core.util.titles module

core.util.titles.normalize_title_for_matching(title)[source]

Used to standardize book titles before matching them to each other to identify best results in VIAF author search feeds.

Run WorkIDCalculator.normalize_title on the name, which will convert to NFKD unicode, de-lint special characters, and lowercase.

core.util.titles.title_match_ratio(title1, title2)[source]

Returns a number between 0 and 100, representing the percent match (Levenshtein Distance) between book title1 and book title2, after each has been normalized.

core.util.titles.unfluff_title(title)[source]

Removes parts of the title that are deemed to be add-ons, like imprint information, inserted subtitles and corporate names. For example, in: Hello World, edited by Bob Bobbinson Hello World: The True and Amazing Adventures of Bob Hello World (Unabridged) (TODO: later add logic for something like Hello World, Harvard University, publisher) we want to return “Hello World”.

core.util.web_publication_manifest module

Helper classes for the Readium Web Publication Manifest format (https://github.com/readium/webpub-manifest) and its audiobook profile (https://github.com/HadrienGardeur/audiobook-manifest).

class core.util.web_publication_manifest.AudiobookManifest(context=None, type=None)[source]

Bases: Manifest

A Python object corresponding to a Readium Web Publication Manifest.

DEFAULT_TYPE = 'http://bib.schema.org/Audiobook'
class core.util.web_publication_manifest.JSONable[source]

Bases: object

An object whose Unicode representation is a JSON dump of a dictionary.

property as_dict
classmethod json_ready(value)[source]
class core.util.web_publication_manifest.Manifest(context=None, type=None)[source]

Bases: JSONable

A Python object corresponding to a Readium Web Publication Manifest.

AUDIOBOOK_TYPE = 'http://bib.schema.org/Audiobook'
BOOK_TYPE = 'http://schema.org/Book'
DEFAULT_CONTEXT = 'http://readium.org/webpub/default.jsonld'
DEFAULT_TYPE = 'http://schema.org/Book'
add_reading_order(href, type, title, **kwargs)[source]
add_resource(href, type, **kwargs)[source]
property as_dict
property component_lists
update_bibliographic_metadata(license_pool)[source]

Update this Manifest with basic bibliographic metadata taken from a LicensePool object.

Currently this assumes that there is no other source of bibliographic metadata, so it will overwrite any metadata that is already present and add a cover link even if the manifest already has one.

core.util.worker_pools module

class core.util.worker_pools.DatabaseJob[source]

Bases: Job

do_run()[source]

Does the work

finalize(_db)[source]

Finalizes the task if it is successful

rollback(_db)[source]

Cleans up the task if it errors

class core.util.worker_pools.DatabasePool(size, session_factory, worker_factory=None)[source]

Bases: Pool

A pool of DatabaseWorker threads and a job queue to keep them busy.

create_worker()[source]
class core.util.worker_pools.DatabaseWorker(jobs, _db)[source]

Bases: Worker

A worker Thread that performs jobs with a database session

do_job()[source]
classmethod factory(worker_pool, _db)[source]
class core.util.worker_pools.Job[source]

Bases: object

Abstract parent class for a bit o’ work that can be run in a Thread. For use with Worker.

do_run()[source]

Does the work

finalize(*args, **kwargs)[source]

Finalizes the task if it is successful

rollback(*args, **kwargs)[source]

Cleans up the task if it errors

run(*args, **kwargs)[source]
class core.util.worker_pools.Pool(size, worker_factory=None)[source]

Bases: object

A pool of Worker threads and a job queue to keep them busy.

create_worker()[source]
get()[source]
inc_error()[source]
join()[source]
log = <Logger core.util.worker_pools (WARNING)>
put(job)[source]
restart()[source]
property success_rate
task_done()[source]
class core.util.worker_pools.Worker(jobs)[source]

Bases: Thread

A Thread that performs jobs

do_job(*args, **kwargs)[source]
classmethod factory(worker_pool)[source]
property log
run()[source]

Method representing the thread’s activity.

You may override this method in a subclass. The standard run() method invokes the callable object passed to the object’s constructor as the target argument, if any, with sequential and keyword arguments taken from the args and kwargs arguments, respectively.

core.util.xmlparser module

class core.util.xmlparser.XMLParser[source]

Bases: object

Helper functions to process XML data.

NAMESPACES = {}
int_of_optional_subtag(tag, name, namespaces=None)[source]
int_of_subtag(tag, name, namespaces=None)[source]
process_all(xml, xpath, namespaces=None, handler=None, parser=None)[source]
process_one(tag, namespaces)[source]
text_of_optional_subtag(tag, name, namespaces=None)[source]
text_of_subtag(tag, name, namespaces=None)[source]

Module contents

Miscellaneous utilities

class core.util.Bigrams(bigrams)[source]

Bases: object

all_letters = re.compile('^[a-z]+$')
difference_from(other_bigrams)[source]
classmethod from_string(string)[source]
classmethod from_text_files(paths)[source]
classmethod process_data(data, bigrams)[source]
class core.util.MetadataSimilarity[source]

Bases: object

Estimate how similar two bits of metadata are.

SEPARATOR = re.compile('\\W')
classmethod author_name_similarity(authors1, authors2)[source]

What percentage of the total number of authors in the two sets are present in both sets?

classmethod author_similarity(authors1, authors2)[source]

What percentage of the total number of authors in the two sets are present in both sets?

classmethod counter_distance(counter1, counter2)[source]
classmethod histogram(strings, stopwords=None)[source]

Create a histogram of word frequencies across the given list of strings.

classmethod histogram_distance(strings_1, strings_2, stopwords=None)[source]

Calculate the histogram distance between two sets of strings.

The histogram distance is the sum of the word distance for every word that occurs in either histogram.

If a word appears in one histogram but not the other, its word distance is its frequency of appearance. If a word appears in both histograms, its word distance is the absolute value of the difference between that word’s frequency of appearance in histogram A, and its frequency of appearance in histogram B.

If the strings use the same words at exactly the same frequency, the difference will be 0. If the strings use completely different words, the difference will be 1.

classmethod most_common(maximum_size, *items)[source]

Return the most common item that’s not longer than the max.

classmethod normalize_histogram(histogram, total=None)[source]
classmethod title_similarity(title1, title2)[source]
class core.util.MoneyUtility[source]

Bases: object

DEFAULT_CURRENCY = 'USD'
classmethod parse(amount)[source]

Attempt to turn a string into a Money object.

class core.util.TitleProcessor[source]

Bases: object

classmethod extract_subtitle(main_title, subtitled_title)[source]

Extracts a subtitle given a shorter and longer title version

Returns:

subtitle or None

classmethod sort_title_for(title)[source]
title_stopwords = ['The ', 'A ', 'An ']
core.util.batch(iterable, size=1)[source]

Split up iterable into batches of size size.

core.util.chunks(lst, chunk_size, start_index=0)[source]

Yield successive n-sized chunks from lst.

core.util.fast_query_count(query)[source]

Counts the results of a query without using super-slow subquery

core.util.first_or_default(collection, default=None)[source]

Return first element of the specified collection or the default value if the collection is empty.

Parameters:
  • collection (Iterable) – Collection

  • default (Any) – Default value

core.util.is_session(value)[source]

Return a boolean value indicating whether the value is a valid SQLAlchemy session.

Parameters:

value (Any) – Value

Returns:

Boolean value indicating whether the value is a valid SQLAlchemy session or not

Return type:

bool

core.util.slugify(text, length_limit=None)[source]

Takes a string and turns it into a slug.

Example:

>>> slugify('Some (???) Title Somewhere')
some-title-somewhere
>>> slugify('Sly & the Family Stone')
sly-and-the-family-stone
>>> slugify('Happy birthday!', length_limit=4)
happ