core package¶
Subpackages¶
- core.classifier package
- Submodules
- core.classifier.age module
- core.classifier.bic module
- core.classifier.bisac module
- core.classifier.ddc module
- core.classifier.gutenberg module
- core.classifier.keyword module
- core.classifier.lcc module
- core.classifier.overdrive module
- core.classifier.rbdigital module
- core.classifier.simplified module
- Module contents
- core.lcp package
- core.model package
- Submodules
- core.model.admin module
- core.model.cachedfeed module
- core.model.circulationevent module
- core.model.classification module
- core.model.collection module
- core.model.complaint module
- core.model.configuration module
- core.model.constants module
- core.model.contributor module
- core.model.coverage module
- core.model.credential module
- core.model.customlist module
- core.model.datasource module
- core.model.edition module
- core.model.hasfulltablecache module
- core.model.identifier module
- core.model.integrationclient module
- core.model.library module
- core.model.licensing module
- core.model.listeners module
- core.model.measurement module
- core.model.patron module
- core.model.resource module
- core.model.work module
- Module contents
- core.python_expression_dsl package
- core.resources package
- core.util package
- Subpackages
- Submodules
- core.util.accept_language module
- core.util.authentication_for_opds module
- core.util.datetime_helpers module
- core.util.epub module
- core.util.flask_util module
- core.util.http module
- core.util.languages module
- core.util.median module
- core.util.opds_writer module
- core.util.permanent_work_id module
- core.util.personal_names module
- core.util.problem_detail module
- core.util.stopwords module
- core.util.string_helpers module
- core.util.summary module
- core.util.titles module
- core.util.web_publication_manifest module
- core.util.worker_pools module
- core.util.xmlparser module
- Module contents
Submodules¶
core.analytics module¶
core.app_server module¶
Implement logic common to more than one of the Simplified applications.
- class core.app_server.ComplaintController[source]¶
Bases:
object
A controller to register complaints against objects.
- class core.app_server.HeartbeatController[source]¶
Bases:
object
- HEALTH_CHECK_TYPE = 'application/vnd.health+json'¶
- VERSION_FILENAME = '.version'¶
- class core.app_server.URNLookupController(_db)[source]¶
Bases:
object
A controller for looking up OPDS entries for specific books, identified in terms of their Identifier URNs.
- permalink(urn, annotator, route_name='work')[source]¶
Look up a single identifier and generate an OPDS feed.
TODO: This method is tested, but it seems unused and it should be possible to remove it.
- process_urns(urns, **process_urn_kwargs)[source]¶
Process a number of URNs by instantiating a URNLookupHandler and having it do the work.
The information gathered by the URNLookupHandler can be used by the caller to generate an OPDS feed.
- Returns:
A URNLookupHandler, or a ProblemDetail if there’s a problem with the request.
- class core.app_server.URNLookupHandler(_db)[source]¶
Bases:
object
A helper for URNLookupController that takes URNs as input and looks up their OPDS entries.
This is a separate class from URNLookupController because URNLookupController is designed to not keep state.
- UNRECOGNIZED_IDENTIFIER = 'This work is not in the collection.'¶
- WORK_NOT_CREATED = 'Identifier resolved but work not yet created.'¶
- WORK_NOT_PRESENTATION_READY = 'Work created but not yet presentation-ready.'¶
- add_message(urn, status_code, message)[source]¶
An identifier lookup resulted in the creation of a message.
- core.app_server.compressible(f)[source]¶
Decorate a function to make it transparently handle whatever compression the client has announced it supports.
Currently the only form of compression supported is representation-level gzip compression requested through the Accept-Encoding header.
This code was modified from http://kb.sites.apiit.edu.my/knowledge-base/how-to-gzip-response-in-flask/, though I don’t know if that’s the original source; it shows up in a lot of places.
- core.app_server.load_facets_from_request(facet_config=None, worklist=None, base_class=<class 'core.lane.Facets'>, base_class_constructor_kwargs=None, default_entrypoint=None)[source]¶
Figure out which faceting object this request is asking for.
The active request must have the library member set to a Library object.
- Parameters:
worklist – The WorkList, if any, associated with the request.
facet_config – An object containing the currently configured facet groups, if different from the request library.
base_class – The faceting class to instantiate.
base_class_constructor_kwargs – Keyword arguments to pass into the faceting class constructor, other than those obtained from the request.
- Returns:
A faceting object if possible; otherwise a ProblemDetail.
- core.app_server.load_pagination_from_request(base_class=<class 'core.lane.Pagination'>, base_class_constructor_kwargs=None, default_size=None)[source]¶
Figure out which Pagination object this request is asking for.
- Parameters:
base_class – A subclass of Pagination to instantiate.
base_class_constructor_kwargs – Extra keyword arguments to use when instantiating the Pagination subclass.
default_size – The default page size.
- Returns:
An instance of base_class.
core.cdn module¶
Turn local URLs into CDN URLs.
core.config module¶
- exception core.config.CannotLoadConfiguration(message, debug_message=None)[source]¶
Bases:
IntegrationException
The current configuration of an external integration, or of the site as a whole, is in an incomplete or inconsistent state.
This is more specific than a base IntegrationException because it assumes the problem is evident just by looking at the current configuration, with no need to actually talk to the foreign server.
- class core.config.Configuration[source]¶
Bases:
ConfigurationConstants
- ALLOW_HOLDS = 'allow_holds'¶
- ANALYTICS_POLICY = 'analytics'¶
- APP_VERSION = 'app_version'¶
- AXIS_INTEGRATION = 'Axis 360'¶
- BASE_URL_KEY = 'base_url'¶
- CDNS_LOADED_FROM_DATABASE = 'loaded_from_database'¶
- CDN_MIRRORED_DOMAIN_KEY = 'mirrored_domain'¶
- CONTENT_SERVER_INTEGRATION = 'Content Server'¶
- DATABASE_INTEGRATION = 'Postgres'¶
- DATABASE_LOG_LEVEL = 'database_log_level'¶
- DATABASE_PRODUCTION_ENVIRONMENT_VARIABLE = 'SIMPLIFIED_PRODUCTION_DATABASE'¶
- DATABASE_PRODUCTION_URL = 'production_url'¶
- DATABASE_TEST_ENVIRONMENT_VARIABLE = 'SIMPLIFIED_TEST_DATABASE'¶
- DATABASE_TEST_URL = 'test_url'¶
- DATA_DIRECTORY = 'data_directory'¶
- DEBUG = 'DEBUG'¶
- DEFAULT_APP_NAME = 'simplified'¶
- DEFAULT_MINIMUM_FEATURED_QUALITY = 0.65¶
- DEFAULT_OPDS_FORMAT = 'simple_opds_entry'¶
- ERROR = 'ERROR'¶
- EXCLUDED_AUDIO_DATA_SOURCES = 'excluded_audio_data_sources'¶
- EXTERNAL_TYPE_REGULAR_EXPRESSION = 'external_type_regular_expression'¶
- FEATURED_LANE_SIZE = 'featured_lane_size'¶
- INFO = 'INFO'¶
- INTEGRATIONS = 'integrations'¶
- LANES_POLICY = 'lanes'¶
- LAST_CHECKED_FOR_SITE_CONFIGURATION_UPDATE = 'last_checked_for_site_configuration_update'¶
- LIBRARY_SETTINGS = [{'key': 'name', 'label': l'Name', 'description': l'The human-readable name of this library.', 'category': 'Basic Information', 'level': 3, 'required': True}, {'key': 'short_name', 'label': l'Short name', 'description': l'A short name of this library, to use when identifying it in scripts or URLs, e.g. 'NYPL'.', 'category': 'Basic Information', 'level': 3, 'required': True}, {'key': 'website', 'label': l'URL of the library's website', 'description': l'The library's main website, e.g. "https://www.nypl.org/" (not this Circulation Manager's URL).', 'required': True, 'format': 'url', 'level': 3, 'category': 'Basic Information'}, {'key': 'allow_holds', 'label': l'Allow books to be put on hold', 'type': 'select', 'options': [{'key': 'true', 'label': l'Allow holds'}, {'key': 'false', 'label': l'Disable holds'}], 'default': 'true', 'category': 'Loans, Holds, & Fines', 'level': 3}, {'key': 'enabled_entry_points', 'label': l'Enabled entry points', 'description': l'Patrons will see the selected entry points at the top level and in search results. <p>Currently supported audiobook vendors: Bibliotheca, Axis 360', 'type': 'list', 'options': [{'key': 'All', 'label': 'All'}, {'key': 'Book', 'label': 'eBooks'}, {'key': 'Audio', 'label': 'Audiobooks'}], 'default': ['Book'], 'category': 'Lanes & Filters', 'format': 'narrow', 'readOnly': True, 'level': 3}, {'key': 'featured_lane_size', 'label': l'Maximum number of books in the 'featured' lanes', 'type': 'number', 'default': 15, 'category': 'Lanes & Filters', 'level': 1}, {'key': 'minimum_featured_quality', 'label': l'Minimum quality for books that show up in 'featured' lanes', 'description': l'Between 0 and 1.', 'type': 'number', 'max': 1, 'default': 0.65, 'category': 'Lanes & Filters', 'level': 1}, {'key': 'facets_enabled_order', 'label': l'Allow patrons to sort by', 'type': 'list', 'options': [{'key': 'title', 'label': l'Title'}, {'key': 'author', 'label': l'Author'}, {'key': 'added', 'label': l'Recently Added'}, {'key': 'random', 'label': l'Random'}, {'key': 'relevance', 'label': l'Relevance'}], 'default': ['title', 'author', 'added', 'random', 'relevance'], 'category': 'Lanes & Filters', 'paired': 'facets_default_order', 'level': 2}, {'key': 'facets_enabled_available', 'label': l'Allow patrons to filter availability to', 'type': 'list', 'options': [{'key': 'now', 'label': l'Available now'}, {'key': 'all', 'label': l'All'}, {'key': 'always', 'label': l'Yours to keep'}], 'default': ['now', 'all', 'always'], 'category': 'Lanes & Filters', 'paired': 'facets_default_available', 'level': 2}, {'key': 'facets_enabled_collection', 'label': l'Allow patrons to filter collection to', 'type': 'list', 'options': [{'key': 'full', 'label': l'Everything'}, {'key': 'featured', 'label': l'Popular Books'}], 'default': ['full', 'featured'], 'category': 'Lanes & Filters', 'paired': 'facets_default_collection', 'level': 2}, {'key': 'facets_default_order', 'label': l'Default Sort by', 'type': 'select', 'options': [{'key': 'title', 'label': l'Title'}, {'key': 'author', 'label': l'Author'}, {'key': 'added', 'label': l'Recently Added'}, {'key': 'random', 'label': l'Random'}, {'key': 'relevance', 'label': l'Relevance'}], 'default': 'author', 'category': 'Lanes & Filters', 'skip': True}, {'key': 'facets_default_available', 'label': l'Default Availability', 'type': 'select', 'options': [{'key': 'now', 'label': l'Available now'}, {'key': 'all', 'label': l'All'}, {'key': 'always', 'label': l'Yours to keep'}], 'default': 'all', 'category': 'Lanes & Filters', 'skip': True}, {'key': 'facets_default_collection', 'label': l'Default Collection', 'type': 'select', 'options': [{'key': 'full', 'label': l'Everything'}, {'key': 'featured', 'label': l'Popular Books'}], 'default': 'full', 'category': 'Lanes & Filters', 'skip': True}]¶
- LOCALIZATION_LANGUAGES = 'localization_languages'¶
- LOGGING = 'logging'¶
- LOGGING_FORMAT = 'format'¶
- LOGGING_LEVEL = 'level'¶
- LOG_APP_NAME = 'log_app'¶
- LOG_DATA_FORMAT = 'format'¶
- LOG_FORMAT_JSON = 'json'¶
- LOG_FORMAT_TEXT = 'text'¶
- LOG_LEVEL = 'log_level'¶
- LOG_LEVEL_UI = [{'key': 'DEBUG', 'label': l'Debug'}, {'key': 'INFO', 'label': l'Info'}, {'key': 'WARN', 'label': l'Warn'}, {'key': 'ERROR', 'label': l'Error'}]¶
- LOG_OUTPUT_TYPE = 'output'¶
- MEASUREMENT_REAPER = 'measurement_reaper_enabled'¶
- MINIMUM_FEATURED_QUALITY = 'minimum_featured_quality'¶
- NAME = 'name'¶
- NO_APP_VERSION_FOUND = <object object>¶
- OVERDRIVE_INTEGRATION = 'Overdrive'¶
- POLICIES = 'policies'¶
- RBDIGITAL_INTEGRATION = 'RBDigital'¶
- SHORT_NAME = 'short_name'¶
- SITEWIDE_SETTINGS = [{'key': 'base_url', 'label': l'Base url of the application', 'required': True, 'format': 'url'}, {'key': 'log_level', 'label': l'Log Level', 'type': 'select', 'options': [{'key': 'DEBUG', 'label': l'Debug'}, {'key': 'INFO', 'label': l'Info'}, {'key': 'WARN', 'label': l'Warn'}, {'key': 'ERROR', 'label': l'Error'}], 'default': 'INFO'}, {'key': 'log_app', 'label': l'Application name', 'description': l'Log messages originating from this application will be tagged with this name. If you run multiple instances, giving each one a different application name will help you determine which instance is having problems.', 'default': 'simplified', 'required': True}, {'key': 'database_log_level', 'label': l'Database Log Level', 'type': 'select', 'options': [{'key': 'DEBUG', 'label': l'Debug'}, {'key': 'INFO', 'label': l'Info'}, {'key': 'WARN', 'label': l'Warn'}, {'key': 'ERROR', 'label': l'Error'}], 'description': l'Database logs are extremely verbose, so unless you're diagnosing a database-related problem, it's a good idea to set a higher log level for database messages.', 'default': 'WARN'}, {'key': 'excluded_audio_data_sources', 'label': l'Excluded audiobook sources', 'description': l'Audiobooks from these data sources will be hidden from the collection, even if they would otherwise show up as available.', 'default': None, 'required': True}, {'key': 'measurement_reaper_enabled', 'label': l'Cleanup old measurement data', 'type': 'select', 'description': l'If this settings is 'true' old book measurement data will be cleaned out of the database. Some sites may want to keep this data for later analysis.', 'options': {'true': 'true', 'false': 'false'}, 'default': 'true'}]¶
- SITE_CONFIGURATION_CHANGED = 'Site Configuration Changed'¶
- SITE_CONFIGURATION_LAST_UPDATE = 'site_configuration_last_update'¶
- SITE_CONFIGURATION_TIMEOUT = 'site_configuration_timeout'¶
- THREEM_INTEGRATION = '3M'¶
- TYPE = 'type'¶
- URL = 'url'¶
- VERSION_FILENAME = '.version'¶
- WARN = 'WARN'¶
- WEBSITE_URL = 'website'¶
- classmethod cdns_loaded_from_database()[source]¶
Has the site configuration been loaded from the database yet?
- classmethod database_url()[source]¶
Find the database URL configured for this site.
For compatibility with old configurations, we will look in the site configuration first.
If it’s not there, we will look in the appropriate environment variable.
- instance = {}¶
- classmethod last_checked_for_site_configuration_update()[source]¶
When was the last time we actually checked when the database was updated?
- classmethod load(_db=None)[source]¶
Load configuration information from the filesystem, and (optionally) from the database.
- classmethod load_from_file()[source]¶
Load additional site configuration from a config file.
This is being phased out in favor of taking all configuration from a database.
- log = <Logger Configuration file loader (WARNING)>¶
- classmethod policy(name, default=None, required=False)[source]¶
Find a policy configuration by name.
- classmethod site_configuration_last_update(_db, known_value=None, timeout=0)[source]¶
Check when the site configuration was last updated.
Updates Configuration.instance[Configuration.SITE_CONFIGURATION_LAST_UPDATE]. It’s the application’s responsibility to periodically check this value and reload the configuration if appropriate.
- Parameters:
known_value – We know when the site configuration was last updated–it’s this timestamp. Use it instead of checking with the database.
timeout – We will only call out to the database once in this number of seconds. If we are asked again before this number of seconds elapses, we will assume site configuration has not changed. By default, we call out to the database every time.
- Returns:
a datetime object.
core.coverage module¶
- class core.coverage.BaseCoverageProvider(_db, batch_size=None, cutoff_time=None, registered_only=False)[source]¶
Bases:
object
Run certain objects through an algorithm. If the algorithm returns success, add a coverage record for that object, so the object doesn’t need to be processed again. If the algorithm returns a CoverageFailure, that failure may itself be memorialized as a coverage record.
Instead of instantiating this class directly, subclass one of its subclasses: either IdentifierCoverageProvider or WorkCoverageProvider.
In IdentifierCoverageProvider the ‘objects’ are Identifier objects and the coverage records are CoverageRecord objects. In WorkCoverageProvider the ‘objects’ are Work objects and the coverage records are WorkCoverageRecord objects.
- DEFAULT_BATCH_SIZE = 100¶
- OPERATION = None¶
- SERVICE_NAME = None¶
- add_coverage_record_for(item)[source]¶
Add a coverage record for the given item.
Implemented in IdentifierCoverageProvider and WorkCoverageProvider.
- add_coverage_records_for(items)[source]¶
Add CoverageRecords for a group of items from a batch, each of which was successful.
- property collection¶
Retrieve the Collection object associated with this CoverageProvider.
- failure_for_ignored_item(work)[source]¶
Create a CoverageFailure recording the coverage provider’s failure to even try to process an item.
Implemented in IdentifierCoverageProvider and WorkCoverageProvider.
- finalize_batch()[source]¶
Do whatever is necessary to complete this batch before moving on to the next one.
e.g. committing the database session or uploading a bunch of assets to S3.
- finalize_timestampdata(timestamp, **kwargs)[source]¶
Finalize the given TimestampData and write it to the database.
- handle_success(item)[source]¶
Do something special to mark the successful coverage of the given item.
- items_that_need_coverage(identifiers=None, **kwargs)[source]¶
Create a database query returning only those items that need coverage.
- Parameters:
subset – A list of Identifier objects. If present, return only items that need coverage and are associated with one of these identifiers.
Implemented in CoverageProvider and WorkCoverageProvider.
- property log¶
- property operation¶
Which operation should this CoverageProvider use to distinguish between multiple CoverageRecords from the same data source?
- process_batch(batch)[source]¶
Do what it takes to give coverage records to a batch of items.
- Returns:
A mixed list of coverage records and CoverageFailures.
- process_batch_and_handle_results(batch)[source]¶
- Returns:
A 2-tuple (counts, records).
counts is a 3-tuple (successes, transient failures, persistent_failures).
records is a mixed list of coverage record objects (for successes and persistent failures) and CoverageFailure objects (for transient failures).
- process_item(item)[source]¶
Do the work necessary to give coverage to one specific item.
Since this is where the actual work happens, this is not implemented in IdentifierCoverageProvider or WorkCoverageProvider, and must be handled in a subclass.
- record_failure_as_coverage_record(failure)[source]¶
Convert the given CoverageFailure to a coverage record.
Implemented in IdentifierCoverageProvider and WorkCoverageProvider.
- run_once(progress, count_as_covered=None)[source]¶
Try to grant coverage to a number of uncovered items.
NOTE: If you override this method, it’s very important that your implementation eventually do one of the following: * Set progress.finish * Set progress.exception * Raise an exception
If you don’t do any of these things, run() will assume you still have work to do, and will keep calling run_once() forever.
- Parameters:
progress – A CoverageProviderProgress representing the progress made so far, and the number of records that need to be ignored for the rest of the run.
count_as_covered – Which values for CoverageRecord.status should count as meaning ‘already covered’.
- Returns:
A CoverageProviderProgress representing whatever additional progress has been made.
- property timestamp¶
Look up the Timestamp object for this CoverageProvider.
- class core.coverage.BibliographicCoverageProvider(collection, **kwargs)[source]¶
Bases:
CollectionCoverageProvider
Fill in bibliographic metadata for all books in a Collection.
e.g. ensures that we get Overdrive coverage for all Overdrive IDs in a collection.
Although a BibliographicCoverageProvider may gather CirculationData for a book, it cannot guarantee equal coverage for all Collections that contain that book. CirculationData should be limited to things like formats that don’t vary between Collections, and you should use a CollectionMonitor to make sure your circulation information is up-to-date for each Collection.
- class core.coverage.CatalogCoverageProvider(collection, **kwargs)[source]¶
Bases:
CollectionCoverageProvider
Most CollectionCoverageProviders provide coverage to Identifiers that are licensed through a given Collection.
A CatalogCoverageProvider provides coverage to Identifiers that are present in a given Collection’s catalog.
- class core.coverage.CollectionCoverageProvider(collection, **kwargs)[source]¶
Bases:
IdentifierCoverageProvider
A CoverageProvider that covers all the Identifiers currently licensed to a given Collection.
You should subclass this CoverageProvider if you want to create Works (as opposed to operating on existing Works) or update the circulation information for LicensePools. You can’t use it to create new LicensePools, since it only operates on Identifiers that already have a LicencePool in the given Collection.
If a book shows up in multiple Collections, the first Collection to process it takes care of it for the others. Any books that were processed through their membership in another Collection will be left alone.
For this reason it’s important that subclasses of this CoverageProvider only deal with bibliographic information and format availability information (such as links to open-access downloads). You’ll have problems if you try to use CollectionCoverageProvider to keep track of information like the number of licenses available for a book.
In addition to defining the class variables defined by CoverageProvider, you must define the class variable PROTOCOL when subclassing this class. This is the entity that provides the licenses for this Collection. It should be one of the collection-type provider constants defined in the ExternalIntegration class, such as ExternalIntegration.OPDS_IMPORT or ExternalIntegration.OVERDRIVE.
- DEFAULT_BATCH_SIZE = 10¶
- EXCLUDE_SEARCH_INDEX = False¶
- INPUT_IDENTIFIER_TYPES = None¶
- PROTOCOL = None¶
- classmethod all(_db, **kwargs)[source]¶
Yield a sequence of CollectionCoverageProvider instances, one for every Collection that gets its licenses from cls.PROTOCOL.
CollectionCoverageProviders will be yielded in a random order.
- Parameters:
kwargs – Keyword arguments passed into the constructor for CollectionCoverageProvider (or, more likely, one of its subclasses).
- classmethod collections(_db)[source]¶
Returns a list of randomly sorted list of collections covered by the provider.
- items_that_need_coverage(identifiers=None, **kwargs)[source]¶
Find all Identifiers associated with this Collection but lacking coverage through this CoverageProvider.
- license_pool(identifier, data_source=None)[source]¶
Finds this Collection’s LicensePool for the given Identifier, creating one if necessary.
- Parameters:
data_source – If it’s necessary to create a LicensePool, the new LicensePool will have this DataSource. The default is to use the DataSource associated with the CoverageProvider. This should only be needed by the metadata wrangler.
- run_once(*args, **kwargs)[source]¶
Try to grant coverage to a number of uncovered items.
NOTE: If you override this method, it’s very important that your implementation eventually do one of the following: * Set progress.finish * Set progress.exception * Raise an exception
If you don’t do any of these things, run() will assume you still have work to do, and will keep calling run_once() forever.
- Parameters:
progress – A CoverageProviderProgress representing the progress made so far, and the number of records that need to be ignored for the rest of the run.
count_as_covered – Which values for CoverageRecord.status should count as meaning ‘already covered’.
- Returns:
A CoverageProviderProgress representing whatever additional progress has been made.
- set_metadata_and_circulation_data(identifier, metadata, circulationdata)[source]¶
Makes sure that the given Identifier has a Work, Edition (in the context of this Collection), and LicensePool (ditto), and that all the information is up to date.
- Returns:
The Identifier (if successful) or an appropriate CoverageFailure (if not).
- work(identifier, license_pool=None, **calculate_work_kwargs)[source]¶
Finds or creates a Work for this Identifier as licensed through this Collection.
If the given Identifier already has a Work associated with it, that Work will always be used, since an Identifier can only have one Work associated with it.
However, if there is no current Work, a Work will only be created if the given Identifier already has a LicensePool in the Collection associated with this CoverageProvider (or if a LicensePool to use is provided.) This method will not create new LicensePools.
If the work is newly created or an existing work is not presentation-ready, a new Work will be created by calling LicensePool.calculate_work(). If there is an existing presentation-ready work, calculate_work() will not be called; instead, the work will be slated for recalculation when its metadata changes through Metadata.apply().
- Parameters:
calculate_work_kwargs – Keyword arguments to pass into calculate_work() if and when it is called.
- Returns:
A Work, if possible. Otherwise, a CoverageFailure explaining why no Work could be created.
- class core.coverage.CollectionCoverageProviderJob(collection, provider_class, progress, **provider_kwargs)[source]¶
Bases:
DatabaseJob
- class core.coverage.CoverageFailure(obj, exception, data_source=None, transient=True, collection=None)[source]¶
Bases:
object
Object representing the failure to provide coverage.
- class core.coverage.CoverageProviderProgress(*args, **kwargs)[source]¶
Bases:
TimestampData
A TimestampData optimized for the special needs of CoverageProviders.
- property achievements¶
Represent the achievements of a CoverageProvider as a human-readable string.
- class core.coverage.IdentifierCoverageProvider(_db, collection=None, input_identifiers=None, replacement_policy=None, **kwargs)[source]¶
Bases:
BaseCoverageProvider
Run Identifiers of certain types (ISBN, Overdrive, OCLC Number, etc.) through an algorithm associated with a certain DataSource.
This class is designed to be subclassed rather than instantiated directly. Subclasses should define SERVICE_NAME, OPERATION (optional), DATA_SOURCE_NAME, and INPUT_IDENTIFIER_TYPES. SERVICE_NAME and OPERATION are described in BaseCoverageProvider; the rest are described in appropriate comments in this class.
- COVERAGE_COUNTS_FOR_EVERY_COLLECTION = True¶
- DATA_SOURCE_NAME = None¶
- INPUT_IDENTIFIER_TYPES = <object object>¶
- NO_SPECIFIED_TYPES = <object object>¶
- add_coverage_record_for(item)[source]¶
Record this CoverageProvider’s coverage for the given Edition/Identifier, as a CoverageRecord.
- classmethod bulk_register(identifiers, data_source=None, collection=None, force=False, autocreate=False)[source]¶
Registers identifiers for future coverage.
This method is primarily for use with CoverageProviders that use the registered_only flag to process items. It’s currently only in use on the Metadata Wrangler.
- Parameters:
data_source – DataSource object or basestring representing a DataSource name.
collection – Collection object to be associated with the CoverageRecords.
force – When True, even existing CoverageRecords will have their status reset to CoverageRecord.REGISTERED.
autocreate – When True, a basestring provided by data_source will be autocreated in the database if it didn’t previously exist.
- Returns:
A tuple of two lists: the first has fresh new REGISTERED CoverageRecords and the second list already has Identifiers that were ignored because they already had coverage.
TODO: Take identifier eligibility into account when registering.
- can_cover(identifier)[source]¶
Can this IdentifierCoverageProvider do anything with the given Identifier?
This is not needed in the normal course of events, but a caller may need to decide whether to pass an Identifier into ensure_coverage() or register().
- property collection_or_not¶
If this CoverageProvider needs to be run multiple times on the same identifier in different collections, this returns the collection. Otherwise, this returns None.
- property data_source¶
Look up the DataSource object corresponding to the service we’re running this data through.
Out of an excess of caution, we look up the DataSource every time, rather than storing it, in case a CoverageProvider is ever used in an environment where the database session is scoped (e.g. the circulation manager).
- edition(identifier)[source]¶
Finds or creates an Edition representing this coverage provider’s view of a given Identifier.
- ensure_coverage(item, force=False)[source]¶
Ensure coverage for one specific item.
- Parameters:
item – This should always be an Identifier, but this code will also work if it’s an Edition. (The Edition’s .primary_identifier will be covered.)
force – Run the coverage code even if an existing coverage record for this item was created after self.cutoff_time.
- Returns:
Either a coverage record or a CoverageFailure.
TODO: This could be abstracted and moved to BaseCoverageProvider.
- failure(identifier, error, transient=True)[source]¶
Create a CoverageFailure object to memorialize an error.
- failure_for_ignored_item(item)[source]¶
Create a CoverageFailure recording the CoverageProvider’s failure to even try to process an item.
- items_that_need_coverage(identifiers=None, **kwargs)[source]¶
Find all items lacking coverage from this CoverageProvider.
Items should be Identifiers, though Editions should also work.
By default, all identifiers of the INPUT_IDENTIFIER_TYPES which don’t already have coverage are chosen.
- Parameters:
identifiers – The batch of identifier objects to test for coverage. identifiers and self.input_identifiers can intersect – if this provider was created for the purpose of running specific Identifiers, and within those Identifiers you want to batch, you can use both parameters.
- record_failure_as_coverage_record(failure)[source]¶
Turn a CoverageFailure into a CoverageRecord object.
- classmethod register(identifier, data_source=None, collection=None, force=False, autocreate=False)[source]¶
Registers an identifier for future coverage.
See CoverageProvider.bulk_register for more information about using this method.
- class core.coverage.MARCRecordWorkCoverageProvider(_db, batch_size=None, cutoff_time=None, registered_only=False)[source]¶
Bases:
WorkPresentationProvider
Make sure all presentation-ready works have an up-to-date MARC record.
- DEFAULT_BATCH_SIZE = 1000¶
- OPERATION = 'generate-marc'¶
- SERVICE_NAME = 'MARC Record Work Coverage Provider'¶
- class core.coverage.OPDSEntryWorkCoverageProvider(_db, batch_size=None, cutoff_time=None, registered_only=False)[source]¶
Bases:
WorkPresentationProvider
Make sure all presentation-ready works have an up-to-date OPDS entry.
This is different from the OPDSEntryCacheMonitor, which sweeps over all presentation-ready works, even ones which are already covered.
- DEFAULT_BATCH_SIZE = 1000¶
- OPERATION = 'generate-opds'¶
- SERVICE_NAME = 'OPDS Entry Work Coverage Provider'¶
- class core.coverage.PresentationReadyWorkCoverageProvider(_db, batch_size=None, cutoff_time=None, registered_only=False)[source]¶
Bases:
WorkCoverageProvider
A WorkCoverageProvider that only covers presentation-ready works.
- class core.coverage.WorkClassificationCoverageProvider(_db, batch_size=None, cutoff_time=None, registered_only=False)[source]¶
Bases:
WorkPresentationEditionCoverageProvider
Calculates the ‘expensive’ parts of a work’s presentation: classifications, summary, and quality.
We do all three at once because these gathering together all equivalent identifiers for the work, which can be, by far, the most expensive part of the work.
This is called ‘classification’ because that’s the most likely use of this coverage provider. If you want to make sure a bunch of works get their summaries recalculated, you need to remember that the coverage record to delete is CLASSIFY_OPERATION.
- DEFAULT_BATCH_SIZE = 20¶
- OPERATION = 'classify'¶
- POLICY = <core.model.PresentationCalculationPolicy object>¶
- SERVICE_NAME = 'Work classification coverage provider'¶
- class core.coverage.WorkCoverageProvider(_db, batch_size=None, cutoff_time=None, registered_only=False)[source]¶
Bases:
BaseCoverageProvider
Perform coverage operations on Works rather than Identifiers.
- add_coverage_record_for(work)[source]¶
Record this CoverageProvider’s coverage for the given Edition/Identifier, as a WorkCoverageRecord.
- add_coverage_records_for(works)[source]¶
Add WorkCoverageRecords for a group of works from a batch, each of which was successful.
- failure_for_ignored_item(work)[source]¶
Create a CoverageFailure recording the WorkCoverageProvider’s failure to even try to process a Work.
- items_that_need_coverage(identifiers=None, **kwargs)[source]¶
Find all Works lacking coverage from this CoverageProvider.
By default, all Works which don’t already have coverage are chosen.
- Param:
Only Works connected with one of the given identifiers are chosen.
- record_failure_as_coverage_record(failure)[source]¶
Turn a CoverageFailure into a WorkCoverageRecord object.
- classmethod register(work, force=False)[source]¶
Registers a work for future coverage.
This method is primarily for use with CoverageProviders that use the registered_only flag to process items. It’s currently only in use on the Metadata Wrangler.
- Parameters:
force – Set to True to reset an existing CoverageRecord’s status “registered”, regardless of its current status.
- class core.coverage.WorkPresentationEditionCoverageProvider(_db, batch_size=None, cutoff_time=None, registered_only=False)[source]¶
Bases:
WorkPresentationProvider
Make sure each Work has an up-to-date presentation edition.
This basically means comparing all the Editions associated with the Work and building a composite Edition.
Expensive operations – calculating work quality, summary, and genre classification – are reserved for WorkClassificationCoverageProvider
- OPERATION = 'choose-edition'¶
- POLICY = <core.model.PresentationCalculationPolicy object>¶
- SERVICE_NAME = 'Calculated presentation coverage provider'¶
- class core.coverage.WorkPresentationProvider(_db, batch_size=None, cutoff_time=None, registered_only=False)[source]¶
Bases:
PresentationReadyWorkCoverageProvider
Recalculate some part of presentation for works that are presentation-ready.
A Work’s presentation is set when it’s made presentation-ready (thus the name). When that happens, a number of WorkCoverageRecords are set for that Work.
A migration script may remove a coverage record if it knows a work needs to have some aspect of its presentation recalculated. These providers give back the ‘missing’ coverage.
- DEFAULT_BATCH_SIZE = 100¶
core.entrypoint module¶
- class core.entrypoint.AudiobooksEntryPoint[source]¶
Bases:
MediumEntryPoint
- INTERNAL_NAME = 'Audio'¶
- URI = 'http://bib.schema.org/Audiobook'¶
- class core.entrypoint.EbooksEntryPoint[source]¶
Bases:
MediumEntryPoint
- INTERNAL_NAME = 'Book'¶
- URI = 'http://schema.org/EBook'¶
- class core.entrypoint.EntryPoint[source]¶
Bases:
object
A EntryPoint is a top-level entry point into a library’s Lane structure that may apply additional filters to the Lane structure.
The “Books” and “Audiobooks” entry points (defined in the EbooksEntryPoint and AudiobooksEntryPoint classes) are different views on a library’s Lane structure; each applies an additional filter against Edition.medium.
Each individual EntryPoint should be represented as a subclass of EntryPoint, and should be registered with the overall EntryPoint class by calling EntryPoint.register.
The list of entry points shows up as a facet group in a library’s top-level grouped feed, and in search results. The SimplyE client renders entry points as a set of tabs.
- BY_INTERNAL_NAME = {'All': <class 'core.entrypoint.EverythingEntryPoint'>, 'Audio': <class 'core.entrypoint.AudiobooksEntryPoint'>, 'Book': <class 'core.entrypoint.EbooksEntryPoint'>}¶
- DEFAULT_ENABLED = [<class 'core.entrypoint.EbooksEntryPoint'>]¶
- DISPLAY_TITLES = {<class 'core.entrypoint.EverythingEntryPoint'>: 'All', <class 'core.entrypoint.EbooksEntryPoint'>: 'eBooks', <class 'core.entrypoint.AudiobooksEntryPoint'>: 'Audiobooks'}¶
- ENABLED_SETTING = 'enabled_entry_points'¶
- ENTRY_POINTS = [<class 'core.entrypoint.EverythingEntryPoint'>, <class 'core.entrypoint.EbooksEntryPoint'>, <class 'core.entrypoint.AudiobooksEntryPoint'>]¶
- URI = None¶
- classmethod modify_database_query(_db, qu)[source]¶
If necessary, modify a database query so that it restricts results to items shown through this entry point.
The default behavior is to not change a database query at all.
- classmethod modify_search_filter(filter)[source]¶
If necessary, modify an ElasticSearch Filter object so that it restricts results to items shown through this entry point.
The default behavior is not to change the Filter object at all.
- Parameters:
filter – An external_search.Filter object.
- classmethod register(entrypoint_class, display_title, default_enabled=False)[source]¶
Register the given subclass with the master registry kept in the EntryPoint class.
- Parameters:
entrypoint_class – A subclass of EntryPoint.
display_title – The title to use when displaying this entry point to patrons.
default_enabled – New libraries should have this entry point enabled by default.
- class core.entrypoint.EverythingEntryPoint[source]¶
Bases:
EntryPoint
An entry point that has everything.
- INTERNAL_NAME = 'All'¶
- URI = 'http://schema.org/CreativeWork'¶
- class core.entrypoint.MediumEntryPoint[source]¶
Bases:
EntryPoint
A entry point that creates a view on one specific medium.
The medium is expected to be the entry point’s INTERNAL_NAME.
The URI is expected to be the one in Edition.schema_to_additional_type[INTERNAL_NAME]
core.exceptions module¶
core.external_list module¶
- class core.external_list.ClassificationBasedMembershipManager(custom_list, subject_fragments)[source]¶
Bases:
MembershipManager
Manage a custom list containing all Editions whose primary Identifier is classified under one of the given subject fragments.
- property new_membership¶
Iterate over the new membership of the list.
- Yield:
a sequence of Edition objects
- class core.external_list.CustomListFromCSV(data_source_name, list_name, metadata_client=None, overwrite_old_data=False, annotation_field='text', annotation_author_name_field='name', annotation_author_affiliation_field='location', first_appearance_field='timestamp', **kwargs)[source]¶
Bases:
CSVMetadataImporter
Create a CustomList, with entries, from a CSV file.
- metadata_to_list_entry(custom_list, data_source, now, metadata)[source]¶
Convert a Metadata object to a CustomListEntry.
- class core.external_list.MembershipManager(custom_list, log=None)[source]¶
Bases:
object
Manage the membership of a custom list based on some criteria.
- property new_membership¶
Iterate over the new membership of the list.
- Yield:
a sequence of Edition objects
- class core.external_list.TitleFromExternalList(metadata, first_appearance, most_recent_appearance, annotation)[source]¶
Bases:
object
This class helps you convert data from external lists into Simplified Edition and CustomListEntry objects.
- to_custom_list_entry(custom_list, metadata_client, overwrite_old_data=False)[source]¶
Turn this object into a CustomListEntry with associated Edition.
- to_edition(_db, metadata_client, overwrite_old_data=False)[source]¶
Create or update an Edition object for this list item.
We have two goals here:
1. Make sure there is an Edition representing the list’s view of the data.
2. If at all possible, connect the Edition’s primary identifier to other identifiers in the system, identifiers which may have associated LicensePools. This can happen in two ways:
2a. The Edition’s primary identifier, or other identifiers associated with the Edition, may be directly associated with LicensePools. This can happen if a book’s list entry includes (e.g.) an Overdrive ID.
2b. The Edition’s permanent work ID may identify it as the same work as other Editions in the system. In that case this Edition’s primary identifier may be associated with the other Editions’ primary identifiers. (p=0.85)
core.external_search module¶
- class core.external_search.CurrentMapping[source]¶
Bases:
Mapping
The first mapping to support only Elasticsearch 6.
The body of this mapping looks for bibliographic information in the core document, primarily used for matching search requests. It also has nested documents, which are used for filtering and ranking Works when generating other types of feeds:
licensepools – the Work has these LicensePools (includes current availability as a boolean, but not detailed availability information)
customlists – the Work is on these CustomLists
contributors – these Contributors worked on the Work
- AUTHOR_CHAR_FILTER_NAMES = ['unknown_author', 'primary_author_only', 'strip_parentheticals', 'strip_periods', 'collapse_three_initials', 'collapse_two_initials']¶
- CHAR_FILTERS = {'collapse_three_initials': {'pattern': ' ([A-Z]) ([A-Z]) ([A-Z])$', 'replacement': ' $1$2$3', 'type': 'pattern_replace'}, 'collapse_two_initials': {'pattern': ' ([A-Z]) ([A-Z])$', 'replacement': ' $1$2', 'type': 'pattern_replace'}, 'primary_author_only': {'pattern': '\\s+;.*', 'replacement': '', 'type': 'pattern_replace'}, 'remove_apostrophes': {'pattern': "'", 'replacement': '', 'type': 'pattern_replace'}, 'strip_parentheticals': {'pattern': '\\s+\\([^)]+\\)', 'replacement': '', 'type': 'pattern_replace'}, 'strip_periods': {'pattern': '\\.', 'replacement': '', 'type': 'pattern_replace'}, 'unknown_author': {'pattern': '\\[Unknown\\]', 'replacement': '�', 'type': 'pattern_replace'}}¶
- VERSION_NAME = 'v4'¶
- WORK_LAST_UPDATE_SCRIPT = "\ndouble champion = -1;\n// Start off by looking at the work's last update time.\nfor (candidate in doc['last_update_time']) {\n if (champion == -1 || candidate > champion) { champion = candidate; }\n}\nif (params.collection_ids != null && params.collection_ids.length > 0) {\n // Iterate over all licensepools looking for a pool in a collection\n // relevant to this filter. When one is found, check its\n // availability time to see if it's later than the last update time.\n for (licensepool in params._source.licensepools) {\n if (!params.collection_ids.contains(licensepool['collection_id'])) { continue; }\n double candidate = licensepool['availability_time'];\n if (champion == -1 || candidate > champion) { champion = candidate; }\n }\n}\nif (params.list_ids != null && params.list_ids.length > 0) {\n\n // Iterate over all customlists looking for a list relevant to\n // this filter. When one is found, check the previous work's first\n // appearance on that list to see if it's later than the last\n // update time.\n for (customlist in params._source.customlists) {\n if (!params.list_ids.contains(customlist['list_id'])) { continue; }\n double candidate = customlist['first_appearance'];\n if (champion == -1 || candidate > champion) { champion = candidate; }\n }\n}\n\nreturn champion;\n"¶
- name = 'collapse_two_initials'¶
- normalizer = {'pattern': ' ([A-Z]) ([A-Z])$', 'replacement': ' $1$2', 'type': 'pattern_replace'}¶
- pattern = ' ([A-Z]) ([A-Z])$'¶
- replacement = ' $1$2'¶
- class core.external_search.ExternalSearchIndex(_db, url=None, works_index=None, test_search_term=None, in_testing=False, mapping=None)[source]¶
Bases:
HasSelfTests
- CURRENT_ALIAS_SUFFIX = 'current'¶
- DEFAULT_TEST_SEARCH_TERM = 'test'¶
- DEFAULT_WORKS_INDEX_PREFIX = 'circulation-works'¶
- MOCK_IMPLEMENTATION = None¶
- NAME = 'Elasticsearch'¶
- SETTINGS = [{'key': 'url', 'label': l'URL', 'required': True, 'format': 'url'}, {'key': 'works_index_prefix', 'label': l'Index prefix', 'default': 'circulation-works', 'required': True, 'description': l'Any Elasticsearch indexes needed for this application will be created with this unique prefix. In most cases, the default will work fine. You may need to change this if you have multiple application servers using a single Elasticsearch server.'}, {'key': 'test_search_term', 'label': l'Test search term', 'default': 'test', 'description': l'Self tests will use this value as the search term.'}]¶
- SITEWIDE = True¶
- TEST_SEARCH_TERM_KEY = 'test_search_term'¶
- VERSION_RE = re.compile('-v([0-9]+)$')¶
- WORKS_INDEX_PREFIX_KEY = 'works_index_prefix'¶
- bulk_update(works, retry_on_batch_failure=True)[source]¶
Upload a batch of works to the search index at once.
- query_works(query_string, filter=None, pagination=None, debug=False)[source]¶
Run a search query.
This works by calling query_works_multi().
- Parameters:
query_string – The string to search for.
filter – A Filter object, used to filter out works that would otherwise match the query string.
pagination – A Pagination object, used to get a subset of the search results.
debug – If this is True, debugging information will be gathered and logged. The search query will ask ElasticSearch for all available fields, not just the fields known to be used by the feed generation code. This all comes at a slight performance cost.
- Returns:
A list of Hit objects containing information about the search results. This will include the values of any script fields calculated by ElasticSearch during the search process.
- query_works_multi(queries, debug=False)[source]¶
Run several queries simultaneously and return the results as a big list.
- Parameters:
queries – A list of (query string, Filter, Pagination) 3-tuples, each representing an Elasticsearch query to be run.
- Yield:
A sequence of lists, one per item in queries, each containing the search results from that (query string, Filter, Pagination) 3-tuple.
- classmethod reset()[source]¶
Resets the __client object to None so a new configuration can be applied during object initialization.
This method is only intended for use in testing.
- set_works_index_and_alias(_db)[source]¶
Finds or creates the works_index and works_alias based on the current configuration.
- setup_current_alias(_db)[source]¶
Finds or creates the works_alias as named by the current site settings.
If the resulting alias exists and is affixed to a different index or if it can’t be generated for any reason, the alias will not be created or moved. Instead, the search client will use the the works_index directly for search queries.
- setup_index(new_index=None, **index_settings)[source]¶
Create the search index with appropriate mapping.
This will destroy the search index, and all works will need to be indexed again. In production, don’t use this on an existing index. Use it to create a new index, then change the alias to point to the new index.
- work_document_type = 'work-type'¶
- class core.external_search.Filter(collections=None, media=None, languages=None, fiction=None, audiences=None, target_age=None, genre_restriction_sets=None, customlist_restriction_sets=None, facets=None, script_fields=None, **kwargs)[source]¶
Bases:
SearchBase
A filter for search results.
This covers every reason you might want to not exclude a search result that would otherise match the query string – wrong media, wrong language, not available in the patron’s library, etc.
This also covers every way you might want to order the search results: either by relevance to the search query (the default), or by a specific field (e.g. author) as described by a Facets object.
It also covers additional calculated values you might need when presenting the search results.
- AUTHOR_MATCH_ROLES = ['Author', 'Primary Author', 'Narrator', 'Editor', 'Director', 'Actor']¶
- DETERMINISTIC = <object object>¶
- FEATURABLE_SCRIPT = "Math.pow(Math.min(%(cutoff).5f, doc['quality'].value), %(exponent).5f) * 5"¶
- KNOWN_SCRIPT_FIELDS = ['last_update']¶
- property asc¶
Convert order_ascending to Elasticsearch-speak.
- property audiences¶
Return the appropriate audiences for this query.
This will be whatever audiences were provided, but it will probably also include the ‘All Ages’ audience.
- property author_filter¶
Build a filter that matches a ‘contributors’ subdocument only if it represents an author-level contribution by self.author.
- build(_chain_filters=None)[source]¶
Convert this object to an Elasticsearch Filter object.
- Returns:
A 2-tuple (filter, nested_filters). Filters on fields within nested documents (such as ‘licensepools.collection_id’) must be applied as subqueries to the query that will eventually be created from this filter. nested_filters is a dictionary that maps a path to a list of filters to apply to that path.
- Parameters:
_chain_filters – Mock function to use instead of Filter._chain_filters
- featurability_scoring_functions(random_seed)[source]¶
Generate scoring functions that weight works randomly, but with ‘more featurable’ works tending to be at the top.
- classmethod from_worklist(_db, worklist, facets)[source]¶
Create a Filter that finds only works that belong in the given WorkList and EntryPoint.
- Parameters:
worklist – A WorkList
facets – A SearchFacets object.
- property last_update_time_script_field¶
Return the configuration for a script field that calculates the ‘last update’ time of a work. An ‘update’ happens when the work’s metadata is changed, when it’s added to a collection used by this Filter, or when it’s added to one of the lists used by this Filter.
- property sort_order¶
Create a description, for use in an Elasticsearch document, explaining how search results should be ordered.
- Returns:
A list of dictionaries, each dictionary mapping a field name to an explanation of how to sort that field. Usually the explanation is a simple string, either ‘asc’ or ‘desc’.
- property target_age_filter¶
Helper method to generate the target age subfilter.
It’s complicated because it has to handle cases where the upper or lower bound on target age is missing (indicating there is no upper or lower bound).
- class core.external_search.Mapping[source]¶
Bases:
MappingDocument
A class that defines the mapping for a particular version of the search index.
Code that won’t change between versions can go here. (Or code that can change between versions without affecting anything.)
- VERSION_NAME = None¶
- create(search_client, base_index_name)[source]¶
Ensure that an index exists in search_client for this Mapping.
- Returns:
True or False, indicating whether the index was created new.
- classmethod script_name(base_name)[source]¶
Scope a script name with “simplified” (to avoid confusion with other applications on the Elasticsearch server), and the version number (to avoid confusion with other versions of this application, which may implement the same script differently, on this Elasticsearch server).
- class core.external_search.MappingDocument[source]¶
Bases:
object
This class knows a lot about how the ‘properties’ section of an Elasticsearch mapping document (or one of its subdocuments) is created.
- add_properties(properties_by_type)[source]¶
Turn a dictionary mapping types to field names into a bunch of add_property() calls.
Useful when you have a lot of fields that don’t need any customization.
- add_property(name, type, **description)[source]¶
Add a field to the list of properties.
- Parameters:
name – Name of the field as found in search documents.
type – Type of the field. This may be a custom type, so long as a hook method is defined for that type.
description – Description of the field.
- basic_text_property_hook(description)[source]¶
Hook method to handle the custom ‘basic_text’ property type.
This type does not exist in Elasticsearch. It’s our name for a text field that is indexed three times: once using our default English analyzer (“title”), once using an analyzer with minimal stemming (“title.minimal”) for close matches, and once using an analyzer that leaves stopwords in place, for searches that rely on stopwords.
- filterable_text_property_hook(description)[source]¶
Hook method to handle the custom ‘filterable_text’ property type.
This type does not exist in Elasticsearch. It’s our name for a text field that can be used in both queries and filters.
This field is indexed _four_ times – the three ways a normal text field is indexed, plus again as an unparsed keyword that can be used in filters.
- class core.external_search.MockExternalSearchIndex(url=None)[source]¶
Bases:
ExternalSearchIndex
- query_works(query_string, filter, pagination, debug=False)[source]¶
Run a search query.
This works by calling query_works_multi().
- Parameters:
query_string – The string to search for.
filter – A Filter object, used to filter out works that would otherwise match the query string.
pagination – A Pagination object, used to get a subset of the search results.
debug – If this is True, debugging information will be gathered and logged. The search query will ask ElasticSearch for all available fields, not just the fields known to be used by the feed generation code. This all comes at a slight performance cost.
- Returns:
A list of Hit objects containing information about the search results. This will include the values of any script fields calculated by ElasticSearch during the search process.
- query_works_multi(queries, debug=False)[source]¶
Run several queries simultaneously and return the results as a big list.
- Parameters:
queries – A list of (query string, Filter, Pagination) 3-tuples, each representing an Elasticsearch query to be run.
- Yield:
A sequence of lists, one per item in queries, each containing the search results from that (query string, Filter, Pagination) 3-tuple.
- work_document_type = 'work-type'¶
- class core.external_search.MockMeta[source]¶
Bases:
dict
Mock the .meta object associated with an Elasticsearch search result. This is necessary to get SortKeyPagination to work with MockExternalSearchIndex.
- property sort¶
- class core.external_search.MockSearchResult(sort_title, sort_author, meta, id)[source]¶
Bases:
object
- class core.external_search.Query(query_string, filter=None, use_query_parser=True)[source]¶
Bases:
SearchBase
An attempt to find something in the search index.
- BASELINE_COEFFICIENT = 1¶
- DEFAULT_KEYWORD_MATCH_COEFFICIENT = 1000¶
- KEYWORD_MATCH_COEFFICIENT_FOR_FIELD = {'imprint': 2, 'publisher': 2}¶
- MULTI_MATCH_FIELDS = ['subtitle', 'series', 'author']¶
- QUERY_WAS_A_FILTER_WEIGHT = 600¶
- SEARCH_RELEVANT_ROLES = ['Primary Author', 'Author', 'Narrator']¶
- SIMPLE_MATCH_FIELDS = ['title', 'subtitle', 'series', 'publisher', 'imprint']¶
- SLIGHTLY_ABOVE_BASELINE = 1.1¶
- SPELLCHECKER = <spellchecker.spellchecker.SpellChecker object>¶
- STEMMABLE_FIELDS = ['title', 'subtitle', 'series']¶
- STOPWORD_FIELDS = ['title', 'subtitle', 'series']¶
- WEIGHT_FOR_FIELD = {'author': 120.0, 'contributors.display_name': 120.0, 'contributors.sort_name': 120.0, 'imprint': 40.0, 'publisher': 40.0, 'series': 120.0, 'subtitle': 130.0, 'summary': 80.0, 'title': 140.0}¶
- build(elasticsearch, pagination=None)[source]¶
Make an Elasticsearch-DSL Search object out of this query.
- Parameters:
elasticsearch – An Elasticsearch-DSL Search object. This object is ready to run a search against an Elasticsearch server, but it doesn’t represent any particular Elasticsearch query.
pagination – A Pagination object indicating a slice of results to pull from the search index.
- Returns:
An Elasticsearch-DSL Search object that’s prepared to run this specific query.
- property elasticsearch_query¶
Build an Elasticsearch-DSL Query object for this query string.
- field = 'contributors.display_name'¶
- property match_author_hypotheses¶
Yield a sequence of query objects representing possible ways in which a query string might represent a book’s author.
- Parameters:
query_string – The query string that might be the name of an author.
- Yield:
A sequence of Elasticsearch-DSL query objects to be considered as hypotheses.
- match_one_field_hypotheses(base_field, query_string=None)[source]¶
Yield a number of hypotheses representing different ways in which the query string might be an attempt to match a given field.
- Parameters:
base_field – The name of the field to search, e.g. “title” or “contributors.sort_name”.
query_string – The query string to use, if different from self.query_string.
- Yield:
A sequence of (hypothesis, weight) 2-tuples.
- property match_topic_hypotheses¶
Yield a number of hypotheses representing different ways in which the query string might be a topic match.
Currently there is only one such hypothesis.
TODO: We probably want to introduce a fuzzy version of this hypothesis.
- property parsed_query_matches¶
Deal with a query string that contains information that should be exactly matched against a controlled vocabulary (e.g. “nonfiction” or “grade 5”) along with information that is more search-like (such as a title or author).
The match information is pulled out of the query string and used to make a series of match_phrase queries. The rest of the information is used in a simple query that matches basic fields.
- title_multi_match_for(other_field)[source]¶
Helper method to create a MultiMatch hypothesis that crosses multiple fields.
This strategy only works if everything is spelled correctly, since we can’t combine a “cross_fields” Multimatch query with a fuzzy search.
- Yield:
At most one (hypothesis, weight) 2-tuple.
- class core.external_search.QueryParser(query_string, query_class=<class 'core.external_search.Query'>)[source]¶
Bases:
object
Attempt to parse filter information out of a query string.
This class is where we make sense of queries like the following:
asteroids nonfiction grade 5 dogs young adult romance divorce age 10 and up
These queries contain information that can best be thought of in terms of a filter against specific fields (“nonfiction”, “grade 5”, “romance”). Books either match these criteria or they don’t.
These queries may also contain information that can be thought of in terms of a search (“asteroids”, “dogs”) – books may match these criteria to a greater or lesser extent.
- add_match_term_filter(query, field, query_string, matched_portion)[source]¶
Create a match query that finds documents whose value for field matches query.
Add it to self.filters, and remove the relevant portion of query_string so it doesn’t get reused.
- add_target_age_filter(query, query_string, matched_portion)[source]¶
Create a query that finds documents whose value for target_age matches query.
Add a filter version of this query to .match_queries (so that all documents outside the target age are filtered out).
Add a boosted version of this query to .match_queries (so that documents that cluster tightly around the target age are boosted over documents that span a huge age range).
Remove the relevant portion of query_string so it doesn’t get reused.
- class core.external_search.SearchBase[source]¶
Bases:
object
A superclass containing helper methods for creating and modifying Elasticsearch-dsl Query-type objects.
- classmethod make_target_age_query(target_age, boost=1.1)[source]¶
Create an Elasticsearch query object for a boolean query that matches works whose target ages overlap (partially or entirely) the given age range.
- Parameters:
target_age – A 2-tuple (lower limit, upper limit)
boost – Boost works that fit precisely into the target age range by this amount, vis-a-vis works that don’t.
- class core.external_search.SearchIndexCoverageProvider(*args, **kwargs)[source]¶
Bases:
WorkPresentationProvider
Make sure all Works have up-to-date representation in the search index.
- DEFAULT_BATCH_SIZE = 500¶
- OPERATION = 'update-search-index'¶
- SERVICE_NAME = 'Search index coverage provider'¶
- class core.external_search.SortKeyPagination(last_item_on_previous_page=None, size=50)[source]¶
Bases:
Pagination
An Elasticsearch-specific implementation of Pagination that paginates search results by tracking where in a sorted list the previous page left off, rather than using a numeric index into the list.
- classmethod from_request(get_arg, default_size=None)[source]¶
Instantiate a SortKeyPagination object from a Flask request.
- modify_search_query(search)[source]¶
Modify the given Search object so that it starts picking up items immediately after the previous page.
- Parameters:
search – An elasticsearch-dsl Search object.
- property next_page¶
If possible, create a new SortKeyPagination representing the next page of results.
- property offset¶
- page_loaded(page)[source]¶
An actual page of results has been fetched. Keep any internal state that would be useful to know when reasoning about earlier or later pages.
Specifically, keep track of the sort value of the last item on this page, so that self.next_page will create a SortKeyPagination object capable of generating the subsequent page.
- Parameters:
page – A list of elasticsearch-dsl Hit objects.
- property pagination_key¶
Create the pagination key for this page.
- property previous_page¶
- property total_size¶
- class core.external_search.WorkSearchResult(work, hit)[source]¶
Bases:
object
Wraps a Work object to give extra information obtained from ElasticSearch.
This object acts just like a Work (though isinstance(x, Work) will fail), with one exception: you can access the raw ElasticSearch Hit result as ._hit.
This is useful when a Work needs to be ‘tagged’ with information obtained through Elasticsearch, such as its ‘last modified’ date the context of a specific lane.
core.facets module¶
- class core.facets.FacetConfig(enabled_facets, default_facets, entrypoints=[])[source]¶
Bases:
FacetConstants
A class that implements the facet-related methods of Library, and allows modifications to the enabled and default facets. For use when a controller needs to use a facet configuration different from the site-wide facets.
- class core.facets.FacetConstants[source]¶
Bases:
object
- AVAILABILITY_FACETS = ['now', 'all', 'always']¶
- AVAILABILITY_FACET_GROUP_NAME = 'available'¶
- AVAILABLE_ALL = 'all'¶
- AVAILABLE_NOT_NOW = 'not_now'¶
- AVAILABLE_NOW = 'now'¶
- AVAILABLE_OPEN_ACCESS = 'always'¶
- COLLECTION_FACETS = ['full', 'featured']¶
- COLLECTION_FACET_GROUP_NAME = 'collection'¶
- COLLECTION_FEATURED = 'featured'¶
- COLLECTION_FULL = 'full'¶
- DEFAULT_ENABLED_FACETS = {'available': ['all', 'now', 'always'], 'collection': ['full', 'featured'], 'order': ['author', 'title', 'added']}¶
- DEFAULT_FACET = {'available': 'all', 'collection': 'full', 'order': 'author'}¶
- ENTRY_POINT_FACET_GROUP_NAME = 'entrypoint'¶
- ENTRY_POINT_REL = 'http://librarysimplified.org/terms/rel/entrypoint'¶
- FACETS_BY_GROUP = {'available': ['now', 'all', 'always'], 'collection': ['full', 'featured'], 'order': ['title', 'author', 'added', 'random', 'relevance']}¶
- FACET_DISPLAY_TITLES = {'added': l'Recently Added', 'all': l'All', 'always': l'Yours to keep', 'author': l'Author', 'featured': l'Popular Books', 'full': l'Everything', 'last_update': l'Last Update', 'now': l'Available now', 'random': l'Random', 'relevance': l'Relevance', 'series': l'Series Position', 'title': l'Title', 'work_id': l'Work ID'}¶
- GROUP_DESCRIPTIONS = {'available': l'Allow patrons to filter availability to', 'collection': l'Allow patrons to filter collection to', 'order': l'Allow patrons to sort by'}¶
- GROUP_DISPLAY_TITLES = {'available': l'Availability', 'collection': l'Collection', 'order': l'Sort by'}¶
- MAX_CACHE_AGE_NAME = 'max_age'¶
- ORDER_ADDED_TO_COLLECTION = 'added'¶
- ORDER_ASCENDING = 'asc'¶
- ORDER_AUTHOR = 'author'¶
- ORDER_DESCENDING = 'desc'¶
- ORDER_DESCENDING_BY_DEFAULT = ['added', 'last_update']¶
- ORDER_FACETS = ['title', 'author', 'added', 'random', 'relevance']¶
- ORDER_FACET_GROUP_NAME = 'order'¶
- ORDER_LAST_UPDATE = 'last_update'¶
- ORDER_RANDOM = 'random'¶
- ORDER_RELEVANCE = 'relevance'¶
- ORDER_SERIES_POSITION = 'series'¶
- ORDER_TITLE = 'title'¶
- ORDER_WORK_ID = 'work_id'¶
- SORT_ORDER_TO_ELASTICSEARCH_FIELD_NAME = {'added': 'licensepools.availability_time', 'author': 'sort_author', 'last_update': 'last_update_time', 'random': 'random', 'series': ['series_position', 'sort_title'], 'title': 'sort_title', 'work_id': '_id'}¶
core.lane module¶
- class core.lane.BaseFacets[source]¶
Bases:
FacetConstants
Basic faceting class that doesn’t modify a search filter at all.
This is intended solely for use as a base class.
- CACHED_FEED_TYPE = None¶
- property cached¶
This faceting object’s opinion on whether feeds should be cached.
- Returns:
A boolean, or None for ‘no opinion’.
- property facet_groups¶
Yield a list of 4-tuples (facet group, facet value, new Facets object, selected) for use in building OPDS facets.
This does not include the ‘entry point’ facet group, which must be handled separately.
- items()[source]¶
Yields a 2-tuple for every active facet setting.
These tuples are used to generate URLs that can identify specific facet settings, and to distinguish between CachedFeed objects that represent the same feed with different facet settings.
- max_cache_age = None¶
- modify_database_query(_db, qu)[source]¶
If necessary, modify a database query so that resulting items conform the constraints of this faceting object.
The default behavior is to not modify the query.
- modify_search_filter(filter)[source]¶
Modify an external_search.Filter object to filter out works excluded by the business logic of this faceting class.
- property query_string¶
A query string fragment that propagates all active facet settings.
- class core.lane.DatabaseBackedFacets(library, collection, availability, order, order_ascending=None, enabled_facets=None, entrypoint=None, entrypoint_is_default=False, **constructor_kwargs)[source]¶
Bases:
Facets
A generic faceting object designed for managing queries against the database. (Other faceting objects are designed for managing Elasticsearch searches.)
- ORDER_FACET_TO_DATABASE_FIELD = {'author': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'last_update': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'title': <sqlalchemy.orm.attributes.InstrumentedAttribute object>, 'work_id': <sqlalchemy.orm.attributes.InstrumentedAttribute object>}¶
- classmethod available_facets(config, facet_group_name)[source]¶
Exclude search orders not available through database queries.
- classmethod default_facet(config, facet_group_name)[source]¶
Exclude search orders not available through database queries.
- class core.lane.DatabaseBackedWorkList[source]¶
Bases:
WorkList
A WorkList that can get its works from the database in addition to (or possibly instead of) the search index.
Even when works _are_ obtained through the search index, a DatabaseBackedWorkList is then created to look up the Work objects for use in an OPDS feed.
- age_range_filter_clauses()[source]¶
Create a clause that filters out all books not classified as suitable for this DatabaseBackedWorkList’s age range.
- audience_filter_clauses(_db, qu)[source]¶
Create a SQLAlchemy filter that excludes books whose intended audience doesn’t match what we’re looking for.
- classmethod base_query(_db)[source]¶
Return a query that contains the joins set up as necessary to create OPDS feeds.
- bibliographic_filter_clauses(_db, qu)[source]¶
Create a SQLAlchemy filter that excludes books whose bibliographic metadata doesn’t match what we’re looking for.
query is either qu, or a new query that has been modified to join against additional tables.
- Returns:
A 2-tuple (query, clauses).
- customlist_filter_clauses(qu)[source]¶
Create a filter clause that only books that are on one of the CustomLists allowed by Lane configuration.
- Returns:
A 3-tuple (query, clauses).
query is the same query as qu, possibly extended with additional table joins.
clauses is a list of SQLAlchemy statements for use in a filter() or case() statement.
- modify_database_query_hook(_db, qu)[source]¶
A hook method allowing subclasses to modify a database query that’s about to find all the works in this WorkList.
This can avoid the need for complex subclasses of DatabaseBackedFacets.
- only_show_ready_deliverable_works(_db, query, show_suppressed=False)[source]¶
Restrict a query to show only presentation-ready works present in an appropriate collection which the default client can fulfill.
Note that this assumes the query has an active join against LicensePool.
- works_from_database(_db, facets=None, pagination=None, **kwargs)[source]¶
Create a query against the works table that finds Work objects corresponding to all the Works that belong in this WorkList.
The apply_filters() implementation defines which Works qualify for membership in a WorkList of this type.
This tends to be slower than WorkList.works, but not all lanes can be generated through search engine queries.
- Parameters:
_db – A database connection.
facets – A faceting object, which may place additional constraints on WorkList membership.
pagination – A Pagination object indicating which part of the WorkList the caller is looking at.
kwargs – Ignored – only included for compatibility with works().
- Returns:
A Query.
- class core.lane.DefaultSortOrderFacets(library, collection, availability, order, order_ascending=None, enabled_facets=None, entrypoint=None, entrypoint_is_default=False, **constructor_kwargs)[source]¶
Bases:
Facets
A faceting object that changes the default sort order.
Subclasses must set DEFAULT_SORT_ORDER
- class core.lane.Facets(library, collection, availability, order, order_ascending=None, enabled_facets=None, entrypoint=None, entrypoint_is_default=False, **constructor_kwargs)[source]¶
Bases:
FacetsWithEntryPoint
A full-fledged facet class that supports complex navigation between multiple facet groups.
Despite the generic name, this is only used in ‘page’ type OPDS feeds that list all the works in some WorkList.
- ORDER_BY_RELEVANCE = 'relevance'¶
- classmethod available_facets(config, facet_group_name)[source]¶
Which facets are enabled for the given facet group?
You can override this to forcible enable or disable facets that might not be enabled in library configuration, but you can’t make up totally new facets.
TODO: This sytem would make more sense if you _could_ make up totally new facets, maybe because each facet was represented as a policy object rather than a key to code implemented elsewhere in this class. Right now this method implies more flexibility than actually exists.
- classmethod default(library, collection=None, availability=None, order=None, entrypoint=None)[source]¶
- classmethod default_facet(config, facet_group_name)[source]¶
The default value for the given facet group.
The default value must be one of the values returned by available_facets() above.
- property enabled_facets¶
Yield a 3-tuple of lists (order, availability, collection) representing facet values enabled via initialization or configuration
The ‘entry point’ facet group is handled separately, since it is not always used.
- property facet_groups¶
Yield a list of 4-tuples (facet group, facet value, new Facets object, selected) for use in building OPDS facets.
This does not yield anything for the ‘entry point’ facet group, which must be handled separately.
- classmethod from_request(library, config, get_argument, get_header, worklist, default_entrypoint=None, **extra)[source]¶
Load a faceting object from an HTTP request.
- items()[source]¶
Yields a 2-tuple for every active facet setting.
In this class that just means the entrypoint and any max_cache_age.
- modify_database_query(_db, qu)[source]¶
Restrict a query against Work+LicensePool+Edition so that it matches only works that fit the criteria of this Faceting object.
Sort order facet cannot be handled in this method, but can be handled in subclasses that override this method.
- modify_search_filter(filter)[source]¶
Modify the given external_search.Filter object so that it reflects the settings of this Facets object.
This is the Elasticsearch equivalent of apply(). However, the Elasticsearch implementation of (e.g.) the meaning of the different availabilty statuses is kept in Filter.build().
Create a slightly different Facets object from this one.
- class core.lane.FacetsWithEntryPoint(entrypoint=None, entrypoint_is_default=False, max_cache_age=None, **kwargs)[source]¶
Bases:
BaseFacets
Basic Facets class that knows how to filter a query based on a selected EntryPoint.
- classmethod from_request(library, facet_config, get_argument, get_header, worklist, default_entrypoint=None, **extra_kwargs)[source]¶
Load a faceting object from an HTTP request.
- Parameters:
facet_config – A Library (or mock of one) that knows which subset of the available facets are configured.
get_argument – A callable that takes one argument and retrieves (or pretends to retrieve) a query string parameter of that name from an incoming HTTP request.
get_header – A callable that takes one argument and retrieves (or pretends to retrieve) an HTTP header of that name from an incoming HTTP request.
worklist – A WorkList associated with the current request, if any.
default_entrypoint – Select this EntryPoint if the incoming request does not specify an enabled EntryPoint. If this is None, the first enabled EntryPoint will be used as the default.
extra_kwargs – A dictionary of keyword arguments to pass into the constructor when a faceting object is instantiated.
- Returns:
A FacetsWithEntryPoint, or a ProblemDetail if there’s a problem with the input from the request.
- items()[source]¶
Yields a 2-tuple for every active facet setting.
In this class that just means the entrypoint and any max_cache_age.
- classmethod load_entrypoint(name, valid_entrypoints, default=None)[source]¶
Look up an EntryPoint by name, assuming it’s valid in the given WorkList.
- Parameters:
valid_entrypoints – The EntryPoints that might be valid. This is probably not the value of WorkList.selectable_entrypoints, because an EntryPoint selected in a WorkList remains valid (but not selectable) for all of its children.
default – A class to use as the default EntryPoint if none is specified. If no default is specified, the first enabled EntryPoint will be used.
- Returns:
A 2-tuple (EntryPoint class, is_default).
- classmethod load_max_cache_age(value)[source]¶
Convert a value for the MAX_CACHE_AGE_NAME parameter to a value that CachedFeed will understand.
- Parameters:
value – A string.
- Returns:
For now, either CachedFeed.IGNORE_CACHE or None.
- modify_database_query(_db, qu)[source]¶
Modify the given database query so that it reflects this set of facets.
- modify_search_filter(filter)[source]¶
Modify the given external_search.Filter object so that it reflects this set of facets.
Create a very similar FacetsWithEntryPoint that points to a different EntryPoint.
- classmethod selectable_entrypoints(worklist)[source]¶
Which EntryPoints can be selected for these facets on this WorkList?
In most cases, there are no selectable EntryPoints; this generally happens only at the top level.
By default, this is completely determined by the WorkList. See SearchFacets for an example that changes this.
- class core.lane.FeaturedFacets(minimum_featured_quality, entrypoint=None, random_seed=None, **kwargs)[source]¶
Bases:
FacetsWithEntryPoint
A simple faceting object that configures a query so that the ‘most featurable’ items are at the front.
This is mainly a convenient thing to pass into AcquisitionFeed.groups().
- CACHED_FEED_TYPE = 'groups'¶
- modify_search_filter(filter)[source]¶
Modify the given external_search.Filter object so that it reflects this set of facets.
Create a slightly different FeaturedFacets object based on this one.
- class core.lane.HierarchyWorkList[source]¶
Bases:
WorkList
A WorkList representing part of a hierarchical view of a a library’s collection. (As opposed to a non-hierarchical view such as search results or “books by author X”.)
- class core.lane.Lane(**kwargs)[source]¶
Bases:
Base
,DatabaseBackedWorkList
,HierarchyWorkList
A WorkList that draws its search criteria from a row in a database table.
A Lane corresponds roughly to a section in a branch library or bookstore. Lanes are the primary means by which patrons discover books.
- MAX_CACHE_AGE = 1200¶
- add_genre(genre, inclusive=True, recursive=True)[source]¶
Create a new LaneGenre for the given genre and associate it with this Lane.
Mainly used in tests.
- classmethod affected_by_customlist(customlist)[source]¶
Find all Lanes whose membership is partially derived from the membership of the given CustomList.
- audiences¶
- cachedfeeds¶
- cachedmarcfiles¶
- property children¶
- property collection_ids¶
- property customlist_ids¶
Find the database ID of every CustomList such that a Work filed in that List should be in this Lane.
- Returns:
A list of CustomList IDs, possibly empty.
- customlists¶
- property depth¶
How deep is this lane in this site’s hierarchy? i.e. how many times do we have to follow .parent before we get None?
- display_name¶
- property entrypoints¶
Lanes cannot currently have EntryPoints.
- fiction¶
- property genre_ids¶
Find the database ID of every Genre such that a Work classified in that Genre should be in this Lane.
- Returns:
A list of genre IDs, or None if this Lane does not consider genres at all.
- genres = ObjectAssociationProxyInstance(AssociationProxy('lane_genres', 'genre'))¶
- groups(_db, include_sublanes=True, pagination=None, facets=None, search_engine=None, debug=False)[source]¶
Return a list of (Work, Lane) 2-tuples describing a sequence of featured items for this lane and (optionally) its children.
- Parameters:
pagination – A Pagination object which may affect how many works each child of this WorkList may contribute.
facets – A FeaturedFacets object.
- id¶
- include_self_in_grouped_feed¶
- inherit_parent_restrictions¶
- is_self_or_descendant(ancestor)[source]¶
Is this WorkList the given WorkList or one of its descendants?
- Parameters:
ancestor – A WorkList.
- Returns:
A boolean.
- lane_genres¶
- languages¶
- library_id¶
- license_datasource_id¶
- list_datasource¶
- property list_datasource_id¶
- list_seen_in_previous_days¶
- max_cache_age(type)[source]¶
Determine how long a feed for this WorkList should be cached internally.
- Parameters:
type – The type of feed.
- media¶
- parent¶
- parent_id¶
- property parentage¶
Yield the parent, grandparent, etc. of this Lane.
The Lane may be inside one or more non-Lane WorkLists, but those WorkLists are not counted in the parentage.
- priority¶
- root_for_patron_type¶
- search(_db, query_string, search_client, pagination=None, facets=None)[source]¶
Find works in this lane that also match a search query.
- Parameters:
_db – A database connection.
query_string – Search for this string.
search_client – An ExternalSearchIndex object.
pagination – A Pagination object.
facets – A faceting object, probably a SearchFacets.
- property search_target¶
Obtain the WorkList that should be searched when someone initiates a search from this Lane.
- size¶
- size_by_entrypoint¶
- sublanes¶
- target_age¶
- update_size(_db, search_engine=None)[source]¶
Update the stored estimate of the number of Works in this Lane.
- property url_name¶
Return the name of this lane to be used in URLs.
Since most aspects of the lane can change through administrative action, we use the internal database ID of the lane in URLs.
- property uses_customlists¶
Does the works() implementation for this Lane look for works on CustomLists?
- visible¶
- property visible_children¶
A WorkList’s children can be used to create a grouped acquisition feed for that WorkList.
- class core.lane.LaneGenre(**kwargs)[source]¶
Bases:
Base
Relationship object between Lane and Genre.
- genre¶
- genre_id¶
- id¶
- inclusive¶
- lane¶
- lane_id¶
- recursive¶
- class core.lane.Pagination(offset=0, size=50)[source]¶
Bases:
object
- DEFAULT_CRAWLABLE_SIZE = 100¶
- DEFAULT_FEATURED_SIZE = 10¶
- DEFAULT_SEARCH_SIZE = 10¶
- DEFAULT_SIZE = 50¶
- MAX_SIZE = 100¶
- property first_page¶
- classmethod from_request(get_arg, default_size=None)[source]¶
Instantiate a Pagination object from a Flask request.
- property has_next_page¶
Returns boolean reporting whether pagination is done for a query
Either total_size or this_page_size must be set for this method to be accurate.
- modify_search_query(search)[source]¶
Modify a Search object so that it retrieves only a single ‘page’ of results.
- Returns:
A Search object.
- property next_page¶
- page_loaded(page)[source]¶
An actual page of results has been fetched. Keep any internal state that would be useful to know when reasoning about earlier or later pages.
- property previous_page¶
- property query_string¶
- class core.lane.SearchFacets(**kwargs)[source]¶
Bases:
Facets
A Facets object designed to filter search results.
Most search result filtering is handled by WorkList, but this allows someone to, e.g., search a multi-lingual WorkList in their preferred language.
- DEFAULT_MIN_SCORE = 500¶
- classmethod default_facet(ignore, group_name)[source]¶
The default facet settings for SearchFacets are hard-coded.
By default, we will search the full collection and all availabilities, and order by match quality rather than any bibliographic field.
- classmethod from_request(library, config, get_argument, get_header, worklist, default_entrypoint=<class 'core.entrypoint.EverythingEntryPoint'>, **extra)[source]¶
Load a faceting object from an HTTP request.
- items()[source]¶
Yields a 2-tuple for every active facet setting.
This means the EntryPoint (handled by the superclass) as well as possible settings for ‘media’ and “min_score”.
- modify_search_filter(filter)[source]¶
Modify the given external_search.Filter object so that it reflects this SearchFacets object.
Create a slightly different Facets object from this one.
- class core.lane.SpecificWorkList(work_ids)[source]¶
Bases:
DatabaseBackedWorkList
A WorkList that only finds specific works, identified by ID.
- class core.lane.TopLevelWorkList[source]¶
Bases:
HierarchyWorkList
A special WorkList representing the top-level view of a library’s collection.
- class core.lane.WorkList[source]¶
Bases:
object
An object that can obtain a list of Work objects for use in generating an OPDS feed.
By default, these Work objects come from a search index.
- CACHED_FEED_TYPE = None¶
- MAX_CACHE_AGE = 600¶
- accessible_to(patron)[source]¶
As a matter of library policy, is the given Patron allowed to access this WorkList?
- append_child(child)[source]¶
Add one child to the list of children in this WorkList.
This hook method can be overridden to modify the child’s configuration so as to make it fit with what the parent is offering.
- property audience_key¶
Translates audiences list into url-safe string
- property customlist_ids¶
Return the custom list IDs.
- property display_name_for_all¶
The display name to use when referring to the set of all books in this WorkList, as opposed to the WorkList itself.
- filter(_db, facets)[source]¶
Helper method to instantiate a Filter object for this WorkList.
Using this ensures that modify_search_filter_hook() is always called.
- property full_identifier¶
A human-readable identifier for this WorkList that captures its position within the heirarchy.
- groups(_db, include_sublanes=True, pagination=None, facets=None, search_engine=None, debug=False)[source]¶
Extract a list of samples from each child of this WorkList. This can be used to create a grouped acquisition feed for the WorkList.
- Parameters:
pagination – A Pagination object which may affect how many works each child of this WorkList may contribute.
facets – A FeaturedFacets object that may restrict the works on view.
search_engine – An ExternalSearchIndex to use when asking for the featured works in a given WorkList.
debug – A debug argument passed into search_engine when running the search.
- Yield:
A sequence of (Work, WorkList) 2-tuples, with each WorkList representing the child WorkList in which the Work is found.
- property has_visible_children¶
- property hierarchy¶
The portion of the WorkList hierarchy that culminates in this WorkList.
- property inherit_parent_restrictions¶
Since a WorkList has no parent, it cannot inherit any restrictions from its parent. This method is defined for compatibility with Lane.
- inherited_value(k)[source]¶
Try to find this WorkList’s value for the given key (e.g. ‘fiction’ or ‘audiences’).
If it’s not set, try to inherit a value from the WorkList’s parent. This only works if this WorkList has a parent and is configured to inherit values from its parent.
Note that inheritance works differently for genre_ids and customlist_ids – use inherited_values() for that.
- inherited_values(k)[source]¶
Find the values for the given key (e.g. ‘genre_ids’ or ‘customlist_ids’) imposed by this WorkList and its parentage.
This is for values like .genre_ids and .customlist_ids, where each member of the WorkList hierarchy can impose a restriction on query results, and the effects of the restrictions are additive.
- initialize(library, display_name=None, genres=None, audiences=None, languages=None, media=None, customlists=None, list_datasource=None, list_seen_in_previous_days=None, children=None, priority=None, entrypoints=None, fiction=None, license_datasource=None, target_age=None)[source]¶
Initialize with basic data.
This is not a constructor, to avoid conflicts with Lane, an ORM object that subclasses this object but does not use this initialization code.
- Parameters:
library – Only Works available in this Library will be included in lists.
display_name – Name to display for this WorkList in the user interface.
genres – Only Works classified under one of these Genres will be included in lists.
audiences – Only Works classified under one of these audiences will be included in lists.
languages – Only Works in one of these languages will be included in lists.
media – Only Works in one of these media will be included in lists.
fiction – Only Works with this fiction status will be included in lists.
target_age – Only Works targeted at readers in this age range will be included in lists.
license_datasource – Only Works with a LicensePool from this DataSource will be included in lists.
customlists – Only Works included on one of these CustomLists will be included in lists.
list_datasource – Only Works included on a CustomList associated with this DataSource will be included in lists. This overrides any specific CustomLists provided in customlists.
list_seen_in_previous_days – Only Works that were added to a matching CustomList within this number of days will be included in lists.
children – This WorkList has children, which are also WorkLists.
priority – A number indicating where this WorkList should show up in relation to its siblings when it is the child of some other WorkList.
entrypoints – A list of EntryPoint classes representing different ways of slicing up this WorkList.
- is_self_or_descendant(ancestor)[source]¶
Is this WorkList the given WorkList or one of its descendants?
- Parameters:
ancestor – A WorkList.
- Returns:
A boolean.
- property language_key¶
Return a string identifying the languages used in this WorkList. This will usually be in the form of ‘eng,spa’ (English and Spanish).
- max_cache_age(type)[source]¶
Determine how long a feed for this WorkList should be cached internally.
- modify_search_filter_hook(filter)[source]¶
A hook method allowing subclasses to modify a Filter object that’s about to find all the works in this WorkList.
This can avoid the need for complex subclasses of Facets.
- overview_facets(_db, facets)[source]¶
Convert a generic FeaturedFacets to some other faceting object, suitable for showing an overview of this WorkList in a grouped feed.
- property parent¶
A WorkList has no parent. This method is defined for compatibility with Lane.
- property parentage¶
WorkLists have no parentage. This method is defined for compatibility with Lane.
- search(_db, query, search_client, pagination=None, facets=None, debug=False)[source]¶
Find works in this WorkList that match a search query.
- Parameters:
_db – A database connection.
query – Search for this string.
search_client – An ExternalSearchIndex object.
pagination – A Pagination object.
facets – A faceting object, probably a SearchFacets.
debug – Pass in True to see a summary of results returned from the search index.
- property search_target¶
By default, a WorkList is searchable.
- classmethod top_level_for_library(_db, library)[source]¶
Create a WorkList representing this library’s collection as a whole.
If no top-level visible lanes are configured, the WorkList will be configured to show every book in the collection.
If a single top-level Lane is configured, it will returned as the WorkList.
Otherwise, a WorkList containing the visible top-level lanes is returned.
- property unique_key¶
A string key that uniquely describes this WorkList within its Library.
This is used when caching feeds for this WorkList. For Lanes, the lane_id is used instead.
- property uses_customlists¶
Does the works() implementation for this WorkList look for works on CustomLists?
- visible = True¶
- property visible_children¶
A WorkList’s children can be used to create a grouped acquisition feed for that WorkList.
- works(_db, facets=None, pagination=None, search_engine=None, debug=False, **kwargs)[source]¶
Use a search engine to obtain Work or Work-like objects that belong in this WorkList.
Compare DatabaseBackedWorkList.works_from_database, which uses a database query to obtain the same Work objects.
- Parameters:
_db – A database connection.
facets – A Facets object which may put additional constraints on WorkList membership.
pagination – A Pagination object indicating which part of the WorkList the caller is looking at, and/or a limit on the number of works to fetch.
kwargs – Different implementations may fetch the list of works from different sources and may need different keyword arguments.
- Returns:
A list of Work or Work-like objects, or a database query that generates such a list when executed.
- works_for_hits(_db, hits, facets=None)[source]¶
Convert a list of search results into Work objects.
This works by calling works_for_resultsets() on a list containing a single list of search results.
- Parameters:
_db – A database connection
hits – A list of Hit objects from ElasticSearch.
- Returns:
A list of Work or (if the search results include script fields), WorkSearchResult objects.
core.local_analytics_provider module¶
- class core.local_analytics_provider.LocalAnalyticsProvider(integration, library=None)[source]¶
Bases:
object
- CARDINALITY = 1¶
- DESCRIPTION = l'Store analytics events in the 'circulationevents' database table.'¶
- LOCATION_SOURCE = 'location_source'¶
- LOCATION_SOURCE_DISABLED = ''¶
- LOCATION_SOURCE_NEIGHBORHOOD = 'neighborhood'¶
- NAME = l'Local Analytics'¶
- SETTINGS = [{'key': 'location_source', 'label': l'Geographic location of events', 'description': l'Local analytics events may have a geographic location associated with them. How should the location be determined?<p>Note: to use the patron's neighborhood as the event location, you must also tell your patron authentication mechanism how to <i>gather</i> a patron's neighborhood information.', 'default': '', 'type': 'select', 'options': [{'key': '', 'label': l'Disable this feature.'}, {'key': 'neighborhood', 'label': l'Use the patron's neighborhood as the event location.'}]}]¶
- core.local_analytics_provider.Provider¶
alias of
LocalAnalyticsProvider
core.log module¶
- class core.log.CloudwatchLogs[source]¶
Bases:
Logger
- CREATE_GROUP = 'create_group'¶
- DEFAULT_CREATE_GROUP = 'TRUE'¶
- DEFAULT_INTERVAL = 60¶
- DEFAULT_REGION = 'us-west-2'¶
- GROUP = 'group'¶
- INTERVAL = 'interval'¶
- NAME = 'AWS Cloudwatch Logs'¶
- REGION = 'region'¶
- REGIONS = [{'key': 'us-east-2', 'label': l'US East (Ohio)'}, {'key': 'us-east-1', 'label': l'US East (N. Virginia)'}, {'key': 'us-west-1', 'label': l'US West (N. California)'}, {'key': 'us-west-2', 'label': l'US West (Oregon)'}, {'key': 'ap-south-1', 'label': l'Asia Pacific (Mumbai)'}, {'key': 'ap-northeast-3', 'label': l'Asia Pacific (Osaka-Local)'}, {'key': 'ap-northeast-2', 'label': l'Asia Pacific (Seoul)'}, {'key': 'ap-southeast-1', 'label': l'Asia Pacific (Singapore)'}, {'key': 'ap-southeast-2', 'label': l'Asia Pacific (Sydney)'}, {'key': 'ap-northeast-1', 'label': l'Asia Pacific (Tokyo)'}, {'key': 'ca-central-1', 'label': l'Canada (Central)'}, {'key': 'cn-north-1', 'label': l'China (Beijing)'}, {'key': 'cn-northwest-1', 'label': l'China (Ningxia)'}, {'key': 'eu-central-1', 'label': l'EU (Frankfurt)'}, {'key': 'eu-west-1', 'label': l'EU (Ireland)'}, {'key': 'eu-west-2', 'label': l'EU (London)'}, {'key': 'eu-west-3', 'label': l'EU (Paris)'}, {'key': 'sa-east-1', 'label': l'South America (Sao Paulo)'}]¶
- SETTINGS = [{'key': 'group', 'label': l'Log Group', 'default': 'simplified', 'required': True}, {'key': 'stream', 'label': l'Log Stream', 'default': 'simplified', 'required': True}, {'key': 'interval', 'label': l'Update Interval Seconds', 'default': 60, 'required': True}, {'key': 'region', 'label': l'AWS Region', 'type': 'select', 'options': [{'key': 'us-east-2', 'label': l'US East (Ohio)'}, {'key': 'us-east-1', 'label': l'US East (N. Virginia)'}, {'key': 'us-west-1', 'label': l'US West (N. California)'}, {'key': 'us-west-2', 'label': l'US West (Oregon)'}, {'key': 'ap-south-1', 'label': l'Asia Pacific (Mumbai)'}, {'key': 'ap-northeast-3', 'label': l'Asia Pacific (Osaka-Local)'}, {'key': 'ap-northeast-2', 'label': l'Asia Pacific (Seoul)'}, {'key': 'ap-southeast-1', 'label': l'Asia Pacific (Singapore)'}, {'key': 'ap-southeast-2', 'label': l'Asia Pacific (Sydney)'}, {'key': 'ap-northeast-1', 'label': l'Asia Pacific (Tokyo)'}, {'key': 'ca-central-1', 'label': l'Canada (Central)'}, {'key': 'cn-north-1', 'label': l'China (Beijing)'}, {'key': 'cn-northwest-1', 'label': l'China (Ningxia)'}, {'key': 'eu-central-1', 'label': l'EU (Frankfurt)'}, {'key': 'eu-west-1', 'label': l'EU (Ireland)'}, {'key': 'eu-west-2', 'label': l'EU (London)'}, {'key': 'eu-west-3', 'label': l'EU (Paris)'}, {'key': 'sa-east-1', 'label': l'South America (Sao Paulo)'}], 'default': 'us-west-2', 'required': True}, {'key': 'create_group', 'label': l'Automatically Create Log Group', 'type': 'select', 'options': [{'key': 'TRUE', 'label': l'Yes'}, {'key': 'FALSE', 'label': l'No'}], 'default': True, 'required': True}]¶
- SITEWIDE = True¶
- STREAM = 'stream'¶
- class core.log.JSONFormatter(app_name)[source]¶
Bases:
Formatter
- format(record)[source]¶
Format the specified record as text.
The record’s attribute dictionary is used as the operand to a string formatting operation which yields the returned string. Before formatting the dictionary, a couple of preparatory steps are carried out. The message attribute of the record is computed using LogRecord.getMessage(). If the formatting string uses the time (as determined by a call to usesTime(), formatTime() is called to format the event time. If there is exception information, it is formatted using formatException() and appended to the message.
- fqdn = 'fv-az1493-85.mbrrm1wwqy3eneuh0nkjogyspa.dx.internal.cloudapp.net'¶
- hostname = 'fv-az1493-85.mbrrm1wwqy3eneuh0nkjogyspa.dx.internal.cloudapp.net'¶
- class core.log.LogConfiguration[source]¶
Bases:
object
Configures the active Python logging handlers based on logging configuration from the database.
- DATABASE_LOG_LEVEL = 'database_log_level'¶
- DEBUG = 'DEBUG'¶
- DEFAULT_APP_NAME = 'simplified'¶
- DEFAULT_DATABASE_LOG_LEVEL = 'WARN'¶
- DEFAULT_LOG_LEVEL = 'INFO'¶
- ERROR = 'ERROR'¶
- INFO = 'INFO'¶
- LOG_APP_NAME = 'log_app'¶
- LOG_LEVEL = 'log_level'¶
- LOG_LEVEL_UI = [{'key': 'DEBUG', 'value': l'Debug'}, {'key': 'INFO', 'value': l'Info'}, {'key': 'WARN', 'value': l'Warn'}, {'key': 'ERROR', 'value': l'Error'}]¶
- SITEWIDE_SETTINGS = [{'key': 'log_level', 'label': l'Log Level', 'type': 'select', 'options': [{'key': 'DEBUG', 'value': l'Debug'}, {'key': 'INFO', 'value': l'Info'}, {'key': 'WARN', 'value': l'Warn'}, {'key': 'ERROR', 'value': l'Error'}], 'default': 'INFO'}, {'key': 'log_app', 'label': l'Log Application name', 'description': l'Log messages originating from this application will be tagged with this name. If you run multiple instances, giving each one a different application name will help you determine which instance is having problems.', 'default': 'simplified'}, {'key': 'database_log_level', 'label': l'Database Log Level', 'type': 'select', 'options': [{'key': 'DEBUG', 'value': l'Debug'}, {'key': 'INFO', 'value': l'Info'}, {'key': 'WARN', 'value': l'Warn'}, {'key': 'ERROR', 'value': l'Error'}], 'description': l'Database logs are extremely verbose, so unless you're diagnosing a database-related problem, it's a good idea to set a higher log level for database messages.', 'default': 'WARN'}]¶
- WARN = 'WARN'¶
- classmethod from_configuration(_db, testing=False)[source]¶
Return the logging policy as configured in the database.
- Parameters:
_db – A database connection. If None, the default logging policy will be used.
testing – A boolean indicating whether a unit test is happening right now. If True, the database configuration will be ignored in favor of a known test-friendly policy. (It’s okay to pass in False during a test of this method.)
- Returns:
A 3-tuple (internal_log_level, database_log_level, handlers). internal_log_level is the log level to be used for most log messages. database_log_level is the log level to be applied to the loggers for the database connector and other verbose third-party libraries. handlers is a list of Handler objects that will be associated with the top-level logger.
- classmethod initialize(_db, testing=False)[source]¶
Make the logging handlers reflect the current logging rules as configured in the database.
- Parameters:
_db – A database connection. If this is None, the default logging configuration will be used.
testing – True if unit tests are currently running; otherwise False.
- class core.log.Logger[source]¶
Bases:
object
Abstract base class for logging
- DEFAULT_APP_NAME = 'simplified'¶
- DEFAULT_MESSAGE_TEMPLATE = '%(asctime)s:%(name)s:%(levelname)s:%(filename)s:%(message)s'¶
- JSON_LOG_FORMAT = 'json'¶
- TEXT_LOG_FORMAT = 'text'¶
- class core.log.Loggly[source]¶
Bases:
Logger
- DEFAULT_LOGGLY_URL = 'https://logs-01.loggly.com/inputs/%(token)s/tag/python/'¶
- NAME = 'Loggly'¶
- PASSWORD = 'password'¶
- SETTINGS = [{'key': 'user', 'label': l'Username', 'required': True}, {'key': 'password', 'label': l'Password', 'required': True}, {'key': 'url', 'label': l'URL', 'required': True, 'format': 'url'}]¶
- SITEWIDE = True¶
- URL = 'url'¶
- USER = 'user'¶
- classmethod from_configuration(_db, testing=False)[source]¶
Should be implemented in each logging class.
- class core.log.StringFormatter(fmt=None, datefmt=None, style='%', validate=True, *, defaults=None)[source]¶
Bases:
Formatter
Encode all output as a string.
- format(record)[source]¶
Format the specified record as text.
The record’s attribute dictionary is used as the operand to a string formatting operation which yields the returned string. Before formatting the dictionary, a couple of preparatory steps are carried out. The message attribute of the record is computed using LogRecord.getMessage(). If the formatting string uses the time (as determined by a call to usesTime(), formatTime() is called to format the event time. If there is exception information, it is formatted using formatException() and appended to the message.
- class core.log.SysLogger[source]¶
Bases:
Logger
- LOG_FORMAT = 'log_format'¶
- LOG_MESSAGE_TEMPLATE = 'message_template'¶
- NAME = 'sysLog'¶
- SETTINGS = [{'key': 'log_format', 'label': l'Log Format', 'type': 'select', 'options': [{'key': 'json', 'label': l'json'}, {'key': 'text', 'label': l'text'}]}, {'key': 'message_template', 'label': l'template', 'default': '%(asctime)s:%(name)s:%(levelname)s:%(filename)s:%(message)s', 'required': True}]¶
- SITEWIDE = True¶
core.marc module¶
- class core.marc.Annotator[source]¶
Bases:
object
The Annotator knows how to add information about a Work to a MARC record.
- AUDIENCE_TERMS = {'Adult': 'General', 'Adults Only': 'Adult', 'Children': 'Juvenile', 'Young Adult': 'Adolescent'}¶
- FORMAT_TERMS = {('application/epub+zip', None): 'EPUB eBook', ('application/epub+zip', 'application/vnd.adobe.adept+xml'): 'Adobe EPUB eBook', ('application/pdf', None): 'PDF eBook', ('application/pdf', 'application/vnd.adobe.adept+xml'): 'Adobe PDF eBook'}¶
- classmethod add_contributors(record, edition)[source]¶
Create contributor fields for this edition.
TODO: Use canonical names from LoC.
- annotate_work_record(work, active_license_pool, edition, identifier, record, integration=None, updated=None)[source]¶
Add metadata from this work to a MARC record.
- Work:
The Work whose record is being annotated.
- Active_license_pool:
Of all the LicensePools associated with this Work, the client has expressed interest in this one.
- Edition:
The Edition to use when associating bibliographic metadata with this entry.
- Identifier:
Of all the Identifiers associated with this Work, the client has expressed interest in this one.
- Parameters:
record – A MARCRecord object to be annotated.
- marc_cache_field = 'marc_record'¶
- class core.marc.MARCExporter(_db, library, integration)[source]¶
Bases:
object
Turn a work into a record for a MARC file.
- DEFAULT_MIRROR_INTEGRATION = {'key': 'NO_MIRROR', 'label': l'None - Do not mirror MARC files'}¶
- DEFAULT_UPDATE_FREQUENCY = 30¶
- DESCRIPTION = l'Export metadata into MARC files that can be imported into an ILS manually.'¶
- INCLUDE_SIMPLIFIED_GENRES = 'include_simplified_genres'¶
- INCLUDE_SUMMARY = 'include_summary'¶
- LIBRARY_SETTINGS = [{'key': 'marc_update_frequency', 'label': l'Update frequency (in days)', 'description': l'The circulation manager will wait this number of days between generating MARC files.', 'type': 'number', 'default': 30}, {'key': 'marc_organization_code', 'label': l'The MARC organization code for this library (003 field).', 'description': l'MARC organization codes are assigned by the Library of Congress.'}, {'key': 'marc_web_client_url', 'label': l'The base URL for the web catalog for this library, for the 856 field.', 'description': l'If using a library registry that provides a web catalog, this can be left blank.'}, {'key': 'include_summary', 'label': l'Include summaries in MARC records (520 field)', 'type': 'select', 'options': [{'key': 'false', 'label': l'Do not include summaries'}, {'key': 'true', 'label': l'Include summaries'}], 'default': 'false'}, {'key': 'include_simplified_genres', 'label': l'Include Library Simplified genres in MARC records (650 fields)', 'type': 'select', 'options': [{'key': 'false', 'label': l'Do not include Library Simplified genres'}, {'key': 'true', 'label': l'Include Library Simplified genres'}], 'default': 'false'}]¶
- MARC_ORGANIZATION_CODE = 'marc_organization_code'¶
- NAME = 'MARC Export'¶
- NO_MIRROR_INTEGRATION = 'NO_MIRROR'¶
- SETTING = {'description': l'Storage protocol to use for uploading generated MARC files. The service must already be configured under 'Storage Services'.', 'key': 'mirror_integration_id', 'label': l'MARC Mirror', 'options': [{'key': 'NO_MIRROR', 'label': l'None - Do not mirror MARC files'}], 'type': 'select'}¶
- UPDATE_FREQUENCY = 'marc_update_frequency'¶
- WEB_CLIENT_URL = 'marc_web_client_url'¶
- classmethod create_record(work, annotator, force_create=False, integration=None)[source]¶
Build a complete MARC record for a given work.
- records(lane, annotator, mirror_integration, start_time=None, force_refresh=False, mirror=None, search_engine=None, query_batch_size=500, upload_batch_size=7500)[source]¶
Create and export a MARC file for the books in a lane.
- Parameters:
lane – The Lane to export books from.
annotator – The Annotator to use when creating MARC records.
mirror_integration – The mirror integration to use for MARC files.
start_time – Only include records that were created or modified after this time.
force_refresh – Create new records even when cached records are available.
mirror – Optional mirror to use instead of loading one from configuration.
query_batch_size – Number of works to retrieve with a single Elasticsearch query.
upload_batch_size – Number of records to mirror at a time. This is different from query_batch_size because S3 enforces a minimum size of 5MB for all parts of a multipart upload except the last, but 5MB of records would be too many works for a single query.
- class core.marc.MARCExporterFacets(start_time)[source]¶
Bases:
BaseFacets
A faceting object used to configure the search engine so that it only works updated since a certain time.
core.metadata_layer module¶
An abstract way of representing incoming metadata and applying it to Identifiers and Editions.
This acts as an intermediary between the third-party integrations (which have this information in idiosyncratic formats) and the model. Doing a third-party integration should be as simple as putting the information into this format.
- class core.metadata_layer.CSVMetadataImporter(data_source_name, title_field='title', language_field='language', default_language='eng', medium_field='medium', default_medium='Book', series_field='series', publisher_field='publisher', imprint_field='imprint', issued_field='issued', published_field=['published', 'publication year'], identifier_fields={'Axis 360 ID': ('axis 360 id', 0.75), 'Bibliotheca ID': ('3m id', 0.75), 'ISBN': ('isbn', 0.75), 'Overdrive ID': ('overdrive id', 0.75)}, subject_fields={'age': ('schema:typicalAgeRange', 100.0), 'audience': ('schema:audience', 100.0), 'tags': ('tag', 100.0)}, sort_author_field='file author as', display_author_field=['author', 'display author as'])[source]¶
Bases:
object
Turn a CSV file into a list of Metadata objects.
- DEFAULT_IDENTIFIER_FIELD_NAMES = {'Axis 360 ID': ('axis 360 id', 0.75), 'Bibliotheca ID': ('3m id', 0.75), 'ISBN': ('isbn', 0.75), 'Overdrive ID': ('overdrive id', 0.75)}¶
- DEFAULT_SUBJECT_FIELD_NAMES = {'age': ('schema:typicalAgeRange', 100.0), 'audience': ('schema:audience', 100.0), 'tags': ('tag', 100.0)}¶
- IDENTIFIER_PRECEDENCE = ['Axis 360 ID', 'Overdrive ID', 'Bibliotheca ID', 'ISBN']¶
- property identifier_field_names¶
All potential field names that would identify an identifier.
- log = <Logger CSV metadata importer (WARNING)>¶
- class core.metadata_layer.CirculationData(data_source, primary_identifier, licenses_owned=None, licenses_available=None, licenses_reserved=None, patrons_in_hold_queue=None, formats=None, default_rights_uri=None, links=None, licenses=None, last_checked=None)[source]¶
Bases:
MetaToModelUtility
Information about actual copies of a book that can be delivered to patrons.
As distinct from Metadata, which is a container for information about a book.
- Basically,
Metadata : Edition :: CirculationData : Licensepool
- apply(_db, collection, replace=None)[source]¶
Update the title with this CirculationData’s information.
- Parameters:
collection – A Collection representing actual copies of this title. Availability information (e.g. number of copies) will be associated with a LicensePool in this Collection. If this is not present, only delivery information (e.g. format information and open-access downloads) will be processed.
- property has_open_access_link¶
Does this Circulation object have an associated open-access link?
- license_pool(_db, collection, analytics=None)[source]¶
Find or create a LicensePool object for this CirculationData.
- Parameters:
collection – The LicensePool object will be associated with the given Collection.
analytics – If the LicensePool is newly created, the event will be tracked with this.
- property links¶
- log = <Logger Abstract metadata layer - Circulation data (WARNING)>¶
- class core.metadata_layer.ContributorData(sort_name=None, display_name=None, family_name=None, wikipedia_name=None, roles=None, lc=None, viaf=None, biography=None, aliases=None, extra=None)[source]¶
Bases:
object
- apply(destination, replace=None)[source]¶
Update the passed-in Contributor-type object with this ContributorData’s information.
- Param:
destination – the Contributor or ContributorData object to write this ContributorData object’s metadata to.
- Param:
replace – Replacement policy (not currently used).
- Returns:
the possibly changed Contributor object and a flag of whether it’s been changed.
- classmethod display_name_to_sort_name_from_existing_contributor(_db, display_name)[source]¶
Find the sort name for this book’s author, assuming it’s easy.
‘Easy’ means we already have an established sort name for a Contributor with this exact display name.
If it’s not easy, this will be taken care of later with a call to the metadata wrangler’s author canonicalization service.
If we have a copy of this book in our collection (the only time an external list item is relevant), this will probably be easy.
- find_sort_name(_db, identifiers, metadata_client)[source]¶
Try as hard as possible to find this person’s sort name.
- class core.metadata_layer.FormatData(content_type, drm_scheme, link=None, rights_uri=None)[source]¶
Bases:
object
- class core.metadata_layer.LicenseData(identifier, checkout_url, status_url, expires=None, remaining_checkouts=None, concurrent_checkouts=None)[source]¶
Bases:
object
- class core.metadata_layer.LinkData(rel, href=None, media_type=None, content=None, thumbnail=None, rights_uri=None, rights_explanation=None, original=None, transformation_settings=None)[source]¶
Bases:
object
- property guessed_media_type¶
If the media type of a link is unknown, take a guess.
- class core.metadata_layer.MARCExtractor[source]¶
Bases:
object
Transform a MARC file into a list of Metadata objects.
This is not totally general, but it’s a good start.
- END_OF_AUTHOR_NAME_RES = [re.compile(',\\s+[0-9]+-'), re.compile(',\\s+active '), re.compile(',\\s+graf,'), re.compile(',\\s+author.')]¶
- class core.metadata_layer.MeasurementData(quantity_measured, value, weight=1, taken_at=None)[source]¶
Bases:
object
- class core.metadata_layer.MetaToModelUtility[source]¶
Bases:
object
Contains functionality common to both CirculationData and Metadata.
- log = <Logger Abstract metadata layer - mirror code (WARNING)>¶
- class core.metadata_layer.Metadata(data_source, title=None, subtitle=None, sort_title=None, language=None, medium=None, series=None, series_position=None, publisher=None, imprint=None, issued=None, published=None, primary_identifier=None, identifiers=None, recommendations=None, subjects=None, contributors=None, measurements=None, links=None, data_source_last_updated=None, circulation=None, **kwargs)[source]¶
Bases:
MetaToModelUtility
A (potentially partial) set of metadata for a published work.
- BASIC_EDITION_FIELDS = ['title', 'sort_title', 'subtitle', 'language', 'medium', 'series', 'series_position', 'publisher', 'imprint', 'issued', 'published']¶
- REL_REQUIRES_FULL_RECALCULATION = ['http://schema.org/description']¶
- REL_REQUIRES_NEW_PRESENTATION_EDITION = ['http://opds-spec.org/image', 'http://opds-spec.org/image/thumbnail']¶
- apply(edition, collection, metadata_client=None, replace=None, replace_identifiers=False, replace_subjects=False, replace_contributions=False, replace_links=False, replace_formats=False, replace_rights=False, force=False)[source]¶
Apply this metadata to the given edition.
- Returns:
(edition, made_core_changes), where edition is the newly-updated object, and made_core_changes answers the question: were any edition core fields harmed in the making of this update? So, if title changed, return True. New: If contributors changed, this is now considered a core change, so work.simple_opds_feed refresh can be triggered.
- associate_with_identifiers_based_on_permanent_work_id(_db)[source]¶
Try to associate this object’s primary identifier with the primary identifiers of Editions in the database which share a permanent work ID.
- calculate_permanent_work_id(_db, metadata_client)[source]¶
Try to calculate a permanent work ID from this metadata.
This may require asking a metadata wrangler to turn a display name into a sort name–thus the metadata_client argument.
- edition(_db, create_if_not_exists=True)[source]¶
Find or create the edition described by this Metadata object.
- filter_recommendations(_db)[source]¶
Filters out recommended identifiers that don’t exist in the db. Any IdentifierData objects will be replaced with Identifiers.
- classmethod from_edition(edition)[source]¶
Create a basic Metadata object for the given Edition.
This doesn’t contain everything but it contains enough information to run guess_license_pools.
- guess_license_pools(_db, metadata_client)[source]¶
Try to find existing license pools for this Metadata.
- property links¶
- log = <Logger Abstract metadata layer (WARNING)>¶
- make_thumbnail(data_source, link, link_obj)[source]¶
Make sure a Hyperlink representing an image is connected to its thumbnail.
- normalize_contributors(metadata_client)[source]¶
Make sure that all contributors without a .sort_name get one.
- property primary_author¶
- class core.metadata_layer.ReplacementPolicy(identifiers=False, subjects=False, contributions=False, links=False, formats=False, rights=False, link_content=False, mirrors=None, content_modifier=None, analytics=None, http_get=None, even_if_not_apparently_updated=False, presentation_calculation_policy=None)[source]¶
Bases:
object
How serious should we be about overwriting old metadata with this new metadata?
- classmethod append_only(**args)[source]¶
Don’t overwrite any information, just append it.
This should probably never be used.
- classmethod from_license_source(_db, **args)[source]¶
When gathering data from the license source, overwrite all old data from this source with new data from the same source. Also overwrite an old rights status with an updated status and update the list of available formats. Log availability changes to the configured analytics services.
- classmethod from_metadata_source(**args)[source]¶
When gathering data from a metadata source, overwrite all old data from this source, but do not overwrite the rights status or the available formats. License sources are the authority on rights and formats, and metadata sources have no say in the matter.
- class core.metadata_layer.SubjectData(type, identifier, name=None, weight=1)[source]¶
Bases:
object
- property key¶
- class core.metadata_layer.TimestampData(start=None, finish=None, achievements=None, counter=None, exception=None)[source]¶
Bases:
object
- CLEAR_VALUE = <object object>¶
- finalize(service, service_type, collection, start=None, finish=None, achievements=None, counter=None, exception=None)[source]¶
Finalize any values that were not set during the constructor.
This is intended to be run by the code that originally ran the service.
The given values for start, finish, achievements, counter, and exception will be used only if the service did not specify its own values for those fields.
- property is_complete¶
Does this TimestampData represent an operation that has completed?
An operation is completed if it has failed, or if the time of its completion is known.
- property is_failure¶
Does this TimestampData represent an unrecoverable failure?
core.mirror module¶
- class core.mirror.MirrorUploader(integration, host)[source]¶
Bases:
object
Handles the job of uploading a representation’s content to a mirror that we control.
- IMPLEMENTATION_REGISTRY = {'Amazon S3': <class 'core.s3.S3Uploader'>, 'LCP': <class 'api.lcp.mirror.LCPMirror'>, 'MinIO': <class 'core.s3.MinIOUploader'>}¶
- STORAGE_GOAL = 'storage'¶
- book_url(identifier, extension='.epub', open_access=True, data_source=None, title=None)[source]¶
The URL of the hosted EPUB file for the given identifier.
This does not upload anything to the URL, but it is expected that calling mirror() on a certain Representation object will make that representation end up at that URL.
- cover_image_url(data_source, identifier, filename=None, scaled_size=None)[source]¶
The URL of the hosted cover image for the given identifier.
This does not upload anything to the URL, but it is expected that calling mirror() on a certain Representation object will make that representation end up at that URL.
- classmethod for_collection(collection, purpose)[source]¶
Create a MirrorUploader for the given Collection.
- Parameters:
collection – Use the mirror configuration for this Collection.
purpose – Use the purpose of the mirror configuration.
- Returns:
A MirrorUploader, or None if the Collection has no mirror integration.
- classmethod implementation(integration)[source]¶
Instantiate the appropriate implementation of MirrorUploader for the given ExternalIntegration.
- classmethod integration_by_name(_db, storage_name=None)[source]¶
Find the ExternalIntegration for the mirror by storage name.
- is_self_url(url)[source]¶
Determines whether the URL has the mirror’s host or a custom domain
- Parameters:
url (string) – The URL
- Returns:
Boolean value indicating whether the URL has the mirror’s host or a custom domain
- Return type:
bool
- classmethod mirror(_db, storage_name=None, integration=None)[source]¶
Create a MirrorUploader from an integration or storage name.
- Parameters:
storage_name – The name of the storage integration.
integration – The external integration.
- Returns:
A MirrorUploader.
- Raise:
CannotLoadConfiguration if no integration with goal==STORAGE_GOAL is configured.
- mirror_one(representation, mirror_to, collection=None)[source]¶
Mirror a single Representation.
- Parameters:
representation (Representation) – Book’s representation
mirror_to (string) – Mirror URL
collection (Optional[core.model.collection.Collection]) – Collection
- sign_url(url, expiration=None)[source]¶
Signs a URL and make it expirable
- Parameters:
url (string) – URL
expiration (int) – (Optional) Time in seconds for the presigned URL to remain valid. Default value depends on a specific implementation
- Returns:
Signed expirable link
- Return type:
string
- abstract split_url(url, unquote=True)[source]¶
Splits the URL into the components: container (bucket) and file path
- Parameters:
url (string) – URL
unquote (bool) – Boolean value indicating whether it’s required to unquote URL elements
- Returns:
Tuple (bucket, file path)
- Return type:
Tuple[string, string]
core.mock_analytics_provider module¶
- class core.mock_analytics_provider.MockAnalyticsProvider(integration=None, library=None)[source]¶
Bases:
object
A mock analytics provider that keeps track of how many times it’s called.
- core.mock_analytics_provider.Provider¶
alias of
MockAnalyticsProvider
core.monitor module¶
- class core.monitor.CachedFeedReaper(*args, **kwargs)[source]¶
Bases:
ReaperMonitor
Removed cached feeds older than thirty days.
- MAX_AGE = 30¶
- MODEL_CLASS¶
alias of
CachedFeed
- TIMESTAMP_FIELD = 'timestamp'¶
- class core.monitor.CirculationEventLocationScrubber(*args, **kwargs)[source]¶
Bases:
ScrubberMonitor
Scrub location information from old CirculationEvents.
- MAX_AGE = 365¶
- MODEL_CLASS¶
alias of
CirculationEvent
- SCRUB_FIELD = 'location'¶
- TIMESTAMP_FIELD = 'start'¶
- class core.monitor.CollectionMonitor(_db, collection)[source]¶
Bases:
Monitor
A Monitor that does something for all Collections that come from a certain provider.
This class is designed to be subclassed rather than instantiated directly. Subclasses must define SERVICE_NAME and PROTOCOL. Subclasses may define replacement values for DEFAULT_START_TIME and DEFAULT_COUNTER.
- PROTOCOL = None¶
- classmethod all(_db, collections=None, **constructor_kwargs)[source]¶
Yield a sequence of CollectionMonitor objects: one for every Collection associated with cls.PROTOCOL.
If collections is specified, then there must be a Monitor for each one and Monitors will be yielded in the same order that the collections are specified. Otherwise, Monitors will be yielded as follows…
Monitors that have no Timestamp will be yielded first. After that, Monitors with older values for Timestamp.start will be yielded before Monitors with newer values.
- Parameters:
_db – Database session object.
collections (List[core.model.collection.Collection]) – An optional list of collections. If None, we’ll process all collections.
constructor_kwargs – These keyword arguments will be passed into the CollectionMonitor constructor.
- class core.monitor.CollectionMonitorLogger(logger, extra)[source]¶
Bases:
LoggerAdapter
Prefix log messages with a collection, if one is present.
- process(msg, kwargs)[source]¶
Process the logging message and keyword arguments passed in to a logging call to insert contextual information. You can either manipulate the message itself, the keyword args or both. Return the message and kwargs modified (or not) to suit your needs.
Normally, you’ll only need to override this one method in a LoggerAdapter subclass for your specific needs.
- class core.monitor.CollectionReaper(*args, **kwargs)[source]¶
Bases:
ReaperMonitor
Remove collections that have been marked for deletion.
- MODEL_CLASS¶
alias of
Collection
- delete(collection)[source]¶
Delete a Collection from the database.
Database deletion of a Collection might take a really long time, so we call a special method that will do the deletion incrementally and can pick up where it left off if there’s a failure.
- property where_clause¶
A SQLAlchemy clause that identifies the database rows to be reaped.
- exception core.monitor.CoverageProvidersFailed(failed_providers)[source]¶
Bases:
Exception
We tried to run CoverageProviders on a Work’s identifier, but some of the providers failed.
- class core.monitor.CredentialReaper(*args, **kwargs)[source]¶
Bases:
ReaperMonitor
Remove Credentials that expired more than a day ago.
- MAX_AGE = 1¶
- MODEL_CLASS¶
alias of
Credential
- TIMESTAMP_FIELD = 'expires'¶
- class core.monitor.CustomListEntrySweepMonitor(_db, collection=None, batch_size=None)[source]¶
Bases:
SweepMonitor
A Monitor that does something to every CustomListEntry.
- MODEL_CLASS¶
alias of
CustomListEntry
- class core.monitor.CustomListEntryWorkUpdateMonitor(_db, collection=None, batch_size=None)[source]¶
Bases:
CustomListEntrySweepMonitor
Set or reset the Work associated with each custom list entry.
- DEFAULT_BATCH_SIZE = 100¶
- SERVICE_NAME = 'Update Works for custom list entries'¶
- class core.monitor.EditionSweepMonitor(_db, collection=None, batch_size=None)[source]¶
Bases:
SweepMonitor
A Monitor that does something to every Edition.
- class core.monitor.IdentifierSweepMonitor(_db, collection=None, batch_size=None)[source]¶
Bases:
SweepMonitor
A Monitor that does some work for every Identifier.
- MODEL_CLASS¶
alias of
Identifier
- class core.monitor.MakePresentationReadyMonitor(_db, coverage_providers, collection=None)[source]¶
Bases:
NotPresentationReadyWorkSweepMonitor
A monitor that makes works presentation ready.
By default this works by passing the work’s active edition into ensure_coverage() for each of a list of CoverageProviders. If all the ensure_coverage() calls succeed, presentation of the work is calculated and the work is marked presentation ready.
- SERVICE_NAME = 'Make Works Presentation Ready'¶
- prepare(work)[source]¶
Try to make a single Work presentation-ready.
- Raises:
CoverageProvidersFailed – If we can’t make a Work presentation-ready because one or more CoverageProviders failed.
- class core.monitor.MeasurementReaper(*args, **kwargs)[source]¶
Bases:
ReaperMonitor
Remove measurements that are not the most recent
- MODEL_CLASS¶
alias of
Measurement
- run_once(*args, **kwargs)[source]¶
Do the actual work of the Monitor.
- Parameters:
progress – A TimestampData representing the work done by the Monitor up to this point.
- Returns:
A TimestampData representing how you want the Monitor’s entry in the timestamps table to look like from this point on. NOTE: Modifying the incoming progress and returning it is generally a bad idea, because the incoming progress is full of old data. Instead, return a new TimestampData containing data for only the fields you want to set.
- property where_clause¶
A SQLAlchemy clause that identifies the database rows to be reaped.
- class core.monitor.Monitor(_db, collection=None)[source]¶
Bases:
object
A Monitor is responsible for running some piece of code on a regular basis. A Monitor has an associated Timestamp that tracks the last time it successfully ran; it may use this information on its next run to cover the intervening span of time.
A Monitor will run to completion and then stop. To repeatedly run a Monitor, you’ll need to repeatedly invoke it from some external source such as a cron job.
This class is designed to be subclassed rather than instantiated directly. Subclasses must define SERVICE_NAME. Subclasses may define replacement values for DEFAULT_START_TIME and DEFAULT_COUNTER.
Although any Monitor may be associated with a Collection, it’s most useful to subclass CollectionMonitor if you’re writing code that needs to be run on every Collection of a certain type.
- DEFAULT_COUNTER = None¶
- DEFAULT_START_TIME = datetime.timedelta(seconds=60)¶
- NEVER = <object object>¶
- ONE_MINUTE_AGO = datetime.timedelta(seconds=60)¶
- ONE_YEAR_AGO = datetime.timedelta(days=365)¶
- SERVICE_NAME = None¶
- cleanup()[source]¶
Do any work that needs to be done at the end, once the main work has completed successfully.
- property collection¶
Retrieve the Collection object associated with this Monitor.
- property initial_start_time¶
The time that should be used as the ‘start time’ the first time this Monitor is run.
- property log¶
- run_once(progress)[source]¶
Do the actual work of the Monitor.
- Parameters:
progress – A TimestampData representing the work done by the Monitor up to this point.
- Returns:
A TimestampData representing how you want the Monitor’s entry in the timestamps table to look like from this point on. NOTE: Modifying the incoming progress and returning it is generally a bad idea, because the incoming progress is full of old data. Instead, return a new TimestampData containing data for only the fields you want to set.
- class core.monitor.NotPresentationReadyWorkSweepMonitor(_db, collection=None, batch_size=None)[source]¶
Bases:
WorkSweepMonitor
A Monitor that does something to every Work that is not presentation-ready.
- class core.monitor.OPDSEntryCacheMonitor(_db, collection=None, batch_size=None)[source]¶
Bases:
PresentationReadyWorkSweepMonitor
A Monitor that recalculates the OPDS entries for every presentation-ready Work.
This is different from the OPDSEntryWorkCoverageProvider, which only processes works that are missing a WorkCoverageRecord with the ‘generate-opds’ operation.
- SERVICE_NAME = 'ODPS Entry Cache Monitor'¶
- class core.monitor.PatronNeighborhoodScrubber(*args, **kwargs)[source]¶
Bases:
ScrubberMonitor
Scrub cached neighborhood information from patrons who haven’t been seen in a while.
- MAX_AGE = datetime.timedelta(seconds=43200)¶
- SCRUB_FIELD = 'cached_neighborhood'¶
- TIMESTAMP_FIELD = 'last_external_sync'¶
- class core.monitor.PatronRecordReaper(*args, **kwargs)[source]¶
Bases:
ReaperMonitor
Remove patron records that expired more than 60 days ago
- MAX_AGE = 60¶
- TIMESTAMP_FIELD = 'authorization_expires'¶
- class core.monitor.PermanentWorkIDRefreshMonitor(_db, collection=None, batch_size=None)[source]¶
Bases:
EditionSweepMonitor
A monitor that calculates or recalculates the permanent work ID for every edition.
- SERVICE_NAME = 'Permanent work ID refresh'¶
- class core.monitor.PresentationReadyWorkSweepMonitor(_db, collection=None, batch_size=None)[source]¶
Bases:
WorkSweepMonitor
A Monitor that does something to every presentation-ready Work.
- class core.monitor.ReaperMonitor(*args, **kwargs)[source]¶
Bases:
Monitor
A Monitor that deletes database rows that have expired but have no other process to delete them.
A subclass of ReaperMonitor MUST define values for the following constants: * MODEL_CLASS - The model class this monitor is reaping, e.g. Credential. * TIMESTAMP_FIELD - Within the model class, the DateTime field to be used when deciding which rows to deleting, e.g. ‘expires’. The reaper will be more efficient if there’s an index on this field. * MAX_AGE - A datetime.timedelta or number of days representing the time that must pass before an item can be safely deleted.
A subclass of ReaperMonitor MAY define values for the following constants: * BATCH_SIZE - The number of rows to fetch for deletion in a single batch. The default is 1000.
If your model class has fields that might contain a lot of data and aren’t important to the reaping process, put their field names into a list called LARGE_FIELDS and the Reaper will avoid fetching that information, improving performance.
- BATCH_SIZE = 1000¶
- MAX_AGE = None¶
- MODEL_CLASS = None¶
- REGISTRY = [<class 'core.monitor.CachedFeedReaper'>, <class 'core.monitor.CredentialReaper'>, <class 'core.monitor.WorkReaper'>, <class 'core.monitor.CollectionReaper'>, <class 'core.monitor.MeasurementReaper'>, <class 'core.monitor.CirculationEventLocationScrubber'>, <class 'core.monitor.PatronNeighborhoodScrubber'>, <class 'api.monitor.LoanReaper'>, <class 'api.monitor.HoldReaper'>, <class 'api.monitor.IdlingAnnotationReaper'>]¶
- TIMESTAMP_FIELD = None¶
- property cutoff¶
Items with a timestamp earlier than this time will be reaped.
- delete(row)[source]¶
Delete a row from the database.
CAUTION: If you override this method such that it doesn’t actually delete the database row, then run_once() may enter an infinite loop.
- run_once(*args, **kwargs)[source]¶
Do the actual work of the Monitor.
- Parameters:
progress – A TimestampData representing the work done by the Monitor up to this point.
- Returns:
A TimestampData representing how you want the Monitor’s entry in the timestamps table to look like from this point on. NOTE: Modifying the incoming progress and returning it is generally a bad idea, because the incoming progress is full of old data. Instead, return a new TimestampData containing data for only the fields you want to set.
- property timestamp_field¶
- property where_clause¶
A SQLAlchemy clause that identifies the database rows to be reaped.
- class core.monitor.ScrubberMonitor(*args, **kwargs)[source]¶
Bases:
ReaperMonitor
Scrub information from the database.
Unlike the other ReaperMonitors, this class doesn’t delete rows from the database – it only clears out specific data fields.
In addition to the constants required for ReaperMonitor, a subclass of ScrubberMonitor MUST define a value for the following constant:
SCRUB_FIELD - The field whose value will be set to None when a row is scrubbed.
- property scrub_field¶
Find the SQLAlchemy representation of the model field to be scrubbed.
- property where_clause¶
Find rows that are older than MAX_AGE _and_ which have a non-null SCRUB_FIELD. If the field is already null, there’s no need to scrub it.
- class core.monitor.SubjectSweepMonitor(_db, subject_type=None, filter_string=None)[source]¶
Bases:
SweepMonitor
A Monitor that does some work for every Subject.
- DEFAULT_BATCH_SIZE = 500¶
- class core.monitor.SweepMonitor(_db, collection=None, batch_size=None)[source]¶
Bases:
CollectionMonitor
A monitor that does some work for every item in a database table, then stops.
Progress through the table is stored in the Timestamp, so that if the Monitor crashes, the next time the Monitor is run, it starts at the item that caused the crash, rather than starting from the beginning of the table.
- COMPLETION_LOG_LEVEL = 20¶
- DEFAULT_BATCH_SIZE = 100¶
- DEFAULT_COUNTER = 0¶
- MODEL_CLASS = None¶
- item_query()[source]¶
Find the items that need to be processed in the sweep.
- Returns:
A query object.
- run_once(*ignore)[source]¶
Do the actual work of the Monitor.
- Parameters:
progress – A TimestampData representing the work done by the Monitor up to this point.
- Returns:
A TimestampData representing how you want the Monitor’s entry in the timestamps table to look like from this point on. NOTE: Modifying the incoming progress and returning it is generally a bad idea, because the incoming progress is full of old data. Instead, return a new TimestampData containing data for only the fields you want to set.
- class core.monitor.TimelineMonitor(_db, collection=None)[source]¶
Bases:
Monitor
A monitor that needs to process everything that happened between two specific times.
This Monitor uses Timestamp.start and Timestamp.finish to describe the span of time covered in the most recent run, not the time it actually took to run.
- OVERLAP = datetime.timedelta(seconds=300)¶
- catch_up_from(start, cutoff, progress)[source]¶
Make sure all events between start and cutoff are covered.
- Parameters:
start – Start looking for events that happened at this time.
cutoff – You’re not responsible for events that happened after this time.
progress – A TimestampData representing the progress so far. Unlike with run_once(), you are encouraged to can modify this in place, for instance to set .achievements. However, you cannot change .start and .finish – any changes will be overwritten by run_once().
- run_once(progress)[source]¶
Do the actual work of the Monitor.
- Parameters:
progress – A TimestampData representing the work done by the Monitor up to this point.
- Returns:
A TimestampData representing how you want the Monitor’s entry in the timestamps table to look like from this point on. NOTE: Modifying the incoming progress and returning it is generally a bad idea, because the incoming progress is full of old data. Instead, return a new TimestampData containing data for only the fields you want to set.
- classmethod slice_timespan(start, cutoff, increment)[source]¶
Slice a span of time into segments no large than [increment].
This lets you divide up a task like “gather the entire circulation history for a collection” into chunks of one day.
- Parameters:
start – A datetime.
cutoff – A datetime.
increment – A timedelta.
- class core.monitor.WorkReaper(*args, **kwargs)[source]¶
Bases:
ReaperMonitor
Remove Works that have no associated LicensePools.
Unlike other reapers, no timestamp is relevant. As soon as a Work loses its last LicensePool it can be removed.
core.opds module¶
- class core.opds.AcquisitionFeed(_db, title, url, works, annotator=None, precomposed_entries=[])[source]¶
Bases:
OPDSFeed
- CURRENT_ENTRYPOINT_ATTRIBUTE = '{http://librarysimplified.org/terms/}entryPoint'¶
- FACET_REL = 'http://opds-spec.org/facet'¶
- add_breadcrumb_links(lane, entrypoint=None)[source]¶
Add information necessary to find your current place in the site’s navigation.
A link with rel=”start” points to the start of the site
A <simplified:entrypoint> section describes the current entry point.
A <simplified:breadcrumbs> section contains a sequence of breadcrumb links.
- add_breadcrumbs(lane, include_lane=False, entrypoint=None)[source]¶
Add list of ancestor links in a breadcrumbs element.
- Parameters:
lane – Add breadcrumbs from up to this lane.
include_lane – Include lane itself in the breadcrumbs.
entrypoint – The currently selected entrypoint, if any.
TODO: The switchover from “no entry point” to “entry point” needs its own breadcrumb link.
- classmethod add_entrypoint_links(feed, url_generator, entrypoints, selected_entrypoint, group_name='Formats')[source]¶
Add links to a feed forming an OPDS facet group for a set of EntryPoints.
- Parameters:
feed – A lxml Tag object.
url_generator – A callable that returns the entry point URL when passed an EntryPoint.
entrypoints – A list of all EntryPoints in the facet group.
selected_entrypoint – The current EntryPoint, if selected.
- as_error_response(**kwargs)[source]¶
Convert this feed into an OPDSFEedResponse that should be treated by intermediaries as an error – that is, treated as private and not cached.
- create_entry(work, even_if_no_license_pool=False, force_create=False, use_cache=True)[source]¶
Turn a work into an entry for an acquisition feed.
- classmethod error_message(identifier, error_status, error_message)[source]¶
Turn an error result into an OPDSMessage suitable for adding to a feed.
- classmethod facet_link(href, title, facet_group_name, is_active)[source]¶
Build a set of attributes for a facet link.
- Parameters:
href – Destination of the link.
title – Human-readable description of the facet.
facet_group_name – The facet group to which the facet belongs, e.g. “Sort By”.
is_active – True if this is the client’s currently selected facet.
- Returns:
A dictionary of attributes, suitable for passing as keyword arguments into OPDSFeed.add_link_to_feed.
- classmethod facet_links(annotator, facets)[source]¶
Create links for this feed’s navigational facet groups.
This does not create links for the entry point facet group, because those links should only be present in certain circumstances, and this method doesn’t know if those circumstances apply. You need to decide whether to call add_entrypoint_links in addition to calling this method.
- classmethod format_types(delivery_mechanism)[source]¶
Generate a set of types suitable for passing into acquisition_link().
- classmethod from_query(query, _db, feed_name, url, pagination, url_fn, annotator)[source]¶
Build a feed representing one page of a given list. Currently used for creating an OPDS feed for a custom list and not cached.
TODO: This is used by the circulation manager admin interface. Investigate changing the code that uses this to use the search index – this is inefficient and creates an alternate code path that may harbor bugs.
TODO: This cannot currently return OPDSFeedResponse because the admin interface modifies the feed after it’s generated.
- classmethod groups(_db, title, url, worklist, annotator, pagination=None, facets=None, max_age=None, search_engine=None, search_debug=False, **response_kwargs)[source]¶
The acquisition feed for ‘featured’ items from a given lane’s sublanes, organized into per-lane groups.
NOTE: If the lane has no sublanes, a grouped feed will probably be unsatisfying. Call page() instead with an appropriate Facets object.
- Parameters:
pagination – A Pagination object. No single child of this lane will contain more than pagination.size items.
facets – A GroupsFacet object.
response_kwargs – Extra keyword arguments to pass into the OPDSFeedResponse constructor.
- Returns:
An OPDSFeedResponse containing the feed.
- classmethod minimal_opds_entry(identifier, cover, description, quality, most_recent_update=None)[source]¶
- classmethod page(_db, title, url, worklist, annotator, facets=None, pagination=None, max_age=None, search_engine=None, search_debug=False, **response_kwargs)[source]¶
Create a feed representing one page of works from a given lane.
- Parameters:
response_kwargs – Extra keyword arguments to pass into the OPDSFeedResponse constructor.
- Returns:
An OPDSFeedResponse containing the feed.
- classmethod search(_db, title, url, lane, search_engine, query, pagination=None, facets=None, annotator=None, **response_kwargs)[source]¶
Run a search against the given search engine and return the results as a Flask Response.
- Parameters:
_db – A database connection
title – The title of the resulting OPDS feed.
url – The URL from which the feed will be served.
search_engine – An ExternalSearchIndex.
query – The search query
pagination – A Pagination
facets – A Facets
annotator – An Annotator
response_kwargs – Keyword arguments to pass into the OPDSFeedResponse constructor.
- Returns:
An ODPSFeedResponse
- show_current_entrypoint(entrypoint)[source]¶
Annotate this given feed with a simplified:entryPoint attribute pointing to the current entrypoint’s TYPE_URI.
This gives clients an overall picture of the type of works in the feed, and a way to distinguish between one EntryPoint and another.
- Parameters:
entrypoint – An EntryPoint.
- classmethod single_entry(_db, work, annotator, force_create=False, raw=False, use_cache=True, **response_kwargs)[source]¶
Create a single-entry OPDS document for one specific work.
- Parameters:
_db – A database connection.
work – A Work
work – An Annotator
force_create – Create the OPDS entry from scratch even if there’s already a cached one.
raw – If this is False (the default), a Flask Response will be returned, ready to be sent over the network. Otherwise an object representing the underlying OPDS entry will be returned.
use_cache – Boolean value determining whether the OPDS cache shall be used.
response_kwargs – These keyword arguments will be passed into the Response constructor, if it is invoked.
- Returns:
A Response, if raw is false. Otherwise, an OPDSMessage or an etree._Element – whatever was returned by OPDSFeed.create_entry.
- class core.opds.Annotator[source]¶
Bases:
object
The Annotator knows how to present an OPDS feed in a specific application context.
- classmethod active_licensepool_for(work)[source]¶
Which license pool would be/has been used to issue a license for this work?
- classmethod annotate_feed(feed, lane, list=None)[source]¶
Make any custom modifications necessary to integrate this OPDS feed into the application’s workflow.
- annotate_work_entry(work, active_license_pool, edition, identifier, feed, entry, updated=None)[source]¶
Make any custom modifications necessary to integrate this OPDS entry into the application’s workflow.
- Work:
The Work whose OPDS entry is being annotated.
- Active_license_pool:
Of all the LicensePools associated with this Work, the client has expressed interest in this one.
- Edition:
The Edition to use when associating bibliographic metadata with this entry. You will probably not need to use this, because bibliographic metadata was associated with the entry when it was created.
- Identifier:
Of all the Identifiers associated with this Work, the client has expressed interest in this one.
- Parameters:
feed – An OPDSFeed – the feed in which this entry will be situated.
entry – An lxml Element object, the entry that will be added to the feed.
- classmethod authors(work, edition)[source]¶
Create one or more <author> and <contributor> tags for the given Work.
- Parameters:
work – The Work under consideration.
edition – The Edition to use as a reference for bibliographic information, including the list of Contributions.
- classmethod categories(work)[source]¶
Return all relevant classifications of this work.
- Returns:
A dictionary mapping ‘scheme’ URLs to dictionaries of attribute-value pairs.
Notable attributes: ‘term’, ‘label’, ‘http://schema.org/ratingValue’
- classmethod contributor_tag(contribution, state)[source]¶
Build an <author> or <contributor> tag for a Contribution.
- Parameters:
contribution – A Contribution.
state – A defaultdict of sets, which may be used to keep track of what happened during previous calls to contributor_tag for a given Work.
- Returns:
A Tag, or None if creating a Tag for this Contribution would be redundant or of low value.
- classmethod cover_links(work)[source]¶
Return all links to be used as cover links for this work.
In a distribution application, each work will have only one link. In a content server-type application, each work may have a large number of links.
- Returns:
A 2-tuple (thumbnail_links, full_links)
- classmethod group_uri(work, license_pool, identifier)[source]¶
The URI to be associated with this Work when making it part of a grouped feed.
By default, this does nothing. See circulation/LibraryAnnotator for a subclass that does something.
- Returns:
A 2-tuple (URI, title)
- is_work_entry_solo(work)[source]¶
- Return a boolean value indicating whether the work’s OPDS catalog entry is served by itself,
rather than as a part of the feed.
- Parameters:
work (core.model.work.Work) – Work object
- Returns:
Boolean value indicating whether the work’s OPDS catalog entry is served by itself, rather than as a part of the feed
- Return type:
bool
- opds_cache_field = 'simple_opds_entry'¶
- classmethod permalink_for(work, license_pool, identifier)[source]¶
Generate a permanent link a client can follow for information about this entry, and only this entry.
Note that permalink is distinct from the Atom <id>, which is always the identifier’s URN.
- Returns:
A 2-tuple (URL, media type). If a single value is returned, the media type will be presumed to be that of an OPDS entry.
- classmethod rating_tag(type_uri, value)[source]¶
Generate a schema:Rating tag for the given type and value.
- class core.opds.LookupAcquisitionFeed(_db, title, url, works, annotator=None, precomposed_entries=[])[source]¶
Bases:
AcquisitionFeed
Used when the user has requested a lookup of a specific identifier, which may be different from the identifier used by the Work’s default LicensePool.
Bases:
FeaturedFacets
Bases:
OPDSFeed
Create an OPDS navigation entry for a URL.
The navigation feed with links to a given lane’s sublanes.
- Parameters:
response_kwargs – Extra keyword arguments to pass into the OPDSFeedResponse constructor.
- Returns:
A Response
- class core.opds.TestAnnotatorWithGroup[source]¶
Bases:
TestAnnotator
- class core.opds.TestUnfulfillableAnnotator[source]¶
Bases:
TestAnnotator
Raise an UnfulfillableWork exception when asked to annotate an entry.
- annotate_work_entry(*args, **kwargs)[source]¶
Make any custom modifications necessary to integrate this OPDS entry into the application’s workflow.
- Work:
The Work whose OPDS entry is being annotated.
- Active_license_pool:
Of all the LicensePools associated with this Work, the client has expressed interest in this one.
- Edition:
The Edition to use when associating bibliographic metadata with this entry. You will probably not need to use this, because bibliographic metadata was associated with the entry when it was created.
- Identifier:
Of all the Identifiers associated with this Work, the client has expressed interest in this one.
- Parameters:
feed – An OPDSFeed – the feed in which this entry will be situated.
entry – An lxml Element object, the entry that will be added to the feed.
- exception core.opds.UnfulfillableWork[source]¶
Bases:
Exception
Raise this exception when it turns out a Work currently cannot be fulfilled through any means, and this is a problem sufficient to cancel the creation of an <entry> for the Work.
For commercial works, this might be because the collection contains no licenses. For open-access works, it might be because none of the delivery mechanisms could be mirrored.
- class core.opds.VerboseAnnotator[source]¶
Bases:
Annotator
The default Annotator for machine-to-machine integration.
This Annotator describes all categories and authors for the book in great detail.
- annotate_work_entry(work, active_license_pool, edition, identifier, feed, entry)[source]¶
Make any custom modifications necessary to integrate this OPDS entry into the application’s workflow.
- Work:
The Work whose OPDS entry is being annotated.
- Active_license_pool:
Of all the LicensePools associated with this Work, the client has expressed interest in this one.
- Edition:
The Edition to use when associating bibliographic metadata with this entry. You will probably not need to use this, because bibliographic metadata was associated with the entry when it was created.
- Identifier:
Of all the Identifiers associated with this Work, the client has expressed interest in this one.
- Parameters:
feed – An OPDSFeed – the feed in which this entry will be situated.
entry – An lxml Element object, the entry that will be added to the feed.
- classmethod categories(work, policy=None)[source]¶
Send out _all_ categories for the work.
(So long as the category type has a URI associated with it in Subject.uri_lookup.)
- Parameters:
policy – A PresentationCalculationPolicy to use when deciding how deep to go when finding equivalent identifiers for the work.
- opds_cache_field = 'verbose_opds_entry'¶
core.opds2_import module¶
- class core.opds2_import.OPDS2ImportMonitor(_db, collection, import_class, force_reimport=False, **import_class_kwargs)[source]¶
Bases:
OPDSImportMonitor
- MEDIA_TYPE = ('application/opds+json', 'application/json')¶
- PROTOCOL = 'OPDS 2.0 Import'¶
- class core.opds2_import.OPDS2Importer(db, collection, data_source_name=None, identifier_mapping=None, http_get=None, metadata_client=None, content_modifier=None, map_from_collection=None, mirrors=None)[source]¶
Bases:
OPDSImporter
Imports editions and license pools from an OPDS 2.0 feed.
- DESCRIPTION = l'Import books from a publicly-accessible OPDS 2.0 feed.'¶
- NAME = 'OPDS 2.0 Import'¶
- NEXT_LINK_RELATION = 'next'¶
- extract_feed_data(feed, feed_url=None)[source]¶
Turn an OPDS 2.0 feed into lists of Metadata and CirculationData objects.
- Parameters:
feed (Union[str, opds2_ast.OPDS2Feed]) – OPDS 2.0 feed
feed_url (Optional[str]f) – Feed URL used to resolve relative links
core.opds_import module¶
- exception core.opds_import.AccessNotAuthenticated[source]¶
Bases:
Exception
No authentication is configured for this service
- class core.opds_import.MetadataWranglerOPDSLookup(url, shared_secret=None, collection=None)[source]¶
Bases:
SimplifiedOPDSLookup
,HasSelfTests
- ADD_ENDPOINT = 'add'¶
- ADD_WITH_METADATA_ENDPOINT = 'add_with_metadata'¶
- CANONICALIZE_ENDPOINT = 'canonical-author-name'¶
- CARDINALITY = 1¶
- METADATA_NEEDED_ENDPOINT = 'metadata_needed'¶
- NAME = l'Library Simplified Metadata Wrangler'¶
- PROTOCOL = 'Metadata Wrangler'¶
- REMOVE_ENDPOINT = 'remove'¶
- SETTINGS = [{'key': 'url', 'label': l'URL', 'default': 'http://metadata.librarysimplified.org/', 'required': True, 'format': 'url'}]¶
- SITEWIDE = True¶
- UPDATES_ENDPOINT = 'updates'¶
- add_with_metadata(feed)[source]¶
Add a feed of items with metadata to an authenticated Metadata Wrangler Collection.
- property authenticated¶
- property authorization¶
- canonicalize_author_name(identifier, working_display_name)[source]¶
Attempt to find the canonical name for the author of a book.
- Parameters:
identifier – an ISBN-type Identifier.
working_display_name – The display name of the author (i.e. the name format human being used as opposed to the name that goes into library records).
- classmethod external_integration(_db)[source]¶
Locate the ExternalIntegration associated with this object. The status of the self-tests will be stored as a ConfigurationSetting on this ExternalIntegration.
By default, there is no way to get from an object to its ExternalIntegration, and self-test status will not be stored.
- property lookup_endpoint¶
- class core.opds_import.MockSimplifiedOPDSLookup(*args, **kwargs)[source]¶
Bases:
SimplifiedOPDSLookup
- class core.opds_import.OPDSImportMonitor(_db, collection, import_class, force_reimport=False, **import_class_kwargs)[source]¶
Bases:
CollectionMonitor
,HasSelfTests
Periodically monitor a Collection’s OPDS archive feed and import every title it mentions.
- DEFAULT_START_TIME = <object object>¶
- PROTOCOL = 'OPDS Import'¶
- SERVICE_NAME = 'OPDS Import Monitor'¶
- data_source(collection)[source]¶
Returns the data source name for the given collection.
By default, this URL is stored as a setting on the collection, but subclasses may hard-code it.
- external_integration(_db)[source]¶
Locate the ExternalIntegration associated with this object. The status of the self-tests will be stored as a ConfigurationSetting on this ExternalIntegration.
By default, there is no way to get from an object to its ExternalIntegration, and self-test status will not be stored.
- feed_contains_new_data(feed)[source]¶
Does the given feed contain any entries that haven’t been imported yet?
- follow_one_link(url, do_get=None)[source]¶
Download a representation of a URL and extract the useful information.
- Returns:
A 2-tuple (next_links, feed). next_links is a list of additional links that need to be followed. feed is the content that needs to be imported.
- identifier_needs_import(identifier, last_updated_remote)[source]¶
Does the remote side have new information about this Identifier?
- Parameters:
identifier – An Identifier.
last_update_remote – The last time the remote side updated the OPDS entry for this Identifier.
- opds_url(collection)[source]¶
Returns the OPDS import URL for the given collection.
By default, this URL is stored as the external account ID, but subclasses may override this.
- run_once(progress_ignore)[source]¶
Do the actual work of the Monitor.
- Parameters:
progress – A TimestampData representing the work done by the Monitor up to this point.
- Returns:
A TimestampData representing how you want the Monitor’s entry in the timestamps table to look like from this point on. NOTE: Modifying the incoming progress and returning it is generally a bad idea, because the incoming progress is full of old data. Instead, return a new TimestampData containing data for only the fields you want to set.
- class core.opds_import.OPDSImporter(_db, collection, data_source_name=None, identifier_mapping=None, http_get=None, metadata_client=None, content_modifier=None, map_from_collection=None, mirrors=None)[source]¶
Bases:
object
Imports editions and license pools from an OPDS feed. Creates Edition, LicensePool and Work rows in the database, if those don’t already exist.
Should be used when a circulation server asks for data from our internal content server, and also when our content server asks for data from external content servers.
- BASE_SETTINGS = [{'key': 'external_account_id', 'label': l'URL', 'required': True, 'format': 'url'}, {'key': 'data_source', 'label': l'Data source name', 'required': True}, {'key': 'default_audience', 'label': l'Default audience', 'description': l'If the vendor does not specify the target audience for their books, assume the books have this target audience.', 'type': 'select', 'format': 'narrow', 'options': [{'key': '', 'label': l'No default audience'}, {'key': 'Adult', 'label': 'Adult'}, {'key': 'Adults Only', 'label': 'Adults Only'}, {'key': 'All Ages', 'label': 'All Ages'}, {'key': 'Children', 'label': 'Children'}, {'key': 'Research', 'label': 'Research'}, {'key': 'Young Adult', 'label': 'Young Adult'}], 'default': '', 'required': False, 'readOnly': True}]¶
- COULD_NOT_CREATE_LICENSE_POOL = 'No existing license pool for this identifier and no way of creating one.'¶
- DESCRIPTION = l'Import books from a publicly-accessible OPDS feed.'¶
- NAME = 'OPDS Import'¶
- NO_DEFAULT_AUDIENCE = ''¶
- PARSER_CLASS¶
alias of
OPDSXMLParser
- SETTINGS = [{'key': 'external_account_id', 'label': l'URL', 'required': True, 'format': 'url'}, {'key': 'data_source', 'label': l'Data source name', 'required': True}, {'key': 'default_audience', 'label': l'Default audience', 'description': l'If the vendor does not specify the target audience for their books, assume the books have this target audience.', 'type': 'select', 'format': 'narrow', 'options': [{'key': '', 'label': l'No default audience'}, {'key': 'Adult', 'label': 'Adult'}, {'key': 'Adults Only', 'label': 'Adults Only'}, {'key': 'All Ages', 'label': 'All Ages'}, {'key': 'Children', 'label': 'Children'}, {'key': 'Research', 'label': 'Research'}, {'key': 'Young Adult', 'label': 'Young Adult'}], 'default': '', 'required': False, 'readOnly': True}, {'key': 'username', 'label': l'Username', 'description': l'If HTTP Basic authentication is required to access the OPDS feed (it usually isn't), enter the username here.'}, {'key': 'password', 'label': l'Password', 'description': l'If HTTP Basic authentication is required to access the OPDS feed (it usually isn't), enter the password here.'}, {'key': 'custom_accept_header', 'label': l'Custom accept header', 'required': False, 'description': l'Some servers expect an accept header to decide which file to send. You can use */* if the server doesn't expect anything.', 'default': 'application/atom+xml;profile=opds-catalog;kind=acquisition,application/atom+xml;q=0.9,application/xml;q=0.8,*/*;q=0.1'}, {'key': 'primary_identifier_source', 'label': l'Identifer', 'required': False, 'description': l'Which book identifier to use as ID.', 'type': 'select', 'options': [{'key': '', 'label': l'(Default) Use <id>'}, {'key': 'first_dcterms_identifier', 'label': l'Use <dcterms:identifier> first, if not exist use <id>'}]}]¶
- SUCCESS_STATUS_CODES = None¶
- assert_importable_content(feed, feed_url, max_get_attempts=5)[source]¶
Raise an exception if the given feed contains nothing that can, even theoretically, be turned into a LicensePool.
By default, this means the feed must link to open-access content that can actually be retrieved.
- build_identifier_mapping(external_urns)[source]¶
Uses the given Collection and a list of URNs to reverse engineer an identifier mapping.
NOTE: It would be better if .identifier_mapping weren’t instance data, since a single OPDSImporter might import multiple pages of a feed. However, the code as written should work.
- property collection¶
Returns an associated Collection object
- Returns:
Associated Collection object
- Return type:
Optional[core.model.collection.Collection]
- classmethod combine(d1, d2)[source]¶
Combine two dictionaries that can be used as keyword arguments to the Metadata constructor.
- classmethod consolidate_links(links)[source]¶
Try to match up links with their thumbnails.
If link n is an image and link n+1 is a thumbnail, then the thumbnail is assumed to be the thumbnail of the image.
Similarly if link n is a thumbnail and link n+1 is an image.
- classmethod coveragefailure_from_message(data_source, message)[source]¶
Turn a <simplified:message> tag into a CoverageFailure.
- classmethod coveragefailures_from_messages(data_source, parser, feed_tag)[source]¶
Extract CoverageFailure objects from a parsed OPDS document. This allows us to determine the fate of books which could not become <entry> tags.
- classmethod data_detail_for_feedparser_entry(entry, data_source)[source]¶
Turn an entry dictionary created by feedparser into dictionaries of data that can be used as keyword arguments to the Metadata and CirculationData constructors.
- Returns:
A 3-tuple (identifier, kwargs for Metadata constructor, failure)
- property data_source¶
Look up or create a DataSource object representing the source of this OPDS feed.
- classmethod detail_for_elementtree_entry(parser, entry_tag, data_source, feed_url=None, do_get=None)[source]¶
Turn an <atom:entry> tag into a dictionary of metadata that can be used as keyword arguments to the Metadata contructor.
- Returns:
A 2-tuple (identifier, kwargs)
- classmethod extract_contributor(parser, author_tag)[source]¶
Turn an <atom:author> tag into a ContributorData object.
- extract_feed_data(feed, feed_url=None)[source]¶
Turn an OPDS feed into lists of Metadata and CirculationData objects, with associated messages and next_links.
- classmethod extract_identifier(identifier_tag)[source]¶
Turn a <dcterms:identifier> tag into an IdentifierData object.
- classmethod extract_link(link_tag, feed_url=None, entry_rights_uri=None)[source]¶
Convert a <link> tag into a LinkData object.
- Parameters:
feed_url – The URL to the enclosing feed, for use in resolving relative links.
entry_rights_uri – A URI describing the rights advertised in the entry. Unless this specific link says otherwise, we will assume that the representation on the other end of the link if made available on these terms.
- classmethod extract_medium(entry_tag, default='Book')[source]¶
Derive a value for Edition.medium from schema:additionalType or from a <dcterms:format> subtag.
- Parameters:
entry_tag – A <atom:entry> tag.
default – The value to use if nothing is found.
- classmethod extract_messages(parser, feed_tag)[source]¶
Extract <simplified:message> tags from an OPDS feed and convert them into OPDSMessage objects.
- classmethod extract_metadata_from_elementtree(feed, data_source, feed_url=None, do_get=None)[source]¶
Parse the OPDS as XML and extract all author and subject information, as well as ratings and medium.
All the stuff that Feedparser can’t handle so we have to use lxml.
- Returns:
a dictionary mapping IDs to dictionaries. The inner dictionary can be used as keyword arguments to the Metadata constructor.
- classmethod extract_subject(parser, category_tag)[source]¶
Turn an <atom:category> tag into a SubjectData object.
- classmethod get_medium_from_links(links)[source]¶
Get medium if derivable from information in an acquisition link.
- handle_failure(urn, failure)[source]¶
Convert a URN and a failure message that came in through an OPDS feed into an Identifier and a CoverageFailure object.
The Identifier may not be the one designated by urn (if it’s found in self.identifier_mapping) and the ‘failure’ may turn out not to be a CoverageFailure at all – if it’s an Identifier, that means that what a normal OPDSImporter would consider ‘failure’ is considered success.
- import_edition_from_metadata(metadata)[source]¶
For the passed-in Metadata object, see if can find or create an Edition in the database. Also create a LicensePool if the Metadata has CirculationData in it.
- classmethod make_link_data(rel, href=None, media_type=None, rights_uri=None, content=None)[source]¶
Hook method for creating a LinkData object.
Intended to be overridden in subclasses.
- classmethod rights_uri(rights_string)[source]¶
Determine the URI that best encapsulates the rights status of the downloads associated with this book.
- classmethod rights_uri_from_entry_tag(entry)[source]¶
Extract a rights string from an lxml <entry> tag.
- Returns:
A rights URI.
- class core.opds_import.OPDSXMLParser[source]¶
Bases:
XMLParser
- NAMESPACES = {'app': 'http://www.w3.org/2007/app', 'atom': 'http://www.w3.org/2005/Atom', 'dc': 'http://purl.org/dc/elements/1.1/', 'dcterms': 'http://purl.org/dc/terms/', 'drm': 'http://librarysimplified.org/terms/drm', 'opds': 'http://opds-spec.org/2010/catalog', 'schema': 'http://schema.org/', 'simplified': 'http://librarysimplified.org/terms/'}¶
- class core.opds_import.SimplifiedOPDSLookup(base_url)[source]¶
Bases:
object
Tiny integration class for the Simplified ‘lookup’ protocol.
- LOOKUP_ENDPOINT = 'lookup'¶
- property lookup_endpoint¶
core.opensearch module¶
- class core.opensearch.OpenSearchDocument[source]¶
Bases:
object
Generates OpenSearch documents.
- TEMPLATE = '<?xml version="1.0" encoding="UTF-8"?>\n <OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">\n <ShortName>%(name)s</ShortName>\n <Description>%(description)s</Description>\n <Tags>%(tags)s</Tags>\n <Url type="application/atom+xml;profile=opds-catalog" template="%(url_template)s"/>\n </OpenSearchDescription>'¶
core.overdrive module¶
- class core.overdrive.MockOverdriveAPI(_db, collection, *args, **kwargs)[source]¶
Bases:
OverdriveAPI
- classmethod mock_collection(_db, library=None, name='Test Overdrive Collection', client_key='a', client_secret='b', library_id='c', website_id='d', ils_name='e')[source]¶
Create a mock Overdrive collection for use in tests.
- token_post(url, payload, headers={}, **kwargs)[source]¶
Mock the request for an OAuth token.
We mock the method by looking at the access_token_response property, rather than inserting a mock response in the queue, because only the first MockOverdriveAPI instantiation in a given test actually makes this call. By mocking the response to this method separately we remove the need to figure out whether to queue a response in a given test.
- class core.overdrive.OverdriveAPI(_db, collection)[source]¶
Bases:
object
- ADVANTAGE_LIBRARY_ENDPOINT = '%(host)s/v1/libraries/%(parent_library_id)s/advantageAccounts/%(library_id)s'¶
- ALL_PRODUCTS_ENDPOINT = '%(host)s/v1/collections/%(collection_token)s/products?sort=%(sort)s'¶
- AVAILABILITY_ENDPOINT = '%(host)s/v2/collections/%(collection_token)s/products/%(product_id)s/availability'¶
- CHECKOUTS_ENDPOINT = '%(patron_host)s/v1/patrons/me/checkouts'¶
- CHECKOUT_ENDPOINT = '%(patron_host)s/v1/patrons/me/checkouts/%(overdrive_id)s'¶
- DEFAULT_READABLE_FORMATS = {'audiobook-overdrive', 'ebook-epub-adobe', 'ebook-epub-open', 'ebook-pdf-open'}¶
- EVENTS_ENDPOINT = '%(host)s/v1/collections/%(collection_token)s/products?lastUpdateTime=%(lastupdatetime)s&sort=%(sort)s&limit=%(limit)s'¶
- EVENT_DELAY = datetime.timedelta(seconds=7200)¶
- EVENT_SOURCE = 'Overdrive'¶
- FORMATS = ['ebook-epub-open', 'ebook-epub-adobe', 'ebook-pdf-adobe', 'ebook-pdf-open', 'audiobook-overdrive']¶
- FORMATS_ENDPOINT = '%(patron_host)s/v1/patrons/me/checkouts/%(overdrive_id)s/formats'¶
- HOLDS_ENDPOINT = '%(patron_host)s/v1/patrons/me/holds'¶
- HOLD_ENDPOINT = '%(patron_host)s/v1/patrons/me/holds/%(product_id)s'¶
- HOSTS = {'production': {'host': 'https://api.overdrive.com', 'oauth_host': 'https://oauth.overdrive.com', 'oauth_patron_host': 'https://oauth-patron.overdrive.com', 'patron_host': 'https://patron.api.overdrive.com'}, 'testing': {'host': 'https://integration.api.overdrive.com', 'oauth_host': 'https://oauth.overdrive.com', 'oauth_patron_host': 'https://oauth-patron.overdrive.com', 'patron_host': 'https://integration-patron.api.overdrive.com'}}¶
- ILS_NAME_DEFAULT = 'default'¶
- ILS_NAME_KEY = 'ils_name'¶
- INCOMPATIBLE_PLATFORM_FORMATS = {'ebook-kindle'}¶
- LIBRARY_ENDPOINT = '%(host)s/v1/libraries/%(library_id)s'¶
- MAX_CREDENTIAL_AGE = 3000¶
- METADATA_ENDPOINT = '%(host)s/v1/collections/%(collection_token)s/products/%(item_id)s/metadata'¶
- ME_ENDPOINT = '%(patron_host)s/v1/patrons/me'¶
- OVERDRIVE_READ_FORMAT = 'ebook-overdrive'¶
- PAGE_SIZE_LIMIT = 300¶
- PATRON_INFORMATION_ENDPOINT = '%(patron_host)s/v1/patrons/me'¶
- PATRON_TOKEN_ENDPOINT = '%(oauth_patron_host)s/patrontoken'¶
- PRODUCTION_SERVERS = 'production'¶
- SERVER_NICKNAME = 'server_nickname'¶
- TESTING_SERVERS = 'testing'¶
- TIME_FORMAT = '%Y-%m-%dT%H:%M:%SZ'¶
- TOKEN_ENDPOINT = '%(oauth_host)s/token'¶
- WEBSITE_ID = 'website_id'¶
- property advantage_library_id¶
The library ID for this library, as we should look for it in certain API documents served by Overdrive.
For ordinary collections, and for consortial collections shared among libraries, this will be -1.
For Overdrive Advantage accounts, this will be the numeric value of the Overdrive library ID.
- all_ids()[source]¶
Get IDs for every book in the system, with the most recently added ones at the front.
- property collection¶
- property collection_token¶
Get the token representing this particular Overdrive collection.
As a side effect, this will verify that the Overdrive credentials are working.
- credential_object(refresh)[source]¶
Look up the Credential object that allows us to use the Overdrive API.
- endpoint(url, **kwargs)[source]¶
Create the URL to an Overdrive API endpoint.
- Parameters:
url – A template for the URL.
kwargs – Arguments to be interpolated into the template. The server hostname will be interpolated automatically; you don’t have to pass it in.
- get(url, extra_headers, exception_on_401=False)[source]¶
Make an HTTP GET request using the active Bearer Token.
- get_advantage_accounts()[source]¶
Find all the Overdrive Advantage accounts managed by this library.
- Yield:
A sequence of OverdriveAdvantageAccount objects.
- get_library()[source]¶
Get basic information about the collection, including a link to the titles in the collection.
- host = {'host': 'https://integration.api.overdrive.com', 'oauth_host': 'https://oauth.overdrive.com', 'oauth_patron_host': 'https://oauth-patron.overdrive.com', 'patron_host': 'https://integration-patron.api.overdrive.com'}¶
- classmethod ils_name_setting(_db, collection, library)[source]¶
Find the ConfigurationSetting controlling the ILS name for the given collection and library.
- lock = <unlocked _thread.RLock object owner=0 count=0>¶
- log = <Logger Overdrive API (WARNING)>¶
- classmethod make_link_safe(url)[source]¶
Turn a server-provided link into a link the server will accept!
The {} part is completely obnoxious and I have complained about it to Overdrive.
The availability part is to make sure we always use v2 of the availability API, even if Overdrive sent us a link to v1.
- recently_changed_ids(start, cutoff)[source]¶
Get IDs of books whose status has changed between the start time and now.
- property source¶
- property token¶
- property token_authorization_header¶
- class core.overdrive.OverdriveAdvantageAccount(parent_library_id, library_id, name)[source]¶
Bases:
object
Holder and parser for data associated with Overdrive Advantage.
- class core.overdrive.OverdriveBibliographicCoverageProvider(collection, api_class=<class 'core.overdrive.OverdriveAPI'>, **kwargs)[source]¶
Bases:
BibliographicCoverageProvider
Fill in bibliographic metadata for Overdrive records.
This will occasionally fill in some availability information for a single Collection, but we rely on Monitors to keep availability information up to date for all Collections.
- DATA_SOURCE_NAME = 'Overdrive'¶
- INPUT_IDENTIFIER_TYPES = 'Overdrive ID'¶
- PROTOCOL = 'Overdrive'¶
- SERVICE_NAME = 'Overdrive Bibliographic Coverage Provider'¶
- class core.overdrive.OverdriveRepresentationExtractor(api)[source]¶
Bases:
object
Extract useful information from Overdrive’s JSON representations.
- DATE_FORMAT = '%Y-%m-%d'¶
- classmethod availability_link_list(book_list)[source]¶
- Returns:
A list of dictionaries with keys id, title, availability_link.
- book_info_to_circulation(book)[source]¶
Note: The json data passed into this method is from a different file/stream from the json data that goes into the book_info_to_metadata() method.
- classmethod book_info_to_metadata(book, include_bibliographic=True, include_formats=True)[source]¶
Turn Overdrive’s JSON representation of a book into a Metadata object.
Note: The json data passed into this method is from a different file/stream from the json data that goes into the book_info_to_circulation() method.
- format_data_for_overdrive_format = {'audiobook-mp3': ('application/x-od-media', 'Overdrive DRM'), 'audiobook-overdrive': [('application/vnd.overdrive.circulation.api+json;profile=audiobook', 'Libby DRM'), ('Streaming Audio', 'Streaming')], 'ebook-epub-adobe': ('application/epub+zip', 'application/vnd.adobe.adept+xml'), 'ebook-epub-open': ('application/epub+zip', None), 'ebook-kindle': ('Kindle via Amazon', 'Kindle DRM'), 'ebook-overdrive': [('application/vnd.overdrive.circulation.api+json;profile=ebook', 'Libby DRM'), ('Streaming Text', 'Streaming')], 'ebook-pdf-adobe': ('application/pdf', 'application/vnd.adobe.adept+xml'), 'ebook-pdf-open': ('application/pdf', None), 'music-mp3': ('application/x-od-media', 'Overdrive DRM'), 'periodicals-nook': ('Nook via B&N', 'Nook DRM'), 'video-streaming': ('Streaming Video', 'Streaming')}¶
- ignorable_overdrive_formats = {}¶
- classmethod internal_formats(overdrive_format)[source]¶
Yield all internal formats for the given Overdrive format.
Some Overdrive formats become multiple internal formats.
- Yield:
A sequence of (content type, DRM system) 2-tuples
- log = <Logger Overdrive representation extractor (WARNING)>¶
- overdrive_medium_to_simplified_medium = {'Audiobook': 'Audio', 'Music': 'Music', 'Periodicals': 'Periodical', 'Video': 'Video', 'eBook': 'Book'}¶
- overdrive_role_to_simplified_role = {'actor': 'Actor', 'adapter': 'Adapter', 'artist': 'Artist', 'associated name': 'Associated name', 'author': 'Author', 'author of afterword': 'Afterword Author', 'author of foreword': 'Foreword Author', 'author of introduction': 'Introduction Author', 'book producer': 'Producer', 'cast member': 'Actor', 'collaborator': 'Collaborator', 'colophon': 'Colophon Author', 'compiler': 'Compiler', 'composer': 'Composer', 'contributor': 'Contributor', 'copyright holder': 'Copyright holder', 'designer': 'Designer', 'director': 'Director', 'editor': 'Editor', 'engineer': 'Engineer', 'etc.': 'Unknown', 'executive producer': 'Executive Producer', 'illustrator': 'Illustrator', 'lyricist': 'Lyricist', 'musician': 'Musician', 'narrator': 'Narrator', 'other': 'Unknown', 'performer': 'Performer', 'photographer': 'Photographer', 'producer': 'Producer', 'transcriber': 'Transcriber', 'translator': 'Translator'}¶
core.problem_details module¶
core.s3 module¶
- class core.s3.MinIOUploader(integration, client_class=None)[source]¶
Bases:
S3Uploader
- NAME = 'MinIO'¶
- SETTINGS = [{'key': 'username', 'label': l'Access Key', 'description': '', 'type': None, 'required': False, 'default': None, 'options': None, 'category': None}, {'key': 'password', 'label': l'Secret Key', 'description': l'If the <em>Access Key</em> and <em>Secret Key</em> are not given here credentials will be used as outlined in the <a href="https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#configuring-credentials">Boto3 documenation</a>. If <em>Access Key</em> is given, <em>Secrent Key</em> must also be given.', 'type': None, 'required': False, 'default': None, 'options': None, 'category': None}, {'key': 'book_covers_bucket', 'label': l'Book Covers Bucket', 'description': l'All book cover images encountered will be mirrored to this S3 bucket. Large images will be scaled down, and the scaled-down copies will also be uploaded to this bucket. <p>The bucket must already exist—it will not be created automatically.</p>', 'type': None, 'required': False, 'default': None, 'options': None, 'category': None}, {'key': 'open_access_content_bucket', 'label': l'Open Access Content Bucket', 'description': l'All open-access books encountered will be uploaded to this S3 bucket. <p>The bucket must already exist—it will not be created automatically.</p>', 'type': None, 'required': False, 'default': None, 'options': None, 'category': None}, {'key': 'protected_content_bucket', 'label': l'Protected Access Content Bucket', 'description': l'Self-hosted books will be uploaded to this S3 bucket. <p>The bucket must already exist—it will not be created automatically.</p>', 'type': None, 'required': False, 'default': None, 'options': None, 'category': None}, {'key': 'marc_bucket', 'label': l'MARC File Bucket', 'description': l'All generated MARC files will be uploaded to this S3 bucket. <p>The bucket must already exist—it will not be created automatically.</p>', 'type': None, 'required': False, 'default': None, 'options': None, 'category': None}, {'key': 's3_region', 'label': l'S3 region', 'description': l'S3 region which will be used for storing the content.', 'type': 'select', 'required': False, 'default': 'us-east-1', 'options': [{'key': 'af-south-1', 'label': 'af-south-1'}, {'key': 'ap-east-1', 'label': 'ap-east-1'}, {'key': 'ap-northeast-1', 'label': 'ap-northeast-1'}, {'key': 'ap-northeast-2', 'label': 'ap-northeast-2'}, {'key': 'ap-northeast-3', 'label': 'ap-northeast-3'}, {'key': 'ap-south-1', 'label': 'ap-south-1'}, {'key': 'ap-southeast-1', 'label': 'ap-southeast-1'}, {'key': 'ap-southeast-2', 'label': 'ap-southeast-2'}, {'key': 'ap-southeast-3', 'label': 'ap-southeast-3'}, {'key': 'ca-central-1', 'label': 'ca-central-1'}, {'key': 'eu-central-1', 'label': 'eu-central-1'}, {'key': 'eu-north-1', 'label': 'eu-north-1'}, {'key': 'eu-south-1', 'label': 'eu-south-1'}, {'key': 'eu-west-1', 'label': 'eu-west-1'}, {'key': 'eu-west-2', 'label': 'eu-west-2'}, {'key': 'eu-west-3', 'label': 'eu-west-3'}, {'key': 'me-south-1', 'label': 'me-south-1'}, {'key': 'sa-east-1', 'label': 'sa-east-1'}, {'key': 'us-east-1', 'label': 'us-east-1'}, {'key': 'us-east-2', 'label': 'us-east-2'}, {'key': 'us-west-1', 'label': 'us-west-1'}, {'key': 'us-west-2', 'label': 'us-west-2'}], 'category': None}, {'key': 's3_addressing_style', 'label': l'S3 addressing style', 'description': l'Buckets created after September 30, 2020, will support only virtual hosted-style requests. Path-style requests will continue to be supported for buckets created on or before this date. For more information, see <a href="https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-rest-of-the-story/">Amazon S3 Path Deprecation Plan - The Rest of the Story</a>.', 'type': 'select', 'required': False, 'default': 'us-east-1', 'options': [{'key': 'virtual', 'label': l'Virtual'}, {'key': 'path', 'label': l'Path'}, {'key': 'auto', 'label': l'Auto'}], 'category': None}, {'key': 's3_presigned_url_expiration', 'label': l'S3 presigned URL expiration', 'description': l'Time in seconds for the presigned URL to remain valid', 'type': 'number', 'required': False, 'default': 3600, 'options': None, 'category': None}, {'key': 'bucket_name_transform', 'label': l'URL format', 'description': l'A file mirrored to S3 is available at <code>http://{bucket}.s3.{region}.amazonaws.com/{filename}</code>. If you've set up your DNS so that http://[bucket]/ or https://[bucket]/ points to the appropriate S3 bucket, you can configure this S3 integration to shorten the URLs. <p>If you haven't set up your S3 buckets, don't change this from the default -- you'll get URLs that don't work.</p>', 'type': 'select', 'required': False, 'default': 'identity', 'options': [{'key': 'identity', 'label': l'S3 Default: https://{bucket}.s3.{region}.amazonaws.com/{file}'}, {'key': 'https', 'label': l'HTTPS: https://{bucket}/{file}'}, {'key': 'http', 'label': l'HTTP: http://{bucket}/{file}'}], 'category': None}, {'key': 'ENDPOINT_URL', 'label': l'Endpoint URL', 'description': l'MinIO's endpoint URL', 'type': None, 'required': True, 'default': None, 'options': None, 'category': None, 'format': None}]¶
- class core.s3.MinIOUploaderConfiguration(configuration_storage, db)[source]¶
Bases:
ConfigurationGrouping
- ENDPOINT_URL = 'ENDPOINT_URL'¶
- endpoint_url¶
Contains configuration metadata
- class core.s3.MockS3Client(service, region_name, aws_access_key_id, aws_secret_access_key, config=None)[source]¶
Bases:
object
This pool lets us test the real S3Uploader class with a mocked-up boto3 client.
- class core.s3.MockS3Uploader(fail=False, *args, **kwargs)[source]¶
Bases:
S3Uploader
A dummy uploader for use in tests.
- buckets = {'book_covers_bucket': 'test-cover-bucket', 'marc_bucket': 'test-marc-bucket', 'open_access_content_bucket': 'test-content-bucket', 'protected_content_bucket': 'test-content-bucket'}¶
- mirror_one(representation, **kwargs)[source]¶
Mirror a single representation to the given URL.
- Parameters:
representation (Representation) – Book’s representation
mirror_to (string) – Mirror URL
collection (Optional[core.model.collection.Collection]) – Collection
- class core.s3.S3AddressingStyle(value)[source]¶
Bases:
Enum
Enumeration of different addressing styles supported by boto
- AUTO = 'auto'¶
- PATH = 'path'¶
- VIRTUAL = 'virtual'¶
- class core.s3.S3Uploader(integration, client_class=None, host='amazonaws.com')[source]¶
Bases:
MirrorUploader
- NAME = 'Amazon S3'¶
- S3_HOST = 'amazonaws.com'¶
- SETTINGS = [{'key': 'username', 'label': l'Access Key', 'description': '', 'type': None, 'required': False, 'default': None, 'options': None, 'category': None}, {'key': 'password', 'label': l'Secret Key', 'description': l'If the <em>Access Key</em> and <em>Secret Key</em> are not given here credentials will be used as outlined in the <a href="https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#configuring-credentials">Boto3 documenation</a>. If <em>Access Key</em> is given, <em>Secrent Key</em> must also be given.', 'type': None, 'required': False, 'default': None, 'options': None, 'category': None}, {'key': 'book_covers_bucket', 'label': l'Book Covers Bucket', 'description': l'All book cover images encountered will be mirrored to this S3 bucket. Large images will be scaled down, and the scaled-down copies will also be uploaded to this bucket. <p>The bucket must already exist—it will not be created automatically.</p>', 'type': None, 'required': False, 'default': None, 'options': None, 'category': None}, {'key': 'open_access_content_bucket', 'label': l'Open Access Content Bucket', 'description': l'All open-access books encountered will be uploaded to this S3 bucket. <p>The bucket must already exist—it will not be created automatically.</p>', 'type': None, 'required': False, 'default': None, 'options': None, 'category': None}, {'key': 'protected_content_bucket', 'label': l'Protected Access Content Bucket', 'description': l'Self-hosted books will be uploaded to this S3 bucket. <p>The bucket must already exist—it will not be created automatically.</p>', 'type': None, 'required': False, 'default': None, 'options': None, 'category': None}, {'key': 'marc_bucket', 'label': l'MARC File Bucket', 'description': l'All generated MARC files will be uploaded to this S3 bucket. <p>The bucket must already exist—it will not be created automatically.</p>', 'type': None, 'required': False, 'default': None, 'options': None, 'category': None}, {'key': 's3_region', 'label': l'S3 region', 'description': l'S3 region which will be used for storing the content.', 'type': 'select', 'required': False, 'default': 'us-east-1', 'options': [{'key': 'af-south-1', 'label': 'af-south-1'}, {'key': 'ap-east-1', 'label': 'ap-east-1'}, {'key': 'ap-northeast-1', 'label': 'ap-northeast-1'}, {'key': 'ap-northeast-2', 'label': 'ap-northeast-2'}, {'key': 'ap-northeast-3', 'label': 'ap-northeast-3'}, {'key': 'ap-south-1', 'label': 'ap-south-1'}, {'key': 'ap-southeast-1', 'label': 'ap-southeast-1'}, {'key': 'ap-southeast-2', 'label': 'ap-southeast-2'}, {'key': 'ap-southeast-3', 'label': 'ap-southeast-3'}, {'key': 'ca-central-1', 'label': 'ca-central-1'}, {'key': 'eu-central-1', 'label': 'eu-central-1'}, {'key': 'eu-north-1', 'label': 'eu-north-1'}, {'key': 'eu-south-1', 'label': 'eu-south-1'}, {'key': 'eu-west-1', 'label': 'eu-west-1'}, {'key': 'eu-west-2', 'label': 'eu-west-2'}, {'key': 'eu-west-3', 'label': 'eu-west-3'}, {'key': 'me-south-1', 'label': 'me-south-1'}, {'key': 'sa-east-1', 'label': 'sa-east-1'}, {'key': 'us-east-1', 'label': 'us-east-1'}, {'key': 'us-east-2', 'label': 'us-east-2'}, {'key': 'us-west-1', 'label': 'us-west-1'}, {'key': 'us-west-2', 'label': 'us-west-2'}], 'category': None}, {'key': 's3_addressing_style', 'label': l'S3 addressing style', 'description': l'Buckets created after September 30, 2020, will support only virtual hosted-style requests. Path-style requests will continue to be supported for buckets created on or before this date. For more information, see <a href="https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-rest-of-the-story/">Amazon S3 Path Deprecation Plan - The Rest of the Story</a>.', 'type': 'select', 'required': False, 'default': 'us-east-1', 'options': [{'key': 'virtual', 'label': l'Virtual'}, {'key': 'path', 'label': l'Path'}, {'key': 'auto', 'label': l'Auto'}], 'category': None}, {'key': 's3_presigned_url_expiration', 'label': l'S3 presigned URL expiration', 'description': l'Time in seconds for the presigned URL to remain valid', 'type': 'number', 'required': False, 'default': 3600, 'options': None, 'category': None}, {'key': 'bucket_name_transform', 'label': l'URL format', 'description': l'A file mirrored to S3 is available at <code>http://{bucket}.s3.{region}.amazonaws.com/{filename}</code>. If you've set up your DNS so that http://[bucket]/ or https://[bucket]/ points to the appropriate S3 bucket, you can configure this S3 integration to shorten the URLs. <p>If you haven't set up your S3 buckets, don't change this from the default -- you'll get URLs that don't work.</p>', 'type': 'select', 'required': False, 'default': 'identity', 'options': [{'key': 'identity', 'label': l'S3 Default: https://{bucket}.s3.{region}.amazonaws.com/{file}'}, {'key': 'https', 'label': l'HTTPS: https://{bucket}/{file}'}, {'key': 'http', 'label': l'HTTP: http://{bucket}/{file}'}], 'category': None}]¶
- SITEWIDE = True¶
- book_url(identifier, extension='.epub', open_access=True, data_source=None, title=None)[source]¶
The path to the hosted EPUB file for the given identifier.
- cover_image_root(bucket, data_source, scaled_size=None)[source]¶
The root URL to the S3 location of cover images for the given data source.
- cover_image_url(data_source, identifier, filename, scaled_size=None)[source]¶
The path to the hosted cover image for the given identifier.
- final_mirror_url(bucket, key)[source]¶
Determine the URL to pass into Representation.set_as_mirrored, assuming that it was successfully uploaded to the given bucket as key.
Depending on ExternalIntegration configuration this may be any of the following:
https://{bucket}.s3.{region}.amazonaws.com/{key} http://{bucket}/{key} https://{bucket}/{key}
- classmethod key_join(key, encode=True)[source]¶
Quote the path portions of an S3 key while leaving the path characters themselves alone.
- Parameters:
key – Either a key, or a list of parts to be assembled into a key.
- Returns:
A string that can be used as an S3 key.
- marc_file_url(library, lane, end_time, start_time=None)[source]¶
The path to the hosted MARC file for the given library, lane, and date range.
- mirror_one(representation, mirror_to, collection=None)[source]¶
Mirror a single representation to the given URL.
- Parameters:
representation (Representation) – Book’s representation
mirror_to (string) – Mirror URL
collection (Optional[core.model.collection.Collection]) – Collection
- multipart_upload(representation, mirror_to, upload_class=<class 'core.s3.MultipartS3Upload'>)[source]¶
- sign_url(url, expiration=None)[source]¶
Signs a URL and make it expirable
- Parameters:
url (string) – URL
expiration (int) – (Optional) Time in seconds for the presigned URL to remain valid. If it’s empty, S3_PRESIGNED_URL_EXPIRATION configuration setting is used
- Returns:
Signed expirable link
- Return type:
string
- class core.s3.S3UploaderConfiguration(configuration_storage, db)[source]¶
Bases:
ConfigurationGrouping
- BOOK_COVERS_BUCKET_KEY = 'book_covers_bucket'¶
- MARC_BUCKET_KEY = 'marc_bucket'¶
- OA_CONTENT_BUCKET_KEY = 'open_access_content_bucket'¶
- PROTECTED_CONTENT_BUCKET_KEY = 'protected_content_bucket'¶
- S3_ADDRESSING_STYLE = 's3_addressing_style'¶
- S3_DEFAULT_ADDRESSING_STYLE = 'virtual'¶
- S3_DEFAULT_PRESIGNED_URL_EXPIRATION = 3600¶
- S3_DEFAULT_REGION = 'us-east-1'¶
- S3_PRESIGNED_URL_EXPIRATION = 's3_presigned_url_expiration'¶
- S3_REGION = 's3_region'¶
- URL_TEMPLATES_BY_TEMPLATE = {'http': 'http://%(bucket)s/%(key)s', 'https': 'https://%(bucket)s/%(key)s', 'identity': 'https://%(bucket)s.s3.%(region)s/%(key)s'}¶
- URL_TEMPLATE_DEFAULT = 'identity'¶
- URL_TEMPLATE_HTTP = 'http'¶
- URL_TEMPLATE_HTTPS = 'https'¶
- URL_TEMPLATE_KEY = 'bucket_name_transform'¶
- access_key¶
Contains configuration metadata
- book_covers_bucket¶
Contains configuration metadata
- marc_file_bucket¶
Contains configuration metadata
- open_access_content_bucket¶
Contains configuration metadata
- protected_access_content_bucket¶
Contains configuration metadata
- s3_addressing_style¶
Contains configuration metadata
- s3_presigned_url_expiration¶
Contains configuration metadata
- s3_region¶
Contains configuration metadata
- secret_key¶
Contains configuration metadata
- url_template¶
Contains configuration metadata
core.scripts module¶
- class core.scripts.AddClassificationScript(_db=None, cmd_args=None, stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'>)[source]¶
Bases:
IdentifierInputScript
- name = 'Add a classification to an identifier'¶
- class core.scripts.CheckContributorNamesInDB(_db=None, cmd_args=None, stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'>)[source]¶
Bases:
IdentifierInputScript
Checks that contributor sort_names are display_names in “last name, comma, other names” format.
Read contributors edition by edition, so that can, if necessary, restrict db query by passed-in identifiers, and so can find associated license pools to register author complaints to.
NOTE: There’s also CheckContributorNamesOnWeb in metadata, it’s a child of this script. Use it to check our knowledge against viaf, with the newer better sort_name selection and formatting.
TODO: make sure don’t start at beginning again when interrupt while batch job is running.
- COMPLAINT_SOURCE = 'CheckContributorNamesInDB'¶
- COMPLAINT_TYPE = 'http://librarysimplified.org/terms/problem/wrong-author'¶
- process_local_mismatch(_db, contribution, computed_sort_name, error_message_detail, log=None)[source]¶
Determines if a problem is to be investigated further or recorded as a Complaint, to be solved by a human. In this class, it’s always a complaint. In the overridden method in the child class in metadata_wrangler code, we sometimes go do a web query.
- class core.scripts.CollectionArgumentsScript(_db=None)[source]¶
Bases:
CollectionInputScript
- class core.scripts.CollectionInputScript(_db=None)[source]¶
Bases:
Script
A script that takes collection names as command line inputs.
- class core.scripts.CollectionType(value)[source]¶
Bases:
Enum
An enumeration.
- LCP = 'LCP'¶
- OPEN_ACCESS = 'OPEN_ACCESS'¶
- PROTECTED_ACCESS = 'PROTECTED_ACCESS'¶
- class core.scripts.ConfigurationSettingScript(_db=None)[source]¶
Bases:
Script
- class core.scripts.ConfigureCollectionScript(_db=None)[source]¶
Bases:
ConfigurationSettingScript
Create a collection or change its settings.
- do_run(_db=None, cmd_args=None, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
- name = "Change a collection's settings"¶
- class core.scripts.ConfigureIntegrationScript(_db=None)[source]¶
Bases:
ConfigurationSettingScript
Create a integration or change its settings.
- do_run(_db=None, cmd_args=None, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
- name = "Create a site-wide integration or change an integration's settings"¶
- class core.scripts.ConfigureLaneScript(_db=None)[source]¶
Bases:
ConfigurationSettingScript
Create a lane or change its settings.
- do_run(_db=None, cmd_args=None, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
- name = "Change a lane's settings"¶
- class core.scripts.ConfigureLibraryScript(_db=None)[source]¶
Bases:
ConfigurationSettingScript
Create a library or change its settings.
- do_run(_db=None, cmd_args=None, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
- name = "Change a library's settings"¶
- class core.scripts.ConfigureSiteScript(_db=None, config=<class 'core.config.Configuration'>)[source]¶
Bases:
ConfigurationSettingScript
View or update site-wide configuration.
- class core.scripts.CustomListManagementScript(manager_class, data_source_name, list_identifier, list_name, primary_language, description, **manager_kwargs)[source]¶
Bases:
Script
Maintain a CustomList whose membership is determined by a MembershipManager.
- class core.scripts.CustomListSweeperScript(_db=None)[source]¶
Bases:
LibraryInputScript
Do something to each custom list in a library.
- class core.scripts.DatabaseMigrationInitializationScript(*args, **kwargs)[source]¶
Bases:
DatabaseMigrationScript
Creates a timestamp to kickoff the regular use of DatabaseMigrationScript to manage migrations.
- class core.scripts.DatabaseMigrationScript(*args, **kwargs)[source]¶
Bases:
Script
Runs new migrations.
This script needs to execute without ever loading an ORM object, because the database might be in a state that’s not compatible with the current ORM version.
This is not a TimestampScript because it keeps separate Timestamps for the Python and the SQL migrations, and because Timestamps are ORM objects, which this script can’t touch.
- DO_NOT_EXECUTE = 'SIMPLYE_MIGRATION_DO_NOT_EXECUTE'¶
- MIGRATION_WITH_COUNTER = re.compile('\\d{8}-(\\d+)-(.)+\\.(py|sql)')¶
- PY_TIMESTAMP_SERVICE_NAME = 'Database Migration - Python'¶
- SERVICE_NAME = 'Database Migration'¶
- TRANSACTIONLESS_COMMANDS = ['alter type']¶
- TRANSACTION_PER_STATEMENT = 'SIMPLYE_MIGRATION_TRANSACTION_PER_STATEMENT'¶
- class TimestampInfo(service, finish, counter=None)[source]¶
Bases:
object
Act like a ORM Timestamp object, but with no database connection.
- property directories_by_priority¶
Returns a list containing the migration directory path for core and its container server, organized in priority order (core first)
- fetch_migration_files()[source]¶
Pulls migration files from the expected locations
- Returns:
a tuple with a list of migration filenames and a dictionary of those files separated by their absolute directory location.
- get_new_migrations(timestamp, migrations)[source]¶
Return a list of migration filenames, representing migrations created since the timestamp
- classmethod migratable_files(filelist, extensions)[source]¶
Filter a list of files for migratable file extensions
- property name¶
Returns the appropriate target Timestamp service name for the timestamp, depending on the script parameters.
- property overall_timestamp¶
Returns a TimestampInfo object corresponding to the the overall or general “Database Migration” service.
If there is no Timestamp or the Timestamp doesn’t have a timestamp attribute, it returns None.
- property python_timestamp¶
Returns a TimestampInfo object corresponding to the python migration- specific “Database Migration - Python” Timestamp.
If there is no Timestamp or the Timestamp hasn’t been initialized with a timestamp attribute, it returns None.
- run_migrations(migrations, migrations_by_dir, timestamp)[source]¶
Run each migration, first by timestamp and then by directory priority.
- class core.scripts.DatabaseVacuum[source]¶
Bases:
Script
Script to vacuum all database tables
- Args:
Script (_type_): _description_
- do_run(subcommand='')[source]¶
Run the database vacuum
- Args:
- subcommand (str, optional):
Can be any of these FULL [ boolean ] FREEZE [ boolean ] VERBOSE [ boolean ] ANALYZE [ boolean ] DISABLE_PAGE_SKIPPING [ boolean ] SKIP_LOCKED [ boolean ] INDEX_CLEANUP { AUTO | ON | OFF } PROCESS_TOAST [ boolean ] TRUNCATE [ boolean ]
- class core.scripts.Explain(_db=None)[source]¶
Bases:
IdentifierInputScript
Explain everything known about a given work.
- METADATA_URL_TEMPLATE = 'http://metadata.librarysimplified.org/lookup?urn=%s'¶
- TIME_FORMAT = '%Y-%m-%d %H:%M'¶
- do_run(cmd_args=None, stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'>, stdout=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
- name = 'Explain everything known about a given work'¶
- class core.scripts.IdentifierInputScript(_db=None)[source]¶
Bases:
InputScript
A script that takes identifiers as command line inputs.
- DATABASE_ID = 'Database ID'¶
- classmethod look_up_identifiers(_db, parsed, stdin_identifier_strings, *args, **kwargs)[source]¶
Turn identifiers as specified on the command line into real database Identifier objects.
- classmethod parse_command_line(_db=None, cmd_args=None, stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'>, *args, **kwargs)[source]¶
- classmethod parse_identifier_list(_db, identifier_type, data_source, arguments, autocreate=False)[source]¶
Turn a list of identifiers into a list of Identifier objects.
The list of arguments is probably derived from a command-line parser such as the one defined in IdentifierInputScript.arg_parser().
This makes it easy to identify specific identifiers on the command line. Examples:
1 2
a b c
- class core.scripts.LaneSweeperScript(_db=None)[source]¶
Bases:
LibraryInputScript
Do something to each lane in a library.
- class core.scripts.LibraryInputScript(_db=None)[source]¶
Bases:
InputScript
A script that operates on one or more Libraries.
- classmethod look_up_libraries(_db, parsed, *args, **kwargs)[source]¶
Turn library names as specified on the command line into real Library objects.
- class core.scripts.ListCollectionMetadataIdentifiersScript(_db=None, output=None)[source]¶
Bases:
CollectionInputScript
List the metadata identifiers for Collections in the database.
This script is helpful for accounting for and tracking collections on the metadata wrangler.
- class core.scripts.MirrorResourcesScript(_db=None)[source]¶
Bases:
CollectionInputScript
Make sure that all mirrorable resources in a collection have in fact been mirrored.
- MIRROR_UTILITY = <core.metadata_layer.MetaToModelUtility object>¶
- collections_with_uploader(collections, collection_type=CollectionType.OPEN_ACCESS)[source]¶
Filter out collections that have no MirrorUploader.
- Yield:
2-tuples (Collection, ReplacementPolicy). The ReplacementPolicy is the appropriate one for this script to use for that Collection.
- classmethod derive_rights_status(license_pool, resource)[source]¶
Make a best guess about the rights status for the given resource.
This relies on the information having been available at one point, but having been stored in the database at a slight remove.
- process_collection(collection, policy, unmirrored=None)[source]¶
Make sure every mirrorable resource in this collection has been mirrored.
- Parameters:
unmirrored – A replacement for Hyperlink.unmirrored, for use in tests.
- class core.scripts.MockStdin(*lines)[source]¶
Bases:
object
Mock a list of identifiers passed in on standard input.
- class core.scripts.OPDSImportScript(_db=None, importer_class=None, monitor_class=None, protocol=None, *args, **kwargs)[source]¶
Bases:
CollectionInputScript
Import all books from the OPDS feed associated with a collection.
- IMPORTER_CLASS¶
alias of
OPDSImporter
- MONITOR_CLASS¶
alias of
OPDSImportMonitor
- PROTOCOL = 'OPDS Import'¶
- name = 'Import all books from the OPDS feed associated with a collection.'¶
- class core.scripts.PatronInputScript(_db=None)[source]¶
Bases:
LibraryInputScript
A script that operates on one or more Patrons.
- classmethod look_up_patrons(_db, parsed, stdin_patron_strings, *args, **kwargs)[source]¶
Turn patron identifiers as specified on the command line into real Patron objects.
- classmethod parse_command_line(_db=None, cmd_args=None, stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'>, *args, **kwargs)[source]¶
- class core.scripts.RebuildSearchIndexScript(*args, **kwargs)[source]¶
Bases:
RunWorkCoverageProviderScript
,RemovesSearchCoverage
Completely delete the search index and recreate it.
- class core.scripts.ReclassifyWorksForUncheckedSubjectsScript(_db=None)[source]¶
Bases:
WorkClassificationScript
Reclassify all Works whose current classifications appear to depend on Subjects in the ‘unchecked’ state.
This generally means that some migration script reset those Subjects because the rules for processing them changed.
- batch_size = 100¶
- name = 'Reclassify works that use unchecked subjects.'¶
- policy = <core.model.PresentationCalculationPolicy object>¶
- class core.scripts.RemovesSearchCoverage[source]¶
Bases:
object
Mix-in class for a script that might remove all coverage records for the search engine.
- class core.scripts.RunCollectionCoverageProviderScript(provider_class, _db=None, providers=None, **kwargs)[source]¶
Bases:
RunCoverageProvidersScript
Run the same CoverageProvider code for all Collections that get their licenses from the appropriate place.
- class core.scripts.RunCollectionMonitorScript(monitor_class, _db=None, cmd_args=None, **kwargs)[source]¶
Bases:
RunMultipleMonitorsScript
,CollectionArgumentsScript
Run a CollectionMonitor on every Collection that comes through a certain protocol.
- class core.scripts.RunCoverageProviderScript(provider, _db=None, cmd_args=None, *provider_args, **provider_kwargs)[source]¶
Bases:
IdentifierInputScript
Run a single coverage provider.
- class core.scripts.RunCoverageProvidersScript(providers, _db=None)[source]¶
Bases:
Script
Alternate between multiple coverage providers.
- class core.scripts.RunMultipleMonitorsScript(_db=None, **kwargs)[source]¶
Bases:
Script
Run a number of monitors in sequence.
Currently the Monitors are run one at a time. It should be possible to take a command-line argument that runs all the Monitors in batches, each in its own thread. Unfortunately, it’s tough to know in a given situation that this won’t overload the system.
- class core.scripts.RunReaperMonitorsScript(_db=None, **kwargs)[source]¶
Bases:
RunMultipleMonitorsScript
Run all the monitors found in ReaperMonitor.REGISTRY
- monitors(**kwargs)[source]¶
Find all the Monitors that need to be run.
- Returns:
A list of Monitor objects.
- name = 'Run all reaper monitors'¶
- class core.scripts.RunThreadedCollectionCoverageProviderScript(provider_class, worker_size=None, _db=None, **provider_kwargs)[source]¶
Bases:
Script
Run coverage providers in multiple threads.
- DEFAULT_WORKER_SIZE = 5¶
- class core.scripts.RunWorkCoverageProviderScript(provider_class, _db=None, providers=None, **kwargs)[source]¶
Bases:
RunCollectionCoverageProviderScript
Run a WorkCoverageProvider on every relevant Work in the system.
- class core.scripts.Script(_db=None)[source]¶
Bases:
object
- property data_directory¶
- property log¶
- property script_name¶
Find or guess the name of the script.
This is either the .name of the Script object or the name of the class.
- update_timestamp(timestamp_data, start_time, exception)[source]¶
By default scripts have no timestamp of their own.
Most scripts either work through Monitors or CoverageProviders, which have their own logic for creating timestamps, or they are designed to be run interactively from the command-line, so facts about when they last ran are not relevant.
- Parameters:
start_time – The time the script started running.
exception – A stack trace for the exception, if any, that stopped the script from running.
- class core.scripts.SearchIndexCoverageRemover(*args, **kwargs)[source]¶
Bases:
TimestampScript
,RemovesSearchCoverage
Script that removes search index coverage for all works.
This guarantees the SearchIndexCoverageProvider will add fresh coverage for every Work the next time it runs.
- class core.scripts.ShowCollectionsScript(_db=None)[source]¶
Bases:
Script
Show information about the collections on a server.
- do_run(_db=None, cmd_args=None, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
- name = 'List the collections on this server.'¶
- class core.scripts.ShowIntegrationsScript(_db=None)[source]¶
Bases:
Script
Show information about the external integrations on a server.
- do_run(_db=None, cmd_args=None, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
- name = 'List the external integrations on this server.'¶
- class core.scripts.ShowLanesScript(_db=None)[source]¶
Bases:
Script
Show information about the lanes on a server.
- do_run(_db=None, cmd_args=None, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
- name = 'List the lanes on this server.'¶
- class core.scripts.ShowLibrariesScript(_db=None)[source]¶
Bases:
Script
Show information about the libraries on a server.
- do_run(_db=None, cmd_args=None, output=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
- name = 'List the libraries on this server.'¶
- class core.scripts.SubjectInputScript(_db=None)[source]¶
Bases:
Script
A script whose command line filters the set of Subjects.
- Returns:
a 2-tuple (subject type, subject filter) that can be passed into the SubjectSweepMonitor constructor.
- class core.scripts.TimestampScript(*args, **kwargs)[source]¶
Bases:
Script
A script that automatically records a timestamp whenever it runs.
- update_timestamp(timestamp_data, start, exception)[source]¶
Update the appropriate Timestamp for this script.
- Parameters:
timestamp_data – A TimestampData representing what the script itself thinks its timestamp should look like. Data will be filled in where it is missing, but it will not be modified if present.
start – The time at which this script believes the service started running. The script itself may change this value for its own purposes.
exception – The exception with which this script believes the service stopped running. The script itself may change this value for its own purposes.
- class core.scripts.UpdateCustomListSizeScript(_db=None)[source]¶
Bases:
CustomListSweeperScript
- class core.scripts.UpdateLaneSizeScript(_db=None)[source]¶
Bases:
LaneSweeperScript
- class core.scripts.WhereAreMyBooksScript(_db=None, output=None, search=None)[source]¶
Bases:
CollectionInputScript
Try to figure out why Works aren’t showing up.
This is a common problem on a new installation or when a new collection is being configured.
- class core.scripts.WorkClassificationScript(*args, **kwargs)[source]¶
Bases:
WorkPresentationScript
Recalculate the classification–and nothing else–for Work objects.
- name = 'Recalculate the classification for works that need it.'¶
- policy = <core.model.PresentationCalculationPolicy object>¶
- class core.scripts.WorkConsolidationScript(force=False, batch_size=10, _db=None, cmd_args=None, stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'>)[source]¶
Bases:
WorkProcessingScript
Given an Identifier, make sure all the LicensePools for that Identifier are in Works that follow these rules:
a) For a given permanent work ID, there may be at most one Work containing open-access LicensePools.
Each non-open-access LicensePool has its own individual Work.
- name = 'Work consolidation script'¶
- class core.scripts.WorkOPDSScript(*args, **kwargs)[source]¶
Bases:
WorkPresentationScript
Recalculate the OPDS entries, MARC record, and search index entries for Work objects.
This is intended to verify that a problem has already been resolved and just needs to be propagated to these three ‘caches’.
- name = 'Recalculate OPDS entries, MARC record, and search index entries for works that need it.'¶
- policy = <core.model.PresentationCalculationPolicy object>¶
- class core.scripts.WorkPresentationScript(*args, **kwargs)[source]¶
Bases:
TimestampScript
,WorkProcessingScript
Calculate the presentation for Work objects.
- name = 'Recalculate the presentation for works that need it.'¶
- policy = <core.model.PresentationCalculationPolicy object>¶
- class core.scripts.WorkProcessingScript(force=False, batch_size=10, _db=None, cmd_args=None, stdin=<_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'>)[source]¶
Bases:
IdentifierInputScript
- name = 'Work processing script'¶
core.selftest module¶
Define the interfaces used by ExternalIntegration self-tests.
- class core.selftest.HasSelfTests[source]¶
Bases:
object
An object capable of verifying its own setup by running a series of self-tests.
- SELF_TEST_RESULTS_SETTING = 'self_test_results'¶
- external_integration(_db)[source]¶
Locate the ExternalIntegration associated with this object. The status of the self-tests will be stored as a ConfigurationSetting on this ExternalIntegration.
By default, there is no way to get from an object to its ExternalIntegration, and self-test status will not be stored.
- classmethod prior_test_results(_db, constructor_method=None, *args, **kwargs)[source]¶
Retrieve the last set of test results from the database.
The arguments here are the same as the arguments to run_self_tests.
- classmethod run_self_tests(_db, constructor_method=None, *args, **kwargs)[source]¶
Instantiate this class and call _run_self_tests on it.
- Parameters:
_db – A database connection. Will be passed into _run_self_tests. This connection may need to be used again in args, if the constructor needs it.
constructor_method – Method to use to instantiate the class, if different from the default constructor.
args – Positional arguments to pass into the constructor.
kwargs – Keyword arguments to pass into the constructor.
- Returns:
A 2-tuple (results_dict, results_list) results_dict is a JSON-serializable dictionary describing the results of the self-test. results_list is a list of SelfTestResult objects.
- run_test(name, method, *args, **kwargs)[source]¶
Run a test method, record any exception that happens, and keep track of how long the test takes to run.
- Parameters:
name – The name of the test to be run.
method – A method to call to run the test.
args – Positional arguments to method.
kwargs – Keyword arguments to method.
- Returns:
A filled-in SelfTestResult.
- class core.selftest.SelfTestResult(name)[source]¶
Bases:
object
The result of running a single self-test.
HasSelfTest.run_self_tests() returns a list of these
- property debug_message¶
The debug message associated with the Exception, if any.
- property duration¶
How long the test took to run.
- property to_dict¶
Convert this SelfTestResult to a dictionary for use in JSON serialization.
core.testing module¶
- class core.testing.AlwaysSuccessfulBibliographicCoverageProvider(collection, **kwargs)[source]¶
Bases:
MockCoverageProvider
,BibliographicCoverageProvider
A BibliographicCoverageProvider that does nothing and is always successful.
Note that this only works if you’ve put a working Edition and LicensePool in place beforehand. Otherwise the process will fail during handle_success().
- SERVICE_NAME = 'Always successful (bibliographic)'¶
- class core.testing.AlwaysSuccessfulCollectionCoverageProvider(collection, **kwargs)[source]¶
Bases:
MockCoverageProvider
,CollectionCoverageProvider
A CollectionCoverageProvider that does nothing and always succeeds.
- SERVICE_NAME = 'Always successful (collection)'¶
- class core.testing.AlwaysSuccessfulCoverageProvider(*args, **kwargs)[source]¶
Bases:
InstrumentedCoverageProvider
A CoverageProvider that does nothing and always succeeds.
- SERVICE_NAME = 'Always successful'¶
- class core.testing.AlwaysSuccessfulWorkCoverageProvider(_db, *args, **kwargs)[source]¶
Bases:
InstrumentedWorkCoverageProvider
A WorkCoverageProvider that does nothing and always succeeds.
- SERVICE_NAME = 'Always successful (works)'¶
- class core.testing.BrokenBibliographicCoverageProvider(*args, **kwargs)[source]¶
Bases:
BrokenCoverageProvider
,BibliographicCoverageProvider
- SERVICE_NAME = 'Broken (bibliographic)'¶
- class core.testing.BrokenCoverageProvider(*args, **kwargs)[source]¶
Bases:
InstrumentedCoverageProvider
- SERVICE_NAME = 'Broken'¶
- class core.testing.DatabaseTest[source]¶
Bases:
object
- connection = None¶
- engine = None¶
- classmethod make_default_library(_db)[source]¶
Ensure that the default library exists in the given database.
This can be called by code intended for use in testing but not actually within a DatabaseTest subclass.
- classmethod print_database_class(db_connection)[source]¶
Prints to the console the entire contents of the database, as the unit test sees it. Exists because unit tests don’t persist db information, they create a memory representation of the db state, and then roll the unit test-derived transactions back. So we cannot see what’s going on by going into postgres and running selects. This is the in-test alternative to going into postgres.
Can be called from model and metadata classes as well as tests.
NOTE: The purpose of this method is for debugging. Be careful of leaving it in code and potentially outputting vast tracts of data into your output stream on production.
Call like this:
set_trace() from testing import (l= DatabaseTest, ) _db = Session.object_session(self) DatabaseTest.print_database_class(_db) TODO: remove before prod
- print_database_instance()[source]¶
Calls the class method that examines the current state of the database model (whether it’s been committed or not).
NOTE: If you set_trace, and hit “continue”, you’ll start seeing console output right away, without waiting for the whole test to run and the standard output section to display. You can also use nosetest –nocapture.
I use:
def test_name(self): [code...] set_trace() self.print_database_instance() # TODO: remove before prod [code...]
- class core.testing.DummyHTTPClient[source]¶
Bases:
object
- class core.testing.EndToEndSearchTest[source]¶
Bases:
ExternalSearchTest
Subclasses of this class set up real works in a real search index and run searches against it.
- class core.testing.ExternalSearchTest[source]¶
Bases:
DatabaseTest
These tests require elasticsearch to be running locally. If it’s not, or there’s an error creating the index, the tests will pass without doing anything.
Tests for elasticsearch are useful for ensuring that we haven’t accidentally broken a type of search by changing analyzers or queries, but search needs to be tested manually to ensure that it works well overall, with a realistic index.
- SIMPLIFIED_TEST_ELASTICSEARCH = 'http://localhost:9200'¶
- default_work(*args, **kwargs)[source]¶
Convenience method to create a work with a license pool in the default collection.
- pytestmark = [Mark(name='elasticsearch', args=(), kwargs={})]¶
- class core.testing.InstrumentedCoverageProvider(*args, **kwargs)[source]¶
Bases:
MockCoverageProvider
,IdentifierCoverageProvider
A CoverageProvider that keeps track of every item it tried to cover.
- class core.testing.InstrumentedWorkCoverageProvider(_db, *args, **kwargs)[source]¶
Bases:
MockCoverageProvider
,WorkCoverageProvider
A WorkCoverageProvider that keeps track of every item it tried to cover.
- class core.testing.LogCaptureHandler(logger, *args, **kwargs)[source]¶
Bases:
Handler
A logging.Handler context manager that captures the messages of emitted log records in the context of the specified logger.
- LEVEL_NAMES = ['critical', 'error', 'warning', 'info', 'debug', 'notset']¶
- class core.testing.MockCoverageProvider[source]¶
Bases:
object
Mixin class for mock CoverageProviders that defines common constants.
- DATA_SOURCE_NAME = 'Gutenberg'¶
- INPUT_IDENTIFIER_TYPES = None¶
- PROTOCOL = 'OPDS Import'¶
- SERVICE_NAME = 'Generic mock CoverageProvider'¶
- class core.testing.MockRequestsRequest(url, method='GET', headers=None)[source]¶
Bases:
object
A mock object that simulates an HTTP request from the requests library.
- class core.testing.MockRequestsResponse(status_code, headers={}, content=None, url=None, request=None)[source]¶
Bases:
object
A mock object that simulates an HTTP response from the requests library.
- raise_for_status()[source]¶
Null implementation of raise_for_status, a method implemented by real requests Response objects.
- property text¶
- class core.testing.NeverSuccessfulBibliographicCoverageProvider(collection, **kwargs)[source]¶
Bases:
MockCoverageProvider
,BibliographicCoverageProvider
Simulates a BibliographicCoverageProvider that’s never successful.
- SERVICE_NAME = 'Never successful (bibliographic)'¶
- class core.testing.NeverSuccessfulCoverageProvider(*args, **kwargs)[source]¶
Bases:
InstrumentedCoverageProvider
A CoverageProvider that does nothing and always fails.
- SERVICE_NAME = 'Never successful'¶
- class core.testing.NeverSuccessfulWorkCoverageProvider(_db, *args, **kwargs)[source]¶
Bases:
InstrumentedWorkCoverageProvider
- SERVICE_NAME = 'Never successful (works)'¶
- class core.testing.SearchClientForTesting(_db, url=None, works_index=None, test_search_term=None, in_testing=False, mapping=None)[source]¶
Bases:
ExternalSearchIndex
When creating an index, limit it to a single shard and disable replicas.
This makes search results more predictable.
- class core.testing.TaskIgnoringCoverageProvider(*args, **kwargs)[source]¶
Bases:
InstrumentedCoverageProvider
A coverage provider that ignores all work given to it.
- SERVICE_NAME = 'I ignore all work.'¶
- class core.testing.TransientFailureCoverageProvider(*args, **kwargs)[source]¶
Bases:
InstrumentedCoverageProvider
- SERVICE_NAME = 'Never successful (transient)'¶
- class core.testing.TransientFailureWorkCoverageProvider(_db, *args, **kwargs)[source]¶
Bases:
InstrumentedWorkCoverageProvider
- SERVICE_NAME = 'Never successful (transient, works)'¶
core.user_profile module¶
- class core.user_profile.MockProfileStorage(read_only_settings=None, writable_settings=None)[source]¶
Bases:
ProfileStorage
A profile storage object for use in tests.
Keeps information in in-memory dictionaries rather than in a database.
- property profile_document¶
Create a Profile document representing the current state of the user’s profile.
- Returns:
A dictionary that can be serialized as JSON.
- update(new_values, profile_document)[source]¶
(Try to) change the user’s profile so it looks like the provided Profile document.
- property writable_setting_names¶
Return the subset of fields that are considered writable.
- class core.user_profile.ProfileController(storage)[source]¶
Bases:
object
Implement the User Profile Management Protocol.
https://github.com/NYPL-Simplified/Simplified/wiki/User-Profile-Management-Protocol
- LINK_RELATION = 'http://librarysimplified.org/terms/rel/user-profile'¶
- MEDIA_TYPE = 'vnd.librarysimplified/user-profile+json'¶
- class core.user_profile.ProfileStorage[source]¶
Bases:
object
An abstract class defining a specific user’s profile.
Subclasses should get profile information from somewhere specific, e.g. a database row.
An instance of this class is responsible for one specific user’s profile, not the set of all profiles.
- AUTHORIZATION_EXPIRES = 'simplified:authorization_expires'¶
- AUTHORIZATION_IDENTIFIER = 'simplified:authorization_identifier'¶
- FINES = 'simplified:fines'¶
- NS = 'simplified:'¶
- SETTINGS_KEY = 'settings'¶
- SYNCHRONIZE_ANNOTATIONS = 'simplified:synchronize_annotations'¶
- property profile_document¶
Create a Profile document representing the current state of the user’s profile.
- Returns:
A dictionary that can be serialized as JSON.
- update(new_values, profile_document)[source]¶
(Try to) change the user’s profile so it looks like the provided Profile document.
- Parameters:
new_values – A dictionary of settings that the client wants to change.
profile_document – The full Profile document as provided by the client. Should not be necessary, but provided in case it’s useful.
- Raises:
Exception – If there’s a problem making the user’s profile look like the provided Profile document.
- property writable_setting_names¶
Return the subset of settings that are considered writable.
An attempt to modify a setting that’s not in this list will fail before update() is called.
- Returns:
An iterable.