Data structures for mapping legacy JobCalculation data to new process attributes.
aiida.backends.general.migrations.calc_state.
StateMapping
Bases: tuple
tuple
__getnewargs__
Return self as a plain tuple. Used by copy and pickle.
__module__
__new__
Create new instance of StateMapping(state, process_state, exit_status, process_status)
__repr__
Return a nicely formatted representation string
__slots__
_asdict
Return a new dict which maps field names to their values.
_field_defaults
_fields
_fields_defaults
_make
Make a new StateMapping object from a sequence or iterable
_replace
Return a new StateMapping object replacing specified fields with new values
exit_status
Alias for field number 2
process_state
Alias for field number 1
process_status
Alias for field number 3
state
Alias for field number 0
SQL statements to detect invalid/ununderstood links for the provenance redesign migration.
Various utils that should be used during migrations and migrations tests because the AiiDA ORM cannot be used.
aiida.backends.general.migrations.utils.
LazyFile
Bases: aiida.repository.common.File
aiida.repository.common.File
Subclass of File where key also allows LazyOpener in addition to a string.
This subclass is necessary because the migration will be storing instances of LazyOpener as the key which should normally only be a string. This subclass updates the key type check to allow this.
__init__
Construct a new instance.
name – The final element of the file path
file_type – Identifies whether the File is a file or a directory
key – A key to map the file to its contents in the backend repository (file only)
objects – Mapping of child names to child Files (directory only)
ValueError – If a key is defined for a directory, or objects are defined for a file
MigrationRepository
Bases: aiida.repository.repository.Repository
aiida.repository.repository.Repository
Subclass of Repository that uses LazyFile instead of File as its file class.
_file_cls
alias of LazyFile
NoopRepositoryBackend
Bases: aiida.repository.backend.abstract.AbstractRepositoryBackend
aiida.repository.backend.abstract.AbstractRepositoryBackend
Implementation of the AbstractRepositoryBackend where all write operations are no-ops.
AbstractRepositoryBackend
This repository backend is used to use the Repository interface to build repository metadata but instead of actually writing the content of the current repository to disk elsewhere, it will simply open a lazy file opener. In a subsequent step, all these streams are passed to the new Disk Object Store that will write their content directly to pack files for optimal efficiency.
Repository
__abstractmethods__
_abc_impl
_put_object_from_filelike
Store the byte contents of a file in the repository.
handle – filelike object with the byte content to be stored.
the generated fully qualified identifier for the object within the repository.
TypeError – if the handle is not a byte stream.
erase
Delete the repository itself and all its contents.
Note
This should not merely delete the contents of the repository but any resources it created. For example, if the repository is essentially a folder on disk, the folder itself should also be deleted, not just its contents.
has_object
Return whether the repository has an object with the given key.
key – fully qualified identifier for the object within the repository.
True if the object exists, False otherwise.
initialise
Initialise the repository if it hasn’t already been initialised.
kwargs – parameters for the initialisation.
is_initialised
Return whether the repository has been initialised.
uuid
Return the unique identifier of the repository.
A sandbox folder does not have the concept of a unique identifier and so always returns None.
None
apply_new_uuid_mapping
Take a mapping of pks to UUIDs and apply it to the given table.
table – database table with uuid column, e.g. ‘db_dbnode’
mapping – dictionary of UUIDs mapped onto a pk
deduplicate_uuids
Detect and solve entities with duplicate UUIDs in a given database table.
Before aiida-core v1.0.0, there was no uniqueness constraint on the UUID column of the node table in the database and a few other tables as well. This made it possible to store multiple entities with identical UUIDs in the same table without the database complaining. This bug was fixed in aiida-core=1.0.0 by putting an explicit uniqueness constraint on UUIDs on the database level. However, this would leave databases created before this patch with duplicate UUIDs in an inconsistent state. This command will run an analysis to detect duplicate UUIDs in a given table and solve it by generating new UUIDs. Note that it will not delete or merge any rows.
list of strings denoting the performed operations
ValueError – if the specified table is invalid
delete_numpy_array_from_repository
Delete the numpy array with a given name from the repository corresponding to a node with a given uuid.
uuid – the UUID of the node
name – the name of the numpy array
dumps_json
Transforms all datetime object into isoformat and then returns the JSON.
ensure_repository_folder_created
Make sure that the repository sub folder for the node with the given UUID exists or create it.
uuid – UUID of the node
get_duplicate_uuids
Retrieve rows with duplicate UUIDS.
list of tuples of (id, uuid) of rows with duplicate UUIDs
get_node_repository_dirpaths
Return a mapping of node UUIDs onto the path to their current repository folder in the old repository.
basepath – the absolute path of the base folder of the old file repository.
shard – optional shard to define which first shard level to check. If None, all shard levels are checked.
dictionary of node UUID onto absolute filepath and list of node repo missing one of the two known sub folders, path or raw_input, which is unexpected.
path
raw_input
DatabaseMigrationError – if the repository contains node folders that contain both the path and raw_input subdirectories, which should never happen.
get_node_repository_sub_folder
Return the absolute path to the sub folder path within the repository of the node with the given UUID.
absolute path to node repository folder, i.e /some/path/repository/node/12/ab/c123134-a123/path
get_numpy_array_absolute_path
Return the absolute path of a numpy array with the given name in the repository of the node with the given uuid.
the absolute path of the numpy array file
get_object_from_repository
Return the content of a file with the given name in the repository sub folder of the given node.
name – name to use for the file
get_repository_object
Return the content of an object stored in the disk object store repository for the given hashkey.
load_numpy_array_from_repository
Load and return a numpy array from the repository folder of a node.
uuid – the node UUID
name – the name under which to store the array
the numpy array
migrate_legacy_repository
Migrate the legacy file repository to the new disk object store and return mapping of repository metadata.
Warning
this method assumes that the new disk object store container has been initialized.
The format of the return value will be a dictionary where the keys are the UUIDs of the nodes whose repository folder has contents have been migrated to the disk object store. The values are the repository metadata that contain the keys for the generated files with which the files in the disk object store can be retrieved. The format of the repository metadata follows exactly that of what is generated normally by the ORM.
This implementation consciously uses the Repository interface in order to not have to rewrite the logic that builds the nested repository metadata based on the contents of a folder on disk. The advantage is that in this way it is guarantee that the exact same repository metadata is generated as it would have during normal operation. However, if the Repository interface or its implementation ever changes, it is possible that this solution will have to be adapted and the significant parts of the implementation will have to be copy pasted here.
mapping of node UUIDs onto the new repository metadata.
put_object_from_string
Write a file with the given content in the repository sub folder of the given node.
content – the content to write to the file
recursive_datetime_to_isoformat
Convert all datetime objects in the given value to string representations in ISO format.
value – a mapping, sequence or single value optionally containing datetime objects
serialize_repository
Serialize the metadata into a JSON-serializable format.
the serialization format is optimized to reduce the size in bytes.
dictionary with the content metadata.
store_numpy_array_in_repository
Store a numpy array in the repository folder of a node.
array – the numpy array to store
verify_uuid_uniqueness
Check whether database table contains rows with duplicate UUIDS.
table – Database table with uuid column, e.g. ‘db_dbnode’
IntegrityError if table contains rows with duplicate UUIDS.