Query & search registries

Find & access data using registries.

Setup

!lamin init --storage ./mydata
Hide code cell output
💡 connected lamindb: testuser1/mydata
import lamindb as ln

ln.settings.verbosity = "info"
💡 connected lamindb: testuser1/mydata

We’ll need some toy data:

ln.Artifact(ln.core.datasets.file_jpg_paradisi05(), description="My image").save()
ln.Artifact.from_df(ln.core.datasets.df_iris(), description="The iris collection").save()
ln.Artifact(ln.core.datasets.file_fastq(), description="My fastq").save()
Hide code cell output
❗ no run & transform get linked, consider calling ln.track()
✅ storing artifact 'TZGQ170jpeXkHXunNt1l' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/TZGQ170jpeXkHXunNt1l.jpg'
❗ no run & transform get linked, consider calling ln.track()
✅ storing artifact '0NwklWkatyxLfpYSvO4t' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/0NwklWkatyxLfpYSvO4t.parquet'
❗ no run & transform get linked, consider calling ln.track()
✅ storing artifact '7rDGkS54txUxBipiOLJC' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/7rDGkS54txUxBipiOLJC.fastq.gz'
Artifact(uid='7rDGkS54txUxBipiOLJC', description='My fastq', suffix='.fastq.gz', size=20, hash='hi7ZmAzz8sfMd3vIQr-57Q', hash_type='md5', visibility=1, key_is_virtual=True, created_by_id=1, storage_id=1, updated_at='2024-06-05 14:28:02 UTC')

Look up metadata

For entities where we don’t store more than 100k records, a look up object can be a convenient way of selecting a record.

Consider the User registry:

users = ln.User.lookup(field="handle")

With auto-complete, we find a user:

user = users.testuser1
user
User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at='2024-06-05 14:28:00 UTC')

Note

You can also auto-complete in a dictionary:

users_dict = ln.User.lookup().dict()

Filter by metadata

Filter for all artifacts created by a user:

ln.Artifact.filter(created_by=user).df()
uid version description key suffix accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 TZGQ170jpeXkHXunNt1l None My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 True 1 None None 1 2024-06-05 14:28:02.507514+00:00
2 0NwklWkatyxLfpYSvO4t None The iris collection None .parquet DataFrame 5629 ah24lV9Ncc8nPL0MumEsdw md5 None None 1 True 1 None None 1 2024-06-05 14:28:02.753244+00:00
3 7rDGkS54txUxBipiOLJC None My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 True 1 None None 1 2024-06-05 14:28:02.761562+00:00

To access the results encoded in a filter statement, execute its return value with one of:

  • .df(): A pandas DataFrame with each record stored as a row.

  • .all(): An indexable django QuerySet.

  • .list(): A list of records.

  • .one(): Exactly one record. Will raise an error if there is none.

  • .one_or_none(): Either one record or None if there is no query result.

Note

filter() returns a QuerySet.

The ORMs in LaminDB are Django Models and any Django query works. LaminDB extends Django’s API for data scientists.

Under the hood, any .filter() call translates into a SQL select statement.

.one() and .one_or_none() are two parts of LaminDB’s API that are borrowed from SQLAlchemy.

Search for metadata

ln.Artifact.search("iris").df()
uid version description key suffix accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
2 0NwklWkatyxLfpYSvO4t None The iris collection None .parquet DataFrame 5629 ah24lV9Ncc8nPL0MumEsdw md5 None None 1 True 1 None None 1 2024-06-05 14:28:02.753244+00:00

Let us create 500 notebook objects with fake titles and save them:

ln.save(
    [
        ln.Transform(name=title, type="notebook")
        for title in ln.core.datasets.fake_bio_notebook_titles(n=500)
    ]
)

We can now search for any combination of terms:

ln.Transform.search("intestine").df().head()
uid version name key description type reference reference_type latest_report_id source_code_id created_by_id updated_at
id
8 qzl9yrgfMmgjUQhj None Magnocellular Neurosecretory Cells Intermediat... None None notebook None None None None 1 2024-06-05 14:28:07.757826+00:00
21 RFoGl2O58eel0slw None Study study intestine IgG3 Ovaries. None None notebook None None None None 1 2024-06-05 14:28:07.759901+00:00
25 ie2EcifpmLycfIJy None Spleen intestine IgG result classify Intermedi... None None notebook None None None None 1 2024-06-05 14:28:07.760542+00:00
27 xF2t1fXIQeBA9afH None Igg Magnocellular neurosecretory cells candida... None None notebook None None None None 1 2024-06-05 14:28:07.760864+00:00
29 ySSSGJmTROz3Md5t None Intestine Uterus IgY result IgG1 IgG3. None None notebook None None None None 1 2024-06-05 14:28:07.761177+00:00

Leverage relations

Django has a double-under-score syntax to filter based on related tables.

This syntax enables you to traverse several layers of relations:

ln.Artifact.filter(run__created_by__handle__startswith="testuse").df()
uid version description key suffix accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id

The filter selects all artifacts based on the users who ran the generating notebook.

(Under the hood, in the SQL database, it’s joining the artifact table with the run and the user table.)

Beyond __startswith, Django supports about two dozen field comparators field__comparator=value.

Here are some of them.

and

ln.Artifact.filter(suffix=".jpg", created_by=user).df()
uid version description key suffix accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 TZGQ170jpeXkHXunNt1l None My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 True 1 None None 1 2024-06-05 14:28:02.507514+00:00

less than/ greater than

Or subset to artifacts greater than 10kB. Here, we can’t use keyword arguments, but need an explicit where statement.

ln.Artifact.filter(created_by=user, size__lt=1e4).df()
uid version description key suffix accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
2 0NwklWkatyxLfpYSvO4t None The iris collection None .parquet DataFrame 5629 ah24lV9Ncc8nPL0MumEsdw md5 None None 1 True 1 None None 1 2024-06-05 14:28:02.753244+00:00
3 7rDGkS54txUxBipiOLJC None My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 True 1 None None 1 2024-06-05 14:28:02.761562+00:00

or

from django.db.models import Q

ln.Artifact.filter().filter(Q(suffix=".jpg") | Q(suffix=".fastq.gz")).df()
uid version description key suffix accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 TZGQ170jpeXkHXunNt1l None My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 True 1 None None 1 2024-06-05 14:28:02.507514+00:00
3 7rDGkS54txUxBipiOLJC None My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 True 1 None None 1 2024-06-05 14:28:02.761562+00:00

in

ln.Artifact.filter(suffix__in=[".jpg", ".fastq.gz"]).df()
uid version description key suffix accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 TZGQ170jpeXkHXunNt1l None My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 True 1 None None 1 2024-06-05 14:28:02.507514+00:00
3 7rDGkS54txUxBipiOLJC None My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 True 1 None None 1 2024-06-05 14:28:02.761562+00:00

order by

ln.Artifact.filter().order_by("-updated_at").df()
uid version description key suffix accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
3 7rDGkS54txUxBipiOLJC None My fastq None .fastq.gz None 20 hi7ZmAzz8sfMd3vIQr-57Q md5 None None 1 True 1 None None 1 2024-06-05 14:28:02.761562+00:00
2 0NwklWkatyxLfpYSvO4t None The iris collection None .parquet DataFrame 5629 ah24lV9Ncc8nPL0MumEsdw md5 None None 1 True 1 None None 1 2024-06-05 14:28:02.753244+00:00
1 TZGQ170jpeXkHXunNt1l None My image None .jpg None 29358 r4tnqmKI_SjrkdLzpuWp4g md5 None None 1 True 1 None None 1 2024-06-05 14:28:02.507514+00:00

contains

ln.Transform.filter(name__contains="search").df().head(10)
uid version name key description type reference reference_type latest_report_id source_code_id created_by_id updated_at
id
2 2IO07Gr0aq27NjLn None Igg1 research IgY IgG4 Ovaries IgG IgG. None None notebook None None None None 1 2024-06-05 14:28:07.756872+00:00
7 aTeLoHFRc5YfVOAR None Research IgY IgY Intermediate skeletal muscle ... None None notebook None None None None 1 2024-06-05 14:28:07.757666+00:00
22 gP53YYnGuGugBPlN None Research IgG3 Eardrum visualize IgG4 investigate. None None notebook None None None None 1 2024-06-05 14:28:07.760056+00:00
24 yssZTFObUztpUoRg None Igy IgG2 IgY Eosinophil granulocyte Intermedia... None None notebook None None None None 1 2024-06-05 14:28:07.760371+00:00
31 m2IT49uor6KQ9x4l None Igg4 study research IgG candidate Eardrum Huxl... None None notebook None None None None 1 2024-06-05 14:28:07.761488+00:00
35 GKo3JmhRM8KgnjqO None Intestinal IgG IgM research. None None notebook None None None None 1 2024-06-05 14:28:07.762108+00:00
37 W4lm5B7WAwxgTGqM None Igg result Spleen research IgG3 IgG3. None None notebook None None None None 1 2024-06-05 14:28:07.762419+00:00
50 xlUfad82sY5Mt0A2 None Eosinophil Granulocyte IgG Eosinophil granuloc... None None notebook None None None None 1 2024-06-05 14:28:07.764443+00:00
54 Ly1MoYqxID74Orev None Igg3 Border cells of organ of Corti investigat... None None notebook None None None None 1 2024-06-05 14:28:07.765087+00:00
55 G9CkoeKBvSmIF2uq None Uterus IgG1 IgG4 Ovaries research cluster Uter... None None notebook None None None None 1 2024-06-05 14:28:07.765240+00:00

And case-insensitive:

ln.Transform.filter(name__icontains="Search").df().head(10)
uid version name key description type reference reference_type latest_report_id source_code_id created_by_id updated_at
id
2 2IO07Gr0aq27NjLn None Igg1 research IgY IgG4 Ovaries IgG IgG. None None notebook None None None None 1 2024-06-05 14:28:07.756872+00:00
7 aTeLoHFRc5YfVOAR None Research IgY IgY Intermediate skeletal muscle ... None None notebook None None None None 1 2024-06-05 14:28:07.757666+00:00
22 gP53YYnGuGugBPlN None Research IgG3 Eardrum visualize IgG4 investigate. None None notebook None None None None 1 2024-06-05 14:28:07.760056+00:00
24 yssZTFObUztpUoRg None Igy IgG2 IgY Eosinophil granulocyte Intermedia... None None notebook None None None None 1 2024-06-05 14:28:07.760371+00:00
31 m2IT49uor6KQ9x4l None Igg4 study research IgG candidate Eardrum Huxl... None None notebook None None None None 1 2024-06-05 14:28:07.761488+00:00
35 GKo3JmhRM8KgnjqO None Intestinal IgG IgM research. None None notebook None None None None 1 2024-06-05 14:28:07.762108+00:00
37 W4lm5B7WAwxgTGqM None Igg result Spleen research IgG3 IgG3. None None notebook None None None None 1 2024-06-05 14:28:07.762419+00:00
50 xlUfad82sY5Mt0A2 None Eosinophil Granulocyte IgG Eosinophil granuloc... None None notebook None None None None 1 2024-06-05 14:28:07.764443+00:00
54 Ly1MoYqxID74Orev None Igg3 Border cells of organ of Corti investigat... None None notebook None None None None 1 2024-06-05 14:28:07.765087+00:00
55 G9CkoeKBvSmIF2uq None Uterus IgG1 IgG4 Ovaries research cluster Uter... None None notebook None None None None 1 2024-06-05 14:28:07.765240+00:00

startswith

ln.Transform.filter(name__startswith="Research").df()
uid version name key description type reference reference_type latest_report_id source_code_id created_by_id updated_at
id
7 aTeLoHFRc5YfVOAR None Research IgY IgY Intermediate skeletal muscle ... None None notebook None None None None 1 2024-06-05 14:28:07.757666+00:00
22 gP53YYnGuGugBPlN None Research IgG3 Eardrum visualize IgG4 investigate. None None notebook None None None None 1 2024-06-05 14:28:07.760056+00:00
67 Uyj0wBLvGQB2Ca52 None Research IgG3 Eosinophil granulocyte IgG Natur... None None notebook None None None None 1 2024-06-05 14:28:07.767085+00:00
153 MV3y64z9RoJWAl7s None Research IgY IgG1 Uterus. None None notebook None None None None 1 2024-06-05 14:28:07.786865+00:00
204 4u8GAoZdORRJGULW None Research Astrocytes Astrocytes IgY. None None notebook None None None None 1 2024-06-05 14:28:07.794636+00:00
222 7TDj7IQPYUyURwoC None Research IgG3 Lymph node Liver IgY. None None notebook None None None None 1 2024-06-05 14:28:07.797349+00:00
255 msNk0NMGoTpidqez None Research classify IgG1 IgG1 Natural killer T c... None None notebook None None None None 1 2024-06-05 14:28:07.805008+00:00
310 Z468gP8ye3WZb4Jh None Research Uterus Liver Uterus IgG research IgG4. None None notebook None None None None 1 2024-06-05 14:28:07.816080+00:00
324 WTssze8FTr4fPeLe None Research IgG1 Huxley's layer IgG. None None notebook None None None None 1 2024-06-05 14:28:07.818192+00:00
380 1ao0DUkCecjo9d2L None Research IgY Huxley's layer Magnocellular neur... None None notebook None None None None 1 2024-06-05 14:28:07.826599+00:00
406 Ps9oX8BwMiRwiVIZ None Research IgG IgG Border cells of organ of Cort... None None notebook None None None None 1 2024-06-05 14:28:07.833312+00:00
408 W8NsDI1m8hv4brKE None Research IgG4 Border cells of organ of Corti I... None None notebook None None None None 1 2024-06-05 14:28:07.833608+00:00
485 S8zDuuPaXRzHeHUd None Research IgE Liver efficiency Lymph node resea... None None notebook None None None None 1 2024-06-05 14:28:07.847991+00:00
Hide code cell content
# clean up test instance
!lamin delete --force mydata
!rm -r mydata
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.11.9/x64/bin/lamin", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich_click/rich_command.py", line 367, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich_click/rich_command.py", line 152, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamin_cli/__main__.py", line 103, in delete
    return delete(instance, force=force)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamindb_setup/_delete.py", line 98, in delete
    n_objects = check_storage_is_empty(
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamindb_setup/core/upath.py", line 779, in check_storage_is_empty
    raise InstanceNotEmpty(message)
lamindb_setup.core.upath.InstanceNotEmpty: Storage /home/runner/work/lamindb/lamindb/docs/mydata/.lamindb contains 3 objects ('_is_initialized' ignored) - delete them prior to deleting the instance
['/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/0NwklWkatyxLfpYSvO4t.parquet', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/7rDGkS54txUxBipiOLJC.fastq.gz', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/TZGQ170jpeXkHXunNt1l.jpg', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/_is_initialized']