Skip to content

glencoesoftware/omero2pandas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

omero2pandas

A convenience package to download data from OMERO.tables into Pandas dataframes.

Installation

omero2pandas can be installed with pip on Python 3.9+:

pip install omero2pandas

omero2pandas also supports authentication using tokens generated by omero-user-token. Compatible versions can be installed as follows:

pip install omero2pandas[token]

See the omero-user-token documentation for more information.

Usage

import omero2pandas
df = omero2pandas.read_table(file_id=402)
df.head()

Tables can be referenced based on their OriginalFile's ID or their Annotation's ID. These can be easily obtained by hovering over the relevant table in OMERO.web, which shows a tooltip with these IDs.

To avoid loading data directly into a dataframe, you can also download directly into a CSV:

import omero2pandas
omero2pandas.download_table("/path/to/output.csv", file_id=2, chunk_size=1000)

chunk_size can be specified when both reading and downloading tables. It determines how many rows are loaded from the server in a single operation.

Supplying credentials

Multiple modes of connecting to OMERO are supported. If you're already familiar with omero-py, you can supply a premade client:

import omero
import omero2pandas
my_client = omero.client(host="myserver", port=4064)
df = omero2pandas.read_table(file_id=402, omero_connector=my_client)
df.head()

Alternatively, your connection and login details can be provided via arguments:

import omero2pandas
df = omero2pandas.read_table(file_id=402, server="omero.mysite.com", port=4064,
                             username="myuser", password="mypass")
df.head()

If you have omero_user_token installed, an existing token will be automatically detected and used to connect:

import omero2pandas
df = omero2pandas.read_table(file_id=402)
df.head()

You can also generate the connection object separately using the built-in wrapper:

import omero2pandas
connector = omero2pandas.connect_to_omero(server="myserver", port=4064)
# User will be prompted for any missing connection info. 

df = omero2pandas.read_table(file_id=402, omero_connector=connector)
df.head()

When prompting for missing connection information, the package automatically detects whether omero2pandas is running in a Jupyter environment. If so, you'll get a login widget to complete details. Otherwise a CLI interface will be provided.

This behaviour can be disabled by supplying interactive=False to the connect call.

Reading data

Several utility methods are provided for working with OMERO.tables. These all support the full range of connection modes.

Fetch the names of the columns in a table:

import omero2pandas
columns = omero2pandas.get_table_columns(annotation_id=142)
# Returns a list of column names

Fetch the dimensions of a table:

import omero2pandas
num_rows, num_cols = omero2pandas.get_table_size(annotation_id=12)
# Returns a tuple containing row and column count.

You can read out specific rows and/or columns

import omero2pandas
my_dataframe = omero2pandas.read_table(file_id=10, 
                                       column_names=['object', 'intensity'],
                                       rows=range(0, 100, 10))
my_dataframe.head()
# Returns object and intensity columns for every 10th row in the table

Returned dataframes also come with a pandas index column, representing the original row numbers from the OMERO.table.

Non-OMERO.tables Tables

Sometimes users attach a CSV file as a FileAnnotation in CSV format rather than uploading as an OMERO.tables object. omero2pandas can still try to read these using dedicated methods:

import omero2pandas
my_dataframe = omero2pandas.read_csv(file_id=101, 
                                     column_names=['object', 'intensity'])
my_dataframe.head()
# Returns dataframe with selected columns

Note that this interface supports less features than using full OMERO.tables, so queries and row selection are unavailable. However, it is also possible to load gzip-compressed CSV files (.csv.gz) directly with these methods.

You can also directly download the OriginalFile as follows:

import omero2pandas
omero2pandas.download_csv("/path/to/output.csv", file_id=201)

In both these cases the chunk_size parameter controls the number of bytes loaded in each server call rather than the row count. Take care when specifying this parameter as using small values (e.g. 10) will make the download very slow.

By default the downloader will only accept csv/csv.gz files, but it can technically be used with most OriginalFile objects. Supply the check_type=False argument to bypass that restriction.

N.b. OMERO.tables cannot be downloaded with this method, use omero2pandas.download_table instead.

Writing data

Pandas dataframes can also be written back as new OMERO.tables. N.b. It is currently not possible to modify a table on the server.

Connection handling works just as it does with downloading, you can provide credentials, a token or a connection object.

To upload data, the user needs to specify which OMERO object(s) the table will be associated with. This can be achieved with the parent_id and parent_type arguments. Supported objects are Dataset, Well, Plate, Project, Screen and Image.

import pandas
import omero2pandas
my_data = pandas.read_csv("/path/to/my_data.csv")
ann_id = omero2pandas.upload_table(my_data, "Name for table", 
                                   parent_id=142, parent_type="Image")
# Returns the annotation ID of the uploaded FileAnnotation object

Once uploaded, the table will be accessible on OMERO.web under the file annotations panel of the parent object. Using unique table names is advised.

OMERO ID columns

OMERO.tables support some special column types which associate tabular data with objects on the server. These are defined as integer columns with the following names: project, dataset, image, screen, plate, well and roi. These names are case-insensitive. For example, a row with an Image column with the value 1033 will be associated with Image 1033.

To display this in omero-web the table itself should be linked to either the object itself or a parent container. i.e. If you have an image column referencing several images in a dataset, attach the table itself to the parent dataset and the relevant row data will be visible when viewing the individual images in omero-web.

Linking to multiple objects

To link to multiple objects, you can supply a list of (<type>, <id>) tuples to the links parameter. The resulting table's FileAnnotation will be linked to all objects in the links parameter (plus parent_type:parent_id if provided).

import omero2pandas
ann_id = omero2pandas.upload_table(
    "/path/to/my.csv", "My table", 
    links=[("Image", 101), ("Dataset", 2), ("Roi", 1923)])
# Uploads with Annotation links to Image 101, Dataset 2 and ROI 1923 

Links allow OMERO.web to display the resulting table as an annotation associated with those objects.

Large Tables

The first argument to upload_table can be a pandas dataframe or a path to a .csv file containing the table data. In the latter case the table will be read in chunks corresponding to the chunk_size argument. This will allow you to upload tables which are too large to load into system memory.

import omero2pandas
ann_id = omero2pandas.upload_table("/path/to/my.csv", "My table", 
                                   142, chunk_size=100)
# Reads and uploads the file to Image 142, loading 100 lines at a time 

The chunk_size argument sets how many rows to send with each call to the server. If not specified, omero2pandas will attempt to automatically optimise chunk size to send ~2 million table cells per call (up to a max of 50,000 rows per message for narrow tables).

Advanced Usage

This package also contains utility functions for managing an OMERO connection.

omero2pandas.connect_to_omero() takes many of the arguments from the other functions and returns an OMEROConnection object.

The OMEROConnection handles your OMERO login and session, cleaning everything up automatically on exit. This has some accessory methods to access useful API calls:

import omero2pandas
connector = omero2pandas.OMEROConnection()
connector.connect()
client = connector.get_client()
blitz = connector.get_gateway()

When a client is active within the OMEROConnection object, calls to this wrapper class will also be forwarded directly to the client object.

OMEROConnection objects can also be used as a context manager:

import omero2pandas
with omero2pandas.connect_to_omero(server='my.server', port=4064, 
                                   username='test.user',) as connector:
    blitz = connector.get_gateway()
    image = blitz.getObject('Image', id=100)
    # Continue using the standard OMERO API.

The context manager will handle session creation and cleanup automatically.

Connection Management

omero2pandas keeps track of any active connector objects and shuts them down safely when Python exits. Deleting all references to a connector will also handle closing the connection to OMERO gracefully. You can also call connector.shutdown() to close a connection manually.

By default omero2pandas also keeps active connections alive by pinging the server once per minute (otherwise the session may timeout and require reconnecting). This can be disabled as follows

omero2pandas.connect_to_omero(keep_alive=False)

Querying tables

You can also supply PyTables condition syntax to the read_table and download_table functions. Returned tables will only include rows which pass this filter.

Basic syntax

Select rows representing objects with area greater than 20:

omero2pandas.read_table(file_id=10, query='(area>20)')

Multiple conditions

Select rows representing objects with an even ID number lower than 50:

omero2pandas.read_table(file_id=10, query='(id%2==0) & (id<50)')

Complex conditions

Select rows representing objects which originated from an ROI named 'Nucleus':

omero2pandas.read_table(file_id=10, query='x!="Nucleus"', variables={'x': omero.rtypes.rstring('Roi Name')})

N.b. Column names containing spaces aren't supported by the native syntax, but can be supplied as variables which are provided by the variables parameter.

The variables map needs to be a dictionary mapping string variables to OMERO rtypes objects rather than raw Python objects. These should match the relevant column type. Mapped variables are substituted into the query during processing.

A variables map usually isn't needed for simple queries. The basic condition string should automatically get converted to a meaningful type, but when this fails replacing tricky elements with a variable may help.

Remote registration [Experimental - OMERO Plus Only]

For OMERO Plus installations which support TileDB as the OMERO.tables backend it is possible to register tables in-place in a similar manner to in-place image imports (otherwise table data is stored in the OMERO Plus server's binary repository).

This is a two-step process:

  1. Convert the dataframe into a TileDB file
  2. Register the remote converted table with OMERO Plus

If you don't know what table backend your OMERO Plus server is using, you probably don't have this feature available. If you have access to the server machine you can check by running omero config get omero.tables.module, if the response is omero_plus.run_tables_pytables_or_tiledb then tiledb is available.

For this mode to be available extra dependencies must also be installed as follows

pip install omero2pandas[remote]

To use remote registration supply the local_path argument to omero2pandas.upload_table as follows:

import omero2pandas
db_path = omero2pandas.upload_table("/path/to/my_data.csv", "Name for table", 
                                    local_path="/path/to/mytable.tiledb")
# Returns the path to the created tiledb file

This will convert the table into a TileDB file written to local_path, then attempt to register this table to OMERO Plus in-place. For this to work the local_path needs to be readable via the server machine as well (e.g. a network drive).

If shared storage is mounted differently from the server's point of view, you can also supply the remote_path parameter to declare where OMERO Plus should find the resulting TileDB file.

For example, if registering from a Windows machine with a network drive to an OMERO Plus server on Linux:

omero2pandas.upload_table(
    df, "My Custom Table", links=[("Image", 101)],
    local_path="J:\\data\\tables\\my_omero_table.tiledb",
    remote_path="/network_data/tables/my_omero_table.tiledb"
)

Effectively, local_path is where the current machine should write the data to, remote_path is where that file will be from the OMERO Plus server's point of view. No remote path implies that both machines will see the file at the local path.

Note that when a table is registered remotely it is not stored within the binary repository used to store OMERO Plus data. This means that it becomes the user's responsibility to update the table object on the OMERO Plus server if the file is moved/deleted.

Running remote registration steps individually

If your system lacks shared storage you may want to split the TileDB creation and registration steps so that data can be manually copied to the server. It is possible to run steps from the upload_table workflow individually.

The remote registration API requires the exchange of a "SecretToken" metadata key which should be present in the the TileDB array metadata. This check verifies that the user does have access to the table they want to register (as they know the token value) and that the file seen by the server is indeed the one the user asked to register.

from omero2pandas.remote import create_tiledb, register_table
# Create the tiledb and retrieve the SecretToken
secret_token = create_tiledb(df, "/path/to/my.tiledb")

# At this point you could copy the tiledb file to the server

# Register the table remotely
omero_connector = omero2pandas.connect_to_omero(**login_params)
ann_id = register_table(omero_connector, "/server/path/to/my.tiledb", 
                        table_name="My table", links=[("Image", 101)], 
                        token=secret_token)

Note that the register_table function requires an omero2pandas.connect.OMEROConnection object, you can generate one from an existing client using omero2pandas.connect_to_omero(client=client)

While it is possible to manually create and register tables without a SecretToken, this is strongly discouraged as other users could potentially register and access the same table without permission. With that in mind the implementation within omero2pandas could be considered as an example of "best practice" for handling remote table registration.

About

Exchange data between OMERO.tables and pandas DataFrames.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 7

Languages