8place
The Cancer Imaging Archive All the ideas and discussions
3 votes Vote

Allow direct download of files

The current interface for downloading any data is not very user-friendly and basically impossible to automate.

Landing on a dataset page like this: https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI/ and clicking download will result in a .jnlp file to be downloaded. As there was no further explanation there, I expected this to be a server misconfiguration on your end. That a download link for a dataset actually downloads a small java program is very unexpected.

After some emails with someone familiar with the platform I landed here: https://wiki.cancerimagingarchive.net/display/NBIA/Download+Manager which explains that this is the expected behavior and that I need to use a download manger written in Java. As a CS person, this was very confusing to me.

If you choose to continue to provide this interface you should at least include visible instructions on every dataset page explaining that this is actually how it is supposed to work and how to download the file.
I would highly recommend also adding a direct download link as an alternative to the download manager. Currently it's impossible to automatically download any data, as it requires user interactions.
It's also not easily possible to download data to a server that doesn't have a graphical user interface, such as a cloud computing machine.
Finally, the instructions assume that the machine has Oracle Java installed, a commercial product with questionable licensing practices, that is not installed by default on all platforms.

Andreas Mueller , 18.04.2017, 09:59
Idea status: under consideration

Comments

Justin, 19.10.2017, 13:30
Hi Andreas, this is great feedback. I am happy to report that we're in the process of migrating to a stand-alone application rather than using the JNLP solution. We'll also be updating the instructions when we roll this out to make it more obvious that the links are meant to be opened in this new TCIA Downloader app.

As for accessing the data on a headless computer, we do have an API which could be utilized via command line or scripts in this scenario: https://wiki.cancerimagingarchive.net/x/NIIiAQ. You might also check out some of the tools in our "Data analysis center" section: https://wiki.cancerimagingarchive.net/x/x49XAQ. Specifically I'd suggest you look at the Open Source Community Code Share link which includes both Python and R clients which might be useful.

Leave a comment