An interactive and zoomable visualization of your whole dataset. This web-based tool, a modified version of the original Pixplot, is valuable for object detection and classification projects to perform these tasks:
Images that look similar are located next to or near each other, making it easy to see where errors occur (in the UMap visualization).
The repo contains:
A PixplotML server. We have added tools helpful for labelling such as a legend, border colours representing the label, and functionality to update labels or flag images for removal. The original PixPlot uses a classification model trained on ImageNet but we find fine-tuning on your own data produces much more accurate visualizations. So we have added:
A preparation step to customise the visualization to your image data. The preparation step uses your images and a metadata.csv file to train a PyTorch classification model and then output an image vectors file for clustering by PixplotML (using UMap). See Fine-tune PixplotML for your own images for more details. The code to do this is in the prep_pixplot_files folder. (We use Pytorch-Accelerated to easily and simply train a classification model)
The server requires the following files located in a folder:
metadata.csv - a file containing the image name, category (see below for more details).
images/*.* - a sub folder containing the images to visualize
image_vectors.npy - image vectors from a classification model backbone. (See below for more details)
To create the image_vectors.npy for your images we provide code and instructions, see Fine-tune PixplotML for your own images for more details. The code to do this is in the prep_pixplot_files folder.
Quickstart - visualizing the coco dataset bounding box imagesTo quickly see PixplotML running on bounding box images extracted from the Coco dataset, you can follow this pre-created example which contains all the required files.
First, clone the repo and extract the zip file containing coco validation dataset bounding boxes
git clone https://github.com/alexhock/pixplotml.git cd pixplotml unzip ./data/coco_trained.zip -d ./data/
To run pixplotml, there are two options: using Python with a new environment, or using Docker where the environment is managed for you.
Create a Python environment and install dependencies:
conda create --name=pixplotml python=3.9 conda activate pixplotml cd pixplot_server pip install -r requirements.txt
Run the pixplot pre-processing. This prepares images and creates the pixplot website in a folder called 'output':
python pixplot/pixplot.py --images "../data/outputs/images/*.jpg" --metadata "../data/outputs/metadata.csv" --image_vectors "../data/outputs/image_vectors.npy"
Start a web server by running:
python -m http.server 8600
Open a browser to: http://localhost:8600/output
.
Instead of manually creating a Python environment and performing the steps in the Python quickstart we can instead just use docker to take care of all that.
Build and tag the image:
cd pixplot_server docker build -t pixplotml:1.0 .
Run pixplot
cd data docker run -v `pwd`/outputs:/data -p 8800:8800 pixplotml:1.0 /data 8800 metadata.csv images/*.jpg
Open a browser to: http://localhost:8800/output
To stop the running docker container:
export CONTAINER_ID=`docker ps -lq` docker stop $CONTAINER_ID
Note that if you want to avoid re-running the preprocessing step then you must commit the docker image after the first run.
docker ps -a
docker commit <container_id> pixplotml:2.0
Then to run use the new docker image name pixplotml:2.0
Metadata should be in a comma-separated value file, should contain one row for each input image, and should contain headers specifying the column order. Here is a sample metadata file:
filename category tags description permalink bees.jpg yellow a|b|c bees' knees https://... cats.jpg dangerous b|c|d cats' pajamas https://...The following column labels are accepted:
Column Description filename the filename of the image category a categorical label for the image tags a pipe-delimited list of categorical tags for the image description a plaintext description of the image's contents permalink a link to the image hosted on another domain year a year timestamp for the image (should be an integer) label a categorical label used for supervised UMAP projection lat the latitudinal position of the image lng the longitudinal position of the imageRetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4