Submitted by calebkaiser t3_yx2krb in MachineLearning

Project Link: https://github.com/comet-ml/kangas

My colleagues and I have been working over the last several months on a tool for visualizing and exploring large, multimedia datasets, with a particular emphasis on computer vision. Today, we're open sourcing the repository and sharing it publicly!

The project is called Kangas, and its Python API will be familiar to anyone whose used Pandas, with one major difference: When you call `DataGrid.show()` on a Kangas DataGrid, you see a UI like this:

https://preview.redd.it/f6c1ni8tyc0a1.png?width=1286&format=png&auto=webp&s=a51ec5151e44ae50caf487f2e8c46486d911afa8

We've focused on a handful of features for this first release:

  1. Scalability. Kangas stores your DataGrids as SQLite databases, as opposed to in-memory like other tools, allowing you to store larger amounts of data and perform queries quickly.
  2. Simplicity. We want it to be incredibly easy to build and render a DataGrid. No tinkering with custom showImage() and plotLabels() methods—just load in your DataGrid and the server will handle metadata parsing, asset rendering, and more.
  3. Interoperability. Kangas can run inside a notebook environment, as a standalone app on your local machine, or can even be deployed as a web app (as we've done at kangas.comet.com ). It also supports a wide variety of data types, and has more robust multimedia support on the immediate roadmap.

Under the hood, Kangas is built on SQLite, along with React Server Components and Next.js, which allows it to render performantly. It's still early days, but we're very excited to share the project with the community and get some initial feedback. Please, don't hesitate to open a ticket or a PR—we love community contributions.

I'm happy to answer any questions you may have here or on the repo!

81

Comments

You must log in or register to comment.

haabilo t1_iwn2ies wrote

Did you know that "kangas" also means "canvas" (or cloth or fabric) in Finnish when oi king the name? Or was it just a kangaroo version of pandas?

8

calebkaiser OP t1_iwncxdj wrote

Someone on our team did bring this up a couple of weeks ago, but I have to admit, it was after we'd named the project Kangas.

Originally, Kangas was just the working name of the project (our research team's mascot is the kangaroo). When we finally decided to put our heads together and name the project something "official," we'd all grown a bit attached to Kangas and it stuck.

I do wish I could pass myself off as worldly enough for that reference though :)

7

Spiritual-Reply5896 t1_iwn1g6i wrote

How would you say it compares to FiftyOne, are your goals the same as with their project?

5

calebkaiser OP t1_iwnci0b wrote

We're big admirers of FiftyOne. Kangas is similar in its approach to UI and some other aspects, but our focus is different. In particular, FiftyOne is really laser-focused on computer vision, whereas Kangas is a more general EDA tool that happens to have some CV features.

To give a more concrete example, Kangas handles all kinds of data, and we put a lot of work into autogenerating charts and aggregate statistics from your dataset regardless of the kind of data you log. Our immediate roadmap involves broadening this even further, adding more advanced support for video, audio, and more. But, we don't have as deep of a feature set specifically for computer vision as FiftyOne.

9

Otje89 t1_iwpilmf wrote

Very interesting! I’ll check it out :)

3

calebkaiser OP t1_iwpsi4b wrote

Thanks! And feel free to reach out with any questions.

1