Viewing a single comment thread. View all comments

jsonathan t1_jbt3hqq wrote

This is really fascinating, thanks for sharing. I'm also working on generating natural language representations of Python packages. My approach is:

  1. Extract a call graph from the package, where each node is a function and two nodes are connected if one contains a call to the other.
  2. Generate natural language summaries of each function by convolving over the graph. This involves generating summaries of the terminal nodes (i.e. functions with no dependencies), then passing those summaries to their dependents to generate summaries, and so on. Very similar to how message passing works in a GNN. The idea here is that summarizing what a function does isn't possible without summaries of what its dependencies do.
  3. Summaries of each function within a file are chained to generate a summary of that file.
  4. Summaries of each file within a directory are chained to generate a summary of that directory, and so on until the root directory is reached.

I'd love to learn more about the differences/advantages of your approach compared to something like this. Thanks again for your contribution, this is insanely cool!

1

NovelspaceOnly OP t1_jbu2xyi wrote

This is awesome. I would be happy to discuss this as well! I was going to add GCNs and GATs pretty soon. if you're up for collaborating, please reach out in DMs!

2