Viewing a single comment thread. View all comments

ZestyData t1_isk14sf wrote

I think one aspect that's really crucial and missing is the statistical/mathematical justification for using this. Before using a tool we'd need to be certain its behaviour is valid.

You mention that you use Delaunay Triangulation (which should really be emphasized higher up, being the crucial aspect of this tool existing). But can you provide and make note of the references that justify Delaunay Triangulation as an effective method for generating data to fit an existing statistical distribution?

I haven't really used Delaunay Triangulation in this manner but by my basic understanding of the algorithm, doesn't it attempt to create an optimal triangulation, and therefore would tend towards outputting rather uniformly distributed internal points, rather than learning the distribution of the input? And the higher number, the greater that trend?

If that hypothesis were the case, it'd be less than useless as an artificial data source, it'd be harmful for the vast majority of use cases! I very well may be wrong, but my main point is that you should definitely make note of the method's performance if you're advertising it as a solution.

45

Liorithiel t1_isk938r wrote

> I haven't really used Delaunay Triangulation in this manner but by my basic understanding of the algorithm, doesn't it attempt to create an optimal triangulation, and therefore would tend towards outputting rather uniformly distributed internal points, rather than learning the distribution of the input?

Delaunay triangulation itself—not really, well, not in the way that would do much harm. We use it for simulations of mobile networks, e.g. analyses at the boundary between urban (where density of base stations is high) and rural (less dense) areas. If each triangle creates one additional point, regardless of whether you have a large triangle (rural) or a small one (urban), then denser areas will get more points. It won't lead to smoothly changing density between more and less dense areas, but then, it's an assumption you'd have to put in addition to your data, not infer from data themselves.

Judging from the visualization though, this algorithm seems though to have a stopping condition dependant on the size of a triangle, which breaks this reasoning.

10