narsilouu

narsilouu t1_j8sckwb wrote

Hijacking highest answer.
Disclaimer, I work at HF.

First of all, thanks for stating things that go wrong. This is the only means we have to get better (we are working with our own tools, but we cannot possibly use in all the various ways our community uses them, and so we cant fix every issue since were simply not aware of them all).

For all the issues you mention above, have you tried opening issues when you encountered your problems ? Were usually keen on answering promptly, and while I cannot promise things will move your way (there s many tradeoffs in our libs), at least that helps inform the relevant people.

Just to give you an overview we have 3 things we re trying to achieve.

- Never introduce breaking change. (Or very rarely, like when something is super new, and we realize its hurting users rather than helping we feel ok to break things. If something is really old, we cannot break it since people rely on it even if something is somewhat buggy).
- Add Sota models as fast as possible (and with the most options possible). That requires help from the community, but also reusing tools that already exists, which sometimes requires creativity on our end, to make widely different codebases in a somewhat consistent way. Most codebases from research don t try to support widely different architectures (theres only a handful) so many things are hardcoded which have to be changed, some bugs are in the original code which we have to copy into our codebase to be somewhat consistent (like position_ids start at 2 for roberta https://github.com/huggingface/transformers/issues/10736)

- And have a very hackable codebase. Contrary to most beautiful code with DRY being the dogma, on the contrary transformers tries to be hackable instead. This is because of the origin of research heavy users, which dont want to spend 2h understanding inheritance of classes and where is that code that does X to the input tensor for them to create a new layer. That means that transformers at least is highly duplicated code (we even have an internal cookie cutter tool to maintain copies as easily as possible).

The consequence for this, is that you have clever idea X to improve upon Whisper lets say, you should be able to copy paste the whisper folder and get going. While it might seem odd for some, it is still a design choice, which comes with pros and cons like any design choice.

And just to set things straight. We dont try to shovel our hub into our tools, we have a lot of testing to make sure local models work all the time, we actually rely on it in several internal projects.
Breaking changes is a very big concern of ours. Subtle breaking changes are most likely unintentional (please report them !).

For reinventing things existing into other libraries, do you have example in mind. We re very careful about the use of our time, and also the amount of dependencies we rely on. Adding a dependency for is_pair function is not something we like to do. If the dependency is too large for what we need we dont need it. If we cant have the functionality in reasonable time, then its going to me mostly optional dependency.

Thanks for reading this to the end.
And for all readers, please rest assured we are continuously trying to have the best code given our 3 constraints above. Any issue or pain, no matter how trivial please report, it does help us improve. And our open source and free code, may not be the best (we re aware of some warts) but please please, never doubt we re trying to do the best. And do not hesitate to contribute to make it better if you feel like you know better than us (and you could definitely be right !)

71