Submitted by drinkingsomuchcoffee t3_113m1ly in MachineLearning

At a glance, HuggingFace seems like a great library. Lots of access to great pretrained models, an easy hub, and a bunch of utilities.

Then you actually try to use their libraries.

Bugs, so many bugs. Configs spanning galaxies. Barely passible documentation. Subtle breaking changes constantly. I've run the exact same code on two different machines and had the width and height dimensions switched from underneath me, with no warning.

I've tried to create encoders with a custom vocabulary, only to realize the code was mangling data unless I passed a specific flag as a kwarg. Dozens of more issues like this.

If you look at the internals, it's a nightmare. A literal nightmare.

Why does this matter? It's clear HuggingFace is trying to shovel as many features as they can to try and become ubiquitous and lock people into their hub. They frequently reinvent things in existing libraries (poorly), simply to increase their staying power and lock in.

This is not ok. It would be OK if the library was solid, just worked, and was a pleasure to use. Instead we're going to be stuck with this mess for years because someone with an ego wanted their library everywhere.

I know HuggingFace devs or management are likely to read this. If you have a large platform, you have a responsibility to do better, or you are burning thousands of other devs time because you didn't want to write a few unit tests or refactor your barely passable code.

/RANT

59

Comments

You must log in or register to comment.

tripple13 t1_j8r8f7i wrote

This reads like some of those posts criticising OS-frameworks that don't always behave intuitively.

While I don't disagree that there are bugs, Hugging Face is doing more for Open ML than many large tech companies are doing.

HuggingFace, FastAI and similar frameworks are designed to lower the barrier to ML, such that any person with programming skills can harness the power of SoTA ML progress.

I think that's a great mission tbh, even if there are some inevitable bumps on the road.

152

narsilouu t1_j8sckwb wrote

Hijacking highest answer.
Disclaimer, I work at HF.

First of all, thanks for stating things that go wrong. This is the only means we have to get better (we are working with our own tools, but we cannot possibly use in all the various ways our community uses them, and so we cant fix every issue since were simply not aware of them all).

For all the issues you mention above, have you tried opening issues when you encountered your problems ? Were usually keen on answering promptly, and while I cannot promise things will move your way (there s many tradeoffs in our libs), at least that helps inform the relevant people.

Just to give you an overview we have 3 things we re trying to achieve.

- Never introduce breaking change. (Or very rarely, like when something is super new, and we realize its hurting users rather than helping we feel ok to break things. If something is really old, we cannot break it since people rely on it even if something is somewhat buggy).
- Add Sota models as fast as possible (and with the most options possible). That requires help from the community, but also reusing tools that already exists, which sometimes requires creativity on our end, to make widely different codebases in a somewhat consistent way. Most codebases from research don t try to support widely different architectures (theres only a handful) so many things are hardcoded which have to be changed, some bugs are in the original code which we have to copy into our codebase to be somewhat consistent (like position_ids start at 2 for roberta https://github.com/huggingface/transformers/issues/10736)

- And have a very hackable codebase. Contrary to most beautiful code with DRY being the dogma, on the contrary transformers tries to be hackable instead. This is because of the origin of research heavy users, which dont want to spend 2h understanding inheritance of classes and where is that code that does X to the input tensor for them to create a new layer. That means that transformers at least is highly duplicated code (we even have an internal cookie cutter tool to maintain copies as easily as possible).

The consequence for this, is that you have clever idea X to improve upon Whisper lets say, you should be able to copy paste the whisper folder and get going. While it might seem odd for some, it is still a design choice, which comes with pros and cons like any design choice.

And just to set things straight. We dont try to shovel our hub into our tools, we have a lot of testing to make sure local models work all the time, we actually rely on it in several internal projects.
Breaking changes is a very big concern of ours. Subtle breaking changes are most likely unintentional (please report them !).

For reinventing things existing into other libraries, do you have example in mind. We re very careful about the use of our time, and also the amount of dependencies we rely on. Adding a dependency for is_pair function is not something we like to do. If the dependency is too large for what we need we dont need it. If we cant have the functionality in reasonable time, then its going to me mostly optional dependency.

Thanks for reading this to the end.
And for all readers, please rest assured we are continuously trying to have the best code given our 3 constraints above. Any issue or pain, no matter how trivial please report, it does help us improve. And our open source and free code, may not be the best (we re aware of some warts) but please please, never doubt we re trying to do the best. And do not hesitate to contribute to make it better if you feel like you know better than us (and you could definitely be right !)

71

dojoteef t1_j8sqm4i wrote

I commend what Huggingface is trying to do (be the source for the latest models that is consistent and easy to use), but every time I've used the library I've had to tackle bugs that were very time consuming to pinpoint, which is exacerbated by the structure of the code. The worst bugs have been subtle heisenbugs: the code seemed to work most of the time, but failed at other times. The heisenbugs are what made me stop using Huggingface altogether, unless it's my only option.

For example, I ran into a bug that only manifested when downloading a specific pretrained model for a task, which in turn downloads a config file that had a bug in the config. As a user it was super difficult to know where the source of the bug was without extensive spelunking. I've had many similarly difficult to diagnose issues each time I've used the Huggingface ecosystem.

I understand that what you're tasked with as a company is a huge undertaking for such a small team. Maybe splitting the package into a "stable" package and a "nightly" package could help (with stable being extensively bug tested more like an Ubuntu LTS release). My guess is that your team is likely too small to support that approach while adding new features at the same speed.

14

drinkingsomuchcoffee OP t1_j8sm1y7 wrote

Thank you for replying. I apologize for the harsh tone, and was hoping to phrase it as a wake up call that people are reading the code and they do care about quality.

Do continue to avoid inheritance. In fact, probably ban inheritance unless it's only one layer deep and inheriting from an abstract base class.

But don't misunderstand DRY. DRY is not about compressing code as much as possible. That's code golfing. DRY is about having one place for information to live, that's it. If you see a dev creating a poorly named function or abstraction to reduce 5 lines of duplicate code, that's not DRY, that's just bad code.

You can achieve DRY by using code generators as you mention, but splitting things into separate modules is also fine. A code generator is DRY because the generator is the point of truth for the information, even if it creates "duplicate" code. This is what a real understanding of DRY is.

People wanting to "hack" on code do not mind about having to copy a few folders. If you have a beautiful module of pure functions for calculating statistics, it is flat out stupid to copy+paste it into every folder to be more "hackable". Dont do this. Instead factor these out into simple pure modules.

6

fasttosmile t1_j8tudd4 wrote

You don't need to explain what DRY is. You need to understand that there is a trade-off between centralizing (creating shared functions/classes in modules that many other modules import from) a codebase verses keeping it hackable that is unavoidable.

They have a blogpost on this

10

drinkingsomuchcoffee OP t1_j8unlkp wrote

Alright, I have a bit of time so I'll address a few things.

>You need to understand that there is a trade-off between centralizing [...] verses keeping it hackable that is unavoidable.

I don't know what hackable means. You haven't defined it. I'm going to use the most generous interpretation to mean, you can modify it without impacting other places. Well you can do that if it's centralized, just copy paste it into your file and then edit it- that's no excuse to completely ban centralization! Alternatively decompose the centralized function more and only use the pieces you need.

Now onto the blog post.

>If a bug is found in one of the model files, we want to make it as easy as possible for the finder to fix it. There is little that is more demotivating than fixing a bug only to see that it caused 100 failures of other models.

Maybe it should cause 100s of failures if it's a breaking change (a bug). That's a pretty good sign you really did screw something up.

>Similarly, it's easier to add new modeling code and review the corresponding PR if only a single new model file is added.

No it's not. If new code uses a battle tested core, I don't have to review those parts as thoroughly. If it's copy pasted, I still have to review it and make sure they didn't copy an old version with bugs or slightly modified it and broke something. Sounds like this is common as many people have complained about dozens of bugs!

>We assume that a significant amount of users of the Transformers library not only read the documentation, but also look into the actual modeling code and potentially modify it. This hypothesis is backed by the Transformers library being forked over 10,000 times and the Transformers paper being cited over a thousand times.

Maybe you should check your assumptions before you make a fundamental decision (you know, basic engineering). There's plenty of forked libraries that are not modified and are forked for archival purposes. Nor should you cater to a small minority if most people _aren't_ doing this.

> Providing all the necessary logical components in order in a single modeling file helps a lot to achieve improved readability and adaptability.

It can _sometimes_. But not always. Having one massive file named `main.py` is not more readable than a well split program. This seems like basic common sense to me, but here's an actual paper on the subject: http://www.catb.org/esr/writings/taoup/html/ch04s01.html

>Every time we would have to have asked ourselves whether the "standard" attention function should be adapted or whether it would have been better to add a new attention function to attention.py. But then how do we name it? attention_with_positional_embd, reformer_attention, deberta_attention?

Yep, you've identified a place where you shouldn't try to fit every idea under a single "Attention" class. That's just common sense programming, not an argument against writing good shared functions or classes.

>Once a machine learning model is published, it is rarely adapted or changed afterward.

Then why does the Bert module have changes as recent as this week with changes from dozens of authors going back years?

https://github.com/huggingface/transformers/tree/main/src/transformers/models/bert

This is irrefutable hard evidence against your argument.

&gt; Sylvain Gugger, found a great mechanism that respects both the single file policy and keeps maintainability cost in bounds. This mechanism, loosely called "the copying mechanism", allows us to mark logical components, such as an attention layer function, with a # Copied from <predecessor_model>.<function> statement

Ok so the programmer you mentioned before is going to "break 100s of tests" when she changes this ad-hoc C-preprocessor knock off. You're still doing "DRY" you're just doing it how C programmers did it 30 years ago, in a much more complicated manner.

If anyone here works at HuggingFace, please forward this to the author of that article.

5

hpstring t1_j8x5iyh wrote

I'm a beginner in this field and I was wondering what it means for code to be "centralized" and "dry". Does "centralized" mean putting a lot of code in a single file and "dry" means raw code that is not very easy to read but is efficient or have some other advantages?

1

baffo32 t1_j8zbmua wrote

dry is a very basic software engineering principle that means to include only one copy of every sequence of code. it looks like machine learning people did not learn this as they weren’t trained as software engineers. DRY stands for “don’t repeat yourself”, and if not respected then it gets harder and slower more and more to maintain, improve, or bugfix software, the larger and older it gets.

2

baffo32 t1_j8zbuup wrote

i think by centralized they mean what they imagine dry looking like, putting code in one place rather than spreading it out. it’s not usually used that way. it’s a reasonable expression though; people usually centralize components so there is one organized place to go to in order to access them.

2

hpstring t1_j8zw7i5 wrote

Lots of thanks! I didn't receive training from software engineering perspective, which seems to be an important aspect in machine learning.

1

baffo32 t1_j90uucx wrote

it’s important if you’re publishing large software packages of course lots of hobbyists also learn in the field

1

fasttosmile t1_j8v03xu wrote

> I don't know what hackable means. You haven't defined it. I'm going to use the most generous interpretation to mean, you can modify it without impacting other places. Well you can do that if it's centralized, just copy paste it into your file and then edit it- that's no excuse to completely ban centralization! Alternatively decompose the centralized function more and only use the pieces you need.

Your definition of hackable is almost it. What’s missing is that being decentralized makes things much, much easier to understand because the code is very straightforward and doesn’t have to take 10 different things into account.

You cant just copy paste a file if it’s centralized, you’ll have to copy paste multiple, and the main issue is it’s gonna take a while to understand which ones (and you'll have to modify the imports etc., unless you copy the entire repo! are you seriously suggesting that lmao) and what’s safe to modify inside of them. Decomposing is just going to make things more complicated for no gain.

Deep learning is about the details, and whenever you start breaking things apart and putting the details in different corners that’s how you end up with code that is hard to understand and people making mistakes and not understanding what’s going on.

> Maybe it should cause 100s of failures if it's a breaking change (a bug). That's a pretty good sign you really did screw something up.

It's a syntax/interface/some-other-not-fundamental bug. A real bug would have already been spotted when checking the test-set performance .

> No it's not. If new code uses a battle tested core, I don't have to review those parts as thoroughly. If it's copy pasted, I still have to review it and make sure they didn't copy an old version with bugs or slightly modified it and broke something. Sounds like this is common as many people have complained about dozens of bugs!

The way code is shown to be correct is by getting SOTA results. If it does that it is "battle tested". If it didn't do that no one would even think of merging it in the first place.

> Yep, you've identified a place where you shouldn't try to fit every idea under a single "Attention" class. That's just common sense programming, not an argument against writing good shared functions or classes.

It is an argument against having shared classes. At the same time, sure you can have some shared code, Huggingface does that.

> It can sometimes. But not always. Having one massive file named main.py is not more readable than a well split program. This seems like basic common sense to me, but here's an actual paper on the subject:

There is an important distinction that you're ignoring here. Having semantically separate objects in one file is indeed confusing. But if put everything related to the model in one file that simplifies things and reduces the working memory people require to read your code.

> Then why does the Bert module have changes as recent as this week with changes from dozens of authors going back years?

The recent change for Bert is some inference Interfaxe code which has to be kept common across all models. That’s their decision, I wouldn’t even do that, just make kwargs mandatory imo.

> Maybe you should check your assumptions before you make a fundamental decision (you know, basic engineering). There's plenty of forked libraries that are not modified and are forked for archival purposes. Nor should you cater to a small minority if most people aren't doing this.

Everyone in deep learning likes to gamble on making some tweaks to the model hoping they’ll get the next ICLR oral. Why else would they care about modifying the model code?

--

I suggest you go read some modeling code from different frameworks, one example is fairseq. I like fairseq, I think it's well done considering it's aims and constraints. But you're crazy if you think it's easier to understand and modify the code for some specific model than in huggingface. Here's the link to fairseq's roberta, you'll need to understand look at a dozen files to see what's happening. In constrast, huggingface is one file.

Spent too much time on this already, not gonna reply anymore.

0

drinkingsomuchcoffee OP t1_j8v3y80 wrote

>You cant just copy paste a file if it’s centralized, you’ll have to copy paste multiple, and the main issue is it’s gonna take a while to understand which ones (and you'll have to modify the imports etc., unless you copy the entire repo! are you seriously suggesting that lmao)

Yep apparently they themselves claim to do this for every module. Thank you for pointing out how crazy this is and proving my point.

>Your definition of hackable is almost it. What’s missing is that being decentralized makes things much, much easier to understand because the code is very straightforward and doesn’t have to take 10 different things into account.

Oh really? I think those files depend on pytorch functions and also numpy. Should they copy those entire libraries into the file to be more "hackable"? Lmao

3

drinkingsomuchcoffee OP t1_j8tw3yt wrote

There's so many contradictions in that blog post and fallacies, I don't even know where to begin. I think I'll let empirical evidence do the talking for me, aka many people agreeing with my post.

3

baffo32 t1_j8vsq9s wrote

looks like there is emotional or funded influence here, cointerintuitive votes, strange statements stated as facts

Duplicated code makes a very very _unhackable project_ because one has to learn the code duplicating systems and add functionality to them for every factorization. It does make _hackable examples_ but the codebase doesn’t seem to understand where to draw the line at all.

The library looks like it was made entirely without an experienced lead software engineer. As a corporation they should have one.

&#x200B;

HuggingFace, please understand that software developers find DRY to be hackable. The two terms usually go together. It reads like a contradiction, like fake news trying to manipulate people by ignoring facts, to state it the other way around.

4

drinkingsomuchcoffee OP t1_j8zael9 wrote

I am the "bad guy" of the thread, so anything I say will be seen negatively, even if it's correct. This is typical human behavior, unfortunately.

I have a feeling most people here do not understand DRY done well, and are used to confusing inheritance hierarchies and incredibly deep function chains. Essentially they have conflated DRY with bad code, simple as that.

2

baffo32 t1_j8zd3ge wrote

You’re not the bad guy, I’m guessing maybe it’s a community of data workers who’ve never had a reason to value DRY.

2

Shinsekai21 t1_j8wrcvo wrote

>HuggingFace, FastAI and similar frameworks are designed to lower the barrier to ML, such that any person with programming skills can harness the power of SoTA ML progress.

I started out with FastAI and now learning PyTorch. I agreed.

I'm more of the top-down student (learn the practical stuff first then the fundamental). FastAI is doing great job at showing me what is possible and interesting with their lectures.

I moved to PyTorch because I wanted to understand more about whats underneath FastAI. I'm currently doing ZeroToMastery PyTorch and found that the knowledge I had with FastAI is helping alot.

−1

gradientpenalty t1_j8ruzh9 wrote

Maybe you don't do much NLP research then? Back when huggingface transformers and datasets library ( still think its bad name ), we had to format these validation ourselves and write the same validation code which hundreds of your peers have written before because no one is the defactor code for doing it (since we are using different kinds of model). NLP models ( or so called transformers ) nowadays are a mess and had no fix way to use them, running benchmark is certainly a nightmare.

When transformers first came out, they are limited but serves to simplify using bert embedding and gpt-2 beam search generation in few line of codes. The library will do all the model downloads, version check and abstraction for you. Then there's datasets, which unifies all NLP datasets in a central platform which allows me to run GLUE benchmark in one single py file.

Oh back then, the code was even worse, all modeling_(name).py under the transformers/ directory. The latest 4.2X version its somewhat maintainable and readable with all the complex abstraction they had. But its a fast moving domain, and any contribution will be irrelevant in a few years later, so complexity and mess will add up ( would you like to spend time doing cleaning instead of implement the new flashy self-attention alternative? ).

But one day, they might sell out as with many for profit company, but they have and had save so many time and helped so many researchers on the advancement of NLP progress. If they manage to piss off the community, someone will rise up and challenge their dominance (tensorflow vs pytorch).

38

weightedflowtime t1_j8r0bet wrote

I think that your post is more likely to get somewhere if reworded in a respectful way.

30

borisfin t1_j8upuj9 wrote

The huggingface devs will clean their libraries over time. It's not fair denounce the value and convenience they provide for new users. What other comparable options even are there?

13

qalis t1_j8r6o9x wrote

Completely agree. Their "side libraries" are even worse, such as Optimum. The design decisions there are not questionable, they are outright stupid at times. Like forcing input to be a PyTorch tensor... and then converting it to Numpy array inside. Without an option to pass a Numpy array. Even first time interns at my company tend not to make such mistakes.

12

fxmarty t1_j8r8inv wrote

Thank you for the feedback, I feel the same it does not make much sense. My understanding is that the goal is to be compatible with transformers pipelines - but it makes things a bit illogical trying to mix ONNX Runtime and PyTorch.

That said, Optimum is an open-source library, and you are very free to submit a PR or to do this kind of request in the github issues!

7

krumb0y t1_j8r8mfl wrote

Why don't you build us a better alternative?

−6

qalis t1_j8r97h8 wrote

I do make PRs for those things. The average waiting time for review is about a few months. The average time to actually release it is even more. I both support and criticize Huggingface.

17

dahdarknite t1_j8usx4b wrote

It’s literally software that you don’t pay a dime for. Ok there’s bugs, but guess what? It’s fully open source so you can fix them.

As someone who maintains an open source project in my spare time, there’s nothing that irks me more than entitled users.

11

drinkingsomuchcoffee OP t1_j8uvzdr wrote

This is such a terrible attitude to have. This isn't about money at all.

You don't pay for many services. Does this mean they should be able to treat you like garbage? Should Google be able to lock you out of all your services because their automated system falsely accused you? By your logic, you don't pay so you have no right to be annoyed.

HuggingFace is a for profit company. They will be asking for your money now or in the future. This isn't a bad thing, they need to eat too.

By even existing, HuggingFace has disincentivized possibly more competent devs from creating their own framework. That's fine, but is a very real thing. In fact it's pretty common for a business to corner a market at a loss and then ratchet up prices.

Finally you may work for a company that chooses HuggingFace and you will be forced to use the library whether you want to or not.

−1

t0t0t4t4 t1_j8ria22 wrote

Is there a specific reason why you have to use Hugging Face?

10

[deleted] t1_j8rpl3f wrote

[deleted]

10

baffo32 t1_j8vt8ob wrote

if we start one we’ll either make something good or bump into the project we accidentally duplicated as we get popular

1

XPlutonium t1_j8qy684 wrote

I appreciate and respect your rant, have been there

However in interest of both of us getting some good out of this how about if you face an issue next, Open an issue? If you can fix it as a community contribution then gold standard, but even opening an issue will tell them where the problem is

While they’re trying to ‘hog’ the users for their experience it can also be looked at as a way of democratising AI. There were MANY ML APIs that I just used HuggingFace for because I don’t understand ML itself so just call Hug and get the job done. I can understand why it’s buggy when the ecosystem itself moves so fast that you have to add features faster than you can fix old ones

So you know I relate, so in interest of getting shit done so to say, let’s try to fix it. Opening an issue, fixing the issue, writing competitive similar libraries, EVEN AS LITTLE AS participating productively in the issues discussions or GitHub discussions (if there is) will actually be a step in direction of getting it done

8

andreichiffa t1_j8vqd42 wrote

It’s a RedHat for ML and especially LLMs. You want clean internals and things that work? You pay the consulting/on-premises fees. In the meantime they are pushing forwards FOSS models and supporting sharing and experimentation on established models.

I really don’t think you realize how much worse the domains that don’t have their HuggingFace are doing.

7

Ronny_Jotten t1_j8r9x0y wrote

Username checks out. Maybe cut down on the coffee.

6

Fit_Schedule5951 t1_j8scasi wrote

Well, huggingface is VERY convenient for inference. I work with speech, so if i need to train with existing/ new models, i always go back a established toolkit like fairseq/ espnet/ speechbrain etc.

6

Tawa-online t1_j8u8wtz wrote

so apart from Hugging Face what are the other alternatives you would suggest using?

6

baffo32 t1_j8vrc4d wrote

HuggingFace recently implemented a PEFT library that reimplements the core functionality of AdapterHub. AdapterHub had reached out to them to contribute and integrate work but this failed in February of last year ( https://github.com/adapter-hub/adapter-transformers/issues/65#issuecomment-1031983053 ). Hugging Face was asked how the work related to the old and it was so sad to see they had done it completely independently, completely ignoring the past outreach ( https://github.com/huggingface/peft/issues/92#issuecomment-1431227939 ). The reply reads to me as if they are implementing the same featureset, unaware that it is the same one.

I would like to know why this didn‘t go better. The person who spearheaded AdapterHub for years appears to be one of the most prominent PEFT researchers with published papers. It looks as if they are tossed out in the snow. I can only imagine management never learned of the outreach or equally likely they have no idea how to work with other projects to refactor concepts from multiple codebases together or don’t find it to be a good thing to do so. It would have been nice to at least see lip service paid.

The library and hub are not complex. Is there a community alternative conducive to code organization or do we need to start yet another?

Sometimes I think it would make sense to train language models to transform the code, organize it, merge things, using techniques like langchain and chatgpt, to integrate future work into a more organized system.

Projects where everyone can work together are best.

6

tysam_and_co t1_j8x4sjv wrote

I have been torn about Huggingface. They provide some wonderful services to the community, but unfortunately the API design is very unintuitive and hard to work with, as well as the documentation being outdated. Also, much of the design tries to accommodate too many standards at once, I think, and switching between them or doing other likewise things requires doing in-place operations or setting markers that permanently become part of an object instead of a chain that I can update with normal control flow operations.

This also includes that there are far too many external libraries as well that are installed with any hf stuff, and the library is very slow to load and to work with. I avoid it like the plague unless I'm required to use it, because it usually takes the most debugging time. For example, I spent well over half the time implementing a new method trying to debug huggingface before just shutting down the server because I had already spent an hour, hour and a half on tracing through the source code to try to fix it. And when I did, it was incredibly slow.

Now, that said, they also provide free models, and free access to datasets, like Imagenet. Do I wish it was an extremely light, fast, and simple wrapper? Yes. That would be great. But they do provide what they provide, and they put in a lot of effort to try to make it accessible to everyone. That's something that should not be ignored because of any potential personal beefs with the library.

All in all, it's a double-edged sword, and I wish there was a bit more simplicity, focus, self-containment, understandability and speed with respect to the hf codebase at large. But at the same time, I sincerely appreciate the models and datasets services that they offer to the community, regardless of the hoops one might have to add to get it. If one stays within the HF ecosystem, certain things are indeed pretty easy.

I hope if anyone from HF is reading this that this doesn't feel like a total dunk or anything like that. Only that I'm very torn because it's a mixed bag, and I think I can see that a lot of care really did go into a lot of this codebase, and that I think it really could be tightened down a ton for the future. There are positives about HF despite my beefs with the code (HF spaces included within this particular calculus at hand).

4

Didicito t1_j8rr4s7 wrote

Yeah, software is hard, specially if it involves cutting edge tech as the stuff published there. But I would consider it harmful ONLY if I detect monopolistic practices. If there are none I don’t have any reason to believe they are not doing their best and the rest of the world can try to build something better.

2

ZCEyPFOYr0MWyHDQJZO4 t1_j8slixf wrote

My (very limited) experience is that HF needs to provide a much more stable API for their "production"-level libraries. Marking a library with a version <1.0.0 as "production" quality then introducing breaking API changes in a minor release (0.x.0) shouldn't be done unless necessary.

2

Seankala t1_j8u8ogp wrote

I hear my colleagues complain about the same thing. And then go back to doing AutoModel.from_pretrained(sdfsdf).

2

outthemirror t1_j8xotjc wrote

This is like complaining Linux is bad because you have to debug various things

2

threevox t1_j8sakww wrote

Replicate demos actually work consistently because they’re containerized

1

mrdrozdov t1_j8t57wp wrote

Huggingface is amazing, and a really active community. Can always go to the forum for questions.

1

pc_4_life t1_j8uzqxo wrote

Huggingface is an incredible library that makes NLP tasks trivial. If you can't get the same code to work on multiple machines, that's on you. Learn how to use docker and containerize your code.

1

dancingnightly t1_j8y7fny wrote

"If you look at the internals, it's a nightmare. A literal nightmare."

Yes, the copy paste button is heavily rinsed at HF HQ.

But you won't believe how much easier they made it to run, tokenize and train models in 2018-19, and at that, train compatible models.

We probably owe a month of NLP progress just to them coming in with those one liners and sensible argument API surfaces.

&#x200B;

Now, yes, it's getting crazy - but if there's a new paradigm, a new complex way to code, then a similar library will simplify it, and we'll mostly jump there except for legacy. It'll become like scikit learn (although that still holds up for most real ML tasks), lots of finegrained detail and slightly questionable amounts of edge cases (looking at the clustering algorithms in particular), but as easy as pie to keep going.

&#x200B;

I personally couldn't ask for more. I was worried they were going to push auto-switching models to their API at some point, but they've been brilliant. There are bugs, but I've never seen them in inference(besides your classic CUDA OOM), and like Fit_Schedule5951 says, it's all about that with HF.

1

WarAndGeese t1_j90z7bb wrote

Shoutout to /r/huggingface/

1

pannous t1_j951ckx wrote

IDK for me the models always work out of the box. Not doing anything fancy though, just three liners: image to text, text to embedding...

1

muffdivemcgruff t1_j8r12hh wrote

Ever consider that in order to use these tools you need to build up your skills? I found huggingface after the Apple demo, I found it quite easy to incorporate models, just requires some skill in debugging.

−10