Avatar

AdComfortable1514

AdComfortable1514@lemmy.world
Joined
7 posts • 8 comments
Direct message

Simple and cool.

Florence 2 image captioning sounds interesting to use.

Do people know of any other image-to-text models (apart from CLIP) ?

permalink
report
reply

Wow , yeah I found a demo here: https://huggingface.co/spaces/Qwen/Qwen2.5

A whole host of LLM models seems to be released. Thanks for the tip!

I’ll see if I can turn them into something useful 👍

permalink
report
parent
reply

That’s good to know. I’ll try them out. Thanks.

permalink
report
parent
reply

Hmm. I mean the FLUX model looks good

, so there must maybe be some magic with the T5 ?

I have no clue, so any insights are welcome.

T5 Huggingface: https://huggingface.co/docs/transformers/model_doc/t5

T5 paper : https://arxiv.org/pdf/1910.10683

Any suggestions on what LLM i ought to use instead of T5?

permalink
report
parent
reply

New stuff

Paper: https://arxiv.org/abs/2303.03032

Takes only a few seconds to calculate.

Most similiar suffix tokens : "vfx cleanup |warcraft |defend |avatar |wall |blu |indigo |dfs |bluetooth |orian |alliance |defence |defenses |defense |guardians |descendants |navis |raid |avengersendgame "

most similiar prefix tokens : “imperi-blue-|bluec-|war-|blau-|veer-|blu-|vau-|bloo-|taun-|kavan-|kair-|storm-|anarch-|purple-|honor-|spartan-|swar-|raun-|andor-

permalink
report
reply

I count casualty_rate = number_shot / (number_shot + number_subdued)

Which in this case is 22/64 = 34% casualty rate for civilians

and 98/131 = 75% casualty rate for police

permalink
report
parent
reply

So its 64-131 between work done by bystanders vs. work done by police?

And casualty rate is actually lower for bystanders doing the work (with their guns) than the police?

permalink
report
reply