Publishers Always Innovating(mander.xyz)

posted 2 months ago

fossilesque@mander.xyzM

science_memes@mander.xyz

39 commentshide report

Sort:

Hot Top Controversial New Old

You are viewing a single thread.

View all comments View context

[ - ]

keepthepace@slrpnk.net

2 points

2 months ago

Yes, PDFs are much more permissive and may not have any semantic information at all. Hell, some old publications are just scanned images!

PDF -> semantic seems to be a hard problem that basically requires OCR, like these people are doing

permalink

report

parent

[ - ]

JackbyDev@programming.dev

2 points

2 months ago

Oh nice, thanks for sharing that project. I haven’t heard of it before!

permalink

report

parent

[ - ]

thevoidzero@lemmy.world

1 point

2 months ago

Not just semantics. PDFs doesn’t even have segmentations like spaces/lines/paragraph. It’s just text drawn at locations the text processor/any other softwares inserted into. Many pdf editor softwares just detect the closeness of the characters to group them together.

And one step further is you can convert text to path, which basically won’t even have glyph (characters) info and font info, all characters will just be geometric shapes. In that case you can’t even copy the text. OCR is your only choice.

PDF is for finalizing something and printing/sharing without the ability to edit.

permalink

report

parent

Science Memes

!science_memes@mander.xyz

Create post

Welcome to c/science_memes @ Mander.xyz!

A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.

Rules

Don’t throw mud. Behave like an intellectual and remember the human.
Keep it rooted (on topic).
No spam.
Infographics welcome, get schooled.

This is a science community. We use the Dawkins definition of meme.

Research Committee

!spiders@lemmy.world

Other Mander Communities

Science and Research

Biology and Life Sciences

Physical Sciences

Humanities and Social Sciences

Practical and Applied Sciences

Memes

Miscellaneous

Community stats

10K
Monthly active users
3.2K
Posts
51K
Comments

Welcome to c/science_memes @ Mander.xyz!

Research Committee

Other Mander Communities

Science and Research

Biology and Life Sciences

Physical Sciences

Humanities and Social Sciences

Practical and Applied Sciences

Memes

Miscellaneous

Community stats

Community moderators