The developers of the Manjaro Linux distribution, built on the basis of Arch Linux and aimed at beginners, announced the beginning of testing a new service MDD (Manjaro Data Donor), designed to collect statistics about the system and send it to the external server of the project. The author of the MDD intended to enable telemetry by default (opt-out), but the decision has not yet been approved and, judging by the objections of some developers and users, it is likely that telemetry will be offered as an option requiring prior consent of the user (a request to enable telemetry is proposed to be added to the greeting interface after the first download).
The report includes data such as host name, kernel version, desktop component versions, detailed information about hardware and drivers involved, screen size and resolution information, network device MAC addresses, disk serial numbers, disk partition data, information about the number of running processes and installed packages, versions of basic packages such as systemd, gcc, bash and PipeWire.
The sent data is stored on the project server in the ClickHouse database and visualized using the Grafana platform. The IP addresses of users are not stored, and the hash from the /etc/machine-id
file is used as the system identifier.
Аccording to the code https://github.com/manjaro/mdd/blob/master/mdd.py#L40 sends everything.
I don’t get why someone would use Manjaro after so many fuckups… If you don’t know what I’m talking about, you’re either too new to Linux or don’t care. Just look for “manjaro certificates” or “manjaro drama” and you’ll find out for yourself.
Don’t like it, don’t opt in
Even Debian has popcon
There are lots of benefits for developers to gather telemetry.
Don’t like that? Fork and do your own distro (presumably though you don’t contribute anything to open source, so id expect such people to simply whine and get angry at contributors)
Debian popcon is opt-in, first of all.
Q) What information is reported by popularity-contest ?
A) popularity-contest reports the system vendor [1], the system architecture you use, the version of popularity-contest you use and the list of packages installed on your system. For each package, popularity-contest looks at the most recently used (based on atime) files, and reports the filename, its last access time (atime) and last change time (ctime). However, some files are not considered, because they have unreliable atime. For privacy reasons, the times are truncated to multiple of twelve hours.
[1] i.e. the dpkg Vendor field, see dpkg-vendor(1).
So no fucking MAC addresses and machine-ids and harddrive serial numbers and stuff.
They only want package statistics, the point being to have statistics about the popularity of packages, mainly so they can be prioritized for the CD/DVD isos. You know, information that actually has a use, not hardware identifiers that can only be used for tracking purposes.
Each popularity-contest host is identified by a random 128bit uuid (MY_HOSTID in /etc/popularity-contest.conf). This uuid is used to track submissions issued by the same host. It should be kept secret.
Oh, and by default, IP, unless usetor is enabled
A machine I’d is just a hash too
Can you explain to me how you track Mac address, serial numbers over the internet.
Just fyi, the backend project I made 20 years ago was hardware related. There’s potential reasons to grab this info…
But, if it is a concern, I’m sure they’d welcome submissions to improve the parsing and allow things to be filtered.
In fact, popcon could be used for digital fingerprinting technically
In all likelihood, op never spoke to the manjaro developers either
Yeah, my only concern here was if it was opt-out. That’d be bad.
Now I completely understand the developer on this. This is useful info to have to help decide future changes/features and general direction, but balancing the right to privacy means this kind of data provision should ALWAYS be opt-in. Microsoft, you hearing me here?
- users can be identified
- probably Opt-out (still in discussion)
Two nogos combined makes nonogogos. Why do they need host name, MAC address and disk serial numbers? Why can’t people set how much they want to send in, like KDE Plasma does? Will the data be shown to the user before its send in? Steam does that perfectly (show data and its opt-in) and that is even a proprietary application. Telemetry is okay if its done right, without user identification, opt-in and not hiding whats sent, preferably in multiple levels of what is being send.
I used Manjaro before and switched to EndeavorOS because I was not happy. Now I am. Manjaro can’t stop being stupid (not the users, I’m not attacking any user here, only the maintainers or developers of Manjaro).
The way I read it, the developer wanted opt-out but it’s likely it will be opt-in. I’m find with opt-in and vehemently against opt-out for telemetry.
I would prefer the information was statistical only. Rather than hostname (making the assumption they only want hostname to be able to somehow separate the data to follow changes over time), a much better idea would be some kind of hash based on information unlikely to change, but enough information that it would be unlikely possible to brute-force the original data out of the hash. So all they know is, this data came from the same machine, but cannot ID the machine. Maybe some kind of unique but otherwise untrackable unique ID is created at install time and ONLY used for this purpose and no other.
This may be illegal in EU if they don’t use opt in. Even then it may be illegal for under 18 year olds to collect MAC addresses and disk serial numbers, as those can potentially be used for identification.
The data is anonymized, and the IP is NOT stored. So I’m not sure this violates GDPR?
From the code we can see the machine ID is anonymized, sending only a SHA256 checksum.
def get_hashed_device_id():
# Read the machine ID
with open("/etc/machine-id", "r") as f:
machine_id = f.read().strip()
# Hash the machine ID using SHA-256 to anonymize it
hashed_id = hashlib.sha256(machine_id.encode()).digest()
# Convert the first 16 bytes of the hash to a UUID (version 5 UUID format)
return str(uuid.UUID(bytes=hashed_id[:16], version=5))
This makes it somewhat a nothingburger IMO.
That’s not anonymous, that’s pseudonymous.
What is the point of this? The machine-id already looks to be some unique random number, so you’re calculating another unique random-looking number from that, might as well use the original number.
You can’t glean any useful information from a unique random-looking number that would help with developing Manjaro. You can’t calculate any statistics from that. The only use is tracking.
Edit: And as mentioned in my other comment, reversing the MAC SHA by brute force is trivial, so that one at least (and possibly the other hardware serial numbers they collect) shouldn’t even be considered pseudonymous.
Nah, it’s still considered Personal Data under GDPR, because it’s possible to connect to natural persons. So GDPR applies. And this is illegal, there is no legal basis for processing this data.
because it’s possible to connect to natural persons.
That’s debatable, and is only based on the claim that it’s just a 24bit decoding that can be brute forced. I don’t know for a fact that it’s true that it can be boiled down to 24bit.
I checked my own /etc/machine-id, and the folder doesn’t even exist, so what exactly is supposed to be in it IDK. And yes I use Manjaro.
I edited my comment on your other reply and by my estimation, calculating every SHA256 of all MACs ever potentially issued takes less than 89 seconds on an RTX 3090.
I also think MACs are (or should be considered) personally identifiable information, since there is potentially a paper trail back to the person who bought it. Plus MACs are not secret information, it’s broadcast on the LAN and for wireless modules over the air in the immediate vicinity (though some systems will randomize wireless MACs for privacy reasons). Privacy-unfriendly software has been known to collect MACs (even from other devices on the network and in the vicinity), so there are already databases connecting MAC addresses with other data.
I just don’t see a good reason to use Manjaro and many reasons not to.
Like if you’re going to use Arch btw, go all the way and use actual Arch.