• 0 Posts
  • 29 Comments
Joined 6 months ago
cake
Cake day: June 29th, 2025

help-circle
  • Ulefone is a tiny Chinese company that employs between 500 and 1000 employees. It’s far, far, far from being a “corporation”.

    They market themselves as approachable and say they have a 24/7 chat available. Sure, it’s probably a bot you’d start talking to, but potentially end up talking to a human.

    But, well, you do you. Bought a phone, spent months troubleshooting on your own, and now want to stop using it completely - if you feel good about it, it’s all good.

    But I were in your shoes, I’d probably at least see what happens if I contact support.












  • Spreading knowledge via machine translation where there are no human translators available, had to be better than not translating

    Have you not read my entire comment…?

    One of the Greenlandic Wiki articles “claimed Canada had only 41 inhabitants”. What use is a text like that? In what world is learning that Canada has 41 inhabitants better than going to the English version of the article and translating it yourself?

    Perhaps part of the solution is machine readable citations

    The contents of the citations are already used for training, as long as they’re publicly available. That’s not the problem. The problem is that LLMs do not understand context well, they are not, well, intelligent.

    The “Chinese Room” thought experiment explains it best, I think: imagine you’re in a room with writing utensils and a manual. Every now and again a letter falls in to the room through a slit in the wall. Your task is to take the letter and use the manual to write a response. If you see such and such shape, you’re supposed to write this and that shape on the reply paper, etc. Once you’re done, you throw the letter out through the slit. This goes back and forth.

    To the person on the other side of the wall it seems like they’re having a conversation with someone fluent in Chinese whereas you’re just painting shapes based on what the manual tells you.

    LLMs don’t understand the prompts - they generate responses based on the probability of certain characters or words or sentences being next to each other when the prompt contains certain characters, words, and sentences. That’s all there is.

    There was a famous botched experiment where scientists where training an AI model to detect tumours. It got really accurate on the training data so they tested it on new cases gathered more recently. It gave a 100% certainty of a tumour being present if the photograph analysed had a yellow ruler on it, because most photos of tumours in the training data had that ruler for scale.

    But even then you have huge gaps on one side with untrustworthy humans (like comedy) and on the other side with machine generated facts such as from a database

    “Machine generated facts” are not facts, they’re just hallucinations and falsehoods. It is 100% better to NOT have them at all and have to resort to the English wiki, than have them and learn bullshit.

    Especially because, again, the contents of the Wikipedia are absolutely being used for training further LLM models. The more errors there are, the worse the models become eventually leading to a collapse of truth. We are already seeing this with whole “research” publications being generated, including “source” material invented on the spot, proving bogus results.


  • I’m not “trying to be nice to minority languages”, I’m directly pushing back against the chauvinistic idea that the English Wikipedia is so important that those without it are somehow inferior. There is no “doom spiral”.

    I think you missed the problem described here.

    The “doom spiral” is not because of English Wiki, it has nothing to do with anything.

    The problem described is that people who don’t know a “niche” language try to contribute to a niche Wiki by using machine translation/LLMs.

    As per the article:

    Virtually every single article had been published by people who did not actually speak the language. Wehr, who now teaches Greenlandic in Denmark, speculates that perhaps only one or two Greenlanders had ever contributed. But what worried him most was something else: Over time, he had noticed that a growing number of articles appeared to be copy-pasted into Wikipedia by people using machine translators. They were riddled with elementary mistakes—from grammatical blunders to meaningless words to more significant inaccuracies, like an entry that claimed Canada had only 41 inhabitants. Other pages sometimes contained random strings of letters spat out by machines that were unable to find suitable Greenlandic words to express themselves.

    Now, another problem is Model Collapse (or, well, a similar phenomenon in strictly in terms of language itself).

    We now have a bunch of “niche” languages’ Wikis containing such errors… that are being used to train machine translators and LLMs to handle these languages. This is contaminating their input data with errors and hallucinations, but since this is the training data, these LLMs consider everything in there as the truth, propagating the errors/hallucinations forward.

    I honestly have no clue where you’re getting anything chauvinistic here. The problem is imperfect technology being misused by irresponsible people.