Youtube is a bad joke these days, and in particular the search algorithm is hilariously biased towards corporate/state propaganda, and if you're not searching, towards brain-dead Tik-Tok like content. China, amusingly enough, bans such content in its own country but is happy to promote it abroad. I see this as proof of my general conjecture that dumbing down the American population is something the global elites continue to pursue as a means of retaining their grasp on power.
As far as myself, I was shadow banned on Youtube well over a year ago, most likely for making cynical comments about the ratio of ads to content on several popular channels, and perhaps for making a habit of reporting to content creators when their content had been age-restricted for no apparent reason, and as a result I will never buy or purchase anything I see recommended to me in a Youtube ad, which I mostly avoid via ublock origin and youtube_dl, and I have nothing but contempt for the Alphabet/Google brand and all who ride with it.
We can do better than this, Europe's data privacy laws should be ported to the USA and all the creepy data merchants should be forced out of business post haste.
P.S. stop calling them FAANG, they're nothing but MAMAA... as in the undue influence of Big MAMAA in the tech sector, which also ties into their ancestor, Ma Bell.
I've noticed that unless you spell out channel names, you will only be shown videos from verified/checkmark accounts, no matter how far down you scroll. The only way to see videos made by "the people" is restricting search to last week or shorter time periods. Controversy-adjacent search terms are sanitized and rendered useless. Channels and videos are shadow banned so that even if you remember exact video titles you will never be able to find them.
If despite these measures you somehow end up on an interesting video, the recommendations sidebar is scrubbed clean and shows global popular videos instead of the usual similar videos watched by other people.
Clearly optimizing for total watchtime and avoiding controversy is not aligned with creating a great search and recommendation algorithm.
>I've noticed that unless you spell out channel names, you will only be shown videos from verified/checkmark accounts, no matter how far down you scroll.
>If despite these measures you somehow end up on an interesting video, the recommendations sidebar is scrubbed clean and shows global popular videos instead of the usual similar videos watched by other people.
Are you talking about all videos, or just controversy-adjacent videos? I know I've seen people online complaining in the past that YouTube sometimes leads to people going down a rabbit hole of crazy videos with the recommendations. If you're just referring to controversial videos, what you say sounds like YouTube fixed the rabbit hole problem. If you're referring to all videos, I haven't experienced that. I get mostly good recommendations.
I believe he was referring to politics-linked searched on YouTube.
> Are you talking about all videos, or just controversy-adjacent videos? I know I've seen people online complaining in the past that YouTube sometimes leads to people going down a rabbit hole of crazy videos with the recommendations. If you're just referring to controversial videos, what you say sounds like YouTube fixed the rabbit hole problem. If you're referring to all videos, I haven't experienced that. I get mostly good recommendations.
YouTube's solution consists of two things: upranking an entire class of videos/channels (mainstream media channels) and additionally deranking other channels/videos (I've had the experience of looking up a politician by full name and never finding his high-views Youtube channel unless I click Filters -> Channel). All of that, with zero transparency (giving us the lists of all artificially-reranked channels, and the new rank weights) nor government or at least independent oversight. At Google's scale, when they promote certain views over others, that poses problems of interference with elections and with the public mind (see Manufacturing Consent). The political-interference problem existed before (when the rabbit-hole effect ended up promoting Stefan Molyneux), but has simply been inverted in the other direction, basically promoting US ideological interests.
What YouTube is doing violates Popper's 'open society' principles.
I tried so hard to find a YouTube video I found in 2020 (here it is: https://www.youtube.com/watch?v=amUMV6xDqbo ) using YouTube search and Google search and couldn't. Using Bing, I find that video in minutes. Whether people ascribe it to incompetence or sinister motives, the end result is the same. I don't understand why there isn't public and democratic oversight over these algorithms. YouTube can literally sway elections by promoting some sources over others; and they demonstrably and blatantly promote mainstream media channels over independent and first-hand channels.
Is it very difficult? So many of these spam accounts follow the exact same format, with names following some variation of "Free stuff Telegram: +1234…" or "s3xY p1cs c0mE 2 mY chAnNeL", etc. Tons of them post these obviously made-up threads about how "Dr John Smith" has helped them triple their crypto investments.
There's probably a dozen or so templates that many of these spammers and scammers use, and if you can spot them in a second, it's not hard to train a model to recognize them also.
It is out of control because YouTube does not care to fix it, not because it's some insurmountably hard problem that one of the top AI companies in the world just can't figure out.
I can totally see that if you assume it’s simple and easy to fix that it must just be another case of people not caring. Yet here we are replying to a post about the not-ideal steps the company is taking. And further this is a problem for multiple large companies with user commenting and chat, from Google through Roblox, Twitch, Twitter and beyond. There are a few things that make naive solutions impractical:
* Cost to build them.
* Cost to keep them up to date in a highly adversarial environment. Note this might mean scraping the solution and starting again.
* Cost of running them at huge scale.
Particularly important is to understand the attack surface is so high and cost to the opposition is so low that people will defeat your approach recreationally just to spam obscenities.
There is a community project out there to filter and delete most of the spam and many creators use it. It was developed primarily by one person in their spare time. Google has no excuse.
The common view on HN seems to be that google products optimize for promotions, and not even executive or entry level promotions, primarily middle management promotions.
I don’t think it’s entirely true as hopefully the finance department has folks smart enough to realize if headcount is adding negative value or positive value.
Guy makes concrete suggestions, you respond with abstract rebuttals that add up to "it's too hard". All he's saying is low-hanging fruit is pluckable; pluck it.
Guy doesn’t understand why perceived low hanging fruit hasn’t been plucked and I explained why it’s not that simple or low hanging. That’s fine, naive filtering (now with added ML!) is the first idea everyone comes up with. Then they learn about the Scunthorpe problem and beyond.
Gmail, Twitter, Facebook and Reddit have all but eliminated spam comments, yet they are everywhere on YouTube.
Regarding scale: this is the company that scan every second of every video uploaded for copyrighted material, I'm sure their servers could handle a couple of (channel-configurable?) regular expressions per comment.
Regarding Scunthorpe: they clearly don't care about false positive flagging when it comes to ContentID and video reporting, so it's definitely not what's stopping them here. And if they suddenly do, they have the resources for human review and appeals like everyone else is doing.
Regarding low-hanging fruit: a handful of individuals have done this themselves already and it works great [0]. I think Linus Media Group started using this this on all their channels and the difference has been night and day.
Sure, it's hard to get around sophisticated operator., but most of the spam on YT
that I see is not sophisticated.
You wouldn't even need ML, for the majority of them.
There is actually OS tool[1] that can do this for individual creators. That tool, finds a lot of spam, that YT itself does not, and it's developed by a single guy.
So there definitively is plenty of opportunity to pluck the low hanging fruit.
But Goolge has probably 100 of PHD's developing some kind of ML, that ends up performing worse than some simple regexes, because you don't get promoted for simple solutions.
Why else has Google not cracked down on spam accounts? It has the tech, it knows it has a problem and the best remedy they can come up with is shortened YouTube usernames? There's more to it behind the scenes.
A real example is that I worked for a business where my job was to sell email addresses gathered from soft-core pornography websites. A batch of working emails could easily fetch £500.
Reddit will sell you accounts for your "campaign" if it profits reddit and it's all the same for any big walled garden.
> Then they learn about the Scunthorpe problem and beyond.
I comment pretty regularly on youtube to get random thoughts out of my head about the video I'm watching (I'm under no delusion that someone relevant will see my comment), and all the spam replies I get use unicode numbers in the username to share phone/telegram numbers. Circled numbers [0] are a favorite, though not the only ones they use. The Scunthorpe problem doesn't seem very relevant here.
It is an adversarial space, the reason the names look ridiculous is because they have evolved to escape pattern matching as it also evolved.
YouTube and Google are notoriously unhelpful when it comes to instances when they catch a human in their automation drag nets. They lack the capacity to support their platforms so they are probably being cautious not to create support and ill will they can't handle.
It’s hard, if you’re insisting on doing it the Google way i.e. ML+AI with zero humans involved.
Having a person keep an eye on the top 50 popular channels and spotting the patterns in commentor names would be very effective, but having a human is not Google
Exactly this is my guess too. Literally everyone is seeing these very obvious spam comments everywhere. Have a few people monitor comment sections manually and then maybe one guy who goes over the spam ones and come up with a regex-like matching rule to detect these and move them to the "potential spam" section of the creator's dashboard. It's gonna be whack-a-mole but it's rather responsive. I get it, they want to automate everything, but evidently that only gets you so far, your model will never be at 100%, and we didn't even get to false positives yet.
That means you're now selecting for all the spammers that are somehow evading the eyes of that crew. It doesn't matter whether they are doing it intentionally initially, this un-natural selection will inevitably lead to spammers figuring out how to not be seen by those Google people.
It sounds like you're assuming that Google sticks to a single approach and doesn't waver from it, despite spam still being present. They wouldn't do it like that, though, would they?
The point is more that, unless Google keeps continuously expanding their approach along every dimension, whichever spammers happen to be evading them at any given point will stick around and multiply.
But what are you suggesting then? Not even trying and accepting the fate of YouTube drowning in spam? Keep chasing the dream of the perfect spam fighting AI that's been just around the corner for the last 5 years?
Incidentally LTT just commented on the announced changes and rants about the prior situation at 15:20, for some insight from a Creator's perspective.
The thing about spam is that as it becomes more evasive, it's also less clear for the humans it was intended for. Once they raise the bar high enough, the spam will be gone.
Yeah, it's not working though. I bet an engineer that trolls the comments for spam every day, writing one manual filter a day would be pretty effective.
Same for Instagram and Twitter. The formats of the obvious spam is super simple to detect on both platforms for years.
Yet I haven't seen ANY progress on either platform. Assuming that they are much more competent than a random dude like me as a whole tech company, the only possible explanation is that they have incentive not to prevent spam.
Perhaps: spam = more notifications delivered = more app opens = more potential feed engagement since they've opened the app even if from a spam notification = more ads shown = more revenue = happier shareholders short time... all at the expense of losing long-term trust to the platforms.
Have you been working on actually detecting them or are you assuming they’re easy to detect? Did you go digging for false negatives or false positives in other languages? Considering that not all special characters are printable, these platforms don’t limit their users to ASCII, random foreign words are common in other languages, the false positive rate has to be pretty much zero, etc etc etc I’ll bet it’s a whole lot more sophisticated than you imagine.
About 90% of the public Youtube spam I see on YT still follows this ultra-simple template: one person asks something on the "top comment", another one replies with a faux-legitimate message.
The question is often "does anyone know how to pirate videos?", "...how to invest in crypto", "...how to get fake instagram followers", life coaches, etc.
It's not really custom messages with special chars, it's something that I could detect by using Ctrl+F in the browser. It's not something they'd need special tools on their end to catch. The only thing that is changing is the name of the users.
So yes, I'm gonna agree with GP that "the formats of the obvious spam" are super easy to detect.
I'm assuming they're easy to detect yet I'm super confident about the patterns I'm seeing that I'd bet money that it can be implemented.
False positives should ideally be zero yet algorithms don't have to ban right away.
If there's a relatively small team of people reviewing those spams manuallu after the algorithms flag, I don't see any problems with the approach.
I of course get there are many different corner cases, Unicode characters that look like other letters etc. to bypass the filters yet they're also very easy to implement a filter that would probably detect many spams right away even with many of the corner cases. But again, a human team would still review these tweets anyway giving no room for false positives.
>There's probably a dozen or so templates that many of these spammers and scammers use, and if you can spot them in a second, it's not hard to train a model to recognize them also.
I'm betting it's probably not hard the spammers and scammers to switch templates if old ones get caught.
I think a company like Google have the resources to dedicate a team to work on that permanently, but as I suspect, working in something like this doesn't get you enough merits, so it's better to work on a new shinny useless feature than an actual useful but not shinny feature.
They don't even need to change the name. I recall seeing a giveaway livestream scam in which the hacker didn't change the name of the channel at all and it was still bringing in btc.
> Unicode, as unpopular an opinion as this is, should only be used for presentation and not storage. Anything past ascii is a security vulnerability.
While at it, we should also ban the letters I and L as they can be confused. People who have those letters in their name can just request a name change. /s
Leave Unicode alone for the vast majority of the world whose language isn't actually writable using ASCII.
The difference is that we have a few good quality fonts, like liberation mono, which let you tell the difference between l and 1 and 0 and O. There can't be a font which does that for the whole of unicode space because a and а are by definition the same glyph.
> What's your proposal for differentiating between mais (but) and maïs (corn), or between café (French, Spanish) and cafè (Catalan) using ASCII?
Context, but...
> How would one write an ideographic language using ASCII?
... We're talking about security sensitive context. Users can write stories with all the Unicode they want, just don't use it for identifiers like user and channel names. Reddit got it right.
Because of course this band from "part of the world" has the same name as "other band from a different part of the world" or even from the same country and they have nothing to do with each other (because why google before naming your band - but I digress)
You can like, just normalize the input on the backend when scanning for spam.
Every spamfilter I've ever used that's worth a damn does that. Sometimes you get slightly odd syntax (I once configured one that replaces A with a 4 so it could detect common fake uses), but this is a solvable issue.
You don't need a human to even look at it until after an "AI" has already flagged it as "something fishy" worthy of human attention, and even then the font don't matter so much, because you'd be using some fancy scripting or tools to be pointing out to the user what exactly is potentially "fishy" about it. Seemingly insurmountable task becomes common "drudge-work" with the aid of some cleverly scripted tools. It's actually what computers are good at. Automating and simplifying repetitive tasks over large data sets. Something that Google's people should be well aware of and able to fairly easily implement. You could even script the tools such that the humans tasked with checking on the flagged materials would be helping further the training of the AI through their responses to the individual situations for less false positives, and more accurate flagging going forward, making the humans' jobs easier as they continue to do it.
Now you've just made name changes unbearbly slow for everyone to supposedly make like better for people who already have access to a keyboard with latin characters on it. If people can manage maths with greek letters they can manage logins with latin letters.
No, foreign names do exist. they problem is how to disallow confusables then. and this is solved by UTS 39, without needing to compare against all existing names. https://unicode.org/reports/tr39/
and use the secure subset of UTS 31.
you can eg use my libu8ident, and please dont use the broken confusables list.
That's an n^2 problem. For every new used you'd need to compare them to every already existing user. Do you think you ought to be charged $5 for every attempted registration?
>Mole is too close to Moie, please select another user name. Your bank account has already been charged.
> Seems like a very difficult classification problem.
I don't know, I've been seeing tons of low-effort comment spam on YT for months if not years. The kind where the comment says "read my username" and the username is "text me on telegram at xxxxx" where x is one of the Unicode numbers with a circle around it.
If only YT or their parent company had access to some kind of spam filtering expertise, built up over decades of real-world use...
If the change is to limit the character set to English then this is not a step in the right direction.
What do they mean by special characters? The example they give is "¥ouⓉube(emoticon)". That doesn't include characters with diacritics.
But if this means they ban channels that are actual human names like „Gică Raț“, then this is a terrible move.
People in the comments are suggesting the same should be done to comments. So, for example, people won't be allowed to talk about prices in Japan ("this costs 100¥").
Wow finally. Many well regarded youtubers that have been creating videos for years have been complaining about many of the problems youtube or its staff cause, this impersonation being a pretty big complaint. Took them long enough to fix it.
Works fine for older videos where the like/dislike count was archived. On new videos, not that much, since it's dependent on data from other users also using the plugin.
I think this was a concern for many channels. Jacksfilms, a channel that often quotes comments, even make fun of such phishing channel names in his videos.
That is such a narrow minded idea. There are 7 billion people who don't speak English. People should be able to write in their own language and use their own names.
Same with domain names. I'm a French speaker and I don't care if I have to renounce using accents in domain names. Nobody uses them for good reasons anyway. But it makes attacks so easy, see binance. Terrible, terrible idea.
Computer systems should conform to people, not the other way around. If you're terrible at detecting malicious sites, the answer isn't to force English on everyone, it's to write better systems.
> If you're terrible at detecting malicious sites, the answer isn't to force English on everyone, it's to write better systems
Yeah good idea, I'll tell my grandmother it's her fault if "bank.tld" with "CYRILLIC SMALL LETTER A" instead of latin "a", and with a valid HTTPS cert, stole her banking credentials. What an idiot!
I just don't see how there isn't a manual solution that could solve this. I get it, its way easier in the future if the computer can do the work for you, but it seems highly unlikely that the number of channels/users being created daily with "problematic" characters in their names is not within the realm of "manually reviewable".
They should also figure out how to stop channel hijacking and the shillers playing that Elon Musk crypto interview which keeps showing up in my recommendations...
I posted a SpaceX live feed launch on my work Slack channel, pretty close to the end, and when it ended it redirected to one of those crypto scams and someone complained I posted a crypto scam on the channel.
To be fair to the person a lot of people open things in a tab and look at them later.
YouTube's default behavior when a stream ends is to redirect you to the first video (or stream) from the sidebar of suggested/related videos -- whatever that may be.
This is a pretty serious misfeature, IMO. But it is what it is.
Subscriber and view counts are a huge factor in whether a video gets recommended and scammers are able to easily exploit that using bots. Some simple commonsense would go far here. For one they could restrict streaming if a new account quickly gains tens of thousands of subscribers. That seems to be a common theme among the scammer accounts I've seen.
Just because something's in Wiktionary, doesn't mean it's legitimate usage. Mind you, I think I should give up being astounded by what passes for English amongst Americans these days. Only t'other day I got pulled up for dropping my monocle at the word "Eventuated"
Talk about "two peoples separated by a common language"!
As far as myself, I was shadow banned on Youtube well over a year ago, most likely for making cynical comments about the ratio of ads to content on several popular channels, and perhaps for making a habit of reporting to content creators when their content had been age-restricted for no apparent reason, and as a result I will never buy or purchase anything I see recommended to me in a Youtube ad, which I mostly avoid via ublock origin and youtube_dl, and I have nothing but contempt for the Alphabet/Google brand and all who ride with it.
We can do better than this, Europe's data privacy laws should be ported to the USA and all the creepy data merchants should be forced out of business post haste.
P.S. stop calling them FAANG, they're nothing but MAMAA... as in the undue influence of Big MAMAA in the tech sector, which also ties into their ancestor, Ma Bell.