DeepSeek collects keystroke data and more, storing it in Chinese servers

@[email protected] · 5 months ago

DeepSeek collects keystroke data and more, storing it in Chinese servers

@[email protected] · 5 months ago

deleted by creator

@[email protected] · 5 months ago

Yes? Is it a surprise that a Chinese company stores it’s data on a Chinese server?

@[email protected] · edit-2 5 months ago

Not excusing Chinese companies but everyone does the same shit. I bet a lot of US companies that behave the same or worse will be looking for trade barriers to protect their business so their interests will be stoking fear of Chinese competitors. I don’t really give a shit which country is doing it, I am not buying what they are selling.

US companies have a stranglehold on government, education and business and are getting access to my families data despite my personal objections. Far more concerned about that than a Chinese service I have no intention of using.

Deepseek can at least be self hosted if you want AI in your life. I can happily live without it.

@[email protected] · 5 months ago

it is open-source, if they did something like this, we would know it for sure

@[email protected] · 5 months ago

Everyone must ask to see Xi jing jing ping pong nudes! But without mentioning Xi or nudes.

That would be a great way of poisoning their plans.

@[email protected] · 5 months ago

Yeah, uh… If you think that American companies aren’t doing this same thing and handing your data over to the government without a warrant among other bad uses, I have some bad news for you. This is pretty much par for the course, and I’m pretty sure that we’re witnessing a well financed negative media blitz happening to try and keep OpenAI from getting all of its spaghetti spilled. Watch for the government to try and ban deepseek for “national security” reasons soon.

@[email protected] · 5 months ago

Not gonna happen. Someone in China gave to Trump’s inauguration fund, so nothing’s getting banned.

@[email protected] · 5 months ago

I swear people do not understand how the internet works.

Anything you use on a remote server is going to be seen to some degree. They may or may not keep track of you, but you can’t be surprised if they are. If you run the model locally, there is no indication it is sending anything anywhere. It runs using the same open source LLM tools that run all the other models you can run locally.

This is very much like someone doing surprised pikachu when they find out that facebook saves all the photos they upload to facebook or that gmail can read your email.

@[email protected] · 5 months ago

The telephone company knows your phone number!

Riley · 5 months ago

They’re desperate to manufacture consent against their competition

@[email protected] · 5 months ago

No, this is just propaganda

billwashere · 5 months ago

If I’m typing into the app, is that really collecting keystrokes?

@[email protected] · edit-2 5 months ago

And why is that an issue? It’s typing data sent to a language model. What nefarious info might they be looking for? Learning to imitate humans? Fingerprinting? Making the best virtual keyboard asmr?

@[email protected] · edit-2 5 months ago

If you’ve got nothing to hide you don’t have to worry ?

edit : For clarification, i consider “If you’ve got nothing to hide you don’t have to worry” to be a naive argument, at best, in any privacy conversation, but I’m not averse to a well-reasoned argument to the contrary.

The wording here was unclear, what i mean to ask was:

“do you believe If you’ve got nothing to hide you don’t have to worry ?”

@[email protected] · 5 months ago

Fair, but I don’t know what exactly I’d be hiding here

@[email protected] · 5 months ago

Also fair.

@[email protected] · 5 months ago

Beware! Anything you type into a Google search is sent to Google’s servers!

@[email protected] · 5 months ago

i mean…yes? that is generally how search platforms work.

I wouldn’t recommend anybody use any google based stuff directly (or at all, if possible) but if you do, then sending the search query is generally what would happen.

@[email protected] · 5 months ago

That’s the point. There is nothing strange or shady about the fact that things you type into DeepSeek.com are sent to DeepSeek.com. Obviously keystrokes you submit to a website are submitted to the website.

@[email protected] · 5 months ago

Oh yeah, the whole article could be reductively summed up as

“DeepSeek and all the other LLM services are almost as bad as each other, but we think deepseek is worse…because the Chinese government are known for doing bad things”.

The title is factual, if a little clickbaity.

Obviously keystrokes you submit to a website are submitted to the website.

This though, it’s not technically accurate, a lot of forms and input are done client side and then the resulting information is parceled up and sent to the server.

The actual keystroke data isn’t normally sent.

Though this article doesn’t go in to what kind of keystroke data is sent, if it was something more than just which keys in which order then that’s perhaps an indicator that it’s actively being collected for a reason, rather than just incidentally.

If you want to get really paranoid about such things it’s known that you can you can do interesting things with actual keystroke data.

Also, afaict none of the the non-chinese services have specified that they don’t do this.

@[email protected] · 5 months ago

I shouldn’t have anything to hide, but I’m part of a group the current fascist leadership in government want’s to eradicate, so hide I shall.

That said, I also feel like people acting like the remote server they are connected to is tracking what you do on it as some kind of surprise is so stupid. “Facebook is keeping track of the pictures I uploaded to it!!!” There’s a lot of stuff to complain about Facebook, google, or whoever, but them tracking stuff you send to them willingly isn’t one of them.

@[email protected] · 5 months ago

I shouldn’t have anything to hide, but I’m part of a group the current fascist leadership in government want’s to eradicate, so hide I shall.

I agree and i think a lot of people who espouse “nothing to hide” as an approach haven’t actually thought it all the way through.

Then there’s the fascists, dictators, oligarchs and other all around shitbags who just want the control.

That said, I also feel like people acting like the remote server they are connected to is tracking what you do on it as some kind of surprise is so stupid. “Facebook is keeping track of the pictures I uploaded to it!!!” There’s a lot of stuff to complain about Facebook, google, or whoever, but them tracking stuff you send to them willingly isn’t one of them.

This always surprises me, i originally thought it was because people didn’t understand how these things work or how capitalist companies work.

More and more it seems like people don’t care until it affects them, which is somewhat understandable, it takes effort to care about this stuff and a lot of people will never be directly affected by the consequences.

What i do still think is that the general population has no idea the extent of what can be done with all of the information they are volunteering.

That’s very slowly changing but the usages of the data are also increasing at a much more rapid pace than before.

@[email protected] · edit-2 5 months ago

Any ChatAI logs your keystrokes and your inputs to work and update their LLM. The PP and TOS is the same and even better as those from the US competitors. DeepSeek is OpenSource

Anyway I prefer Andisearch and its PP, the best of all these big tech AIs.

@[email protected] · 5 months ago

Is Deepseek Open Source?

Hugging Face researchers are trying to build a more open version of DeepSeek’s AI ‘reasoning’ model

Hugging Face head of research Leandro von Werra and several company engineers have launched Open-R1, a project that seeks to build a duplicate of R1 and open source all of its components, including the data used to train it.

The engineers said they were compelled to act by DeepSeek’s “black box” release philosophy. Technically, R1 is “open” in that the model is permissively licensed, which means it can be deployed largely without restrictions. However, R1 isn’t “open source” by the widely accepted definition because some of the tools used to build it are shrouded in mystery. Like many high-flying AI companies, DeepSeek is loathe to reveal its secret sauce.

Autonomous User · edit-2 5 months ago

When will Hugging Face replace BLOOM’s anti-libre software license?

@[email protected] · 5 months ago

Ok, so they’ll ban it under that guise to appease US companys, same as TikTok. I really didn’t care about TikTok since it’s all brain rot to me but this might actually be a tool I’ll use if it’s as efficient as they say.

Good thing I can run it locally, I guess.

Queen HawlSera · 5 months ago

So I won’t use this for the same reason I don’t use any AI? Cool

@[email protected] · 5 months ago

Doubtful, since it’s both open source and you can run it locally. This seems more like a smear piece.

@[email protected] · 5 months ago

This article is about the app, which does not run the model locally. Why would you doubt that a Chinese app which openly claims they send your data to China, actually does so?

@[email protected] · 5 months ago

It seems like a smear piece because it makes it sound like DeepSeek is doing something that the others aren’t, while the truth is that ever single on of them collects your data.

At best, it’s disingenuous. At worst, with the ability to run locally, it’s a blatant lie.

@[email protected] · 5 months ago

What would you have preferred? “Most apps sell your data, news at 11”? Would anyone care if it was written like that?

@[email protected] · 5 months ago

Ah yes, selling your integrity for clicks and pushing propaganda for cash, welcome to the information age.

@[email protected] · 5 months ago

I think you’re incorrectly assuming that everyone knows they all do it. I see nothing wrong with raising awareness.

@[email protected] · 5 months ago

detective conan sure had a hard time cracking the case!

“The personal information we collect from you may be stored on a server located outside of the country where you live. We store the information we collect in secure servers located in the People’s Republic of China,” the privacy policy reads.

Oh the horror! Let’s look at what our glorious spawns-of-techbro heroism has for us in store:

ChatGPT:

spoiler

OpenAI processes your Personal Data for the purposes described in this Privacy Policy on servers located in various jurisdictions, including processing and storing your Personal Data in our facilities and servers in the United States. While data protection law varies by country, we apply the protections described in this policy to your Personal Data regardless of where it is processed, and only transfer that data pursuant to legally valid transfer mechanisms.

Claude:

spoiler

When you access our website or Services, your personal data may be transferred to our servers in the US, or to other countries outside the European Economic Area (“EEA”) and the UK. This may be a direct provision of your personal data to us, or a transfer that we or a third party make.

So not only is your data “possibly” stored in one country, now there’s a possibility of it being stored in many different countries. Where’s the outcry for that?

Ok, so maybe your data being under the jurisdiction of another country is sus, right?

In another section about how DeepSeek shares user data, the company states that it may share user information to “comply with applicable law, legal process, or government requests.”

OH MY GOD SOUND THE ALARM!

ChatGPT:

spoiler

We may use Personal Data for the following purposes: […] To comply with legal obligations and to protect the rights, privacy, safety, or property of our users, OpenAI, or third parties.

Claude:

spoiler

Pursuant to regulatory or legal requirements, safety, rights of others, and to enforce our rights or our terms. We may disclose personal data to governmental regulatory authorities as required by law, including for legal, tax or accounting purposes, in response to their requests for such information or to assist in investigations. We may also disclose personal data to third parties in connection with claims, disputes or litigation, when otherwise permitted or required by law, or if we determine its disclosure is necessary to protect the health and safety of you or any other person, to protect against fraud or credit risk, to enforce our legal rights or the legal rights of others, to enforce contractual commitments that you have made, or as otherwise permitted or required by applicable law.

So not only can your data be subject to the authorities, but it’s also handed out to 3rd parties (mind you, DeepSeek does the exact same, so why is it any surprise?).

Not only does DeepSeek collect “text or audio input, prompt, uploaded files, feedback, chat history, or other content that [the user] provide[s] to our model and Services,” …

🤦… You get the idea now, bother yourself with the privacy policies of the respective contemporaries and CTRL + F to “User Content” or “User Input”… Same fucking shit.

Companies with AI models like Google, Meta, and OpenAI collect similar troves of information, but their privacy policies do not mention collecting keystrokes.

Yes, collecting keystrokes is probably the oddest thing here. To compare data farming giants with a decade and a half’s worth of data collection to a startup in terms of data collection is so astronomically dumb.

I could go on but I’m bored now. Do your own research.

@[email protected] · 5 months ago

Not quite on topic but semi related… It’s reasons like this that I started reading privacy policies many times before signing up for a service.

People would be surprised at some of the extremely concerning things are listed in there. Some is for good reason but some stuff is absolutely unnecessary and should be an issue for some people.

@[email protected] · 5 months ago

off-topic here as well, why stop at privacy policies? EULAs can get wilder, best such example of which is Apple:

@[email protected] · 5 months ago

Lmaooooo great find. I wonder why exactly they had to clarify that? Maybe a semi Easter egg? Or a genuine concern? Thanks for sharing.

@[email protected] · 5 months ago

The way this is worded, technically you’re not allowed to use a Mac for designing a 3D printed nerf dart.