In the relentless march of technological advancement, AI, specifically generative AI, has emerged as a transformative force. While there's much discussion about bias and ethics in AI models, there's less talk about the ethics surrounding these AI models' use of private data without regard for the intellectual and ethical property rights of its owners. Through conversations with my daughter, I've realized that this incredibly important issue deserves more attention.
According to a recent research paper, we could run out of "fresh" data (data no Gen-AI model has seen) in less than two years. For context, ChatGPT3 was pre-trained on "570 GB of text culled from open sources, including BookCorpus (Book1 and Book2), Common Crawl, Wikipedia, and WebText2," while ChatGPT4 was trained on over 20 times that amount of data. Now that public training data is getting exhausted, the next frontier is to train on private data within enterprises.
I believe there has not been enough discussion about the ethics behind ChatGPT and other large language models (LLMs) using private data for training purposes without the data owners' permission.
Insofar as there has been discussion around this issue, it has revolved around the IP of authors and creators being used without the creators' permission. On September 23, the Authors Guild and 17 authors filed a class-action suit against OpenAI for copyright infringement of their works of fiction on behalf of a class of fiction writers whose works have been used to train GPT.
The news about Scarlett Johansson's voice being cloned by OpenAI without her explicit permission has gotten a lot of publicity. If a celebrity of her stature could have her identity co-opted so easily, what hope do ordinary folks like me have?
Businesses across industries are leveraging AI to streamline operations, enhance decision-making and drive innovation. However, beneath the surface of this technological revolution lies a complex and ethically charged question: To what extent should companies utilize employee data to train their AI systems?
The practice of collecting and analyzing employee data is not new. For years, companies have gathered information on employee performance, communication patterns and productivity metrics to inform HR decisions and strategic planning. But as AI capabilities have soared, so too has the potential for data exploitation.
With employees' digital footprint scattered across the company’s intranet in the form of thousands of emails, hundreds of articles and the countless hours of Zoom calls we participate in at work, we leave breadcrumbs for some digital entity to follow, to learn from and potentially become us. I believe it may now be possible for a company to use this treasure trove of information to create a digital twin—an AI version of an employee—be it an individual contributor or a manager, that could work 24/7 without complaint.
This possibility raises many ethical questions for me, such as: Would this digital twin be entitled to compensation? Would I have any say in how it will be used? If my digital twin continued to learn and evolve based on my data long after I had left the company, should I receive ongoing royalties?
These questions led me to dig deeper into the terms of employment contracts I'd blindly accepted over the years. Buried in the legalese of one 56-page document, I found vague clauses about data usage rights that seemed to give companies carte blanche to use my information however they saw fit. It was a sobering realization: I had effectively signed away my digital rights without even realizing it.
As I delved further into this issue, I discovered I wasn't alone in my concerns. Privacy advocates, ethicists and even some tech insiders are raising alarms about the unchecked collection and use of personal data for AI training. They argue for greater transparency, more stringent regulations and clearer consent processes. But the reality is, we're already deep into this brave new world. Our digital selves are being used to train the very AI systems that might one day replace us in the workforce. The question now is: How do we regain control of our digital identities?
And beyond ethical concerns, many governments and privacy advocates have been raising various legal issues. As AI technology continues to evolve, it is imperative that lawmakers keep pace. Robust data privacy regulations are essential to protect employees' rights and prevent the misuse of their personal information. While the potential benefits of using employee data to train AI systems are undeniable, it is crucial to proceed with caution. Companies must prioritize transparency, consent and fairness in their data practices. Employees deserve to know how their data is being used and have a say in the matter.
Perhaps it starts with awareness. By understanding the value of our data and the potential implications of its use, we can begin to demand better protections and fairer compensation. Maybe it's time for a new social contract for the AI age, one that recognizes our digital selves as extensions of our physical beings, deserving of the same rights and protections. Because in this rapidly evolving digital landscape, we need to ask ourselves: Are we unknowingly training our own replacements? And if so, shouldn't we at least be getting paid for it?