Notebook or My Personal Learning Network
14.3K views | +13 today
Follow
 
Scooped by Gilbert C FAURE
onto Notebook or My Personal Learning Network
April 2, 1:13 PM
Scoop.it!

Steady rise: female scientists & engineers reach 7.9 mln | Eurostat

Steady rise: female scientists & engineers reach 7.9 mln | Eurostat | Notebook or My Personal Learning Network | Scoop.it
The number of women working as scientists and engineers in the EU reached 7.9 million in 2024, representing 40.5% of the scientists and engineers’ workforce across all economic activities. 🧑‍🔬🔬

Across EU regions, highest shares in:

đŸ‡Ș🇾 Canarias (58.8%)
đŸ‡”đŸ‡č RegiĂŁo AutĂłnoma dos Açores (57.3%)
đŸ‡”đŸ‡č Madeira (56.4%)

Lowest in:

🇭đŸ‡ș KözĂ©p-MagyarorszĂĄg (30.0%)
đŸ‡«đŸ‡ź Manner-Suomi (30.7%)
🇼đŸ‡č Sud (31.1%)

â„č Please note that the map includes available regional data from EU countries, EFTA and candidate countries. The ranking in the caption of the post is based on data from EU countries only.

Learn more 👉 https://lnkd.in/eHQqWP_g
No comment yet.
Notebook or My Personal Learning Network
a personal notebook since summer 2013, a virtual scrapbook
Your new post is loading...
Your new post is loading...
Scooped by Gilbert C FAURE
October 13, 2013 8:40 AM
Scoop.it!

This notebook..

is a personal Notebook

 

Thanks John Dudley for the following tweet

"If you like interesting snippets on all sorts of subjects relevant to academia, information, the world, highly recommended is @grip54 's collection:"

 

La curation de contenus, la mémoire partagée d'une veille scientifique et sociétale

Gilbert C FAURE's insight:

... designed to collect posts and informations I found and want to keep available but not relevant to the other topics I am curating on Scoop.it (on behalf of ASSIM):

 

the most sucessful being

Immunology, teaching and learning immunology

http://www.scoop.it/t/immunology

and

From flow cytometry to cytomics

http://www.scoop.it/t/from-flow-cytometry-to-cytomics

Immunology and Biotherapies, a page of resources for the DIU 

 http://www.scoop.it/t/immunology-and-biotherapies

 

followed by

Nancy, Lorraine

 http://www.scoop.it/t/nancy-lorraine

I am based at Université Lorraine in Nancy

Wuhan, Hubei,

 http://www.scoop.it/t/wuhan

because we have a long standing collaboration through a french speaking medical training program between Faculté de Médecine de Nancy and WuDA, Wuhan university medical school and Zhongnan Hospital

  

CME-CPD,

 http://www.scoop.it/t/cme-cpd

because I am at EACCME in Brussels, representative of the medical biopathology and laboratory medicine UEMS section

 

Mucosal Immunity,

 http://www.scoop.it/t/mucosal-immunity

because it was one of our main research interest some years ago 

 

It is a kind of electronic scrapbook with many ideas shared by others.

It focuses more and more on new ways of Teaching and Learning: e-, m-, a-, b-, h-, c-, d, ld-, s-, p-, w-, pb-, ll- ....

Thanks to all

No comment yet.
Scooped by Gilbert C FAURE
Today, 11:28 AM
Scoop.it!

#earthatnight #artemisii | Peter Stumpf

#earthatnight #artemisii | Peter Stumpf | Notebook or My Personal Learning Network | Scoop.it
Earth at Night, illuminated by full moon - a dark sphere, with editing or longer exposure transformed to pale blue, show bright spots which are cities at night across iberian peninsula and Africa's mediterranean sea coast line & South America on the right.

Dark shot: https://lnkd.in/d6gmWgiF

Colorful shot https://lnkd.in/dr_4iEEm

#earthatnight #artemisII
No comment yet.
Scooped by Gilbert C FAURE
April 3, 11:38 AM
Scoop.it!

Academic Research vs Case Study: Key Differences | Docadeson R. posted on the topic

Academic Research vs Case Study: Key Differences | Docadeson R. posted on the topic | Notebook or My Personal Learning Network | Scoop.it
đ—Șđ—”đ˜† đ—–đ—”đ—Œđ—Œđ˜€đ—¶đ—»đ—Ž 𝗕đ—Č𝘁𝘄đ—Čđ—Čđ—» đ—”đ—°đ˜đ—¶đ—Œđ—» đ—„đ—Č𝘀đ—Čđ—źđ—żđ—°đ—” đ—źđ—»đ—± 𝗖𝗼𝘀đ—Č đ—Šđ˜đ˜‚đ—±đ˜† đ—–đ—źđ—» 𝗠𝗼𝗾đ—Č đ—Œđ—ż 𝗕𝗿đ—Č𝗼𝗾 đ—Źđ—Œđ˜‚đ—ż đ—§đ—”đ—Čđ˜€đ—¶đ˜€.

Many graduate students weaken their thesis by confusing đ—źđ—°đ˜đ—¶đ—Œđ—» 𝗿đ—Č𝘀đ—Čđ—źđ—żđ—°đ—” with 𝗰𝗼𝘀đ—Č đ˜€đ˜đ˜‚đ—±đ˜†â€”yet the two serve fundamentally different academic purposes.

đ—”đ—°đ˜đ—¶đ—Œđ—» 𝗿đ—Č𝘀đ—Čđ—źđ—żđ—°đ—” is initiated to solve an đ—¶đ—șđ—șđ—Čđ—±đ—¶đ—źđ˜đ—Č đ—œđ—żđ—Œđ—Żđ—čđ—Čđ—ș It focuses on đ—¶đ—șđ—œđ—čđ—Čđ—șđ—Čđ—»đ˜đ—¶đ—»đ—Ž đ˜€đ—Œđ—čđ˜‚đ˜đ—¶đ—Œđ—»đ˜€, often within the đ—łđ—¶đ—Čđ—čđ—± đ—Œđ—ł đ—Čđ—±đ˜‚đ—°đ—źđ˜đ—¶đ—Œđ—», where researchers may also 𝗼𝗰𝘁 𝗼𝘀 đ—œđ—źđ—żđ˜đ—¶đ—°đ—¶đ—œđ—źđ—»đ˜đ˜€ in the research process. This approach is practical, intervention-based, and solution-oriented.

𝗖𝗼𝘀đ—Č đ˜€đ˜đ˜‚đ—±đ˜†, by contrast, involves đ—¶đ—»-đ—±đ—Čđ—œđ˜đ—” đ—źđ—»đ—źđ—čđ˜†đ˜€đ—¶đ˜€ of a đ—œđ—źđ—żđ˜đ—¶đ—°đ˜‚đ—č𝗼𝗿 đ—Č𝘃đ—Čđ—»đ˜ đ—Œđ—ż 𝗰𝗼𝘀đ—Č đ—Œđ˜ƒđ—Č𝗿 𝗼 đ—čđ—Œđ—»đ—Ž đ—œđ—Čđ—żđ—¶đ—Œđ—± đ—Œđ—ł đ˜đ—¶đ—șđ—Č. It emphasizes đ—Œđ—Żđ˜€đ—Čđ—żđ˜ƒđ—¶đ—»đ—Ž đ—źđ—»đ—± đ—źđ—»đ—źđ—čđ˜†đ˜€đ—¶đ—»đ—Ž 𝗼 đ˜€đ—¶đ˜đ˜‚đ—źđ˜đ—¶đ—Œđ—», is 𝘂𝘀đ—Čđ—± đ—¶đ—» đ—șđ—źđ—»đ˜† đ—łđ—¶đ—Čđ—čđ—±đ˜€, and đ—±đ—Œđ—Č𝘀 đ—»đ—Œđ˜ đ—œđ—żđ—Œđ˜ƒđ—¶đ—±đ—Č 𝗼 đ˜€đ—Œđ—čđ˜‚đ˜đ—¶đ—Œđ—» đ˜đ—Œ 𝗼 đ—œđ—żđ—Œđ—Żđ—čđ—Čđ—ș. Researchers typically đ—±đ—Œ đ—»đ—Œđ˜ 𝘁𝗼𝗾đ—Č đ—œđ—źđ—żđ˜ in the research setting.

Misunderstanding this distinction leads to flawed methodology, weak research design, and inconsistent findings—common issues in rejected proposals.

đŸ“Č If you need thesis help, WhatsApp DocAdeson on: +14243487554

♻ find this useful? follow + like + repost + comment.

#DrAdeson
#AcademicResearch
#ResearchMatters
#ResearchCommunity
#AcademicWriting
#PhDLife
#PostdocLife
#GradSchool
No comment yet.
Scooped by Gilbert C FAURE
April 3, 11:29 AM
Scoop.it!

Reliability of LLMs as medical assistants for the general public: a randomized preregistered study - Nature Medicine | Vitaly Herasevich

Reliability of LLMs as medical assistants for the general public: a randomized preregistered study - Nature Medicine | Vitaly Herasevich | Notebook or My Personal Learning Network | Scoop.it
LLM are ok for medical diagnoses, but AI chatbots for public are not.
LLMs complete the scenarios accurately, correctly identifying conditions in 94.9% of cases and disposition in 56.3% on average. However, participants using the same LLMs identified relevant conditions in fewer than 34.5% of cases and disposition in fewer than 44.2%, both no better than the control group.

https://lnkd.in/gRCrBSkE

#LLM #AI
No comment yet.
Scooped by Gilbert C FAURE
April 2, 1:18 PM
Scoop.it!

Reddit veut scanner votre oeil pour vous laisser écrire un commentaire. On dirait de la science-fiction. Mais non. C'est une annonce officielle du 25 mars 2026. Reddit a un problème. Des millions ...

Reddit veut scanner votre oeil pour vous laisser écrire un commentaire. On dirait de la science-fiction. Mais non. C'est une annonce officielle du 25 mars 2026. Reddit a un problème. Des millions ... | Notebook or My Personal Learning Network | Scoop.it
Reddit veut scanner votre oeil pour vous laisser écrire un commentaire.
On dirait de la science-fiction. Mais non. C'est une annonce officielle du 25 mars 2026.

Reddit a un problÚme. Des millions de faux comptes automatisés envahissent la plateforme. Des programmes qui postent, commentent, likent à la place de vrais utilisateurs. Reddit en supprime 100 000 par jour. Et ça ne suffit plus. Digg, son ancien concurrent, vient de fermer, submergé par les machines.

La solution ? Demander aux comptes suspects de prouver qu'un humain se cache derriĂšre. Avec du Face ID, des passkeys, ou mĂȘme World ID, un systĂšme qui scanne votre iris.

Le dĂ©tail Ă  connaĂźtre : World ID, c'est un projet cofondĂ© par Sam Altman, le CEO d'OpenAI. Le mĂȘme Sam Altman qui a investi plus de 60 millions de dollars dans Reddit, qui a siĂ©gĂ© Ă  son conseil d'administration pendant sept ans, et qui possĂšde plus d'actions que le CEO de la plateforme. Sa participation vaut plus d'un milliard de dollars. LĂ©ger conflit d'intĂ©rĂȘt.

Nous voilà donc coincés entre deux risques : laisser mourir l'internet ouvert sous les faux comptes, ou le sauver en confiant nos données biométriques au propriétaire de ChatGPT.

Je n'ai pas la solution. Mais si la seule façon de prouver qu'on est humain, c'est de sacrifier son anonymat, alors on a un sérieux problÚme de conception. | 36 comments on LinkedIn
No comment yet.
Scooped by Gilbert C FAURE
April 1, 10:39 AM
Scoop.it!

#epidemiology #publichealth #evidencebasedpractice #clinicalresearch #datascience #globalhealth #researchmethods | Collins Ogweno MPH, MSc, PMP

#epidemiology #publichealth #evidencebasedpractice #clinicalresearch #datascience #globalhealth #researchmethods | Collins Ogweno MPH, MSc, PMP | Notebook or My Personal Learning Network | Scoop.it
Not all evidence is created equal—and in public health, that distinction saves lives.

Understanding clinical study designs is the foundation of evidence-based decision-making:

1. Observational studies (cohort, case-control, cross-sectional) help us detect patterns, associations, and disease burden—critical for surveillance and hypothesis generation.
2. Experimental studies (randomized vs non-randomized) go a step further—establishing causality through controlled intervention.

But here’s the nuance many overlook:

✔ Cohort studies track exposure → outcome (powerful for incidence & risk)
✔ Case-control studies work backward (efficient for rare diseases)
✔ Cross-sectional studies provide snapshots (essential for prevalence)
✔ Randomized trials minimize bias and remain the gold standard—but are not always feasible in real-world public health settings

The real expertise lies not in choosing the “best” design—but in choosing the right design for the right question.

In an era of data abundance and rapid policy decisions, strengthening our understanding of study designs is not optional—it is a professional responsibility.

#Epidemiology #PublicHealth #EvidenceBasedPractice #ClinicalResearch #DataScience #GlobalHealth #ResearchMethods
No comment yet.
Scooped by Gilbert C FAURE
April 1, 6:12 AM
Scoop.it!

Can you really publish 20-40 articles a day?

Can you really publish 20-40 articles a day? | Notebook or My Personal Learning Network | Scoop.it
Introducing my new white paper: The myth of the academic superstar - or why name disambiguation is crucial
No comment yet.
Scooped by Gilbert C FAURE
April 1, 3:59 AM
Scoop.it!

What should your organization know about the FedNow Service, an instant payment infrastructure developed by the Federal Reserve? These resources from Federal Reserve Financial Services are a grea...

What should your organization know about the FedNow Service, an instant payment infrastructure developed by the Federal Reserve? These resources from Federal Reserve Financial Services are a grea... | Notebook or My Personal Learning Network | Scoop.it
What should your organization know about the FedNow Service, an instant payment infrastructure developed by the Federal Reserve?


These resources from Federal Reserve Financial Services are a great starting point. Get up to speed on the basics about these innovative payments, how they work and the benefits they offer: https://bit.ly/47zdgnJ
No comment yet.
Scooped by Gilbert C FAURE
April 1, 3:57 AM
Scoop.it!

Researchers publish 1 paper every 5 days, sparking concerns over authorship norms | Phil Baty posted on the topic

Researchers publish 1 paper every 5 days, sparking concerns over authorship norms | Phil Baty posted on the topic | Notebook or My Personal Learning Network | Scoop.it
More than 9,000 researchers published at least 72 papers in a single year - more than one paper every five days - in one or more of the years between 2019 and 2024.

When a “more conservative threshold” of 40 papers a year was applied, so called "hyper-prolific authors" increased in number by 66 per cent, from 2,517 in 2019 to 4,189 in 2023, a 2025 study found, against a wider increase in publications of 15 per cent over that period.

Last year, Clarivate excluded 432 authors from its latest Highly Cited Researchers list in response to concerns over “extreme levels of publication relative to field baselines”.

There is also a “growing trend of multiple institutional affiliations”, often across different countries, with “some authors listing affiliations with more than 20 institutions”.

Concerns abound that authors who publish on a weekly basis are cutting corners, corrupting authorship norms and overburdening the peer review system – with AI likely to make matters worse. But if incentives are misaligned, what can be done? And is the moral panic exaggerated? Jack Grove reports for Times Higher Education. https://lnkd.in/embsRNmZ | 45 comments on LinkedIn
No comment yet.
Scooped by Gilbert C FAURE
March 31, 6:26 AM
Scoop.it!

« Mon précieux » : pourquoi les universitaires protègent leurs ressources pédagogiques et leurs données (mais partagent volontiers leurs articles) – Chaire UNESCO RELIA

« Mon précieux » : pourquoi les universitaires protègent leurs ressources pédagogiques et leurs données (mais partagent volontiers leurs articles) – Chaire UNESCO RELIA | Notebook or My Personal Learning Network | Scoop.it
Partager Javiera Atenas Javiera Atenas est maĂźtresse de confĂ©rences Ă  la facultĂ© de commerce, d’arts, de sciences sociales et de technologie de l’universitĂ© du Suffolk, au Royaume-Uni. Elle dirige le certificat postuniversitaire en pratique pĂ©dagogique et enseigne l’analyse et la visualisation des...
No comment yet.
Scooped by Gilbert C FAURE
March 31, 4:18 AM
Scoop.it!

#décision #veilleur #veille #compétences #fresque #collaboratif #pragmatique | Christelle Urvoy

#décision #veilleur #veille #compétences #fresque #collaboratif #pragmatique | Christelle Urvoy | Notebook or My Personal Learning Network | Scoop.it
Une information captée et non partagée est une information perdue.
C'est l'un des angles morts les plus frĂ©quents que je rencontre dans les organisations : des Ă©quipes qui veillent, des signaux qui remontent — et des dĂ©cideurs qui n'en voient qu'une infime partie.

La veille sans circuit de diffusion n'est pas de la veille. C'est de la collection.

Trois questions pour tester votre organisation :
âžĄïž Qui reçoit vos synthĂšses de veille aujourd'hui ?
âžĄïž Sous quelle forme ? À quelle frĂ©quence ?
âžĄïž Est-ce que ça change quelque chose dans les dĂ©cisions prises ?

Si vous n'avez pas de réponse claire à ces trois questions, c'est le point de départ. Fresque de la connaissance et atelier collaboratif pour passer clairement à l'action.

C'est exactement ce qu'on travaille le 20 mai avec ceux qui veillent — et le 28 mai avec ceux qui dĂ©cident.

👉 lien d'inscription suite à vos commentaires ou vos MP.

#décision #veilleur #veille #compétences #fresque #collaboratif #pragmatique
No comment yet.
Scooped by Gilbert C FAURE
March 30, 8:29 AM
Scoop.it!

NotebookLM : de l'outil de recherche à la plateforme pédagogique complète | Fidel Navamuel

NotebookLM : de l'outil de recherche à la plateforme pédagogique complète | Fidel Navamuel | Notebook or My Personal Learning Network | Scoop.it
🎹 NotebookLM frappe encore fort ! Les infographies avec styles personnalisĂ©s sont disponibles. 10 styles prĂ©dĂ©finis (Ă©ditorial, argile, kawaii
) + la possibilitĂ© de crĂ©er les vĂŽtres via un simple prompt. Vos documents transformĂ©s en visuels percutants en un clic.
👉
No comment yet.
Scooped by Gilbert C FAURE
March 29, 4:06 AM
Scoop.it!

Quelle IA collecte le plus de données personnelles ? D’après l’étude de Surfshark, Meta AI est l’IA qui collecte le plus de données personnelles. Elle couvre 33 types de données sur 35, soit… | D...

Quelle IA collecte le plus de données personnelles ? D’après l’étude de Surfshark, Meta AI est l’IA qui collecte le plus de données personnelles. Elle couvre 33 types de données sur 35, soit… | D... | Notebook or My Personal Learning Network | Scoop.it
Quelle IA collecte le plus de données personnelles ?

D’aprĂšs l’étude de Surfshark, Meta AI est l’IA qui collecte le plus de donnĂ©es personnelles.

Elle couvre 33 types de donnĂ©es sur 35, soit presque tout ce qu’il est possible de rĂ©cupĂ©rer.

C’est aussi l’une des rares Ă  inclure des donnĂ©es sensibles, comme les informations financiĂšres ou certaines donnĂ©es personnelles.

DerriĂšre, d’autres outils comme Gemini collectent aussi des donnĂ©es sensibles.
ChatGPT a élargi sa collecte ces derniers mois, mais reste en dessous.

Claude est plutĂŽt parmi les plus sobres.

À retenir : vos conversations ne sont pas anodines. Elles peuvent ĂȘtre stockĂ©es, analysĂ©es, parfois utilisĂ©es Ă  d’autres fins.

Un bon rĂ©flexe : Ă©viter d’y partager des informations sensibles.
No comment yet.
Scooped by Gilbert C FAURE
Today, 11:33 AM
Scoop.it!

Drug Administration Routes Impact Treatment Outcomes | Dr. Aarunee Krishna posted on the topic

Drug Administration Routes Impact Treatment Outcomes | Dr. Aarunee Krishna posted on the topic | Notebook or My Personal Learning Network | Scoop.it
Most people think a drug works simply because of what is inside it.
In reality, how it enters your body can be just as important.

This is called the route of drug administration—oral, intravenous, intramuscular, subcutaneous, inhalational, topical—and each route changes how a drug behaves inside you.

Why does this matter for the general population?

Because the same drug can act very differently depending on how it is taken:
‱ A tablet may take 30–60 minutes to act, while an injection can work within seconds
‱ Some drugs are destroyed in the stomach and must never be taken orally
‱ Incorrect use (like crushing sustained-release tablets) can lead to toxicity
‱ Inhalers, if used incorrectly, may deliver almost no benefit despite regular use

In simple terms: right drug + wrong route = wrong outcome

As an MD trainee in Medical Pharmacology, this is not just academic knowledge for me—it is a responsibility.

We often see:
‱ Antibiotic misuse due to wrong administration practices
‱ Poor control of chronic diseases because of improper drug use
‱ Adverse drug reactions that could have been prevented with basic awareness

Educating people about how to take medicines correctly can:
‱ Improve treatment outcomes
‱ Reduce side effects
‱ Prevent drug resistance
‱ Empower patients to participate in their own care

Pharmacology is not just about drugs—it is about optimizing how those drugs interact with human biology.

And sometimes, the smallest detail—like the route of administration—makes the biggest difference.

#Medicine #Pharmacology #PatientEducation #RationalUseOfMedicines #Healthcare #MedicalEducation
No comment yet.
Scooped by Gilbert C FAURE
Today, 11:25 AM
Scoop.it!

Claude 4.5 est devenu une vraie suite de travail. Et si vous êtes dirigeant, voici l’essentiel à retenir : ✩ 1. Choisissez le bon modèle Opus 4.5 → stratégie, raisonnement complexe, sujets à fo...

Claude 4.5 est devenu une vraie suite de travail. Et si vous êtes dirigeant, voici l’essentiel à retenir : ✩ 1. Choisissez le bon modèle Opus 4.5 → stratégie, raisonnement complexe, sujets à fo... | Notebook or My Personal Learning Network | Scoop.it
Claude 4.5 est devenu une vraie suite de travail.

Et si vous ĂȘtes dirigeant, voici l’essentiel Ă  retenir :

✩ 1. Choisissez le bon modùle

Opus 4.5 → stratĂ©gie, raisonnement complexe, sujets Ă  fort enjeu
Sonnet 4.5 → rĂ©daction, synthĂšse, usage business quotidien
Haiku 4.5 → tñches simples, vitesse, gros volumes

Ne demandez pas la mĂȘme chose Ă  tous les modĂšles.

C’est comme utiliser le mĂȘme vĂ©hicule pour livrer un colis
 ou traverser un dĂ©sert.

✩ 2. Claude ne sert pas qu’à Ă©crire

Il peut aussi :

→ chercher sur le web
→ analyser des fichiers
→ travailler avec Google Drive
→ gĂ©rer des Projects
→ coder avec Claude Code
→ produire des livrables avec Artifacts

Le sujet n’est plus “fais-moi un texte”.

Le sujet est : fais avancer mon travail.

✩ 3. Les fonctions les plus utiles pour un dirigeant

Artifacts → produire un plan d’action, une SOP, une proposition, un tableau

Web Search → veille, prĂ©paration de rendez-vous, analyse concurrentielle

Analyse de fichiers → comprendre vos ventes, leads, devis, marges

Projects → crĂ©er une mĂ©moire de travail par sujet : marketing, commercial,
recrutement, direction

Claude Code → prototyper un outil interne, accĂ©lĂ©rer un projet digital, crĂ©er un
MVP

TĂ©lĂ©versement de fichiers → donner PDF, contrats, comptes rendus, captures
d’écran pour qu’il analyse votre rĂ©alitĂ©

✩ 4. Les meilleurs cas d’usage concrets

→ prĂ©parer un rendez-vous commercial en 3 minutes
→ transformer une rĂ©union en dĂ©cisions + tĂąches + prioritĂ©s
→ analyser un fichier de ventes ou de leads
→ crĂ©er une proposition commerciale plus vite
→ structurer un plan 90 jours
→ transformer des documents dispersĂ©s en systĂšme clair

✩ 5. La vraie mĂ©thode

Le débutant dit :

“RĂ©sume-moi ça.”

L’utilisateur avancĂ© dit :

→ voici mon objectif
→ voici mes fichiers
→ voici mes contraintes
→ pose-moi les questions manquantes
→ propose un plan
→ exĂ©cute
→ amĂ©liore la V1

En clair : il ne “prompt” pas.

Il manage Claude comme un collaborateur.

C’est ça, la vraie mise à jour.

---------------------------------

PS : j’organise une masterclass le 10 avril Ă  20H pour vous aider Ă  faire votre mise Ă  jour Claude et apprendre Ă  l’utiliser pour vous libĂ©rer de l’opĂ©rationnel, rĂ©duire votre surcharge mentale et accĂ©lĂ©rer la croissance de votre entreprise.

👉 Inscription ici :

https://lnkd.in/eN6mBd9B | 12 comments on LinkedIn
No comment yet.
Scooped by Gilbert C FAURE
April 3, 11:33 AM
Scoop.it!

#phd #research | Faheem Ullah | 16 comments

#phd #research | Faheem Ullah | 16 comments | Notebook or My Personal Learning Network | Scoop.it
PhD Students - How to check if your research idea is actually new?

First, let's understand why novelty is important for research

Here is what reviewers will look for in your research

1ïžâƒŁ đđšđŻđžđ„đ­đČ → Is it new?
2ïžâƒŁ 𝐒𝐱𝐠𝐧𝐱𝐟𝐱𝐜𝐚𝐧𝐜𝐞 → Is it important for anyone?
3ïžâƒŁ đŒđžđ­đĄđšđđšđ„đšđ đČ → Is it conducted the right way?
4ïžâƒŁ đ•đžđ«đąđŸđąđœđšđ­đąđšđ§ → Can other researchers verify it?
5ïžâƒŁ đđ«đžđŹđžđ§đ­đšđ­đąđšđ§ → Is it presented in the right way?

You see novelty comes on the top of this list.

To confirm novelty, meet đđšđ­đ’đ§đšđ© đ„đźđ«đžđ€đš.

Eureka thinks like an IP expert.

Here is how it works.

1. Go to https://lnkd.in/dqiq55cM
2. Describe your research idea in 20-30 words
3. Eureka scans 200M+ patents to compare your idea
4. It shows you a side-by-side table of your idea vs existing ones
5. Export the entire novelty report to share with others

𝐖𝐡đČ đŹđĄđšđźđ„đ đČ𝐹𝐼 đ­đ«đČ 𝐱𝐭?

✓ Confirms the novelty of your research idea
✓ Gives you confidence in your research direction
✓ Change research idea if it's not novel
✓ After confirmation, dive deep into your research

đŸŽ—ïž Try Eureka for FREE: https://lnkd.in/dqiq55cM

❄ Anything you'd like to add?

#phd #research
| 16 comments on LinkedIn
No comment yet.
Scooped by Gilbert C FAURE
April 2, 1:21 PM
Scoop.it!

Le TOP 10 | Elliacare | 19 comments

Le TOP 10 | Elliacare | 19 comments | Notebook or My Personal Learning Network | Scoop.it
L'IA s'est installée dans les outils médicaux du quotidien. Pas par décision collective. Par glissement progressif.

On vous prĂ©sente le Top 10 de ses usages en santĂ© 📊

On a voulu faire le point. Pas sur les promesses, sur ce qui existe réellement. Sur ce qui est déployé, ce qui est encore en cours, et ce qui reste à construire.

Ce carrousel recense les 10 usages de l'IA les plus documentés pour les soignants en 2025-2026.
L'IA n'a pas attendu d'ĂȘtre invitĂ©e. Elle s'est glissĂ©e dans les logiciels de prescription, les DPI, les outils de documentation, les systĂšmes d'alerte.
Par touches. Par intégrations successives. Et sans que la formation des soignants n'anticipe ce deplacement

Ce que le panorama montre, c'est une réalité à deux vitesses.

D'un cÎté, des outils qui tiennent leurs promesses.
â–ȘLe scribing IA divise par deux le temps de documentation
â–ȘLa veille bibliographique qui prenait une heure se fait dorenavant en 15 minutes. A condition de bien savoir la rĂ©aliser avec l’IA.
â–ȘL'aide Ă  la dĂ©cision clinique, quand elle est couplĂ©e au jugement mĂ©dical, produit de meilleurs diagnostics que l'un ou l'autre seul.

Le paradoxe est net. Les outils fonctionnent. Les gains sont mesurables
Et pourtant, 45 % des soignants n'informent jamais leurs patients qu'ils utilisent l'IA dans leur prise en charge.
Alors que 84 % des Français souhaitent l'etre.

Ce n'est pas de la mauvaise volonté. C'est l'absence de cadre.
On automatise des taches, mais on ne forme pas à ce que ca engage, puis on déploie des outils.
Mais on ne prepare pas le soignant à évaluer ce qu'il délÚgue, à identifier là ou l'outil échoue, à maintenir son jugement clinique souverain face à une recommandation algorithmique.

La formation n'est pas un accessoire du déploiement. C'est sa condition de validité.

C'est exactement ce qu'Elliacare adresse. Pas l'enthousiasme pour les outils. La compétence pour les utiliser, et pour savoir quand ne pas s'y fier.

👉 Swipez le carrousel pour voir les 10 usages, leurs niveaux de maturitĂ© et les chiffres qui les Ă©tayent.

Sur ces 10 usages, lesquels faites-vous déjà, et lesquels vous manquent encore ?

#IAenSanté #Elliacare #MédecineAugmentée #FormationIA | 19 comments on LinkedIn
No comment yet.
Scooped by Gilbert C FAURE
April 2, 1:13 PM
Scoop.it!

Steady rise: female scientists & engineers reach 7.9 mln | Eurostat

Steady rise: female scientists & engineers reach 7.9 mln | Eurostat | Notebook or My Personal Learning Network | Scoop.it
The number of women working as scientists and engineers in the EU reached 7.9 million in 2024, representing 40.5% of the scientists and engineers’ workforce across all economic activities. 🧑‍🔬🔬

Across EU regions, highest shares in:

đŸ‡Ș🇾 Canarias (58.8%)
đŸ‡”đŸ‡č RegiĂŁo AutĂłnoma dos Açores (57.3%)
đŸ‡”đŸ‡č Madeira (56.4%)

Lowest in:

🇭đŸ‡ș KözĂ©p-MagyarorszĂĄg (30.0%)
đŸ‡«đŸ‡ź Manner-Suomi (30.7%)
🇼đŸ‡č Sud (31.1%)

â„č Please note that the map includes available regional data from EU countries, EFTA and candidate countries. The ranking in the caption of the post is based on data from EU countries only.

Learn more 👉 https://lnkd.in/eHQqWP_g
No comment yet.
Scooped by Gilbert C FAURE
April 1, 10:15 AM
Scoop.it!

https://french.visitbeijing.com.cn/article/47DYL69KFDB

Lao...

No comment yet.
Scooped by Gilbert C FAURE
April 1, 4:46 AM
Scoop.it!

Chez Prisma Media, 40% des articles sont générés par IA. Les journalistes sont clonés numériquement pour produire des vidéos. Personne ne vous l'a dit. 📉 Le Monde a signé avec Meta et OpenAI pour...

Chez Prisma Media, 40% des articles sont générés par IA. Les journalistes sont clonés numériquement pour produire des vidéos. Personne ne vous l'a dit. 📉 Le Monde a signé avec Meta et OpenAI pour... | Notebook or My Personal Learning Network | Scoop.it
Chez Prisma Media, 40% des articles sont gĂ©nĂ©rĂ©s par IA. Les journalistes sont clonĂ©s numĂ©riquement pour produire des vidĂ©os. Personne ne vous l'a dit. 📉

Le Monde a signĂ© avec Meta et OpenAI pour intĂ©grer ses contenus dans les assistants IA. TF1 expĂ©rimente la production assistĂ©e. L'AFP fournit le fil que les agents IA rĂ©sument et redistribuent. Aux États-Unis, 9% des articles sont dĂ©jĂ  partiellement Ă©crits par l'IA, sans le mentionner.

3 434 postes de journalistes supprimés en 2025. En 2026, le rythme s'accélÚre. Washington Post, Politico, Wall Street Journal : touchés.

Ce qui est en train de mourir dans le journalisme :
👉 L'article de routine. RĂ©sultat sportif, cours de bourse, mĂ©tĂ©o : un agent IA rĂ©dige ça en 10 secondes. Associated Press produit 730 000 articles automatisĂ©s par an. Le journaliste qui couvre le factuel affronte une machine qui ne dort jamais.
👉 Le mĂ©dia comme intermĂ©diaire unique. Quand le lecteur pose une question Ă  un assistant IA qui puise dans Le Monde, l'AFP et 200 autres sources, il n'a plus besoin de visiter le site du journal. Le trafic baisse. La pub baisse. Le modĂšle Ă©conomique s'effrite.
👉 La confiance par dĂ©faut. 9% d'articles partiellement IA sans disclosure. Des journalistes clonĂ©s chez Prisma. Et seulement 12% des lecteurs sont Ă  l'aise avec un contenu 100% IA. Le jour oĂč le lecteur ne sait plus qui Ă©crit, il dĂ©croche.

Le test de survie du Monde, de TF1, de Prisma Media et de l'AFP :
1ïžâƒŁ Miser sur l'enquĂȘte, pas sur le flux. L'article factuel meurt. L'investigation, l'analyse, le dĂ©cryptage : c'est ce que l'IA ne sait pas faire. Le journaliste qui survit est celui qui va sur le terrain, pas celui qui reformule une dĂ©pĂȘche.
2ïžâƒŁ Devenir la source de confiance des agents IA. Le Monde a compris : si les assistants IA citent vos articles, vous devenez l'infrastructure de la vĂ©ritĂ©. Le mĂ©dia qui refuse de nourrir les LLM disparaĂźt des rĂ©ponses. Celui qui nĂ©gocie sa place survit.
3ïžâƒŁ Assumer la transparence totale sur l'usage de l'IA. Le lecteur pardonne l'IA. Il ne pardonne pas le mensonge. Prisma Media produit 40% de contenu IA ? TrĂšs bien. Mais dites-le. Le mĂ©dia qui joue la transparence gagne la confiance.

Ma conviction : le journalisme ne mourra pas. Mais le journaliste qui produit du contenu que l'IA fait mieux, oui. Resteront les enquĂȘteurs, les analystes, les Ă©ditorialistes. Ceux qui pensent.

SĂ©rie "La Chute des GĂ©ants — Saison 3" [13/15]. Hier : Clifford Chance, Gide, Bredin Prat. Demain : HEC, ESSEC, INSEAD.

🚀 Dirigeants : votre veille est-elle humaine ou automatisĂ©e ? 👉 https://lnkd.in/e6k46944
🎓 Consultants : la communication IA est un nouveau terrain de jeu. 👉 https://lnkd.in/eaJd3bZ8
🎯 Masterclass gratuite le 16 avril : passez du prompting à l'IA agentique en 1h. 👉 https://lnkd.in/eZGGrvvY
🚀 Nos bootcamps tournent sur Claude. Un projet IA ? 👉 https://decisionia.com/rdv
Vous lisez des articles Ă©crits par l'IA sans le savoir. Ça vous pose un problĂšme ? 👇 | 12 comments on LinkedIn
No comment yet.
Scooped by Gilbert C FAURE
April 1, 3:59 AM
Scoop.it!

Are boys really in crisis? What the science says in the age of the manosphere | Helen Pearson | 15 comments

Are boys really in crisis? What the science says in the age of the manosphere | Helen Pearson | 15 comments | Notebook or My Personal Learning Network | Scoop.it
Are boys in ‘crisis’ — and is the manosphere playing a part? My new feature for Nature Magazine looks at data on boys and young men, including education, health and attitudes. And it asks whether talk of a male crisis risks fueling hostility towards, or sidelining, women and girls. https://lnkd.in/eJGhkXuA

The data and interviews suggest that: 
- Globally, more boys than girls are out of school; young men are less likely to attend higher education. 

- Injuries — from road accidents, violence, self-harm — are strikingly higher for male adolescents. More boys than girls die by suicide.

- Mental health disorders are a large and growing problem for boys and girls. 

- Stereotypical ideas of masculinity are common e.g. that men must be tough, self-sufficient, financial providers and in control in relationships.  
In one survey, 63% of young men said they regularly engaged with a masculinity or men influencer. But research on the manosphere and its impact is still limited.

It’s uncomfortable & controversial to talk about ‘boys in crisis’ in the face of entrenched and worsening discrimination against girls and women. Many things are worse for adolescent girls. 

The message I heard was: it's important to understand the challenges that all young people are facing. 
 
Many thanks to the researchers & experts who spoke to me about this important topic – one that I was particularly interested to report on as the mum of three boys.  | 15 comments on LinkedIn
No comment yet.
Scooped by Gilbert C FAURE
April 1, 3:56 AM
Scoop.it!

#é›Łé“èŠç­‰æ”¶ćˆ°é€šçŸ„é‚Łć€©æˆ‘æ‰è”° | Edwin Chan

#é›Łé“èŠç­‰æ”¶ćˆ°é€šçŸ„é‚Łć€©æˆ‘æ‰è”° | Edwin Chan | Notebook or My Personal Learning Network | Scoop.it
A few days ago, over lunch with some senior academics, I heard a really meaningful saying:

â€œć­žèĄ“äžæ˜Żćż™ć‡șäŸ†çš„ïŒŒè€Œæ˜Żé–’ć‡ș䟆的” (Science doesn't come from being constantly busy— it emerges from having periods of idleness.)

That instantly reminded me of this powerful insight on creativity in science from Max Perutz, a Nobel laureate:
"Creativity in science, as in art, cannot be organized. It arises spontaneously from individual talent. Well-run laboratories can foster it, but hierarchical organizations, inflexible bureaucratic rules, and mountains of futile paperwork can kill it. Discoveries cannot be planned—they pop up, like Puck, in unexpected corners."

In a world full of endless meetings, grant deadlines, metrics, and "productivity" pressure, these two thoughts hit hard.

Real breakthroughs often come not from grinding harder, but from protecting unstructured time—time to think, wander, connect dots that no schedule could predict.

才ćčŸć€©è·ŸćčŸäœèł‡æ·±ć­žè€…ćœšćˆé€èŠć€©æ™‚ïŒŒèœćˆ°äž€ć„éžćžžæœ‰æ„æ€çš„è©±ïŒš

ă€Œć­žèĄ“äžæ˜Żćż™ć‡șäŸ†çš„ïŒŒè€Œæ˜Żé–’ć‡șäŸ†çš„ă€‚ă€

é€™ć„è©±çžŹé–“èź“æˆ‘æƒłè”·è«ŸèČçˆŸçŽćŸ—çŽäșșMax Perutzé—œæ–Œç§‘ć­žć‰”é€ ćŠ›çš„äž€æź”æ·±ćˆ»æŽžèŠ‹ïŒš

「Creativity in science, as in art, cannot be organized. It arises spontaneously from individual talent. Well-run laboratories can foster it, but hierarchical organizations, inflexible bureaucratic rules, and mountains of futile paperwork can kill it. Discoveries cannot be planned—they pop up, like Puck, in unexpected corners.」

ćœšćŠ‚ä»Šć……æ–„è‘—ç„Ąæ­ąç›Ąæœƒè­°ă€grant/report deadlines、搄繼KPIsïŒŒä»„ćŠă€Œproductivityă€ćŁ“ćŠ›çš„ć­žèĄ“äž–ç•ŒèŁĄïŒŒé€™ć…©æź”è©±ç‰čćˆ„æ‰“ć‹•äșș濃。

çœŸæ­Łçš„çȘç ŽïŒŒćŸ€ćŸ€äžæ˜Żé æ›Žæ‹Œć‘œćœ°ćŸ‹é ­è‹ŠćččïŒŒè€Œæ˜ŻäŸ†è‡Șæ–Œé‚Łäș›èƒœèź“äșșè‡Șç”±ćœ°æ€è€ƒçš„ç©ș間。

#é›Łé“èŠç­‰æ”¶ćˆ°é€šçŸ„é‚Łć€©æˆ‘æ‰è”°
No comment yet.
Scooped by Gilbert C FAURE
March 31, 4:24 AM
Scoop.it!

#ai #artificialintelligence #genai #aiagents #futureofwork #automation #techtrends #ai2026 | Harish kumar | 125 comments

#ai #artificialintelligence #genai #aiagents #futureofwork #automation #techtrends #ai2026 | Harish kumar | 125 comments | Notebook or My Personal Learning Network | Scoop.it
Most people are still using AI like a chatbot.

That’s the biggest mistake in 2026. ⚠

Because AI is evolving into something much bigger


👉 Agentic AI— systems that don’t just respond
but think, plan, and execute tasks on their own.

If you understand this, you’re already ahead of 99% 🚀


Here’s the simple breakdown of how modern AI works:

đŸ”č AI & ML Foundations
– NLP, Deep Learning, Transformers
– The core that powers everything

đŸ”č Gen AI Layer
– Text, Image, Audio, Video generation
– Prompt Engineering + RAG

đŸ”č AI Agents
– Tool usage & automation
– Memory + decision-making
– Multi-step task execution

đŸ”č Agentic AI (Next Level)
– Autonomous systems
– Goal-based execution
– Self-improving workflows

---

💡 In simple words:
We are moving from asking AI → to assigning work to AI


⚡ If you learn this now:
You won’t just use AI
You’ll build systems that work for you 24/7


📌 Save this before it disappears

🔁 Repost to help others learn AI

đŸ‘„ Tag someone who needs to see this

💬 What’s your take on Agentic AI?

🚀 Follow Harish Kumar for more AI insights


#AI #ArtificialIntelligence #GenAI #AIAgents #FutureOfWork #Automation #TechTrends #AI2026. | 125 comments on LinkedIn
No comment yet.
Scooped by Gilbert C FAURE
March 31, 4:13 AM
Scoop.it!

The IKEA Effect: Why Effort Increases Ownership - Learnnovators | Learnnovators®

The IKEA Effect: Why Effort Increases Ownership - Learnnovators | Learnnovators® | Notebook or My Personal Learning Network | Scoop.it
Some learning sticks because it’s clear.
Some sticks because it’s repeated.

But some stays with us simply because we built it ourselves.

Our latest blog explores the 𝗜𝗞𝗘𝗔 𝗘𝗳𝗳đ—Č𝗰𝘁 and why effort increases ownership. When people contribute, solve, or create, learning stops feeling like something delivered to them and starts feeling like something they own.

That shift matters.

Because effort changes the relationship people have with what they learn.
It deepens engagement.
It strengthens memory.
And most importantly, it makes people far more likely to use it.

In learning design, the goal isn’t just to make things easy.
It’s to make space for contribution.

When learners build, decide, and shape outcomes, even in small ways, the experience becomes personal. And personal learning is the kind that lasts.

📌 Write to đ—Čđ—čđ—Čđ—źđ—żđ—»đ—¶đ—»đ—Ž@đ—čđ—Čđ—źđ—żđ—»đ—»đ—Œđ˜ƒđ—źđ˜đ—Œđ—żđ˜€.đ—°đ—Œđ—ș to craft learning that transforms behaviour.

#LearningDesign #LearningScience #WorkplaceLearning #InstructionalDesign

https://lnkd.in/eMnCi9Nx
No comment yet.
Scooped by Gilbert C FAURE
March 30, 3:42 AM
Scoop.it!

PARHAF, a human-authored corpus of clinical reports for fictitious patients in French

PARHAF, a human-authored corpus of clinical reports for fictitious patients in French Xavier Tannier Sorbonne UniversitĂ©, UniversitĂ© Sorbonne Paris Nord, Inserm, Limics, F-75006 Paris, France Salam Abbara UniversitĂ© Paris-Saclay, UVSQ, Assistance Publique-HĂŽpitaux de Paris, Raymond PoincarĂ© University Hospital, Infectious Disease Department, Garches, France Yonsei University College of Medicine, Gangnam Severance Hospital, Department of Laboratory Medicine, Seoul, South Korea RĂ©mi Flicoteaux Assistance Publique-HĂŽpitaux de Paris, Department of medical information, Paris, France Youness Khalil Health Data Hub, 75015, Paris, France AurĂ©lie NĂ©vĂ©ol UniversitĂ© Paris-Saclay, CNRS, LISN, 91400, Orsay, France Pierre Zweigenbaum UniversitĂ© Paris-Saclay, CNRS, LISN, 91400, Orsay, France Emmanuel Bacry Health Data Hub, 75015, Paris, France UniversitĂ© Paris-Dauphine, PSL, CNRS, CEREMADE, 75016, Paris, France Abstract The development of clinical natural language processing (NLP) systems is severely hampered by the sensitive nature of medical records, which restricts data sharing under stringent privacy regulations, particularly in France and the broader European Union. To address this gap, we introduce PARHAF, a large open-source corpus of clinical documents in French. PARHAF comprises expert-authored clinical reports describing realistic yet entirely fictitious patient cases, making it anonymous and freely shareable by design. The corpus was developed using a structured protocol that combined clinician expertise with epidemiological guidance from the French National Health Data System (SNDS), ensuring broad clinical coverage. A total of 104 medical residents across 18 specialties authored and peer-reviewed the reports following predefined clinical scenarios and document templates. The corpus contains 7,394 clinical reports covering 5,009 patient cases across a wide range of medical and surgical specialties. It includes a general-purpose component designed to approximate real-world hospitalization distributions, and four specialized subsets that support information-extraction use cases in oncology, infectious diseases, and diagnostic coding. Documents are released under a CC-BY open license, with a portion temporarily embargoed to enable future benchmarking under controlled conditions. PARHAF provides a valuable resource for training and evaluating French clinical language models in a fully privacy-preserving setting, and establishes a replicable methodology for building shareable synthetic clinical corpora in other languages and health systems. Corresponding author: Xavier Tannier, xavier.tannier@sorbonne-universite.fr 1 Background & Summary 1.1 Context and Motivation Much of the information in electronic health records is conveyed by text such as clinical notes and discharge summaries (see, e.g., [17]). Natural language processing aims to unlock that information and make it available for downstream tasks. Publicly available clinical text corpora are a key asset to design, tune, and evaluate clinical natural language processing systems [9]. Sharing clinical text is, however, difficult: the tension between individual data privacy and corpus distributability has been widely acknowledged as the central obstacle to making clinical corpora publicly available [3]. Some resources have been made available for U.S. English clinical NLP over the past few decades [9], starting with the 2007 Computational Medicine Challenge [29] and the i2b2 series of clinical NLP shared tasks [34], many of which relied on the MIMIC database [30, 16]. However, access hurdles remain particularly salient in the French context. The European regulatory framework, among the most protective of health data, imposes severe restrictions on the circulation and secondary use of medical records. This creates a marked scarcity of open and usable corpora of French medical reports [27]. Beyond data access, models trained on clinical reports may themselves become sensitive, as they can memorize patient information during training, making the sharing of trained models legally and ethically challenging [2]. Together, these factors create a fragmented ecosystem in which institutions and research teams operate in isolation, unable to effectively pool data or models. This combination of restricted data access, model sensitivity, lack of open resources, and resulting fragmentation severely limits the development and robust evaluation of NLP systems applied to French clinical text. This creates the following challenge: Given this privacy bottleneck, how can a large, realistic, and fully privacy-preserving corpus of clinical reports be created to help clinical language processing research and development while being freely shareable? To address this challenge, NLP researchers have studied de-identification methods that remove personally identifying information from original clinical text [34, 26, 11, 5, 4] and used them to de-identify clinical datasets such as that in MIMIC [26]. However, the resulting text is pseudonymized (directly identifying information has been removed) but not anonymized (there is no guarantee that reidentification is impossible). This prevents it from being freely distributed. In the United States, the MIMIC database can be shared under a stringent data-use agreement, but it remains unclear whether this protocol is compatible with E.U. regulation. For this reason, clinical document collections in French (e.g., the MERLOT corpus [3] and other corpora extracted from French clinical data warehouses [15, 33]) were used in evaluation studies with explicit targeted ethical board approval but could not be shared due to privacy restrictions. Hahn [12] notes that, faced with the non-shareability of real patient records, NLP researchers have developed a variety of proxies for clinical text. One type of proxy is machine translation of English clinical datasets: Becker et al. [1] translated into German the ShARe/CLEF eHealth 2013 training dataset based on MIMIC-II data [30], Neves et al. [28] translated some clinical cases from English into French (with a focus on evaluating the performance of machine translation for measures and acronyms) and Frei et al. [7] translated into German the 2018 n2c2 shared task dataset that reused data from MIMIC-III [16]. However, translated text requires thorough human review, and cultural and health system differences make the resulting text sensibly different from native clinical text. Another proxy is synthetic clinical text. GraSCCo [24] manually edited 63 deidentified German discharge summaries and case reports at multiple linguistic levels to make reidentification virtually impossible. Recent efforts have also explored the use of autoregressive generative language models to produce synthetic clinical documents in English [14], French [13], German [8], Swedish and Spanish [35]. Nonetheless, the balance between privacy and utility of the resulting material needs further analysis [21, 6]. Published case reports are a more distant proxy for clinical text, but their open-source status and existence in multiple languages have made them particularly attractive for clinical NLP. Case reports have been collected, for instance, in the following corpora: CAS [10] (French), E3C [20] (Italian, English, French, Spanish, and Basque), CANTEMIST [22] and DISTEMIST [23] (Spanish), and FRASIMED [36] (French translations of CANTEMIST and DISTEMIST). The style of case reports, however, is quite different from that of electronic health records. The closest proxy for true clinical texts is those written by health care professionals about fictitious patients, for instance, in medical textbooks or course material. The JSynCC corpus [19] extracted 400 operative reports and 470 case reports from such textbooks. The initial copyright on the textbooks, though, prevents the free distribution of the corpus. The PARROT corpus [18] contains 2,658 radiology reports about fictitious patients, including 475 in French, written on a volunteer basis by healthcare professionals from 21 countries. This endeavor was made possible through human networking, including leveraging professional radiological societies, which may be difficult to scale up to a diversity of medical specialties. 1.2 Objectives and Contributions To overcome this limitation, the approach adopted in the present work was to ask healthcare professionals to write new clinical reports describing fictitious patients specifically for the creation of a shareable corpus, and to distribute these reports under an open license. Because the reports are created for this purpose and do not derive from real patient data, they are anonymous and shareable by design. However, this approach raises important methodological questions: how can such reports be generated in a way that ensures both medical realism and statistical representativeness while preserving privacy? To address this challenge, we designed a corpus creation protocol that leverages clinicians’ expertise while being guided by public health statistics, principles of corpus development [31, 37], and a set of predefined clinical scenarios. The protocol was implemented using a large pool of French-speaking clinicians through a partnership with associations and unions of medical residents across multiple medical and surgical specialties, which recruited 104 residents as report authors. Guidelines were developed for selecting clinical cases, using data from the French National Health Data System (SNDS [25]) as reference scenarios for report creation, and the residents authored synthetic medical reports following these guidelines. The resulting open-source French-language corpus can now be used to train and evaluate language models on targeted medical use cases. In this article, we introduce this open-source corpus of French clinical documents. PARHAF comprises 7,394 expert-authored clinical reports describing 5,009 realistic yet fictitious patient cases. Each case is accompanied by structured documentation of the underlying clinical scenario, including the primary diagnosis, main procedure, care pathway, and discharge information when applicable. We further provide three specialized subsets specifically designed to support information extraction tasks in oncology and infectious diseases. This corpus offers a valuable resource for the development and evaluation of clinical NLP models, directly tackling the root cause of all the challenges outlined above : the inherently sensitive nature of clinical data. We release 6,185 documents corresponding to 4,254 fictitious patients under an open-source CC-BY license. The remaining portion of the corpus will be temporarily embargoed to enable future evaluations under controlled conditions, thereby limiting the risk of large language model contamination through prior exposure to the data. 1.3 Intended uses of PARHAF This corpus is intended for research, development, and educational purposes in clinical natural language processing. It enables the sharing of clinical-style notes and annotations and supports community-wide pooling of efforts around a common, openly accessible resource. The corpus is suitable for benchmarking French medical language models, including large language models, and for conducting reproducible clinical NLP research under controlled and privacy-safe conditions. The corpus also supports uses for medical teaching, such as training medical students and residents in structured clinical report writing, diagnostic reasoning, and clinical information synthesis. It can also serve as a resource for clinical case preparation and supports training in clinical natural language processing using realistic yet fictitious reports without exposure to sensitive patient data. The corpus further enables privacy-preserving data augmentation, either as a standalone resource or as a complement to restricted-access clinical datasets, provided its fictitious nature is explicitly acknowledged. Finally, the representativeness of part of the corpus is geared towards three use cases of the PARTAGES project, allowing methodological comparisons across these specific use cases. 1.4 Limitations and non-intended uses Although efforts were made to create a diverse corpus that includes a variety of document types and clinical specialties, the corpus does not cover all specialties and variations of French clinical text. This corpus is intended for research purposes only, specifically for training and evaluating natural language processing models on French clinical text. It is not a substitute for clinically validated data and must not be used to support regulatory approval, clinical certification, or deployment decisions in real healthcare settings. It is not suitable for clinical use. It cannot be used for clinical decision-making, diagnosis, prognosis, treatment, or patient care. Models trained or evaluated on this data are not clinically validated, and results obtained on this corpus cannot be presented as evidence of clinical performance or safety. The corpus does not support generalization claims to real hospitals, regions, or clinical practices, nor does it allow epidemiological or population-level inference, as its distributions do not reflect real-world prevalence. It is also unsuitable for longitudinal studies or for assessing real-world clinical risk or safety, including rare adverse events or edge cases, and must not be used as a replacement for real clinical data in deployment settings. Finally, the corpus does not capture the operational constraints of real clinical environments (e.g., time pressure, workload, interruptions) and should not be used for stress-testing models under realistic clinical conditions. 2 Methods 2.1 Challenges Building the PARHAF corpus required addressing two main challenges. The first was ensuring that recruited physicians and collected texts adequately represent the relevant dimensions of clinical language, while remaining within concrete implementation constraints (a limited author pool, a fixed corpus size, and the specific use cases targeted by the project). The second was encouraging healthcare professionals to write reports that closely resemble real clinical documents, while minimizing the risk of privacy leaks. 2.2 Clinical Scenario Design For the above reason, we deemed essential to provide relatively precise guidelines to assist healthcare professionals in authoring clinical reports that closely resemble real-world documents while minimizing the risk of privacy breaches. These guidelines addressed both the content and the format of the documents. Given that the recruited physicians primarily worked in hospital settings, the resulting corpus of documents focused predominantly on hospital-based clinical situations. 2.2.1 Content Development The selection of clinical scenarios was guided by our goal of guaranteeing the representativeness of the clinical situations actually observed in French hospitals (see Section 2.3.1) and by the constraint of ensuring physicians were familiar with the clinical situation in relation to their specialty of practice or training. Both aspects were addressedusing hospitalization claims data available in the French National Health Data System (SNDS [25]). Scenarios were constructed by sampling observed distributions of Diagnosis-Related Groups (DRGs), principal diagnoses (ICD-10), age, sex, type of management (e.g., ambulatory surgery), admission and discharge modes (e.g., emergency department admission). DRGs were used (in a less formal format) to describe the type of hospitalization (e.g., surgery, medicine) and to map clinical cases to physicians’ qualifications (specialties). Secondary diagnoses were incorporated into the scenarios as a list of 10 randomly selected diagnoses from the pool of diagnoses frequently associated with the primary diagnosis-DRG pair. Patient name was randomly fixed. Based on these core elements, authors were encouraged to develop the clinical case details, enriching the content with relevant and realistic information, maintaining medical consistency with the baseline information, and ensuring depth and authenticity while adhering to principles of plausibility and ethics. 2.2.2 Document Format For document format, we aligned official recommendations with physicians’ actual practices to develop specialized templates for each type of hospitalization: ‱ Medical hospitalization – Hospital discharge summary ‱ Surgical hospitalization – Pre-operative consultation report (for scheduled admissions) – Operative report – Hospital discharge summary (for ambulatory surgery, a single document was requested combining both the operative report and the discharge summary) ‱ Obstetrics (childbirth) – Pre-delivery hospitalization report (for high-risk pregnancies) or emergency department visit report (for low-risk pregnancies) – Delivery room report – Postpartum hospitalization report (maternity ward) ‱ Oncology – Pathology report For discharge summaries, the template included: department name, reason for admission, medical history, surgical history, family history, allergies, lifestyle factors, treatment at admission, history of the present illness, clinical examination, complementary investigations, in-hospital course, discharge treatment, and conclusion. Similar minimal templates were developed for surgeries, obstetrics, and pathology reports. Authors were encouraged to follow these structures or to write in free text format, provided that all required information was included. Finally, a structured summary section was completed at the end of each report, in which the authors specified the primary diagnosis, length of stay, and associated diagnoses mentioned in the report. The use of generative artificial intelligence tools was discouraged because it could bias both the content and the stylistic features of the reports. 2.3 Document Type and Distribution Strategy This corpus is structured in two complementary components, targeting a total of 5,000 patients. The primary component (n = 3,900) includes patients across a wide range of medical specialties and is designed to maximize diversity and approximate representativeness, although the target size does not allow full coverage of the spectrum of possible clinical cases. The secondary component focuses on specific use cases (ICD-10 coding, oncology, and infectious diseases) and comprises patients selected outside the main distribution to support more targeted evaluation scenarios. 2.3.1 Core distribution To approximate real-world distributions of medical activity, we relied on diagnosis frequencies derived from SNDS [25], which provides exhaustive, nationwide hospital claims data and served as a proxy for the underlying epidemiological and care distribution across medical conditions. For the year 2024, the national claims database consisted of approximately 18 million hospitalizations drawn from the SNDS. From the data, we defined clinical cases as the association of a DRG, sex, age group, and length-of-stay group. With these associations, we created a sampling database of 100,000 different clinical cases. These cases covered around 4,000 distinct ICD-10 primary diagnoses. To ensure patient privacy and data confidentiality, the sampling strategy over this clinical cases distribution adheres to the principles of k-anonymity and l-diversity [32]. To preserve epidemiological realism while avoiding excessive over-representation of very frequent conditions, which would reduce clinical diversity in the corpus, we applied a square-root transformation to the empirical frequencies, yielding a preliminary sampling probability proportional to fi\sqrt{f_{i}}, where fif_{i} is the frequency of condition ii in the SNDS data. To further limit the dominance of the most common conditions, we capped this value at a maximum probability pmaxp_{\mathrm{max}} (corresponding to a 0.1% sampling chance) and renormalized, giving the final sampling probability for each condition: pi=min⁥(pmax,fi)∑jmin⁥(pmax,fj)p_{i}=\frac{\min(p_{\mathrm{max}},\,\sqrt{f_{i}})}{\sum_{j}\min(p_{\mathrm{max}},\,\sqrt{f_{j}})} In practice, this theoretical distribution required iterative adjustment to account for operational constraints. Because hired authors had uneven expertise across medical specialties, document production could not be distributed uniformly, and not all specialties could be covered. The final allocation therefore used the square-root-with-cap model as a guiding principle, with reallocation based on actual case availability and author capacity, while preserving broad clinical coverage. Figures S1 and S2 in the Supplementary Materials provide, respectively, a detailed breakdown of these adjustments by specialty and the number of cases written per author. 2.3.2 Specific Use Cases In addition to the documents from the initial distribution, four specific sets of reports were assembled. Each set was designed to address a specific clinical information extraction use case: Coding Surgery reports from digestive surgery, orthopedic surgery, traumatology, and urology were specifically collected for a use case on ICD-10 diagnostic coding. Identifying biomarkers in oncology Pathology reports containing descriptions of tumor biomarkers used to inform diagnosis, prognosis, and targeted therapy selection in oncology: tissue and genomic alterations such as mutations, amplifications, and protein expression. The use case for this dataset is the automatic identification of these biomarkers. Identifying the response to treatment in oncology Oncology consultation reports mentioning treatment response (complete response, partial response, stable disease, progressive disease, not applicable, or indeterminate). The associated use case aims at classifying RECIST-style information from these reports. Infectiology Reports describing infectious episodes (including bacteremia) along with the causative bacteria and the primary site of infection. Other use cases (pseudonymization and summarization) are planned for the PARTAGES project (described below). However, the documents dedicated to these tasks do not require a distribution that differs from that described in the previous section. 2.4 Implementation The PARTAGES project, funded by the French government under the France 2030 initiative (operated by Bpifrance), brings together a consortium of 32 partners, including research teams, public and private healthcare institutions, and AI-focused deeptech companies. Its aim is to develop open resources to support the emergence of generative AI solutions in healthcare. The creation of the PARHAF corpus of clinical reports is one of the consortium’s initiatives. The corpus development was initiated through a scoping phase involving NLP experts and physicians, aimed at balancing production volume with budgetary constraints. Writing time was estimated at 60 minutes per document: 45 minutes for the first author and 15 minutes for review, validation, and correction by a second expert. This estimation was based on the expertise of the physicians involved and consultation with twelve residents from different specialties, active within their respective residents’ associations. A maximum duration of 60 minutes per document was retained, with a gross hourly compensation of €40, corresponding to a maximum of €40 per completed and reviewed report. Recruitment was conducted through a temporary employment agency. A national outreach campaign targeted 21 residents’ associations across different specialties in France. Eleven associations disseminated the call for participation, representing the following specialties: internal medicine, infectious diseases, visceral surgery, obstetrics and gynecology, neurology, pulmonology, public health, urology, oncology, anesthesiology and intensive care, and anatomical pathology. The call for participation was further circulated via the project’s hospital partners and their associated medical networks, enabling the inclusion of residents from additional specialties: nephrology, hematology, orthopedics, pediatrics, gastroenterology and hepatology, and cardiology. More than 500 applications were received, reflecting strong engagement from the medical community. A final panel of 104 residents was selected, prioritizing residents in the later stages of training and ensuring broad geographic representation. This response confirmed residents’ commitment to developing digital commons and supporting generative AI projects in healthcare. To ensure consistency and quality of the outputs, a structured support framework was implemented, including regular webinars, methodological guidelines, a centralized communication platform, and dedicated support. Contributors were also involved in methodological refinements through specialty-specific meetings, enabling adaptation of instructions to clinical practice and ensuring the validity and representativeness of the corpus. From a financial perspective, operational costs primarily consisted of physician compensation, increased by the management fees of the temporary employment agency (multiplicative coefficient of 1.9 applied to the gross remuneration). The production of the 7,394 clinical reports, totaling 5,518 hours of effective work, resulted in a total operational cost of approximately €495,000. 3 Data Records PARHAF consists of a single JSON file containing structured metadata about fictitious patients and the clinical documents associated with them. Each entry in the data array corresponds to one patient record and includes patient-level metadata, contextual information about the care scenario, and a list of associated documents. The documents themselves are not embedded in the JSON file. Instead, each document is referenced via a relative file path pointing to an external text file. These text files are stored separately and organized by medical specialty, with one directory per specialty. Each document file contains raw, unannotated plain text in French, with no markup, labels, or structural tags applied. The JSON file, therefore, acts as the index and metadata layer of the corpus, while the directory structure contains the raw textual content. The linkage between metadata and text relies exclusively on the relative paths specified in the documents[].path fields. The high-level structure of the JSON format and the path-based schema description are described in Figure 1 and Table 1, respectively. In addition to this standalone dataset, we also distribute a Hugging Face dataset (Parquet/Arrow) that is a derived representation generated automatically from the JSON files. Both formats, therefore, contain identical information and differ only in storage layout. The PARHAF corpus is openly available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license and the Etalab 2.0 license. It was released on March 25, 2026. The primary distribution is on Hugging Face (https://huggingface.co/datasets/HealthDataHub/PARHAF). 4 Data Overview A subset of the dataset is temporarily embargoed to enable future evaluations under controlled conditions, thereby limiting the risk of large language model contamination through prior exposure to the data. In total, we release 4,254 patients (6,185 documents) and keep 755 patients (1,209 documents) under embargo for future release. Figures 2, 3, and 4 provide details about, respectively, the patient count per medical specialty, the population pyramid chart, and the document and word counts by document type. 5 Technical Validation All contributors involved in report writing and validation completed standardized online training covering the study objectives, authoring guidelines, and validation procedures. The full clinical scenario framework was presented in detail to ensure a consistent understanding of context and expectations. Additional specialty-specific instructions were provided where relevant, including requirements regarding the number and types of documents per patient and scenario-dependent constraints. To preserve ecological validity and avoid burdening contributors with rigid formatting that would diverge from real-world clinical practice, reports were produced as guided but free-text documents within an online text-editing environment. Human validators were responsible for content quality control, assessing clinical coherence and adherence to the instructions. This step resulted in a rejection rate of 3.6% of submitted documents. Beyond human review, an automated “sanity check” pipeline was implemented to verify compliance with key structural and procedural requirements. These checks included validating document type and the expected number of documents per patient, confirming final hospitalization duration entries, and recording required clinical procedures or acts. This automated stage ensured the consistency of core structured elements across the corpus. Following automated validation, 120 documents required manual correction. These interventions addressed minor typographical errors within structured fields, format-style deviations that interfered with automated parsing, or unauthorized modifications to the original instruction template. No corrections altered the underlying clinical content; all changes were limited to restoring technical conformity with the dataset specifications. 6 Data Availability The PARHAF corpus is openly available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license, and the Etalab 2.0 license. It was released on March 25, 2026. The primary distribution is on Hugging Face at https://huggingface.co/datasets/HealthDataHub/PARHAF. 7 Code Availability The code used for data processing and quality control is publicly available on GitHub at https://github.com/xtannier/PAHRAF_cleaning_and_publication. This includes scripts for converting source documents from .docx to plain text, generating JSON metadata, building the Hugging Face Parquet dataset, and running the automated sanity-check pipeline described above. References [1] M. Becker and B. Böckmann (2016) Extraction of UMLS concepts using Apache cTAKES for German language. In Health Inform. Meets eHealth, G. Schreier, E. Ammenwerth, A. Hörbst, and D. Hayn (Eds.), Studies in Health Technology and Informatics, Vol. 223, pp. 71–76. External Links: Link, Document Cited by: §1.1. [2] G. Berthelier, A. Boutet, and A. Richard (2023) Toward training NLP models to take into account privacy leakages. In 2023 IEEE Int. Conf. Big Data (BigData), pp. 4854–4862. Cited by: §1.1. [3] L. Campillos, L. DelĂ©ger, C. Grouin, T. Hamon, A. Ligozat, and A. NĂ©vĂ©ol (2018) A French clinical corpus with comprehensive semantic annotations: development of the Medical Entity and Relation LIMSI annOtated Text corpus (MERLOT). Lang. Resour. Eval. 52 (2), pp. 571–601. Cited by: §1.1. [4] R. Chevrier, V. Foufi, C. Gaudet-Blavignac, A. Robert, and C. Lovis (2019-May 31) Use and understanding of anonymization and de-identification in the biomedical literature: scoping review. J. Med. Internet Res. 21 (5), pp. e13484. Cited by: §1.1. [5] F. Dernoncourt, J. Y. Lee, O. Uzuner, and P. Szolovits (2016-Dec 30) De-identification of patient notes with recurrent neural networks. J. Am. Med. Inform. Assoc.. Cited by: §1.1. [6] F. Estignard, S. Ghannay, J. Girard-Satabin, N. Hiebel, and A. NĂ©vĂ©ol (2025) Evaluating the confidentiality of synthetic clinical texts generated by language models. In Int. Conf. Artif. Intell. Med. (AIME), pp. 130–139. Cited by: §1.1. [7] J. Frei, L. Frei-Stuber, and F. Kramer (2023) GERNERMED++: semantic annotation in German medical NLP through transfer-learning, translation and word alignment. J. Biomed. Inform. 147, pp. 104513. External Links: Link, Document Cited by: §1.1. [8] J. Frei and F. Kramer (2023) Annotated dataset creation through large language models for non-English medical NLP. J. Biomed. Inform. 145, pp. 104478. External Links: Link, Document Cited by: §1.1. [9] Y. Gao, D. Dligach, L. Christensen, S. Tesch, R. Laffin, D. Xu, T. Miller, O. Uzuner, M. M. Churpek, and M. Afshar (2022-Sep 12) A scoping review of publicly available language tasks in clinical natural language processing. J Am Med Inform Assoc 29 (10), pp. 1797–1806. Cited by: §1.1, §1.1. [10] N. Grabar, V. Claveau, and C. Dalloux (2018) CAS: French corpus with clinical cases. In Proc. 9th Int. Workshop Health Text Mining Inf. Anal., pp. 122–128. Cited by: §1.1. [11] C. Grouin, A. Rosier, O. Dameron, and P. Zweigenbaum (2009) Testing tactics to localize de-identification. In Proc. MIE 2009, K. Adlassnig, J. Mantas, and B. Blobel (Eds.), Studies in Health Technology and Informatics, Vol. 150, Amsterdam, pp. 735–739. Cited by: §1.1. [12] U. Hahn (2025-06) Clinical document corpora-real ones, translated and synthetic substitutes, and assorted domain proxies: a survey of diversity in corpus design, with focus on German text data. JAMIA Open 8 (3), pp. ooaf024. External Links: Document Cited by: §1.1. [13] N. Hiebel, O. Ferret, K. Fort, and A. NĂ©vĂ©ol (2023) Can synthetic text help clinical named entity recognition? a study of electronic health records in French. In Proc. 17th Conf. Eur. Chapter Assoc. Comput. Linguist. (EACL), pp. 2320–2338. Cited by: §1.1. [14] J. Ive, N. Viani, J. Kam, L. Yin, S. Verma, S. Puntis, R. N. Cardinal, A. Roberts, R. Stewart, and S. Velupillai (2020) Generation and evaluation of artificial mental health records for natural language processing. NPJ Digit. Med. 3 (1), pp. 69. Cited by: §1.1. [15] A. Jannot, E. Zapletal, P. Avillach, M. Mamzer, A. Burgun, and P. Degoulet (2017) The Georges Pompidou University hospital clinical data warehouse: a 8-years follow-up experience. Int. J. Med. Inform. 102, pp. 21–28. Cited by: §1.1. [16] A. E. W. Johnson, T. J. Pollard, L. Shen, L. H. Lehman, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. A. Celi, and R. G. Mark (2016-May 24) MIMIC-III, a freely accessible critical care database. Sci. Data 3, pp. 160035. External Links: Document Cited by: §1.1, §1.1. [17] E. Laparra, A. Mascio, S. Velupillai, and T. Miller (2021-08) A review of recent work in transfer learning and domain adaptation for natural language processing of electronic health records. Yearb Med Inform 30 (1), pp. 239–244. Cited by: §1.1. [18] B. Le Guellec, K. Adambounou, L. C. Adams, T. Agripnidis, S. S. Ahn, R. Ait Chalal, T. Akinci D’Antonoli, P. Amouyel, H. Andersson, R. Bentegeac, C. Benzoni, A. A. Blandino, F. Busch, E. Can, R. Cau, A. U. Cavallo, C. Chavihot, E. Chiquete, R. Cuocolo, E. Divjak, B. Dziadkowiec-Macek, A. Elogne, S. C. Fanni, C. Ferrarotti, C. Fossataro, F. Fossataro, K. FuƂek, M. FuƂek, P. Gać, M. Gachowska, I. GarcĂ­a-JuĂĄrez, M. Gatti, N. Gorelik, A. M. Goulianou, A. Hamroun, N. Herinirina, Q. Holay, G. Ivanac, F. Kitamura, M. E. Klontzas, A. Kompanowska, R. Kompanowski, K. Kraik, D. Krupka, A. LefĂšvre, T. Lemke, M. Lindholz, P. Macek, M. Makowski, L. Mannacio, A. Meddeb, L. MĂŒller, A. Natale, B. Nguema Edzang, A. Ojeda, Y. W. Park, F. Piccione, A. Ponsiglione, M. Poręba, R. Poręba, P. Prucker, J. Pruvo, R. alba Pugliesi, F. H. Rabemanorintsoa, V. Rafailidis, K. Resler, J. Rotkegel, L. Saba, E. Siebert, A. Stanzione, A. F. Tekin, L. Toapanta-Yanchapaxi, M. Triantafyllou, E. Tsaoulia, S. Urban, E. Vassalou, F. Vernuccio, W. Wang, J. WassĂ©lius, A. WƂodarczak, S. WƂodarczak, A. Wysocki, L. Xu, T. ZatoƄski, S. Zhang, S. Ziegelmayer, G. Kuchcinski, and K. K. Bressem (2026) PARROT, an open multilingual radiology reports dataset. Eur. J. Radiol. Artif. Intell. 5, pp. 100066. External Links: ISSN 3050-5771, Document, Link Cited by: §1.1. [19] C. Lohr, S. Buechel, and U. Hahn (2018) Sharing copies of synthetic clinical corpora without physical distribution - A case study to get around IPRs and privacy constraints featuring the German JSYNCC corpus. In Proc. LREC 2018, N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis, and T. Tokunaga (Eds.), External Links: Link Cited by: §1.1. [20] B. Magnini, B. Altuna, A. Lavelli, M. Speranza, and R. Zanoli (2020) The E3C project: collection and annotation of a multilingual corpus of clinical cases. In Proc. of the Seventh Italian Conf. on Computational Linguistics, CLiC-it 2020, Bologna, Italy, March 1-3, 2021, J. Monti, F. Dell’Orletta, and F. Tamburini (Eds.), CEUR Workshop Proceedings, Vol. 2769. External Links: Link Cited by: §1.1. [21] O. Melamud and C. Shivade (2019) Towards automatic generation of shareable synthetic clinical notes using neural language models. In Proc. 2nd Clinical NLP Workshop, pp. 35–45. Cited by: §1.1. [22] A. Miranda-Escalada, E. FarrĂ©, and M. Krallinger (2020) Named entity recognition, concept normalization and clinical coding: overview of the Cantemist track for cancer text mining in Spanish, corpus, guidelines, methods and results. In Proc. IberLEF 2020, M. Á. G. Cumbreras, J. Gonzalo, E. M. CĂĄmara, R. MartĂ­nez-Unanue, P. Rosso, S. M. JimĂ©nez-Zafra, J. A. O. Zambrano, A. Miranda, J. Porta-Zamorano, Y. GutiĂ©rrez, A. RosĂĄ, M. Montes-y-GĂłmez, and M. G. Vega (Eds.), CEUR Workshop Proceedings, Vol. 2664, pp. 303–323. External Links: Link Cited by: §1.1. [23] A. Miranda-Escalada, L. GascĂł, S. Lima-LĂłpez, E. FarrĂ©-Maduell, D. Estrada, A. Nentidis, A. Krithara, G. Katsimpras, G. Paliouras, and M. Krallinger (2022) Overview of DisTEMIST at BioASQ: automatic detection and normalization of diseases from clinical texts: results, methods, evaluation and multilingual resources. In Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum, Bologna, Italy, September 5th - to - 8th, 2022, G. Faggioli, N. Ferro, A. Hanbury, and M. Potthast (Eds.), CEUR Workshop Proceedings, Vol. 3180, pp. 179–203. External Links: Link Cited by: §1.1. [24] L. Modersohn, S. Schulz, C. Lohr, and U. Hahn (2022) GRASCCO - the first publicly shareable, multiply-alienated German clinical text corpus. In Ger. Med. Data Sci. 2022, R. Röhrig, N. Grabe, V. S. Hoffmann, U. HĂŒbner, J. König, U. Sax, B. Schreiweis, and M. Sedlmayr (Eds.), Studies in Health Technology and Informatics, Vol. 296, pp. 66–72. External Links: Link, Document Cited by: §1.1. [25] N. Moore, P. Blin, R. Lassalle, N. Thurin, P. Bosco-Levy, and C. Droz (2021) National health insurance claims database in France (SNIRAM), systĂšme national des donnĂ©es de santĂ© (SNDS) and Health Data Hub (HDH). In Databases for pharmacoepidemiological research, pp. 131–140. Cited by: §1.2, §2.2.1, §2.3.1. [26] I. Neamatullah, M. M. Douglass, L. H. Lehman, A. Reisner, M. Villarroel, W. J. Long, P. Szolovits, G. B. Moody, R. G. Mark, and G. D. Clifford (2008-07) Automated de-identification of free-text medical records. BMC Med. Inform. Decis. Mak. 8. External Links: Document, ISSN 1472-6947, Link Cited by: §1.1. [27] A. NĂ©vĂ©ol, H. Dalianis, S. Velupillai, G. Savova, and P. Zweigenbaum (2018) Clinical natural language processing in languages other than English: opportunities and challenges. J. Biomed. Semant. 9 (1), pp. 12. Cited by: §1.1. [28] M. Neves, A. J. Yepes, A. Siu, R. Roller, P. Thomas, M. V. Navarro, L. Yeganova, D. Wiemann, G. M. Di Nunzio, F. Vezzani, et al. (2022) Findings of the WMT 2022 biomedical translation shared task: monolingual clinical case reports. In Proc. 7th Conf. Mach. Transl. (WMT), pp. 694–723. Cited by: §1.1. [29] J. P. Pestian, C. Brew, P. Matykiewicz, D. Hovermale, N. Johnson, K. B. Cohen, and W. Duch (2007-06) A shared task involving multi-label classification of clinical free text. In Biol. Transl. Clin. Lang. Process., K. B. Cohen, D. Demner-Fushman, C. Friedman, L. Hirschman, and J. Pestian (Eds.), Prague, Czech Republic, pp. 97–104. External Links: Link Cited by: §1.1. [30] M. Saeed, M. Villarroel, A. T. Reisner, G. Clifford, L. Lehman, G. Moody, T. Heldt, T. H. Kyaw, B. Moody, and R. G. Mark (2011-05) Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II): a public-access intensive care unit database. Crit. Care Med. 39, pp. 952–960. Note: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3124312/ Cited by: §1.1, §1.1. [31] J. Sinclair (2004) Corpus and text: basic priniciples. In Developing Linguistic Corpora: a Guide to Good Practice, M. Wynne (Ed.), External Links: ISSN 1463 5194 Cited by: §1.2. [32] L. Sweeney (2002-10) K-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10 (5), pp. 557–570. External Links: ISSN 0218-4885, Link, Document Cited by: §2.3.1. [33] X. Tannier, P. WajsbĂŒrt, A. Calliger, B. Dura, A. Mouchet, M. Hilka, and R. Bey (2024) Development and validation of a natural language processing algorithm to pseudonymize documents in the context of a clinical data warehouse. Methods Inf. Med. 63 (01/02), pp. 021–034. Cited by: §1.1. [34] Ö. Uzuner, Y. Luo, and P. Szolovits (2007) Evaluating the state-of-the-art in automatic de-identification. J. Am. Med. Inform. Assoc. 14, pp. 550–563. External Links: Document Cited by: §1.1, §1.1. [35] T. Vakili, A. Henriksson, and H. Dalianis (2025-07) Data-constrained synthesis of training data for de-identification. In Proc. 63rd Annu. Meet. Assoc. Comput. Linguist. (ACL), W. Che, J. Nabende, E. Shutova, and M. T. Pilehvar (Eds.), Vienna, Austria, pp. 27414–27427. External Links: Link, Document, ISBN 979-8-89176-251-0 Cited by: §1.1. [36] J. Zaghir, M. Bjelogrlic, J. Goldman, S. Aananou, C. Gaudet-Blavignac, and C. Lovis (2024-05) FRASIMED: a clinical French annotated resource produced through crosslingual BERT-based annotation projection. In Proc. LREC-COLING 2024, N. Calzolari, M. Kan, V. Hoste, A. Lenci, S. Sakti, and N. Xue (Eds.), Torino, Italia, pp. 7450–7460. External Links: Link Cited by: §1.1. [37] P. Zweigenbaum, P. Jacquemart, N. Grabar, and B. Habert (2001) Building a text corpus for representing the variety of medical language. Stud Health Technol Inform 84 (Pt 1), pp. 290–294. Cited by: §1.2. 8 Author Contributions Xavier Tannier: Conceptualization, Methodology, Software, Validation, Data Curation, Writing - Original Draft, Writing - Review & Editing, Visualization, Funding acquisition. Salam Abbara: Conceptualization, Methodology, Writing - Review & Editing, Funding acquisition. RĂ©mi Flicoteaux: Conceptualization, Methodology, Validation, Writing - Review & Editing. Youness Khalil: Methodology, Writing - Review & Editing, Project administration. AurĂ©lie NĂ©vĂ©ol: Conceptualization, Writing - Review & Editing, Funding acquisition. Pierre Zweigenbaum: Conceptualization, Writing - Review & Editing, Funding acquisition. Emmanuel Bacry: Conceptualization, Supervision, Project coordinator, Funding acquisition, Writing - Review & Editing. 9 Competing Interests The authors declare no competing interests related to this work. 10 Acknowledgements We thank the authors of the reports for their contribution and feedback on the protocol, as well as the PARTAGES consortium members for fruitful discussions towards corpus development. We also thank Florian Pons for helping with operational support and project coordination. 11 Funding This work was carried out as part of the PARTAGES project, awardee of the Bpifrance France 2030 call for proposals “Digital Commons for Generative Artificial Intelligence.” 12 Ethics statement* Through their affiliations with French public service agencies, the developers of the PARHAF corpus have benefited from access to SNDS data. The clinical scenarios used to write the clinical documents in the PARHAF corpus are based on aggregated public health statistics and do not pertain to identifiable real patients. No private information about individual subjects was used in this study; therefore, no IRB or ethics approval was required to create or distribute the PARHAF corpus. The clinical document authors involved in this study were apprised of the full document creation protocol. Participation was voluntary, and authors were compensated for their work in accordance with French labor laws. The PARHAF corpus is intended for use as educational material and as support for the development and evaluation of clinical NLP systems. It is not intended for clinical use. Supplementary Material Figure S1 shows the changes in distribution for each specialty, after the adjustment described in Section 2.3.1. Figure S2 illustrates the number of patients written by each author.
No comment yet.
Scooped by Gilbert C FAURE
March 29, 3:50 AM
Scoop.it!

98% people don’t fail with AI… They’re just stuck on beginner tools forever. And then wonder why results stay average. When ChatGPT and some AI tools launched towards the end of 2022,  I was doi...

98% people don’t fail with AI… They’re just stuck on beginner tools forever. And then wonder why results stay average. When ChatGPT and some AI tools launched towards the end of 2022,  I was doi... | Notebook or My Personal Learning Network | Scoop.it
98% people don’t fail with AI


They’re just stuck on beginner tools forever.

And then wonder why results stay average.

When ChatGPT and some AI tools launched towards the end of 2022, 
I was doing everything the “easy way”.
→ I used basic prompts
→ Used generic tools
→ Got average results
But it felt productive.

But now I do 10x more work
 in half the time.
With a better tool stack.

That’s when it clicked:
AI rewards leverage.

And the gap between novice and expert level AI tools is where that leverage lives.

Here’s what changed everything for me:

‱ Beginners use AI to assist
‱ Experts use AI to do the work

‱ Beginners ask questions
‱ Experts build workflows and agents

‱ Beginners chase outputs
‱ Experts chase systems

Now look at how the shift actually plays out:

→ Presentations: Gamma creates beautiful presentations
→ Data: Rows for structured analysis
→ Research: ChatGPT Deep Research for depth
→ Learning: NotebookLM for different learning formats
→ Video: VEED for end-to-end video content editing
→ Coding: Claude Code for precision
→ Websites: Webflow for aesthetic websites
→ Apps: Replit to build anything

The difference is the level of thinking behind the tool.

Do’s for expert-level AI usage:
✔ Build repeatable workflows
✔ Combine multiple tools into one pipeline
✔ Use AI for execution
✔ Validate outputs with real-world data
✔ Continuously upgrade your stack

Don’ts for expert-level AI usage:
✖ Don’t rely on a single tool for everything
✖ Don’t skip context 
✖ Don’t blindly trust outputs 
✖ Don’t ignore speed vs quality tradeoffs
✖ Don’t stay comfortable with beginner setups

Most people get stuck because beginner tools feel “good enough”.
But “good enough” is the enemy of scale.

If you want unfair advantage,
You need better tools.
Used like an expert.

Check the infographic 👇
I’ve covered all the expert-level tools clearly.

And now I’m curious:
What do you think about using expert-level AI tools?
Are they overkill or the real edge?
Comment below 👇

➕ Follow for more breakdowns like this.

♻ Repost to help your network use expert-level AI tools | 116 comments on LinkedIn
No comment yet.